最近看一个产品，遇到分析MIPS编译二进制文件的需求，尝试用IDA反编译无果后，改用Ghidra进行分析，进而产生了自动化批量反编译的需求。

Ghidra介绍

Ghidra是一个由NSA创建并维护的软件逆向工程框架，在2019年开源。

Github项目地址是：https://github.com/NationalSecurityAgency/ghidra。
该项目在Github上有1.3k个开放Issues、2.9k个已关闭Issues，183个开放PR、947个已关闭PR。

Ghidra最新版本是10.4，2023年9月29日发布。

安装使用

Ghidra安装过程很简单，按官方手册：

安装JDK17：https://adoptium.net/temurin/releases
下载Ghidra最新发行文件，并解压缩
运行./ghidraRun.bat

使用Ghidra反编译文件，需要先创建项目，然后将文件导入到项目中，也可以批量导入整个文件夹。

打开需要编译的文件，选择要分析的函数，右侧窗口会出现反编译后的代码。

小技巧

切换主题

Ghidra主界面，Edit→Theme→Switch：
Alt text

高亮相同变量名

鼠标中键点击变量，即可高亮同名变量。

自动化批量反编译

要分析的文件数量较多，一个一个在GUI界面反编译不现实，需要实现自动化批量实现。

在网上搜索相关场景，如下两个链接很有用：

上述链接给出了采用无头模式运行Ghida的命令：

./analyzeHeadless ghidra-project-directory -import binary-file -postscript yourpythonscript

（注：👆这个命令是错的、实际运行会报错。）

运行无头分析模式

在Ghidra解压目录的support/analyzeHeadlessREADME.html中有关于无头模式分析的详细说明。

运行无头分析模式时需要关闭GUI界面。

我自己测试最终可用的命令是：

.\support\analyzeHeadless.bat "E:\projects\xxxxxxxxxxxx\xx_mips" "xx_mips/xxxx_files" -process "*.so" -postscript ./decompile.py

"E:\projects\xxxxxxxxxxxx\xx_mips" 项目在本地磁盘上的路径
"xx_mips[/xxxx_files/]" 项目名称，可添加项目下的文件夹。注：这里是在Ghidra界面看到的路径
-process "*.so" 表示要处理的文件，Ghidra支持*和?的通配符匹配
-postscript ./decompile.py 后处理文件，本次场景下即实现自动化反编译的python脚本。

反编译脚本

上面链接中给出的脚本基本可用：

from  ghidra.app.decompiler import DecompInterface
from ghidra.util.task import ConsoleTaskMonitor

# get the current program
# here currentProgram is predefined

program = currentProgram
decompinterface = DecompInterface()
decompinterface.openProgram(program);
functions = program.getFunctionManager().getFunctions(True)
for function in list(functions):
    print(function)
    # decompile each function
    tokengrp = decompinterface.decompileFunction(function, 0, ConsoleTaskMonitor())
    print(tokengrp.getDecompiledFunction().getC())

设置反编译选项

直接使用链接给出的代码，反编译的代码中会出现大量的(*(code *)PTR_snprintf_00056138)，然而实际在GUI界面查看时直接是snprintf。

在GUI的Decompiler界面，可以选择查看反编译调试信息：

<optionslist>
    <readonly>on</readonly>
    <setlanguage>c-language</setlanguage>
    <indentincrement>4</indentincrement>
    <protoeval>__stdcall</protoeval>
</optionslist>

Ghidra的API手册见docs/GhidraAPI_javadoc/api/index.html

参照API说明设置反编译选项即可：

decompileOptions = DecompileOptions()
decompileOptions.setProtoEvalModel("__stdcall")
decompinterface.setOptions(decompileOptions)

最终脚本

我自己进行了如下修改：

将反编译后的代码写入文件
跳过对外部函数的反编译
设置反编译选项

#!encoding:utf8

from ghidra.app.decompiler import DecompInterface, DecompileOptions
from ghidra.util.task import ConsoleTaskMonitor

import re
import os


OVERWRITE = True
DECOMPILED_SOURCE_FOLDER = "xxxxxxxxx"


# get the current program
# here currentProgram is predefined
program = currentProgram
programName = program.getName()

print("[+] Current Program is {}, type = {}".format(programName, type(program)))

decompiledSourceName = "{}/{}.c".format(DECOMPILED_SOURCE_FOLDER, programName)
if not OVERWRITE and os.path.exists(decompiledSourceName):
    exit()

f = open(decompiledSourceName, "w")


decompinterface = DecompInterface()
decompinterface.openProgram(program)

# 设置反编译格式
decompileOptions = DecompileOptions()
decompileOptions.setProtoEvalModel("__stdcall")
decompinterface.setOptions(decompileOptions)

functions = program.getFunctionManager().getFunctions(True)


def WriteDecompiledSource(f, functionName, functionCode):
    functionCodeStartInfo = "// ============ {} Start ===============\n".format(
        functionName
    )
    functionCodeEndInfo = "// ============ {} End ===============\n\n\n".format(
        functionName
    )
    functionCode = str(functionCode)
    functionCode = re.sub(r"[\r\n]+", "\n", functionCode)

    f.write(functionCodeStartInfo)
    f.write(functionCode)
    f.write(functionCodeEndInfo)


for function in list(functions):
    # 外部函数不反编译
    if function.isExternal() or "::" in str(function):
        # print(function,function.getExternalLocation(),function.getSymbol())
        continue

    print("[+] Decompiling {}".format(function))
    # # decompile each function
    tokengrp = decompinterface.decompileFunction(function, 0, ConsoleTaskMonitor())

    funcName = str(function.getSymbol())
    code = tokengrp.getDecompiledFunction().getC()
    WriteDecompiledSource(f, funcName, code)

其他问题

自动化反编译效率

实际进行自动化批量反编译，默认配置下反编译185个文件，耗时50分钟左右。
监测CPU、GPU、内存使用量都不高。

查看手册，使用-noanalysis、-max-cpu可提高部分效率。

-noanalysis
If present, executables will not be analyzed (auto-analysis occurs by default).

-max-cpu <max cpu cores to use>
Sets the maximum number of CPU cores to use during headless processing (must be an integer). Setting max-cpu to 0 or a negative integer is equivalent to setting the maximum number of cores to 1.

添加-noanalysis测试，反编译同样的185个文件，耗时40分钟左右；
额外设置-max-cpu对速度无显著影响。

反编译脚本的调试

编写、调试反编译脚本时很不方便，得手工打印变量类型、然后查看手册。

Ghidra自动化批量反编译