Unflattening the Maze: Automating CFF Deobfuscation using Microcode Hex-Rays (P2)
1. Limitations of MIASM compared to Hex-Rays microcode
Although Miasm is a great data flow analysis tool, when coming to the final step of function recovery, this method reveals two major disadvantages:
Code patching space limitations: When working with machine code, each instruction block has a fixed size. If we want to replace a short jump instruction with a far jump (JMP) instruction whose size is larger than the existing bytes of that block, we will run out of space to overwrite. This easily leads to overwriting into important instructions or data located right after, corrupting the structure of the executable file.
Problems with Call and Offset instructions: In the original execution blocks, malware often contains function call instructions (CALL) or relative jumps. These instructions use offset values calculated from their current positions. When we do patching at the machine code level, keeping these offset values completely accurate is an extremely difficult challenge.
Modifying Microcode through the Hex-Rays plugin completely eliminates these troubles. Instead of caring about every machine code byte, we only need to change the logic of the function graph, Hex-Rays will automatically recalculate all the jumps and generate clean pseudocode for us.
2. Introduce Microcode
Instead of manually patching bytes, which is quite risky to corrupt the file. We have another choice which is deobfuscating using Hex-Rays Microcode. For the introduction to Hex-Rays Microcode, you can read the details at Elastic's blog. In this article, I will assume you already have certain knowledge about Microcode.
I will briefly talk about Microcode. When we press the magic F5 key in IDA Pro to see C pseudocode, Hex-Rays does not translate directly from assembly language to C. The semantic gap between these two languages is too huge. To accomplish that, Hex-Rays uses an extremely powerful intermediate station named Microcode (Intermediate Representation language). The processing steps take place as follows:
Step 1: Assembly decoding. Hex-Rays receives machine code from various hardware architectures like x86 or ARM and decodes them into basic assembly instructions.
Step 2: Conversion to IL intermediate language. Hardware-dependent instructions are translated to linear Microcode language. The differences in registers or CPU instruction sets across architectures are completely removed to create a common standard.
Step 3: AST syntax tree construction. Instead of leaving the code in flat form, Microcode is reorganized into a graph of blocks and abstract syntax trees inside each block. In this tree format, optimization plugins can easily analyze operands, prune tangled control flows, and safely connect code blocks back together.
Step 4: C pseudocode generation. After the syntax tree has been cleaned and fully optimized, the decompiler traverses this entire tree structure to output friendly C code, making it easy for the analyst to read and understand the actual logic of the program.
3. Implement
The sample in this part is similar to part 1. The implementation steps are also similar to part 1 previously, only different in the tool that we use.
Step 1: Identify the Dispatcher
Similar to the previous script, the plugin begins by scanning all Microcode blocks to find the Dispatcher based on the count of incoming parent nodes.
# Identifies the dispatcher based on predecessor count (predset) statistics
# The block with the most predecessors is chosen as the dispatcher
def filter_pred(self):
pred_blk = defaultdict()
blk_qty = self.mba.qty
for i in range(1, blk_qty - 1):
current_blk = self.mba.get_mblock(i)
pred_blk[current_blk] = len(current_blk.predset)
pred_blk = sorted(pred_blk.items(), key=lambda item: item[1], reverse=True)
dispatcher = pred_blk[0][0]
# Preliminary check: Verify if the block is a valid dispatcher
if dispatcher.serial in range(1, 5) and len(dispatcher.predset) >= 3:
return dispatcher
Step 2: Build the state map from the Backbone
After having the Dispatcher, the plugin analyzes the subsequent comparison blocks to create a mapping list between the state variable's value and the actual destination code block.
# Maps state values to their corresponding destination blocks
def find_block_status(self):
blk_qty = self.mba.qty
status_blk = defaultdict()
statis_state_var = defaultdict()
for i in range(1, blk_qty - 1):
current_blk = self.mba.get_mblock(i)
# Type 1 Backbone: Missing microcode, verify via assembly
if ((current_blk.head == None and current_blk.tail == None) or
(current_blk.tail.ea == current_blk.head.ea and current_blk.head.opcode == m_goto)):
insn = ida_ua.insn_t()
if not ida_ua.decode_insn(insn, current_blk.start):
continue
if insn.ops[0].type == ida_ua.o_reg and insn.ops[1].type == ida_ua.o_imm:
# Map constant to successor block
status = insn.ops[1].value & 0xFFFFFFFF
# ... logic to determine jz/jnz destination ...
status_blk[status] = current_blk.succset[0]
self.clean_blk.add(current_blk.serial)
# Type 2 Backbone: Analyzed directly via microcode instructions
elif current_blk.tail != None and current_blk.tail.opcode in [m_jz, m_jnz]:
mins = current_blk.tail
if mins.r.is_constant():
status = mins.r.value(False)
# ... logic to determine destination ...
self.clean_blk.add(current_blk.serial)
return status_blk
Step 3: Trace and link actual code blocks
This is the phase where the plugin finds the endpoints of execution blocks and determines where they intend to jump next by tracing instructions that assign values to the state variable.
In this phase, it will be different from part 1. If instead of having to patch bytes to modify the assembly instruction, we will alter the block links including predset and succset to point to the correct order as in the illustration image below.
# Re-links blocks into their logical execution order
def link_blocks(self):
is_changed = False
jump_opcodes = [m_goto, m_jcnd, m_jz, m_jnz, m_ja, m_jae, m_jb, m_jbe, m_jg, m_jge, m_jl, m_jle]
for src_blk, dst_blk in self.linking_blk.items():
tail = src_blk.tail
old_succs = list(src_blk.succset)
if tail and tail.opcode in jump_opcodes:
# Update the destination of existing jumps
if tail.opcode == m_goto:
tail.l._make_blkref(dst_blk.serial)
# ... setup succset and predecessors ...
else:
# Update conditional jump targets
tail.d._make_blkref(dst_blk.serial)
# ... handle fallthrough logic ...
else:
# Insert a new GOTO if no jump exists
new_insn = minsn_t(tail.ea if tail else src_blk.start)
new_insn.opcode = m_goto
new_insn.l._make_blkref(dst_blk.serial)
src_blk.insert_into_block(new_insn, src_blk.tail)
src_blk.mark_lists_dirty()
is_changed = True
return is_changed
Step 4: Clean up malware code and optimize
Finally, the plugin will completely remove dispatcher blocks, garbage state variable assignment instructions, and even complex computing logic blocks (MBA) added to trick analysts. For MBA instructions, we will replace them using nop instructions, then let IDA optimize the rest.
# Removes unnecessary instructions like state variable assignments and MBA logic
def remove_state_mov_insns(self, state_backbone, state_relevant):
# ... logic to find and NOP instructions involving state variables ...
minsn.opcode = m_nop
is_changed = True
def remove_assignment_mba(self):
# Removes complex Mixed Boolean-Arithmetic (MBA) instructions
opcode_mba = [m_add, m_sub, m_mul, m_and, m_or, m_shr, m_shl]
# ... logic to identify and NOP complex MBA patterns ...
minsn.opcode = m_nop
The obtained result is as shown in the image below. For this version, the pseudocode will look cleaner than part 1 thanks to further optimization by removing MBA. The full sourcde code is on Github.



