Skip to main content

Command Palette

Search for a command to run...

Unflattening the Maze: Automating CFF Deobfuscation using Microcode Hex-Rays (P2)

Updated
6 min read

1. Limitations of MIASM compared to Hex-Rays microcode

Although Miasm is a great data flow analysis tool, when coming to the final step of function recovery, this method reveals two major disadvantages:

  • Code patching space limitations: When working with machine code, each instruction block has a fixed size. If we want to replace a short jump instruction with a far jump (JMP) instruction whose size is larger than the existing bytes of that block, we will run out of space to overwrite. This easily leads to overwriting into important instructions or data located right after, corrupting the structure of the executable file.

  • Problems with Call and Offset instructions: In the original execution blocks, malware often contains function call instructions (CALL) or relative jumps. These instructions use offset values calculated from their current positions. When we do patching at the machine code level, keeping these offset values completely accurate is an extremely difficult challenge.

Modifying Microcode through the Hex-Rays plugin completely eliminates these troubles. Instead of caring about every machine code byte, we only need to change the logic of the function graph, Hex-Rays will automatically recalculate all the jumps and generate clean pseudocode for us.

2. Introduce Microcode

Instead of manually patching bytes, which is quite risky to corrupt the file. We have another choice which is deobfuscating using Hex-Rays Microcode. For the introduction to Hex-Rays Microcode, you can read the details at Elastic's blog. In this article, I will assume you already have certain knowledge about Microcode.

I will briefly talk about Microcode. When we press the magic F5 key in IDA Pro to see C pseudocode, Hex-Rays does not translate directly from assembly language to C. The semantic gap between these two languages is too huge. To accomplish that, Hex-Rays uses an extremely powerful intermediate station named Microcode (Intermediate Representation language). The processing steps take place as follows:

  • Step 1: Assembly decoding. Hex-Rays receives machine code from various hardware architectures like x86 or ARM and decodes them into basic assembly instructions.

  • Step 2: Conversion to IL intermediate language. Hardware-dependent instructions are translated to linear Microcode language. The differences in registers or CPU instruction sets across architectures are completely removed to create a common standard.

  • Step 3: AST syntax tree construction. Instead of leaving the code in flat form, Microcode is reorganized into a graph of blocks and abstract syntax trees inside each block. In this tree format, optimization plugins can easily analyze operands, prune tangled control flows, and safely connect code blocks back together.

  • Step 4: C pseudocode generation. After the syntax tree has been cleaned and fully optimized, the decompiler traverses this entire tree structure to output friendly C code, making it easy for the analyst to read and understand the actual logic of the program.

3. Implement

The sample in this part is similar to part 1. The implementation steps are also similar to part 1 previously, only different in the tool that we use.

Step 1: Identify the Dispatcher

Similar to the previous script, the plugin begins by scanning all Microcode blocks to find the Dispatcher based on the count of incoming parent nodes.

    # Identifies the dispatcher based on predecessor count (predset) statistics
    # The block with the most predecessors is chosen as the dispatcher
    def filter_pred(self):
        pred_blk = defaultdict()
        blk_qty = self.mba.qty
        for i in range(1, blk_qty - 1):
            current_blk = self.mba.get_mblock(i)
            pred_blk[current_blk] = len(current_blk.predset)
        pred_blk = sorted(pred_blk.items(), key=lambda item: item[1], reverse=True)

        dispatcher = pred_blk[0][0]
        # Preliminary check: Verify if the block is a valid dispatcher
        if dispatcher.serial in range(1, 5) and len(dispatcher.predset) >= 3:
            return dispatcher

Step 2: Build the state map from the Backbone

After having the Dispatcher, the plugin analyzes the subsequent comparison blocks to create a mapping list between the state variable's value and the actual destination code block.

    # Maps state values to their corresponding destination blocks
    def find_block_status(self):
        blk_qty = self.mba.qty
        status_blk = defaultdict()
        statis_state_var = defaultdict()

        for i in range(1, blk_qty - 1):
            current_blk = self.mba.get_mblock(i)
            # Type 1 Backbone: Missing microcode, verify via assembly
            if ((current_blk.head == None and current_blk.tail == None) or
                    (current_blk.tail.ea == current_blk.head.ea and current_blk.head.opcode == m_goto)):
                insn = ida_ua.insn_t()
                if not ida_ua.decode_insn(insn, current_blk.start):
                    continue
                if insn.ops[0].type == ida_ua.o_reg and insn.ops[1].type == ida_ua.o_imm:
                    # Map constant to successor block
                    status = insn.ops[1].value & 0xFFFFFFFF
                    # ... logic to determine jz/jnz destination ...
                    status_blk[status] = current_blk.succset[0]
                    self.clean_blk.add(current_blk.serial)

            # Type 2 Backbone: Analyzed directly via microcode instructions
            elif current_blk.tail != None and current_blk.tail.opcode in [m_jz, m_jnz]:
                mins = current_blk.tail
                if mins.r.is_constant():
                    status = mins.r.value(False)
                    # ... logic to determine destination ...
                    self.clean_blk.add(current_blk.serial)
        
        return status_blk

This is the phase where the plugin finds the endpoints of execution blocks and determines where they intend to jump next by tracing instructions that assign values to the state variable.

In this phase, it will be different from part 1. If instead of having to patch bytes to modify the assembly instruction, we will alter the block links including predset and succset to point to the correct order as in the illustration image below.

    # Re-links blocks into their logical execution order
    def link_blocks(self):
        is_changed = False
        jump_opcodes = [m_goto, m_jcnd, m_jz, m_jnz, m_ja, m_jae, m_jb, m_jbe, m_jg, m_jge, m_jl, m_jle]

        for src_blk, dst_blk in self.linking_blk.items():
            tail = src_blk.tail
            old_succs = list(src_blk.succset)

            if tail and tail.opcode in jump_opcodes:
                # Update the destination of existing jumps
                if tail.opcode == m_goto:
                    tail.l._make_blkref(dst_blk.serial)
                    # ... setup succset and predecessors ...
                else:
                    # Update conditional jump targets
                    tail.d._make_blkref(dst_blk.serial)
                    # ... handle fallthrough logic ...
            else:
                # Insert a new GOTO if no jump exists
                new_insn = minsn_t(tail.ea if tail else src_blk.start)
                new_insn.opcode = m_goto
                new_insn.l._make_blkref(dst_blk.serial)
                src_blk.insert_into_block(new_insn, src_blk.tail)

            src_blk.mark_lists_dirty()
            is_changed = True
        return is_changed

Step 4: Clean up malware code and optimize

Finally, the plugin will completely remove dispatcher blocks, garbage state variable assignment instructions, and even complex computing logic blocks (MBA) added to trick analysts. For MBA instructions, we will replace them using nop instructions, then let IDA optimize the rest.

    # Removes unnecessary instructions like state variable assignments and MBA logic
    def remove_state_mov_insns(self, state_backbone, state_relevant):
        # ... logic to find and NOP instructions involving state variables ...
        minsn.opcode = m_nop
        is_changed = True

    def remove_assignment_mba(self):
        # Removes complex Mixed Boolean-Arithmetic (MBA) instructions
        opcode_mba = [m_add, m_sub, m_mul, m_and, m_or, m_shr, m_shl]
        # ... logic to identify and NOP complex MBA patterns ...
        minsn.opcode = m_nop

The obtained result is as shown in the image below. For this version, the pseudocode will look cleaner than part 1 thanks to further optimization by removing MBA. The full sourcde code is on Github.