Deobfuscation

What is obfuscation?

Obfuscation is the deliberate transformation of code to make it harder to understand while preserving its behavior. Malware authors, DRM systems, and anti-cheat software all use obfuscation to slow down reverse engineers.

Obfuscated binaries don’t just lack symbols — they actively fight analysis. A function that’s three lines of logic can become two hundred lines of meaningless-looking jumps, dead branches, and encrypted constants. Kong detects five categories of obfuscation and handles each with targeted tooling.

Detected techniques

Control Flow Flattening (CFF)

CFF replaces a function’s natural control flow with a while(1)/switch(state) dispatcher. Every basic block becomes a case in the switch, and transitions happen by updating a state variable.

// Obfuscated: all logic flattened into a state machine
int state = 0x3a7b;
while (1) {
    switch (state) {
        case 0x3a7b: x = input[0]; state = 0x91ff; break;
        case 0x91ff: if (x > 10) state = 0x4c02; else state = 0xd8a1; break;
        case 0x4c02: result = x * 2; state = 0xf100; break;
        case 0xd8a1: result = x + 1; state = 0xf100; break;
        case 0xf100: return result;
    }
}

The original code was just an if/else with two branches. CFF makes it look like a complex state machine.

Bogus Control Flow

Bogus control flow inserts branches guarded by opaque predicates — conditions that always evaluate to the same value but are hard for static analysis to prove.

// (x * (x + 1)) is always even — this branch is always taken
if ((x * (x + 1)) % 2 == 0) {
    real_logic();
} else {
    garbage_code_that_never_runs();
}

Opaque predicate — a condition whose result is known to the obfuscator at compile time but appears unpredictable to a decompiler. Classic example: x * (x + 1) % 2 == 0 is always true because consecutive integers always include an even number.

Instruction Substitution

Simple operations are replaced with mathematically equivalent but cryptic expressions:

Original	Obfuscated
`a ^ b`	`(~a & b) \| (a & ~b)`
`a + b`	`(a ^ b) + 2 * (a & b)`
`a - b`	`(a ^ (a ^ b)) - (a ^ b)`

String Encryption

Strings are encrypted at compile time and decrypted at runtime, hiding clues that reverse engineers rely on:

// Encrypted string — looks like random bytes in the binary
char buf[] = {0x53, 0x7a, 0x6b, 0x71, 0x72, 0x00};
for (int i = 0; buf[i]; i++) {
    buf[i] ^= 0x1f;  // XOR decrypt at runtime
}
// buf is now "Login" — but the decompiler only sees the encrypted form

VM Protection

The most aggressive technique. The original code is compiled into custom bytecode, and the function is replaced with an interpreter:

void vm_execute(uint8_t *bytecode) {
    while (1) {
        switch (*bytecode++) {
            case 0x01: regs[*bytecode++] = stack[sp--]; break;
            case 0x02: stack[++sp] = regs[*bytecode++]; break;
            case 0x03: stack[sp-1] += stack[sp]; sp--; break;
            // ... 15+ more opcodes
            case 0xFF: return;
        }
    }
}

This is the hardest to reverse — the original logic is encoded as data, not code.

The agentic deobfuscation loop

When Kong detects obfuscation in a function’s decompilation, it switches from single-shot LLM analysis to a multi-turn agentic loop. The LLM gets access to six specialized tools and can call them iteratively to peel away layers of obfuscation:

Tool	Purpose
`simplify_expression`	Z3-based symbolic simplification — detects and resolves opaque predicates
`eliminate_dead_code`	Removes unreachable branches after predicates are resolved
`trace_state_machine`	Extracts CFF state transitions and exit conditions from Ghidra’s IR
`identify_crypto_constants`	Scans for known constants (AES S-box, SHA-256 init values, MD5 T-table)
`get_decompilation`	Re-reads decompilation from Ghidra after applying intermediate results
`get_basic_blocks`	Extracts the control flow graph and basic block structure

The LLM drives the loop — it decides which tools to call and in what order based on what it sees in the decompilation. A typical flow:

LLM sees a suspicious while(1)/switch → calls trace_state_machine
State machine tool returns the transition graph → LLM identifies dead states
LLM calls simplify_expression on a guard condition → z3 proves it’s always true
LLM calls eliminate_dead_code with the resolved predicate → dead branch removed
LLM calls get_decompilation to see the cleaned-up code → produces final analysis

Z3 simplifier internals

The simplify_expression tool is powered by the Z3 theorem prover. Here’s how it works:

Parse: The C expression from the decompilation is parsed into a Z3 AST (abstract syntax tree)
Simplify: Z3 attempts algebraic simplification — reducing complex bit operations to simpler forms
Prove: Z3 checks whether the expression is a tautology (always true) or contradiction (always false)
Report: If it’s one of these, the expression is an opaque predicate, and the corresponding branch is dead code

For example, given (x * (x + 1)) % 2 == 0:

Z3 recognizes that x * (x + 1) is always even (one of two consecutive integers is even)
Z3 proves the expression is a tautology
Kong marks the else branch as dead code

The simplifier also handles instruction substitution by reducing complex bit expressions to their simpler equivalents. (~a & b) | (a & ~b) simplifies to a ^ b through Z3’s bit-vector reasoning.

Dead code elimination

After opaque predicates are resolved, Kong prunes the dead branches. The algorithm:

Take the set of resolved predicates and their constant truth values (from the Z3 simplifier)
Walk the decompilation AST
For each if/else guarded by a resolved predicate:
- If the predicate is always true: keep the if body, remove the else body
- If the predicate is always false: keep the else body (if it exists), remove the if body
Remove any variables and assignments that are only referenced in deleted branches

This produces cleaner decompilation with the garbage code stripped away, making the real logic visible to the LLM.

State machine tracing for CFF

When a function uses control flow flattening, trace_state_machine reconstructs the original control flow:

Identify the dispatcher: Find the while(1)/switch(state) loop and the state variable
Extract transitions: For each switch case, determine what value the state variable is set to — this gives the edges in the state graph
Find entry and exit: The initial state value is the entry point; cases that break out of the loop or return are exits
Reconstruct flow: Build a directed graph of state transitions. The original control flow is the path through this graph from entry to exit
Simplify: Collapse linear chains of states (A→B→C where B has no other edges) back into sequential code

The result is a representation of the original control flow that the LLM can reason about, even though the compiled code uses a flat dispatcher.

Getting Started

Core Concepts

Usage

Configuration

Deobfuscation

What is obfuscation?

Detected techniques

Control Flow Flattening (CFF)

Bogus Control Flow

Instruction Substitution

String Encryption

VM Protection

The agentic deobfuscation loop

Z3 simplifier internals

Dead code elimination

State machine tracing for CFF

Further reading

Getting Started

Core Concepts

Usage

Configuration

Documentation Index

​What is obfuscation?

​Detected techniques

​Control Flow Flattening (CFF)

​Bogus Control Flow

​Instruction Substitution

​String Encryption

​VM Protection

​The agentic deobfuscation loop

​Z3 simplifier internals

​Dead code elimination

​State machine tracing for CFF

​Further reading

What is obfuscation?

Detected techniques

Control Flow Flattening (CFF)

Bogus Control Flow

Instruction Substitution

String Encryption

VM Protection

The agentic deobfuscation loop

Z3 simplifier internals

Dead code elimination

State machine tracing for CFF

Further reading