Skip to main content

What is obfuscation?

Obfuscation is the deliberate transformation of code to make it harder to understand while preserving its behavior. Malware authors, DRM systems, and anti-cheat software all use obfuscation to slow down reverse engineers.
Obfuscated binaries don’t just lack symbols — they actively fight analysis. A function that’s three lines of logic can become two hundred lines of meaningless-looking jumps, dead branches, and encrypted constants. Kong detects five categories of obfuscation and handles each with targeted tooling.

Detected techniques

Control Flow Flattening (CFF)

CFF replaces a function’s natural control flow with a while(1)/switch(state) dispatcher. Every basic block becomes a case in the switch, and transitions happen by updating a state variable.
// Obfuscated: all logic flattened into a state machine
int state = 0x3a7b;
while (1) {
    switch (state) {
        case 0x3a7b: x = input[0]; state = 0x91ff; break;
        case 0x91ff: if (x > 10) state = 0x4c02; else state = 0xd8a1; break;
        case 0x4c02: result = x * 2; state = 0xf100; break;
        case 0xd8a1: result = x + 1; state = 0xf100; break;
        case 0xf100: return result;
    }
}
The original code was just an if/else with two branches. CFF makes it look like a complex state machine.

Bogus Control Flow

Bogus control flow inserts branches guarded by opaque predicates — conditions that always evaluate to the same value but are hard for static analysis to prove.
// (x * (x + 1)) is always even — this branch is always taken
if ((x * (x + 1)) % 2 == 0) {
    real_logic();
} else {
    garbage_code_that_never_runs();
}
Opaque predicate — a condition whose result is known to the obfuscator at compile time but appears unpredictable to a decompiler. Classic example: x * (x + 1) % 2 == 0 is always true because consecutive integers always include an even number.

Instruction Substitution

Simple operations are replaced with mathematically equivalent but cryptic expressions:
OriginalObfuscated
a ^ b(~a & b) | (a & ~b)
a + b(a ^ b) + 2 * (a & b)
a - b(a ^ (a ^ b)) - (a ^ b)

String Encryption

Strings are encrypted at compile time and decrypted at runtime, hiding clues that reverse engineers rely on:
// Encrypted string — looks like random bytes in the binary
char buf[] = {0x53, 0x7a, 0x6b, 0x71, 0x72, 0x00};
for (int i = 0; buf[i]; i++) {
    buf[i] ^= 0x1f;  // XOR decrypt at runtime
}
// buf is now "Login" — but the decompiler only sees the encrypted form

VM Protection

The most aggressive technique. The original code is compiled into custom bytecode, and the function is replaced with an interpreter:
void vm_execute(uint8_t *bytecode) {
    while (1) {
        switch (*bytecode++) {
            case 0x01: regs[*bytecode++] = stack[sp--]; break;
            case 0x02: stack[++sp] = regs[*bytecode++]; break;
            case 0x03: stack[sp-1] += stack[sp]; sp--; break;
            // ... 15+ more opcodes
            case 0xFF: return;
        }
    }
}
This is the hardest to reverse — the original logic is encoded as data, not code.

The agentic deobfuscation loop

When Kong detects obfuscation in a function’s decompilation, it switches from single-shot LLM analysis to a multi-turn agentic loop. The LLM gets access to six specialized tools and can call them iteratively to peel away layers of obfuscation:
ToolPurpose
simplify_expressionZ3-based symbolic simplification — detects and resolves opaque predicates
eliminate_dead_codeRemoves unreachable branches after predicates are resolved
trace_state_machineExtracts CFF state transitions and exit conditions from Ghidra’s IR
identify_crypto_constantsScans for known constants (AES S-box, SHA-256 init values, MD5 T-table)
get_decompilationRe-reads decompilation from Ghidra after applying intermediate results
get_basic_blocksExtracts the control flow graph and basic block structure
The LLM drives the loop — it decides which tools to call and in what order based on what it sees in the decompilation. A typical flow:
  1. LLM sees a suspicious while(1)/switch → calls trace_state_machine
  2. State machine tool returns the transition graph → LLM identifies dead states
  3. LLM calls simplify_expression on a guard condition → z3 proves it’s always true
  4. LLM calls eliminate_dead_code with the resolved predicate → dead branch removed
  5. LLM calls get_decompilation to see the cleaned-up code → produces final analysis

Z3 simplifier internals

The simplify_expression tool is powered by the Z3 theorem prover. Here’s how it works:
  1. Parse: The C expression from the decompilation is parsed into a Z3 AST (abstract syntax tree)
  2. Simplify: Z3 attempts algebraic simplification — reducing complex bit operations to simpler forms
  3. Prove: Z3 checks whether the expression is a tautology (always true) or contradiction (always false)
  4. Report: If it’s one of these, the expression is an opaque predicate, and the corresponding branch is dead code
For example, given (x * (x + 1)) % 2 == 0:
  • Z3 recognizes that x * (x + 1) is always even (one of two consecutive integers is even)
  • Z3 proves the expression is a tautology
  • Kong marks the else branch as dead code
The simplifier also handles instruction substitution by reducing complex bit expressions to their simpler equivalents. (~a & b) | (a & ~b) simplifies to a ^ b through Z3’s bit-vector reasoning.

Dead code elimination

After opaque predicates are resolved, Kong prunes the dead branches. The algorithm:
  1. Take the set of resolved predicates and their constant truth values (from the Z3 simplifier)
  2. Walk the decompilation AST
  3. For each if/else guarded by a resolved predicate:
    • If the predicate is always true: keep the if body, remove the else body
    • If the predicate is always false: keep the else body (if it exists), remove the if body
  4. Remove any variables and assignments that are only referenced in deleted branches
This produces cleaner decompilation with the garbage code stripped away, making the real logic visible to the LLM.

State machine tracing for CFF

When a function uses control flow flattening, trace_state_machine reconstructs the original control flow:
  1. Identify the dispatcher: Find the while(1)/switch(state) loop and the state variable
  2. Extract transitions: For each switch case, determine what value the state variable is set to — this gives the edges in the state graph
  3. Find entry and exit: The initial state value is the entry point; cases that break out of the loop or return are exits
  4. Reconstruct flow: Build a directed graph of state transitions. The original control flow is the path through this graph from entry to exit
  5. Simplify: Collapse linear chains of states (A→B→C where B has no other edges) back into sequential code
The result is a representation of the original control flow that the LLM can reason about, even though the compiled code uses a flat dispatcher.

Further reading

Last modified on March 20, 2026