What is obfuscation?
Obfuscation is the deliberate transformation of code to make it harder to understand while preserving its behavior. Malware authors, DRM systems, and anti-cheat software all use obfuscation to slow down reverse engineers.
Obfuscated binaries don’t just lack symbols — they actively fight analysis. A function that’s three lines of logic can become two hundred lines of meaningless-looking jumps, dead branches, and encrypted constants. Kong detects five categories of obfuscation and handles each with targeted tooling.
Detected techniques
Control Flow Flattening (CFF)
CFF replaces a function’s natural control flow with a while(1)/switch(state) dispatcher. Every basic block becomes a case in the switch, and transitions happen by updating a state variable.
// Obfuscated: all logic flattened into a state machine
int state = 0x3a7b;
while (1) {
switch (state) {
case 0x3a7b: x = input[0]; state = 0x91ff; break;
case 0x91ff: if (x > 10) state = 0x4c02; else state = 0xd8a1; break;
case 0x4c02: result = x * 2; state = 0xf100; break;
case 0xd8a1: result = x + 1; state = 0xf100; break;
case 0xf100: return result;
}
}
The original code was just an if/else with two branches. CFF makes it look like a complex state machine.
Bogus Control Flow
Bogus control flow inserts branches guarded by opaque predicates — conditions that always evaluate to the same value but are hard for static analysis to prove.
// (x * (x + 1)) is always even — this branch is always taken
if ((x * (x + 1)) % 2 == 0) {
real_logic();
} else {
garbage_code_that_never_runs();
}
Opaque predicate — a condition whose result is known to the obfuscator at compile time but appears unpredictable to a decompiler. Classic example: x * (x + 1) % 2 == 0 is always true because consecutive integers always include an even number.
Instruction Substitution
Simple operations are replaced with mathematically equivalent but cryptic expressions:
| Original | Obfuscated |
|---|
a ^ b | (~a & b) | (a & ~b) |
a + b | (a ^ b) + 2 * (a & b) |
a - b | (a ^ (a ^ b)) - (a ^ b) |
String Encryption
Strings are encrypted at compile time and decrypted at runtime, hiding clues that reverse engineers rely on:
// Encrypted string — looks like random bytes in the binary
char buf[] = {0x53, 0x7a, 0x6b, 0x71, 0x72, 0x00};
for (int i = 0; buf[i]; i++) {
buf[i] ^= 0x1f; // XOR decrypt at runtime
}
// buf is now "Login" — but the decompiler only sees the encrypted form
VM Protection
The most aggressive technique. The original code is compiled into custom bytecode, and the function is replaced with an interpreter:
void vm_execute(uint8_t *bytecode) {
while (1) {
switch (*bytecode++) {
case 0x01: regs[*bytecode++] = stack[sp--]; break;
case 0x02: stack[++sp] = regs[*bytecode++]; break;
case 0x03: stack[sp-1] += stack[sp]; sp--; break;
// ... 15+ more opcodes
case 0xFF: return;
}
}
}
This is the hardest to reverse — the original logic is encoded as data, not code.
The agentic deobfuscation loop
When Kong detects obfuscation in a function’s decompilation, it switches from single-shot LLM analysis to a multi-turn agentic loop. The LLM gets access to six specialized tools and can call them iteratively to peel away layers of obfuscation:
| Tool | Purpose |
|---|
simplify_expression | Z3-based symbolic simplification — detects and resolves opaque predicates |
eliminate_dead_code | Removes unreachable branches after predicates are resolved |
trace_state_machine | Extracts CFF state transitions and exit conditions from Ghidra’s IR |
identify_crypto_constants | Scans for known constants (AES S-box, SHA-256 init values, MD5 T-table) |
get_decompilation | Re-reads decompilation from Ghidra after applying intermediate results |
get_basic_blocks | Extracts the control flow graph and basic block structure |
The LLM drives the loop — it decides which tools to call and in what order based on what it sees in the decompilation. A typical flow:
- LLM sees a suspicious
while(1)/switch → calls trace_state_machine
- State machine tool returns the transition graph → LLM identifies dead states
- LLM calls
simplify_expression on a guard condition → z3 proves it’s always true
- LLM calls
eliminate_dead_code with the resolved predicate → dead branch removed
- LLM calls
get_decompilation to see the cleaned-up code → produces final analysis
Z3 simplifier internals
The simplify_expression tool is powered by the Z3 theorem prover. Here’s how it works:
- Parse: The C expression from the decompilation is parsed into a Z3 AST (abstract syntax tree)
- Simplify: Z3 attempts algebraic simplification — reducing complex bit operations to simpler forms
- Prove: Z3 checks whether the expression is a tautology (always true) or contradiction (always false)
- Report: If it’s one of these, the expression is an opaque predicate, and the corresponding branch is dead code
For example, given (x * (x + 1)) % 2 == 0:
- Z3 recognizes that
x * (x + 1) is always even (one of two consecutive integers is even)
- Z3 proves the expression is a tautology
- Kong marks the
else branch as dead code
The simplifier also handles instruction substitution by reducing complex bit expressions to their simpler equivalents. (~a & b) | (a & ~b) simplifies to a ^ b through Z3’s bit-vector reasoning.
Dead code elimination
After opaque predicates are resolved, Kong prunes the dead branches. The algorithm:
- Take the set of resolved predicates and their constant truth values (from the Z3 simplifier)
- Walk the decompilation AST
- For each
if/else guarded by a resolved predicate:
- If the predicate is always true: keep the
if body, remove the else body
- If the predicate is always false: keep the
else body (if it exists), remove the if body
- Remove any variables and assignments that are only referenced in deleted branches
This produces cleaner decompilation with the garbage code stripped away, making the real logic visible to the LLM.
State machine tracing for CFF
When a function uses control flow flattening, trace_state_machine reconstructs the original control flow:
- Identify the dispatcher: Find the
while(1)/switch(state) loop and the state variable
- Extract transitions: For each switch case, determine what value the state variable is set to — this gives the edges in the state graph
- Find entry and exit: The initial state value is the entry point; cases that break out of the loop or return are exits
- Reconstruct flow: Build a directed graph of state transitions. The original control flow is the path through this graph from entry to exit
- Simplify: Collapse linear chains of states (A→B→C where B has no other edges) back into sequential code
The result is a representation of the original control flow that the LLM can reason about, even though the compiled code uses a flat dispatcher.
Further reading