Documentation Index
Fetch the complete documentation index at: https://docs.kong.fyi/llms.txt
Use this file to discover all available pages before exploring further.
The problem
Ghidra’s decompiler produces syntactically valid C, but it’s noisy. Negative numbers appear as unsigned hex, modulo operations are expanded into division chains, and undefined types litter the output. This noise wastes LLM tokens and can confuse the model. Kong runs four normalization passes on every function’s decompilation before sending it to the LLM.Normalization passes
1. Negative literal recovery
Ghidra often represents negative numbers as large unsigned values or with awkward+ - syntax.
Before:
+ -N and converts them to - N, making the code more readable.
2. Modulo operation recovery
Compilers optimizex % N into a division-multiply-subtract pattern. Ghidra decompiles the optimized form, which obscures the original modulo:
Before:
3. Undefined type inference
Ghidra uses placeholder types likeundefined4 (4-byte unknown) and undefined8 (8-byte unknown) when it can’t determine the actual type. Kong infers types from usage context:
Before:
undefined4→intwhen the variable is used as a loop counter or accumulator (initialized to 0, incremented)undefined8→longwhen the variable is compared to NULL, assigned from aDAT_global, or cast-dereferenced as a pointer
4. Dead null-assignment removal
Inside null-check blocks, Ghidra sometimes emits redundant assignments that set the variable being checked to(type *)0x0. These are artifacts of decompilation, not real logic:
Before:
Impact
These transformations are small individually but compound across a binary. In a 500-function analysis, normalization typically reduces total token usage by 10-15% and improves LLM accuracy by removing confusing patterns that the model might misinterpret.Further reading
- Context Windows — normalized code goes into the LLM prompt
- Pipeline Overview — normalization runs during the analysis phase

