Skip to main content
The single biggest factor in Kong’s analysis quality is not which LLM you use — it is what you put in the prompt. Kong builds a structured context window for every function before sending it to the model. This page explains what that window contains and why each piece matters.

The Naive Approach (and Why It Fails)

The simplest possible approach to LLM-assisted reverse engineering is to paste a function’s decompiled output directly into a prompt and ask “what does this do?”
void FUN_00401a30(undefined8 param_1, undefined4 param_2) {
    undefined4 local_10;
    local_10 = FUN_00401b50(param_1, param_2);
    FUN_00401c00(param_1, local_10);
    return;
}
The LLM has almost nothing to work with. Every identifier is meaningless. The callees are opaque. There is no way to infer that param_1 is a request buffer, that param_2 is a max length, or that FUN_00401b50 parses an HTTP header. Kong fixes this by assembling rich context from Ghidra’s program database before the prompt is built.

What Goes Into a Context Window

Every context window includes up to six categories of information, assembled by the Analyzer._build_context method.

1. Normalized Decompilation

The target function’s decompiled C code, run through Kong’s syntactic normalizer to strip Ghidra artifacts and produce cleaner output.
## Target Function: FUN_00401a30 (0x00401a30)
Size: 94 bytes

### Decompilation
This is always present — it is the core of the prompt.

2. Cross-References (Callers and Callees)

Cross-references (xrefs) are the “who calls whom” relationships in a binary. If function A calls function B, then A has an outgoing xref to B and B has an incoming xref from A. Ghidra tracks these automatically during its initial analysis pass.
Kong fetches decompilation snippets (first 10 lines) for both callers and callees of the target function:
  • Callees (up to 5) — functions called by the target. If a callee has already been analyzed, its recovered name appears in the snippet.
  • Callers (up to 3) — functions that call the target. These provide usage context.
This is where call-graph-ordered analysis pays off. Because Kong processes functions bottom-up, when it analyzes FUN_00401a30, the callee FUN_00401b50 may already have been renamed to parse_http_header. The LLM sees:
### Called Functions
#### parse_http_header (0x00401b50)
int parse_http_header(char *request_buffer, int max_length) {
    ...
Instead of the meaningless FUN_00401b50(param_1, param_2), the model now knows the target function is calling an HTTP header parser. That single piece of context can determine the entire analysis.

3. String References

String references are literal string constants (like "Content-Type" or "Error: invalid input") that a function accesses. In Ghidra, these show up as data cross-references from the function to addresses in the binary’s read-only data section.
Kong resolves data cross-references from the function to Ghidra’s string table:
### Referenced Strings
- "Content-Type"
- "HTTP/1.1 200 OK"
- "\r\n"
String references are among the most powerful signals for naming. A function that references "AES", "encrypt", and "key" is almost certainly cryptographic. A function that references "malloc failed" is doing memory allocation with error handling.

4. Already-Identified Functions

A map of all functions that Kong has already named in earlier analysis passes:
### Already Identified Functions
- 0x00401100: init_connection
- 0x00401200: parse_request_line
- 0x00401b50: parse_http_header
This gives the LLM a “vocabulary” of the binary. Even if a function is not a direct caller or callee, seeing that the binary contains init_connection, parse_request_line, and parse_http_header tells the model this is an HTTP server — which influences how it interprets ambiguous functions.

5. Known Struct Types

Struct recovery is the process of reconstructing C struct layouts from pointer arithmetic patterns in decompiled code. When a function accesses *(param + 0x10) and *(param + 0x18), those fixed offsets suggest param points to a struct with fields at those positions.
Struct definitions recovered from earlier function analyses:
### Known Struct Types
struct http_request { // 48 bytes
    char *method;         // offset 0x0, 8 bytes
    char *uri;            // offset 0x8, 8 bytes
    char *headers;        // offset 0x10, 8 bytes
    int content_length;   // offset 0x18, 4 bytes
};
When the current function accesses a pointer at the same offsets, the LLM can reuse the existing struct definition instead of inventing a new one.

6. Binary Metadata

Architecture, format, and compiler information for the binary:
Binary: x86_64 ELF (GCC)
This influences calling conventions, name mangling expectations, and common library patterns. A MIPS binary compiled with a different toolchain will have different idioms than an x86_64 ELF.

How the Prompt Is Assembled

The Analyzer._build_prompt method concatenates these sections in order:
  1. Binary metadata (arch, format, compiler)
  2. Target function header (name, address, size)
  3. Normalized decompilation
  4. Referenced strings
  5. Called functions (callee snippets)
  6. Calling functions (caller snippets)
  7. Already-identified function list
  8. Known struct types with field layouts
Each section is conditionally included — if a function has no string references or no known callee snippets, those sections are omitted to save tokens.

Why This Matters

The difference between naive and context-enriched prompting is dramatic:
ApproachWhat the LLM sees for a calleeNaming accuracy
NaiveFUN_00401b50(param_1, param_2)Low — the model guesses based on structure alone
Kongparse_http_header(request_buffer, max_length)High — the model recognizes the calling pattern
Kong’s context windows turn a guessing game into a pattern-matching exercise. The LLM does not need to deduce what a function does from first principles — it recognizes the function’s role from its relationships, its strings, and the names of functions around it. This is also why call-graph ordering and semantic synthesis matter: each analyzed function enriches the context available to every function analyzed after it.
Last modified on March 20, 2026