Why Per-Function Analysis Isn’t Enough
Kong’s main analysis phase processes functions one at a time, bottom-up from the call graph. Each function gets a rich context window and a dedicated LLM call. This produces good per-function results — but it also produces inconsistencies.
When functions are analyzed independently, the LLM may choose different names for the same concept. One function’s callee might be named parse_request, while another caller refers to the same pattern as decode_http_request. Global variables like DAT_00412040 might be described as “config buffer” in one function’s analysis and “settings data” in another.
Semantic synthesis is a post-analysis unification pass. After every function has been individually analyzed and named, Kong makes one final LLM call that sees all the results together and harmonizes them into a consistent whole.
These inconsistencies are not bugs — they are an inherent limitation of per-function analysis. Semantic synthesis exists to resolve them.
How the SemanticSynthesizer Works
After the main analysis phase completes, Kong’s SemanticSynthesizer runs a single LLM call over the most-connected functions in the binary. This call has three objectives:
1. Rename Global Variables
Ghidra labels global variables with addresses like DAT_00412040. During per-function analysis, the LLM sees these labels in context but cannot rename them — it only has authority over the current function.
The synthesis pass collects every DAT_XXXXXXXX reference across all decompilations and identifies globals that appear in multiple functions. These multi-use globals are the most valuable to rename because a meaningful name propagates across the entire binary.
The LLM sees each global alongside the list of functions that reference it, giving it enough context to propose names like g_config_buffer or connection_pool.
2. Synthesize Struct Definitions
When multiple functions access fields at consistent offsets from the same base pointer or global, that pattern strongly suggests a struct. The synthesis pass asks the LLM to identify these patterns and propose struct definitions.
This complements the type recovery system, which accumulates struct proposals during per-function analysis. Synthesis catches cross-function patterns that individual analyses miss.
3. Refine Function Names
With all function names visible at once, the LLM can spot inconsistencies and propose refinements. If the per-function pass produced both parse_request and decode_http_request for similar functions, the synthesis pass can unify them under a consistent convention.
The Synthesis Prompt
The synthesizer builds a prompt that includes:
- A global variables section listing every
DAT_XXXXXXXX that appears in two or more functions, alongside the names of the functions that reference it
- A functions section with each function’s name, classification, confidence, and full decompilation
To keep the prompt within LLM context limits, the synthesizer caps the number of included functions. When the binary has more functions than the cap, it prioritizes the most-connected functions — those with the highest cross-reference counts — since they provide the most signal for unification.
The Response
The LLM responds with a single JSON object containing three fields:
{
"globals": {
"DAT_00412040": "g_config_buffer",
"DAT_00413080": "connection_pool"
},
"structs": [
{
"name": "ConnectionState",
"fields": [
{"name": "socket_fd", "type": "int", "offset": 0},
{"name": "flags", "type": "uint32_t", "offset": 4},
{"name": "read_buffer", "type": "char *", "offset": 8}
]
}
],
"name_refinements": {
"0x00401a30": "parse_http_request",
"0x00402b10": "send_http_response"
}
}
Kong parses this into a SynthesisResult and applies the global renames directly to all decompilations — every occurrence of DAT_00412040 becomes g_config_buffer across the entire binary.
When Synthesis Runs
Semantic synthesis runs after the main analysis phase and cleanup are complete. It is the second-to-last step before export:
- Triage — enumerate functions, build call graph, match signatures
- Analysis — per-function LLM calls, bottom-up
- Cleanup — normalize results, unify struct proposals
- Synthesis — global unification pass (this page)
- Export — write to Ghidra and
analysis.json
For more on how the full pipeline fits together, see Pipeline Overview.