Semantic Synthesis

Why Per-Function Analysis Isn’t Enough

Kong’s main analysis phase processes functions one at a time, bottom-up from the call graph. Each function gets a rich context window and a dedicated LLM call. This produces good per-function results — but it also produces inconsistencies. When functions are analyzed independently, the LLM may choose different names for the same concept. One function’s callee might be named parse_request, while another caller refers to the same pattern as decode_http_request. Global variables like DAT_00412040 might be described as “config buffer” in one function’s analysis and “settings data” in another.

Semantic synthesis is a post-analysis unification pass. After every function has been individually analyzed and named, Kong makes one final LLM call that sees all the results together and harmonizes them into a consistent whole.

These inconsistencies are not bugs — they are an inherent limitation of per-function analysis. Semantic synthesis exists to resolve them.

How the SemanticSynthesizer Works

After the main analysis phase completes, Kong’s SemanticSynthesizer runs a single LLM call over the most-connected functions in the binary. This call has three objectives:

1. Rename Global Variables

Ghidra labels global variables with addresses like DAT_00412040. During per-function analysis, the LLM sees these labels in context but cannot rename them — it only has authority over the current function. The synthesis pass collects every DAT_XXXXXXXX reference across all decompilations and identifies globals that appear in multiple functions. These multi-use globals are the most valuable to rename because a meaningful name propagates across the entire binary. The LLM sees each global alongside the list of functions that reference it, giving it enough context to propose names like g_config_buffer or connection_pool.

2. Synthesize Struct Definitions

When multiple functions access fields at consistent offsets from the same base pointer or global, that pattern strongly suggests a struct. The synthesis pass asks the LLM to identify these patterns and propose struct definitions. This complements the type recovery system, which accumulates struct proposals during per-function analysis. Synthesis catches cross-function patterns that individual analyses miss.

3. Refine Function Names

With all function names visible at once, the LLM can spot inconsistencies and propose refinements. If the per-function pass produced both parse_request and decode_http_request for similar functions, the synthesis pass can unify them under a consistent convention.

The Synthesis Prompt

The synthesizer builds a prompt that includes:

A global variables section listing every DAT_XXXXXXXX that appears in two or more functions, alongside the names of the functions that reference it
A functions section with each function’s name, classification, confidence, and full decompilation

To keep the prompt within LLM context limits, the synthesizer caps the number of included functions. When the binary has more functions than the cap, it prioritizes the most-connected functions — those with the highest cross-reference counts — since they provide the most signal for unification.

The Response

The LLM responds with a single JSON object containing three fields:

{
  "globals": {
    "DAT_00412040": "g_config_buffer",
    "DAT_00413080": "connection_pool"
  },
  "structs": [
    {
      "name": "ConnectionState",
      "fields": [
        {"name": "socket_fd", "type": "int", "offset": 0},
        {"name": "flags", "type": "uint32_t", "offset": 4},
        {"name": "read_buffer", "type": "char *", "offset": 8}
      ]
    }
  ],
  "name_refinements": {
    "0x00401a30": "parse_http_request",
    "0x00402b10": "send_http_response"
  }
}

Kong parses this into a SynthesisResult and applies the global renames directly to all decompilations — every occurrence of DAT_00412040 becomes g_config_buffer across the entire binary.

When Synthesis Runs

Semantic synthesis runs after the main analysis phase and cleanup are complete. It is the second-to-last step before export:

Triage — enumerate functions, build call graph, match signatures
Analysis — per-function LLM calls, bottom-up
Cleanup — normalize results, unify struct proposals
Synthesis — global unification pass (this page)
Export — write to Ghidra and analysis.json

For more on how the full pipeline fits together, see Pipeline Overview.

Type Recovery — how struct proposals are accumulated and merged during analysis
Context Windows — how Kong builds per-function context for LLM calls
Call-Graph Analysis — how bottom-up ordering maximizes context propagation

Getting Started

Core Concepts

Usage

Configuration

Semantic Synthesis

Why Per-Function Analysis Isn’t Enough

How the SemanticSynthesizer Works

1. Rename Global Variables

2. Synthesize Struct Definitions

3. Refine Function Names

The Synthesis Prompt

The Response

When Synthesis Runs

Getting Started

Core Concepts

Usage

Configuration

Documentation Index

​Why Per-Function Analysis Isn’t Enough

​How the SemanticSynthesizer Works

​1. Rename Global Variables

​2. Synthesize Struct Definitions

​3. Refine Function Names

​The Synthesis Prompt

​The Response

​When Synthesis Runs

​Related

Why Per-Function Analysis Isn’t Enough

How the SemanticSynthesizer Works

1. Rename Global Variables

2. Synthesize Struct Definitions

3. Refine Function Names

The Synthesis Prompt

The Response

When Synthesis Runs

Related