Pipeline Overview

Kong analyzes stripped binaries through a five-phase pipeline orchestrated by a central Supervisor. Each phase builds on the output of the previous one, and the entire pipeline runs from a single command.

                    ┌──────────────────────┐
                    │       Triage         │
                    │  enumerate, classify,│
                    │  build call graph,   │
                    │  match signatures    │
                    └──────────┬───────────┘
                               │
                               ▼
              ┌────────────────┼────────────────┐
              │                │                │
              ▼                ▼                ▼
     ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
     │   Analyze    │ │   Analyze    │ │     ...      │
     │  (leaf fns)  │ │ (next tier)  │ │              │
     └──────┬───────┘ └──────┬───────┘ └──────┬───────┘
            │                │                │
            └────────┬───────┴────────────────┘
                     │
                     ▼
            ┌──────────────────────┐
            │      Cleanup                      │
            │  normalize, dedupe                │
            └──────────┬───────────┘
                       │
                       ▼
            ┌──────────────────────┐
            │     Synthesis                      │
            │  unify names, build                │
            │  structs, deobfuscate              │
            └──────────┬───────────┘
                       │
                       ▼
            ┌──────────────────────┐
            │       Export                       │
            │  analysis.json +                  │
            │  Ghidra writeback                 │
            └──────────────────────┘

Phase 1: Triage

Triage is the reconnaissance phase. Before any LLM calls happen, Kong builds a complete picture of the binary.

What it does

Enumerate all functions — Kong queries Ghidra’s program database for every function in the binary, collecting address, size, parameters, return types, and calling convention.

Classify by size — Each function is assigned a classification based on its byte size:

Classification	Size	Behavior
`IMPORTED`	N/A	External library function (linked dynamically). Skipped.
`THUNK`	N/A	Thin wrapper that jumps to another function. Skipped.
`TRIVIAL`	16 bytes or fewer	Too small to meaningfully analyze. Skipped.
`SMALL`	17 — 64 bytes	Analyzed, often simple utility functions.
`MEDIUM`	65 — 256 bytes	The bulk of most binaries.
`LARGE`	Over 256 bytes	Complex functions that get full context windows.

Build the call graph — For every function, Kong retrieves its callers and callees from Ghidra’s cross-reference database. This produces a directed graph that determines analysis ordering.
Detect source language — Kong examines function name patterns and string references to identify whether the binary was compiled from C, C++, Go, or Rust. This affects how the LLM interprets the decompiler output.
Run signature matching — Functions are compared against a database of known signatures for standard library and cryptographic routines. Matched functions are marked as resolved and skip LLM analysis entirely, saving time and cost.

Signature matching identifies known library functions (like memcpy or AES_encrypt) by comparing their byte patterns against a database of known signatures. This is faster and more reliable than sending them to an LLM.

Why it matters

Triage determines what needs to be analyzed and in what order. By classifying functions upfront, Kong avoids wasting LLM calls on trivial wrappers and imported functions. By building the call graph first, it ensures the analysis phase processes functions in the optimal bottom-up order.

Phase 2: Analysis

The analysis phase is where the LLM does its work. Kong processes functions in bottom-up order from the call graph, so leaf functions are named first and their callers benefit from that context.

What it does

For each function (or chunk of functions):

Build context window — Kong assembles a prompt that includes the function’s decompilation, cross-references, string references, and the signatures of already-analyzed callees. See Context Windows for details.
Normalize decompiler output — Raw Ghidra decompilation is cleaned up through syntactic normalization: modulo recovery, negative literal reconstruction, and dead assignment removal. This reduces noise and token waste.
Detect obfuscation — If the decompiler output shows signs of obfuscation (control flow flattening, bogus control flow, opaque predicates), the function is routed through an agentic deobfuscation pipeline with symbolic tool access instead of the standard batch path.
Send to LLM — The assembled prompt is sent to the configured LLM (Claude or GPT-4o). For non-obfuscated functions, Kong batches multiple functions into a single prompt to reduce API overhead. Obfuscated functions are processed individually with tool access.
Write back to Ghidra immediately — As soon as the LLM returns a result, Kong writes the recovered name, signature, and type information back into Ghidra’s program database. This is critical: when the next function is analyzed, its decompilation will show the real names of its callees instead of auto-generated labels.

Bottom-up analysis means starting from leaf functions (those that don’t call other analyzed functions) and working upward through the call graph. When you analyze parse_http_header first, then later analyze handle_request which calls it, the LLM sees parse_http_header by name instead of FUN_00401a30.

Why it matters

The immediate writeback is what makes call-graph-ordered analysis worthwhile. If Kong waited until the end to write names back, every function would still see FUN_ labels for its callees, defeating the purpose of bottom-up ordering. By writing results back as they arrive, each successive function gets better context.

Phase 3: Cleanup

Cleanup is a reconciliation pass that fixes inconsistencies from the analysis phase.

What it does

Unify struct types — During analysis, the LLM may propose struct definitions for data it sees being accessed through pointers. Different functions might propose overlapping or conflicting structs for the same memory layout. The cleanup phase collects all struct proposals and unifies them into a consistent set of type definitions, then applies them to Ghidra.
Retry failed signatures — Some function signatures proposed by the LLM may fail to apply during analysis because they reference types that didn’t exist yet. After struct unification creates those types, Kong retries the pending signatures.

Why it matters

The LLM analyzes functions independently (or in small batches). Each function’s struct proposals are local guesses. Cleanup reconciles those guesses into a globally consistent type system. Without this phase, you would end up with duplicate or conflicting struct definitions. See Type Recovery for details on how struct unification works.

Phase 4: Synthesis

Synthesis takes a global view across the entire binary. While the analysis phase works function-by-function, synthesis looks at relationships between functions.

What it does

Unify naming conventions — Different LLM calls may use inconsistent naming styles (parse_header vs parseHeader vs header_parse). Synthesis reviews the most-connected functions and standardizes naming across the binary.
Rename DAT_ globals — Ghidra labels global variables as DAT_00601020. Synthesis infers meaningful names from how those globals are used across multiple functions and renames them.
Synthesize structs from field patterns — When multiple functions access the same pointer offsets in similar ways, synthesis can infer struct definitions even if no single function made the pattern obvious. See Semantic Synthesis for details.
Refine names — Names that looked reasonable in isolation may look wrong in context. If process_data and handle_data both appear in the same call chain doing related things, synthesis may refine them to process_request_body and handle_response_body.

Why it matters

Individual function analysis happens without full cross-binary context. Synthesis is the phase that makes the output coherent — a binary where naming conventions are consistent and struct definitions reflect the actual data layout rather than per-function guesses.

Phase 5: Export

Export writes the final results to disk.

What it does

Kong produces output in the formats configured for the run:

analysis.json — A structured JSON file containing every recovered function name, signature, classification, confidence score, and reasoning. This is the primary machine-readable output.
Annotated source (decompiled.c) — The full decompilation with all recovered names, types, and signatures applied. Readable C-like source with the LLM’s analysis baked in.
Ghidra writeback — All names, types, and signatures are written back into Ghidra’s program database throughout the analysis, so by the time export runs, the Ghidra project already reflects the full analysis.

See Output Formats for details on each format and how to configure them.

Why it matters

Different workflows need different outputs. If you are continuing manual analysis in Ghidra, the writeback is what matters. If you are feeding results into another tool, analysis.json is the structured format. If you want a quick read, the annotated source gives you the full decompilation with real names.

Running the Pipeline

The entire pipeline runs from a single command:

kong analyze ./path/to/binary

See Analyzing a Binary for the full walkthrough, including provider selection, output configuration, and what to expect during each phase.

Next Steps

Call-Graph Analysis — how bottom-up ordering works and why it matters
Context Windows — what information Kong assembles for each LLM call
Deobfuscation — how Kong handles obfuscated functions
Analyzing a Binary — end-to-end usage guide

​Phase 1: Triage

​What it does

​Why it matters

​Phase 2: Analysis

​What it does

​Why it matters

​Phase 3: Cleanup

​What it does

​Why it matters

​Phase 4: Synthesis

​What it does

​Why it matters

​Phase 5: Export

​What it does

​Why it matters

​Running the Pipeline

​Next Steps

Phase 1: Triage

What it does

Why it matters

Phase 2: Analysis

What it does

Why it matters

Phase 3: Cleanup

What it does

Why it matters

Phase 4: Synthesis

What it does

Why it matters

Phase 5: Export

What it does

Why it matters

Running the Pipeline

Next Steps