Call-Graph Analysis

Kong does not analyze functions in an arbitrary order. It builds a call graph from the binary, computes a bottom-up ordering, and processes functions from the leaves upward. This single design decision is one of the biggest factors in analysis quality.

What is a Call Graph?

A call graph is a directed graph where each node is a function and each edge represents a function call. If main calls handle_request, and handle_request calls parse_http_header, the graph has edges main → handle_request → parse_http_header. Leaf functions (like parse_http_header in this example) have no outgoing edges to other analyzed functions.

When Ghidra loads a binary, it identifies cross-references between functions — which function calls which. Kong queries these cross-references during the triage phase and builds caller/callee adjacency lists for every function in the binary. The call graph captures the dependency structure of the program. Higher-level functions depend on lower-level ones. An HTTP server’s main might call start_server, which calls handle_connection, which calls parse_request, which calls read_line. Understanding read_line makes parse_request clearer, which makes handle_connection clearer, and so on.

Why Bottom-Up Order Matters

The core problem: when Ghidra decompiles a stripped binary, every function has an auto-generated name like FUN_00401a30. When the LLM analyzes a function, its decompilation includes calls to other functions — and those calls show up as FUN_ labels. If Kong analyzed handle_request before its callees, the LLM would see something like:

void FUN_00401b00(int param_1) {
    char *buf = (char *)malloc(4096);
    FUN_00401a30(param_1, buf);   // what is this?
    FUN_00401c10(buf);             // what is this?
    free(buf);
}

The LLM has to guess what FUN_00401a30 and FUN_00401c10 do based on how they are called. It can make reasonable guesses, but they are guesses. If Kong instead analyzes the callees first and writes their names back to Ghidra immediately, the decompilation of handle_request now shows:

void FUN_00401b00(int param_1) {
    char *buf = (char *)malloc(4096);
    parse_http_header(param_1, buf);   // already named
    send_response(buf);                 // already named
    free(buf);
}

Now the LLM can see that this function parses an HTTP header and sends a response. It will almost certainly name it something like handle_request — a much better result than it would produce from the FUN_ version. This is why immediate writeback matters. As soon as the LLM names a function, Kong writes that name back into Ghidra’s program database. Every subsequent decompilation benefits from all previously recovered names.

How the BFS Work Queue Works

Kong computes the bottom-up ordering using a breadth-first search (BFS) from leaf functions.

Step 1: Identify leaf functions

A leaf function is one that has no callees within the set of functions being analyzed. It might call external library functions (like malloc or printf), but those are already named — they were identified as IMPORTED during triage and excluded from analysis.

Step 2: Assign depths via BFS

Starting from the leaves (depth 0), Kong traverses the call graph upward through callers:

All leaf functions get depth 0.
For each function at the current depth, look at its callers.
A caller’s depth is the maximum depth of any of its callees, plus one.
Repeat until every reachable function has a depth.

Functions that have no callees and no callers (isolated nodes) also get depth 0.

Step 3: Build the priority queue

Functions are sorted by depth (ascending), with ties broken by function size (smaller first). This produces the analysis order: leaves first, then their direct callers, then the callers of callers, up to the top-level entry points.

A Concrete Example

Consider a small binary with four functions:

main
 ├── handle_connection
 │    ├── parse_request
 │    └── send_response
 └── log_startup

BFS depth assignment:

Function	Callees (analyzed)	Depth
`parse_request`	none	0 (leaf)
`send_response`	none	0 (leaf)
`log_startup`	none	0 (leaf)
`handle_connection`	`parse_request`, `send_response`	1
`main`	`handle_connection`, `log_startup`	2

Analysis order (depth ascending, smaller functions first within same depth):

parse_request (depth 0)
send_response (depth 0)
log_startup (depth 0)
handle_connection (depth 1) — decompilation now shows parse_request and send_response by name
main (depth 2) — decompilation now shows handle_connection and log_startup by name

By the time Kong analyzes main, every function it calls has already been named. The LLM sees real names in the decompilation and produces a much better result.

Priority within the same depth

Within the same BFS depth, Kong processes smaller functions first. The priority for each function is computed as:

priority = depth * 1,000,000 + size_in_bytes

This means all depth-0 functions come before all depth-1 functions, and within depth 0, a 32-byte function is processed before a 200-byte function. Smaller functions tend to be simpler, so naming them first provides useful context for the larger functions at the same depth that may call into them via indirect paths not captured in the static call graph.

What Gets Skipped

Not every function enters the work queue. During triage, the following classifications are excluded:

IMPORTED — External library functions that are already named (e.g., malloc, printf, send).
THUNK — Thin wrappers that simply jump to another function. Ghidra already resolves these.

TRIVIAL functions (16 bytes or fewer) enter the queue but are marked as skipped during analysis — they are too small for the LLM to meaningfully analyze, and their names rarely matter to callers. Everything else (SMALL, MEDIUM, LARGE) is analyzed in full.

Handling Cycles

Real call graphs can contain cycles (recursion, mutual recursion). Kong’s BFS handles this by tracking visited nodes. If a function has already been assigned a depth through one path, a later path can only increase its depth, never decrease it. Cyclic functions end up at the depth of their deepest non-cyclic dependency plus one. In practice, most binaries have relatively few cycles. The BFS still produces a useful ordering — the recursive functions get analyzed after their non-recursive dependencies.

Next Steps

Pipeline Overview — how call-graph analysis fits into the five-phase pipeline
Context Windows — what information each LLM call receives
Analyzing a Binary — end-to-end walkthrough of running an analysis

​What is a Call Graph?

​Why Bottom-Up Order Matters

​How the BFS Work Queue Works

​Step 1: Identify leaf functions

​Step 2: Assign depths via BFS

​Step 3: Build the priority queue

​A Concrete Example

​Priority within the same depth

​What Gets Skipped

​Handling Cycles

​Next Steps

What is a Call Graph?

Why Bottom-Up Order Matters

How the BFS Work Queue Works

Step 1: Identify leaf functions

Step 2: Assign depths via BFS

Step 3: Build the priority queue

A Concrete Example

Priority within the same depth

What Gets Skipped

Handling Cycles

Next Steps