Skip to main content

The Problem: Lost Type Information

Stripped binaries have no type information. When the original source had a struct like this:
typedef struct {
    int socket_fd;
    uint32_t flags;
    char *read_buffer;
    size_t buffer_size;
} ConnectionState;
The decompiler sees raw pointer arithmetic:
*(int *)(param_1)           // offset 0x0 — socket_fd
*(uint *)(param_1 + 0x4)    // offset 0x4 — flags
*(char **)(param_1 + 0x8)   // offset 0x8 — read_buffer
*(ulong *)(param_1 + 0x10)  // offset 0x10 — buffer_size
Type recovery is the process of reconstructing high-level type information — struct layouts, field names, field types — from raw pointer offsets and usage patterns in decompiled code. It is one of the most time-consuming tasks in manual reverse engineering.
Kong recovers struct definitions automatically by accumulating proposals from per-function LLM analysis, merging proposals that describe the same underlying struct, and creating the unified types in Ghidra.

Phase 1: Struct Proposals

During the main analysis phase, each function is sent to the LLM with a rich context window. When the LLM sees a function accessing a pointer parameter at multiple offsets, it proposes a struct definition. A single function might produce a proposal like:
name: "ConnectionState"
total_size: 24
fields:
  - offset: 0x0,  name: "socket_fd",    type: "int",      size: 4
  - offset: 0x4,  name: "flags",        type: "uint32_t", size: 4
  - offset: 0x8,  name: "read_buffer",  type: "char *",   size: 8
used_by_param: "param_1"
source_function: 0x00401a30
Each proposal is tagged with the function it came from and the parameter it applies to. The StructAccumulator collects these proposals as functions are analyzed. Multiple functions often access the same struct. A send_response function might produce its own proposal with different fields at different offsets — but for the same underlying type. These overlapping proposals are the raw material for the merge phase.

Phase 2: Merging Proposals

After all functions have been analyzed, the accumulator’s unify() method merges proposals that describe the same struct.

Grouping

Proposals are grouped by name. When the LLM independently names a struct ConnectionState in two different functions, those proposals land in the same group.

Merging Fields

Within a group, fields are merged by offset. When multiple proposals define a field at the same offset, the merge picks the best candidate using a scoring system:
  • Type specificity wins. A field typed char * beats one typed undefined8. The generic types undefined, undefined4, and undefined8 score lowest.
  • Name descriptiveness wins. A field named socket_fd beats one named field_0x0. Generic names like field, unk, undefined, and pad score lowest.
This means that even if only one out of five proposals has a descriptive name for a given field, that name is preserved in the merged result.

Name Selection

The struct name that appears most frequently across proposals wins. If three proposals call it ConnectionState and one calls it ConnState, the merged struct is named ConnectionState.

Size Calculation

The total struct size is the maximum of all proposed sizes and the end offset of the last merged field, ensuring no fields are lost to truncation.

Phase 3: Ghidra Type Creation

Once structs are unified, Kong creates them in Ghidra’s type system using create_struct. Each UnifiedStruct becomes a real Ghidra data type that Ghidra’s decompiler can use to improve its output. Kong also applies struct types to function parameters. If a proposal was tagged with used_by_param: "param_1" from function 0x00401a30, Kong resolves which parameter ordinal param_1 corresponds to and sets its type to a pointer to the new struct.

Error Handling

Type creation in Ghidra can fail — name collisions, invalid sizes, and other edge cases. Kong handles these gracefully: if a struct fails to create, it logs a warning and continues with the remaining structs. If a parameter type application fails, it logs and moves on. No single type failure blocks the rest of the pipeline.

What Re-Analysis Gets You

The apply_unified_structs function returns a list of function addresses whose parameters were retyped. These functions are candidates for re-analysis — with the struct types now applied, Ghidra’s decompiler produces cleaner output with named field accesses instead of raw offsets, which in turn gives the LLM better input for a second pass. Before type application:
void FUN_00401a30(long param_1) {
    if (*(int *)(param_1 + 0x4) & 0x1) {
        send(*(int *)param_1, *(char **)(param_1 + 0x8), 0x400, 0);
    }
}
After type application:
void handle_connection(ConnectionState *conn) {
    if (conn->flags & 0x1) {
        send(conn->socket_fd, conn->read_buffer, 0x400, 0);
    }
}
  • Semantic Synthesis — the global unification pass that can synthesize additional structs from cross-function patterns
  • Context Windows — how per-function context is built before LLM analysis
  • Pipeline Overview — where type recovery fits in the overall analysis pipeline
Last modified on March 20, 2026