Signature Matching

The idea

Before spending LLM tokens on a function, Kong checks whether it matches a known pattern. Standard library functions like malloc, strcmp, and printf have well-documented signatures. Cryptographic functions like AES, SHA-256, and MD5 follow recognizable patterns. If Kong can identify these by signature alone, it skips the LLM entirely.

Signature databases

Kong ships with two signature databases:

C Standard Library (`stdlib.json`)

Contains signatures for common C stdlib functions — memory management (malloc, free, calloc, realloc), string operations (strlen, strcpy, strcmp, strcat), I/O (printf, fprintf, fopen, fread), and more. Each entry includes the function name, common aliases, and the expected signature:

{
  "name": "malloc",
  "aliases": ["__libc_malloc", "_malloc"],
  "signature": "void * malloc(size_t size)",
  "classification": "memory"
}

Cryptographic Functions (`crypto.json`)

Contains signatures for cryptographic primitives — AES (aes_encrypt, AES_set_encrypt_key), SHA-256 (sha256_init, SHA256_Update), MD5, RC4, ChaCha20, and others.

How matching works

During the triage phase, Kong runs every function through the signature matcher:

Name normalization: Strip leading underscores and convert to lowercase. __libc_malloc → malloc
Database lookup: Check the normalized name against all entries in both databases, including aliases
Import detection: Functions flagged as imports by Ghidra are checked against the databases
Mark as resolved: Matched functions get their name, signature, and classification from the database and skip LLM analysis entirely

Example

A function named _SHA256_Init in the binary:

Normalize: sha256_init
Match found in crypto.json → classified as crypto
Signature applied: void SHA256_Init(SHA256_CTX *ctx)
Function skips the LLM queue — no tokens spent

Why this matters

Signature matching saves both time and money. In a binary with 500 functions, 50-100 might be recognizable standard library or crypto functions. At ~

0.02 per function (Claude Opus), that's

1-2 saved per analysis. More importantly, it’s 50-100 fewer functions in the LLM queue, which means faster overall analysis. The matched signatures also improve downstream analysis. When a caller invokes malloc, the LLM sees malloc(size) in the callee context instead of FUN_00401234(param_1), which helps it understand what the caller is doing.

Current limitations

Signature matching is name-based only — it doesn’t do structural matching against the function’s decompilation. A function that implements strlen but has a completely different name (like FUN_00402000) won’t be matched. That’s what the LLM is for.

Getting Started

Core Concepts

Usage

Configuration

Signature Matching

The idea

Signature databases

C Standard Library (`stdlib.json`)

Cryptographic Functions (`crypto.json`)

How matching works

Example

Why this matters

Current limitations

Further reading

Getting Started

Core Concepts

Usage

Configuration

Documentation Index

​The idea

​Signature databases

​C Standard Library (stdlib.json)

​Cryptographic Functions (crypto.json)

​How matching works

​Example

​Why this matters

​Current limitations

​Further reading

The idea

Signature databases

C Standard Library (`stdlib.json`)

Cryptographic Functions (`crypto.json`)

How matching works

Example

Why this matters

Current limitations

Further reading