The idea
Before spending LLM tokens on a function, Kong checks whether it matches a known pattern. Standard library functions like malloc, strcmp, and printf have well-documented signatures. Cryptographic functions like AES, SHA-256, and MD5 follow recognizable patterns. If Kong can identify these by signature alone, it skips the LLM entirely.
Signature databases
Kong ships with two signature databases:
C Standard Library (stdlib.json)
Contains signatures for common C stdlib functions — memory management (malloc, free, calloc, realloc), string operations (strlen, strcpy, strcmp, strcat), I/O (printf, fprintf, fopen, fread), and more.
Each entry includes the function name, common aliases, and the expected signature:
{
"name": "malloc",
"aliases": ["__libc_malloc", "_malloc"],
"signature": "void * malloc(size_t size)",
"classification": "memory"
}
Cryptographic Functions (crypto.json)
Contains signatures for cryptographic primitives — AES (aes_encrypt, AES_set_encrypt_key), SHA-256 (sha256_init, SHA256_Update), MD5, RC4, ChaCha20, and others.
How matching works
During the triage phase, Kong runs every function through the signature matcher:
- Name normalization: Strip leading underscores and convert to lowercase.
__libc_malloc → malloc
- Database lookup: Check the normalized name against all entries in both databases, including aliases
- Import detection: Functions flagged as imports by Ghidra are checked against the databases
- Mark as resolved: Matched functions get their name, signature, and classification from the database and skip LLM analysis entirely
Example
A function named _SHA256_Init in the binary:
- Normalize:
sha256_init
- Match found in
crypto.json → classified as crypto
- Signature applied:
void SHA256_Init(SHA256_CTX *ctx)
- Function skips the LLM queue — no tokens spent
Why this matters
Signature matching saves both time and money. In a binary with 500 functions, 50-100 might be recognizable standard library or crypto functions. At ~0.02perfunction(ClaudeOpus),that′s1-2 saved per analysis. More importantly, it’s 50-100 fewer functions in the LLM queue, which means faster overall analysis.
The matched signatures also improve downstream analysis. When a caller invokes malloc, the LLM sees malloc(size) in the callee context instead of FUN_00401234(param_1), which helps it understand what the caller is doing.
Current limitations
Signature matching is name-based only — it doesn’t do structural matching against the function’s decompilation. A function that implements strlen but has a completely different name (like FUN_00402000) won’t be matched. That’s what the LLM is for.
Further reading
Last modified on March 20, 2026