Documentation Index
Fetch the complete documentation index at: https://docs.kong.fyi/llms.txt
Use this file to discover all available pages before exploring further.
The idea
Before spending LLM tokens on a function, Kong checks whether it matches a known pattern. Standard library functions likemalloc, strcmp, and printf have well-documented signatures. Cryptographic functions like AES, SHA-256, and MD5 follow recognizable patterns. If Kong can identify these by signature alone, it skips the LLM entirely.
Signature databases
Kong ships with two signature databases:C Standard Library (stdlib.json)
Contains signatures for common C stdlib functions — memory management (malloc, free, calloc, realloc), string operations (strlen, strcpy, strcmp, strcat), I/O (printf, fprintf, fopen, fread), and more.
Each entry includes the function name, common aliases, and the expected signature:
Cryptographic Functions (crypto.json)
Contains signatures for cryptographic primitives — AES (aes_encrypt, AES_set_encrypt_key), SHA-256 (sha256_init, SHA256_Update), MD5, RC4, ChaCha20, and others.
How matching works
During the triage phase, Kong runs every function through the signature matcher:- Name normalization: Strip leading underscores and convert to lowercase.
__libc_malloc→malloc - Database lookup: Check the normalized name against all entries in both databases, including aliases
- Import detection: Functions flagged as imports by Ghidra are checked against the databases
- Mark as resolved: Matched functions get their name, signature, and classification from the database and skip LLM analysis entirely
Example
A function named_SHA256_Init in the binary:
- Normalize:
sha256_init - Match found in
crypto.json→ classified ascrypto - Signature applied:
void SHA256_Init(SHA256_CTX *ctx) - Function skips the LLM queue — no tokens spent
Why this matters
Signature matching saves both time and money. In a binary with 500 functions, 50-100 might be recognizable standard library or crypto functions. At ~1-2 saved per analysis. More importantly, it’s 50-100 fewer functions in the LLM queue, which means faster overall analysis. The matched signatures also improve downstream analysis. When a caller invokesmalloc, the LLM sees malloc(size) in the callee context instead of FUN_00401234(param_1), which helps it understand what the caller is doing.
Current limitations
Signature matching is name-based only — it doesn’t do structural matching against the function’s decompilation. A function that implementsstrlen but has a completely different name (like FUN_00402000) won’t be matched. That’s what the LLM is for.
Further reading
- Pipeline Overview — signature matching runs during triage
- Call-Graph Analysis — matched signatures propagate context to callers

