Skip to main content

The idea

Before spending LLM tokens on a function, Kong checks whether it matches a known pattern. Standard library functions like malloc, strcmp, and printf have well-documented signatures. Cryptographic functions like AES, SHA-256, and MD5 follow recognizable patterns. If Kong can identify these by signature alone, it skips the LLM entirely.

Signature databases

Kong ships with two signature databases:

C Standard Library (stdlib.json)

Contains signatures for common C stdlib functions — memory management (malloc, free, calloc, realloc), string operations (strlen, strcpy, strcmp, strcat), I/O (printf, fprintf, fopen, fread), and more. Each entry includes the function name, common aliases, and the expected signature:
{
  "name": "malloc",
  "aliases": ["__libc_malloc", "_malloc"],
  "signature": "void * malloc(size_t size)",
  "classification": "memory"
}

Cryptographic Functions (crypto.json)

Contains signatures for cryptographic primitives — AES (aes_encrypt, AES_set_encrypt_key), SHA-256 (sha256_init, SHA256_Update), MD5, RC4, ChaCha20, and others.

How matching works

During the triage phase, Kong runs every function through the signature matcher:
  1. Name normalization: Strip leading underscores and convert to lowercase. __libc_mallocmalloc
  2. Database lookup: Check the normalized name against all entries in both databases, including aliases
  3. Import detection: Functions flagged as imports by Ghidra are checked against the databases
  4. Mark as resolved: Matched functions get their name, signature, and classification from the database and skip LLM analysis entirely

Example

A function named _SHA256_Init in the binary:
  1. Normalize: sha256_init
  2. Match found in crypto.json → classified as crypto
  3. Signature applied: void SHA256_Init(SHA256_CTX *ctx)
  4. Function skips the LLM queue — no tokens spent

Why this matters

Signature matching saves both time and money. In a binary with 500 functions, 50-100 might be recognizable standard library or crypto functions. At ~0.02perfunction(ClaudeOpus),thats0.02 per function (Claude Opus), that's 1-2 saved per analysis. More importantly, it’s 50-100 fewer functions in the LLM queue, which means faster overall analysis. The matched signatures also improve downstream analysis. When a caller invokes malloc, the LLM sees malloc(size) in the callee context instead of FUN_00401234(param_1), which helps it understand what the caller is doing.

Current limitations

Signature matching is name-based only — it doesn’t do structural matching against the function’s decompilation. A function that implements strlen but has a completely different name (like FUN_00402000) won’t be matched. That’s what the LLM is for.

Further reading

Last modified on March 20, 2026