Background
In March 2024, a Microsoft engineer noticed SSH logins taking 500 milliseconds longer than usual. That observation unraveled one of the most sophisticated supply chain attacks in open-source history.
Supply chain attack — an attack that compromises software by targeting its dependencies or build process, rather than the software itself. The attacker poisons a library that many other projects rely on.
A malicious maintainer had spent two years building trust in the XZ Utils project — a compression library linked by virtually every Linux distribution’s SSH daemon. The backdoored liblzma.so.5.4.1 hijacked OpenSSH’s RSA signature verification to execute arbitrary commands on targeted systems.
The malicious functions were hand-written to blend in with liblzma’s legitimate compression code. No obvious symbols, no suspicious strings, no telltale exports. This is the hardest class of reverse engineering problem: a real-world implant buried inside a large, legitimate codebase.
The challenge
Analyzing the stripped liblzma.so binary means:
- No symbols — every function is
FUN_XXXXXXXX
- 558 functions enumerated, 396 queued for analysis
- Malicious code is ~1% of the binary — a needle in a haystack
- Deliberately camouflaged — the backdoor mimics legitimate LZMA patterns
A skilled reverse engineer would typically spend days working through this binary manually. Finding the implant through static analysis — without symbols, without source, without knowing what to look for — is the kind of task that defines expert-level RE work.
Kong’s results
Kong analyzed the binary in 15 minutes for $6.63.
| Metric | Value |
|---|
| Functions analyzed | 355 / 396 |
| High confidence (80%+) | 308 (87%) |
| Medium confidence (60-79%) | 33 (9%) |
| Low confidence (under 60%) | 14 (4%) |
Kill chain — fully reconstructed
Kong independently identified all five core backdoor functions and reconstructed the attack chain with no prior knowledge of CVE-2024-3094:
| Function | Confidence | Role |
|---|
init_rsa_public_decrypt | 95% | Parses ELF dynamic symbols at load time to locate RSA_public_decrypt |
function_hook_replace | 90% | Overwrites the GOT entry — changes memory protection, swaps pointer, restores permissions |
rsa_public_decrypt_wrapper | 95% | The hook: intercepts RSA verification, checks for root + magic value, decrypts payload |
initialize_cipher_context | 92% | Sets up ChaCha20 state with 256-bit key and 96-bit nonce |
chacha20_encrypt | 95% | Decrypts shellcode embedded in RSA signature data |
GOT hijacking — the Global Offset Table (GOT) stores addresses of dynamically linked functions. By overwriting a GOT entry, an attacker can redirect calls to a legitimate function (like RSA_public_decrypt) through their own code first.
Kong’s analysis of the hook function:
“XZ backdoor: intercepts RSA_public_decrypt. When running as root and magic matches, decrypts and executes shellcode via ChaCha20.”
Supporting infrastructure
Beyond the five core functions, Kong correctly identified the backdoor’s supporting infrastructure:
- ELF dynamic section parsing — reading the binary’s own symbol table at runtime
/proc/self/maps reads — checking memory permissions before modifying the GOT
dladdr1-based symbol resolution — finding function addresses by name at runtime
All of these were correctly classified as part of the implant’s runtime hooking mechanism, not legitimate liblzma functionality.
Legitimate code
Kong also correctly recovered the full breadth of liblzma’s real functionality:
- LZMA/LZMA2 encoders and decoders
- Match finders and range coders
- Streaming state machines
- CRC32/CRC64 (including CLMUL-accelerated variants)
- SHA-256
- XZ container format handling
- Branch-call-jump filters for x86, ARM64, and RISC-V
Five functions were flagged for potential control flow flattening. All were correctly identified as false positives in the reasoning — they’re legitimate 7-23 state resumption machines inherent to liblzma’s streaming API.
Why this matters
The XZ backdoor was discovered by a human noticing a timing anomaly. Finding the implant through static analysis of the stripped binary — without symbols, without source, without knowing what to look for — is the kind of task that traditionally takes an experienced reverse engineer days of manual work.
Kong reconstructed the full kill chain autonomously in 15 minutes. This suggests a path toward automated triage of suspected supply chain compromises: point Kong at a suspicious binary and get a structured assessment of what it does, including code that shouldn’t be there.
Further reading