SSTIninja
A constraint-aware Server-Side Template Injection exploitation tool — empirically benchmarked against 4 peer tools across 10 vulnerable targets.
Abstract
SSTIninja treats template-injection exploitation as a search problem:
a Searcher walks the live Python object graph from a seed value to a
callable target (e.g. os.popen), producing an
abstract path; a Renderer then turns that path into a payload
string under arbitrary syntactic constraints (forbidden chars, banned
keywords, length budgets, hardened sandboxes). The two are orthogonal
components, allowing payload-shape obfuscation without re-doing search.
The HTTP front-end then probes a remote target — fingerprinting the
template engine via boundary-marked math probes, mapping forbidden tokens
via differential reflection, optionally enumerating
__subclasses__() remotely — then composes a constraint-aware
exploit and verifies it via a random sentinel echoed back through the
response.
Headline result
On the 9-target local_flask benchmark — covering bare Jinja2,
multiple sandbox configurations, a WAF-style filter, two non-Jinja2
engines (Mako, Tornado), and two negative cases — SSTIninja is the
only tool to score 100% on detection, exploitation, and WAF/sandbox bypass
simultaneously, while running 3–100× faster than other detect+RCE peers.
Per-target results
Each row is one HTTP target; each column is one tool's verdict.
✓/✓ = detected and exploited; ✓/✗ = detected but no RCE.
Local Flask (9 targets)
| Target | Engine (exp) | ours | sstimap | tplmap | nuclei | tinja |
|---|---|---|---|---|---|---|
flask_jinja2_baseline | jinja2 | ✓ / ✓ | ✓ / ✓ | ✓ / ✓ | ✓ / ✗ | ✓ / ✗ |
flask_jinja2_loose_sandbox | jinja2 | ✓ / ✓ | ✓ / ✓ | ✓ / ✓ | ✓ / ✗ | ✓ / ✗ |
flask_jinja2_narrow_blacklist | jinja2 | ✓ / ✓ | ✓ / ✓ | ✓ / ✗ | ✓ / ✗ | ✓ / ✗ |
flask_jinja2_waf | jinja2 | ✓ / ✓ | ✓ / ✗ | ✓ / ✗ | ✓ / ✗ | ✓ / ✗ |
flask_mako | mako | ✓ / ✓ | ✓ / ✓ | ✓ / ✓ | ✓ / ✗ | ✓ / ✗ |
flask_tornado | tornado | ✓ / ✓ | ✓ / ✓ | ✓ / ✓ | ✓ / ✗ | ✓ / ✗ |
flask_jinja2_strict_sandbox | jinja2 | ✓ / ✗ | ✓ / ✗ | ✓ / ✗ | ✓ / ✗ | ✓ / ✗ |
flask_no_template_eval | — | ✗ / ✗ | ✗ / ✗ | ✗ / ✗ | ✗ / ✗ | ✗ / ✗ |
flask_safe_jinja2 | — | ✗ / ✗ | ✗ / ✗ | ✗ / ✗ | ✗ / ✗ | ✗ / ✗ |
Cells: <detected>/<exploited>. The engine each tool actually
reported is available on hover; aggregate engine identification accuracy
is in the dimensional summary below.
vulhub flask/ssti (1 target)
| Target | Engine (exp) | ours | sstimap | tplmap | nuclei | tinja |
|---|---|---|---|---|---|---|
vulhub_flask_ssti | jinja2 | ✓ / ✓ | ✓ / ✗ | ✓ / ✓ | ✓ / ✗ | ✓ / ✗ |
Cells: <detected>/<exploited>. The engine each tool actually
reported is available on hover; aggregate engine identification accuracy
is in the dimensional summary below.
Dimensional analysis
Aggregated over the full 10-target corpus. ours the tool from this repo. detect+rce tools that do detection and exploitation. detect_only detection-only tools (signature scanners / academic analyzers); their exploitation rate is N/A by design, not a defeat.
Detection rate (true positives)
Fraction of targets with known SSTI that the tool flagged.
| Tool | Class | Rate | Count |
|---|---|---|---|
| ours | ours | 100% | 8/8 |
| sstimap | detect+rce | 100% | 8/8 |
| tplmap | detect+rce | 100% | 8/8 |
| nuclei | detect+only | 100% | 8/8 |
| tinja | detect+only | 100% | 8/8 |
False positive rate
Fraction of non-vulnerable targets the tool incorrectly flagged. Lower is better.
| Tool | Class | Rate | Count |
|---|---|---|---|
| ours | ours | 0% | 0/2 |
| sstimap | detect+rce | 0% | 0/2 |
| tplmap | detect+rce | 0% | 0/2 |
| nuclei | detect+only | 0% | 0/2 |
| tinja | detect+only | 0% | 0/2 |
Exploitation rate (RCE confirmed)
Fraction of exploit-expected targets where a sentinel was actually echoed by the target.
| Tool | Class | Rate | Count |
|---|---|---|---|
| ours | ours | 100% | 7/7 |
| sstimap | detect+rce | 71% | 5/7 |
| tplmap | detect+rce | 71% | 5/7 |
| nuclei | detect+only | 0% | 0/7 |
| tinja | detect+only | 0% | 0/7 |
WAF / weakened-sandbox bypass
Subset rate on hardened targets — WAF, narrow-blacklist, or loose-sandbox endpoints.
| Tool | Class | Rate | Count |
|---|---|---|---|
| ours | ours | 100% | 3/3 |
| sstimap | detect+rce | 67% | 2/3 |
| tplmap | detect+rce | 33% | 1/3 |
| nuclei | detect+only | 0% | 0/3 |
| tinja | detect+only | 0% | 0/3 |
Engine identification specificity
Targets where the tool emitted the correct engine label (jinja2 / mako / tornado) instead of a generic catch-all.
| Tool | Class | Rate | Count |
|---|---|---|---|
| ours | ours | 100% | 8/8 |
| tplmap | detect+rce | 100% | 8/8 |
| tinja | detect+only | 100% | 8/8 |
| sstimap | detect+rce | 75% | 6/8 |
| nuclei | detect+only | 0% | 0/8 |
Mean wall time per target
| Tool | Class | Mean | Total | Targets |
|---|---|---|---|---|
| ours | ours | 0.09s | 0.9s | 10 |
| tinja | detect+only | 0.04s | 0.4s | 10 |
| tplmap | detect+rce | 0.23s | 2.3s | 10 |
| sstimap | detect+rce | 1.02s | 10.2s | 10 |
| nuclei | detect+only | 7.98s | 79.8s | 10 |
Peer tools
Versions captured at run time and committed alongside the JSONL records, so every comparison is fully reproducible.
- SSTImap (1.3.3.7 @d4f0905) · vladko312/SSTImap — active fork of Tplmap; detect + RCE.
- Tplmap (0.5 @616b0e5) · epinna/tplmap — original; archived 2018; needs Python 3.10+ patch.
- Nuclei (v3.8.0) · projectdiscovery/nuclei — industry-standard signature scanner; detect-only.
- TInjA (v1.2.0) · Hackmanit/TInjA — academic-released SSTI analyzer; detect-only.
Methodology
Each (target × tool) pair is a single subprocess run with timeout. Detection
is judged from each tool's own success-confirmation output; exploitation is
confirmed only by sentinel-in-response (no claim is taken at the tool's
word). The full corpus, expected outcomes, and tool adapters are versioned
in eval/corpus/, src/sstininja/eval/peers/; the
raw per-(target, tool) records this site reads from are in
eval/results/*.jsonl.
Honest limitations
- Sample size is 10 targets. Multi-tool diversity does not substitute for target diversity; broader external validity needs more vulhub stacks or PortSwigger Web Security Academy labs.
-
Tplmap is archived (2018); we patched
collections.Mappingto make it run on Python 3.10+. Its results reflect 2018-era technique. - Nuclei and TInjA are detection-only by design; their 0% exploit rate is a category mismatch, not a defeat.
- We do not compare request counts across tools (no HTTP-proxy instrumentation). Only this tool's request counts are exposed.