SSTIninja

A constraint-aware Server-Side Template Injection exploitation tool — empirically benchmarked against 4 peer tools across 10 vulnerable targets.

Abstract

SSTIninja treats template-injection exploitation as a search problem: a Searcher walks the live Python object graph from a seed value to a callable target (e.g. os.popen), producing an abstract path; a Renderer then turns that path into a payload string under arbitrary syntactic constraints (forbidden chars, banned keywords, length budgets, hardened sandboxes). The two are orthogonal components, allowing payload-shape obfuscation without re-doing search.

The HTTP front-end then probes a remote target — fingerprinting the template engine via boundary-marked math probes, mapping forbidden tokens via differential reflection, optionally enumerating __subclasses__() remotely — then composes a constraint-aware exploit and verifies it via a random sentinel echoed back through the response.

Headline result

On the 9-target local_flask benchmark — covering bare Jinja2, multiple sandbox configurations, a WAF-style filter, two non-Jinja2 engines (Mako, Tornado), and two negative cases — SSTIninja is the only tool to score 100% on detection, exploitation, and WAF/sandbox bypass simultaneously, while running 3–100× faster than other detect+RCE peers.

Per-target results

Each row is one HTTP target; each column is one tool's verdict. ✓/✓ = detected and exploited; ✓/✗ = detected but no RCE.

Local Flask (9 targets)

Target	Engine (exp)	ours	sstimap	tplmap	nuclei	tinja
`flask_jinja2_baseline`	`jinja2`	✓ / ✓	✓ / ✓	✓ / ✓	✓ / ✗	✓ / ✗
`flask_jinja2_loose_sandbox`	`jinja2`	✓ / ✓	✓ / ✓	✓ / ✓	✓ / ✗	✓ / ✗
`flask_jinja2_narrow_blacklist`	`jinja2`	✓ / ✓	✓ / ✓	✓ / ✗	✓ / ✗	✓ / ✗
`flask_jinja2_waf`	`jinja2`	✓ / ✓	✓ / ✗	✓ / ✗	✓ / ✗	✓ / ✗
`flask_mako`	`mako`	✓ / ✓	✓ / ✓	✓ / ✓	✓ / ✗	✓ / ✗
`flask_tornado`	`tornado`	✓ / ✓	✓ / ✓	✓ / ✓	✓ / ✗	✓ / ✗
`flask_jinja2_strict_sandbox`	`jinja2`	✓ / ✗	✓ / ✗	✓ / ✗	✓ / ✗	✓ / ✗
`flask_no_template_eval`	`—`	✗ / ✗	✗ / ✗	✗ / ✗	✗ / ✗	✗ / ✗
`flask_safe_jinja2`	`—`	✗ / ✗	✗ / ✗	✗ / ✗	✗ / ✗	✗ / ✗

Cells: <detected>/<exploited>. The engine each tool actually reported is available on hover; aggregate engine identification accuracy is in the dimensional summary below.

vulhub `flask/ssti` (1 target)

Target	Engine (exp)	ours	sstimap	tplmap	nuclei	tinja
`vulhub_flask_ssti`	`jinja2`	✓ / ✓	✓ / ✗	✓ / ✓	✓ / ✗	✓ / ✗

Cells: <detected>/<exploited>. The engine each tool actually reported is available on hover; aggregate engine identification accuracy is in the dimensional summary below.

Dimensional analysis

Aggregated over the full 10-target corpus. ours the tool from this repo. detect+rce tools that do detection and exploitation. detect_only detection-only tools (signature scanners / academic analyzers); their exploitation rate is N/A by design, not a defeat.

Detection rate (true positives)

Fraction of targets with known SSTI that the tool flagged.

Tool	Class	Rate	Count
ours	ours	100%	8/8
sstimap	detect+rce	100%	8/8
tplmap	detect+rce	100%	8/8
nuclei	detect+only	100%	8/8
tinja	detect+only	100%	8/8

False positive rate

Fraction of non-vulnerable targets the tool incorrectly flagged. Lower is better.

Tool	Class	Rate	Count
ours	ours	0%	0/2
sstimap	detect+rce	0%	0/2
tplmap	detect+rce	0%	0/2
nuclei	detect+only	0%	0/2
tinja	detect+only	0%	0/2

Exploitation rate (RCE confirmed)

Fraction of exploit-expected targets where a sentinel was actually echoed by the target.

Tool	Class	Rate	Count
ours	ours	100%	7/7
sstimap	detect+rce	71%	5/7
tplmap	detect+rce	71%	5/7
nuclei	detect+only	0%	0/7
tinja	detect+only	0%	0/7

WAF / weakened-sandbox bypass

Subset rate on hardened targets — WAF, narrow-blacklist, or loose-sandbox endpoints.

Tool	Class	Rate	Count
ours	ours	100%	3/3
sstimap	detect+rce	67%	2/3
tplmap	detect+rce	33%	1/3
nuclei	detect+only	0%	0/3
tinja	detect+only	0%	0/3

Engine identification specificity

Targets where the tool emitted the correct engine label (jinja2 / mako / tornado) instead of a generic catch-all.

Tool	Class	Rate	Count
ours	ours	100%	8/8
tplmap	detect+rce	100%	8/8
tinja	detect+only	100%	8/8
sstimap	detect+rce	75%	6/8
nuclei	detect+only	0%	0/8

Mean wall time per target

Tool	Class	Mean	Total	Targets
ours	ours	0.09s	0.9s	10
tinja	detect+only	0.04s	0.4s	10
tplmap	detect+rce	0.23s	2.3s	10
sstimap	detect+rce	1.02s	10.2s	10
nuclei	detect+only	7.98s	79.8s	10

Peer tools

Versions captured at run time and committed alongside the JSONL records, so every comparison is fully reproducible.

SSTImap (1.3.3.7 @d4f0905) · vladko312/SSTImap — active fork of Tplmap; detect + RCE.
Tplmap (0.5 @616b0e5) · epinna/tplmap — original; archived 2018; needs Python 3.10+ patch.
Nuclei (v3.8.0) · projectdiscovery/nuclei — industry-standard signature scanner; detect-only.
TInjA (v1.2.0) · Hackmanit/TInjA — academic-released SSTI analyzer; detect-only.

Methodology

Each (target × tool) pair is a single subprocess run with timeout. Detection is judged from each tool's own success-confirmation output; exploitation is confirmed only by sentinel-in-response (no claim is taken at the tool's word). The full corpus, expected outcomes, and tool adapters are versioned in eval/corpus/, src/sstininja/eval/peers/; the raw per-(target, tool) records this site reads from are in eval/results/*.jsonl.

Honest limitations

Sample size is 10 targets. Multi-tool diversity does not substitute for target diversity; broader external validity needs more vulhub stacks or PortSwigger Web Security Academy labs.
Tplmap is archived (2018); we patched collections.Mapping to make it run on Python 3.10+. Its results reflect 2018-era technique.
Nuclei and TInjA are detection-only by design; their 0% exploit rate is a category mismatch, not a defeat.
We do not compare request counts across tools (no HTTP-proxy instrumentation). Only this tool's request counts are exposed.

Source: github.com/WangYihang/sstimap (branch claude/ssti-bfs-automation-KfZAr) · SSTIninja (0.1.0 @6f6b28b) · Reproduce: sstininja eval compare ...