Guide
How to use the Injection Risk Scorer
When Claude detects language that looks like prompt injection in retrieved content, it can suppress the attached brand below baseline — while GPT models may recommend it more. Most marketing copy is written in exactly the style that triggers this response. This tool helps you audit and rewrite before publish. For the full context, read The Injection Paradox.
1. Paste your passage
Enter an About page paragraph, product description, or press release. Add your brand name if you plan to run the live probe. Demo passages are available if you want to try the tool first. Scoring runs instantly in your browser — no API key required.
2. Read the Dual-Model Content Safety Score
You get two independent scores, never one blended number:
- Claude suppression risk (0–100) — how likely Claude reads the passage as manipulative. Higher = worse.
- GPT appeal (0–100) — how strongly the passage asserts brand confidence in the way GPT tends to reward. Higher = stronger appeal.
The target zone is low Claude risk + maintained GPT appeal, shown on a two-axis plot. Superlatives often raise both scores — that is the split-model tension the tool is built to navigate.
3. Expand the signal breakdown
Every flagged span shows its category, matched text, weight, direction per model, and a plain-English explanation. Triggers per 100 words is shown as a density metric. The scoring formula is documented so you can recompute by hand.
4. Apply rewrites and re-score
Suggested rewrites convert superlatives to evidenced claims, imperatives to declarative statements, and strip embedded instruction artefacts entirely. Apply all rewrites in one click to see the risk drop in real time.
5. Optional: live dual-model probe
For advanced validation, add your OpenAI and Anthropic keys in Settings and run a live probe. The tool embeds your passage in a RAG-style recommendation prompt and compares whether each model recommends your brand. Your keys stay in your browser unless you opt in to remember them on this device.
What this tool does not do
It does not guarantee how any model will behave tomorrow, replace a full content audit at scale, or provide an attack playbook. It is a defensive calibration tool based on a documented research direction, not a confirmed permanent ranking factor.
Continue the workflow: Grounding Classifier → Fan-Out Analyser → SRO Snippet Tester → this tool.
Questions or edge cases? Get in touch.