On this page
1. How scoring works
A ThreatReady assessment consists of a single scenario — a real architecture with an active attack chain — followed by five adaptive questions. Each answer is evaluated by our 3-Pass AI engine (Evaluator → Challenger → Reconciler) against five anchored dimensions, each on a 1–10 integer scale. Every score band has a concrete written definition — a 7 in Threat Identification means something specific and verifiable, not arbitrary AI judgment.
This page is the canonical reference. We publish it because we want our scores to be auditable. If you ever question a result, every score traces back to one of these bands.
Every answer passes through three independent AI evaluations:
Pass 1 — Evaluator · Scores generously against the anchored rubric, citing specific evidence from the answer. Tends to over-credit verbose responses (intentional — Pass 2 corrects).
Pass 2 — Challenger · Adversarially reviews. Catches technical errors, hallucinated CVEs, non-existent MITRE techniques, fabricated tools, inflated scores, and missing critical elements.
Pass 3 — Reconciler · Reviews both passes, resolves disagreements, applies question-specific dimension weighting, generates the model answer, and assigns a confidence level (HIGH / MEDIUM / LOW) based on inter-pass agreement.
A score below 5 means the candidate has gaps that would concern a hiring manager. A score of 5–7 means they are competent but not yet senior. A score of 7–8.5 means strong senior-ready performance. A score above 8.5 indicates expert-level reasoning — rare and worth noting. These are guidelines, not promises; final hiring decisions are always yours.
2. The five anchored dimensions
Dimension 1 · Threat Identification
Measures whether the candidate correctly identifies attack vectors, techniques, and threat actor behavior.
Dimension 2 · Containment & Response Logic
Measures whether the candidate can design an effective containment strategy that minimizes blast radius while preserving evidence.
Dimension 3 · Architecture & Blast Radius Analysis
Measures understanding of architectural interconnections and accurate impact assessment.
Dimension 4 · Communication Quality
Measures clarity of reasoning, structural organization, and audience-appropriateness.
Dimension 5 · Framework & Best Practice Application
Measures correct reference to and application of security frameworks (MITRE ATT&CK, NIST, CIS, ISO 27001).
3. Qualitative bands
Every score also maps to a qualitative band that's visible from day one — even before the platform has enough cohort data for percentile rankings. Bands describe what a score means in plain language, so candidates and hiring managers don't need to interpret a 7.2 in isolation.
4. Confidence levels & dual score display
Not every score is equally certain. When the Evaluator (Pass 1) and Challenger (Pass 2) disagree significantly, the final score carries that uncertainty forward as a confidence flag. Pretending every AI score is equally reliable destroys trust the moment a buyer catches a bad one — so we publish it.
Dual score display. Every result also shows two scores side by side: Scenario Score (raw, on the question's own scale) and Readiness Score (difficulty-adjusted with caps — Beginner 6.0, Intermediate 8.0, Advanced 9.0, Expert 10.0). Both numbers are honest. The candidate's effort is recognized; the hiring manager's signal is preserved.
5. Question-specific dimension weighting
Not every question tests all five dimensions equally. A "map this attack to MITRE" question should weight Threat Identification at 40%. A "write an executive brief" question should weight Communication at 50%. Without per-question weighting, scoring would feel wrong even when the engine works correctly.
| Question category | TI | CR | AB | CQ | FA |
|---|---|---|---|---|---|
| Threat Identification | 40% | 10% | 15% | 10% | 25% |
| Containment & Response | 10% | 40% | 20% | 15% | 15% |
| Architecture Analysis | 15% | 10% | 40% | 10% | 25% |
| Executive Communication | 10% | 10% | 5% | 50% | 25% |
| Incident Response | 15% | 35% | 15% | 20% | 15% |
| Default (balanced) | 20% | 20% | 20% | 20% | 20% |
TI = Threat Identification · CR = Containment & Response · AB = Architecture & Blast Radius · CQ = Communication Quality · FA = Framework Application
6. A worked example
Scenario: A Lambda function has unusual outbound traffic. The IAM role attached has s3:* and secrets:*. The S3 bucket contains customer PII.
Question: What is your first containment step, and why does sequence matter here?
"I would look at CloudTrail to see what happened and then maybe change the IAM role."
Why: Threat Identification 4 — recognises something is wrong but no specific attack path. Containment & Response 3 — investigation-first during active exfiltration. Communication Quality 5 — readable but unstructured. Architecture & Blast Radius 3 — no awareness of dependencies. Framework Application 2 — no MITRE or NIST reference.
"First, I'd scope down the IAM role by removing secrets:* and narrowing s3:* to only the specific bucket. Then I'd check CloudTrail for what was accessed and rotate any credentials the Lambda could have read."
Why: Threat Identification 7 — names the IAM over-permission path. Containment & Response 6 — specific, correct mitigations but doesn't justify sequence. Architecture & Blast Radius 6 — names what the role can reach. Communication Quality 7 — clear structure. Framework Application 5 — implicit least-privilege but no explicit standard cited.
"Disable the Lambda function first — remove its trigger or set concurrency to zero. That stops active exfiltration without destroying forensic state, which matters because wiping the role before preserving evidence loses the CloudTrail correlation we'll need later. After that: scope the role (remove secrets:*, narrow s3:* to the specific bucket), rotate credentials the role could have accessed, snapshot the Lambda environment for forensics, and review CloudTrail for scope of what was already taken. Sequence matters because investigation before containment lets the attacker keep reading PII the whole time we're triaging."
Why: Threat Identification 9 — names primary and secondary vectors. Containment & Response 9 — proportional, evidence-preserving, sequenced. Architecture & Blast Radius 9 — full credential reuse map. Communication Quality 9 — explicit sequencing and rationale. Framework Application 8 — implicit MITRE alignment, NIST 800-61 sequencing visible.
7. Adaptive questioning logic
After each answer, the AI generates a follow-up question based on what the candidate actually said. This serves two purposes:
- Prevents pre-prepared answers. Generic responses trigger probing follow-ups that can't be anticipated.
- Explores depth. Strong answers earn harder follow-ups; shallow answers trigger clarifying ones.
Adaptive questions are bounded by the scenario's MITRE ATT&CK tactics and the role's competency map — the AI cannot wander off-topic into unrelated domains.
8. What we deliberately do not score
- Grammar or spelling, beyond the threshold where clarity suffers
- Typing speed or answer length for its own sake
- Regional English or accent when voice mode is used
- Specific tool preferences (AWS vs Azure terminology, Splunk vs Elastic) — we score on the reasoning, not the brand
- "Correct" answers that depend on organizational context we haven't given — we score the reasoning, not the guess
- Agreeing with our model answer — a well-reasoned alternative can score as well as the model answer if defensible
9. Consistency & calibration
We take rubric consistency seriously. Concrete measures:
- All scoring prompts include the full anchored rubric (not a summary)
- Each question includes a role-specific and difficulty-specific rubric anchor
- Nightly regression at 02:00 IST: 50+ golden answers across 7 categories — Clearly Excellent, Clearly Poor, Fluent But Wrong, Correct But Poorly Communicated, Overconfident Hallucination, Partial But Well-Prioritized, Technically Strong But Business-Blind — run through the live engine. If pass-rate drops below 80%, we freeze prompt changes until investigated.
- Pass 1, 2, 3 raw outputs are stored for 90 days for auditability. Hiring managers can request explanation of any score; the system regenerates the full Pass 1 / Pass 2 / Pass 3 trace.
- Every month, the founder and 2 core engineers manually review 20–30 flagged evaluations to track AI-vs-human agreement.
10. Score challenges
If you believe your score is wrong, you can challenge it. Here's how:
- Within 14 days of receiving the score, email [email protected] with your session ID and a specific explanation of what you believe was mis-scored and why
- A security engineer reviews the full transcript and rubric application within 7 business days
- We respond with either a confirmed score (with a rubric-grounded explanation) or a revised score
- All challenges and outcomes are logged. Patterns are fed back into rubric refinement
11. Version history
| Version | Date | Summary of changes |
|---|---|---|
| 1.0 | 22 April 2026 | Initial publication. |
| 2.0 | May 2026 | Updated to v4 spec — five anchored dimensions, 3-Pass scoring engine, qualitative bands, confidence levels, dual score display, question-specific weighting, golden-answer regression. Replaces the v1 three-dimension model. |