Skill

deduplication-protocol

MANDATORY pre-flight protocol for every QA STLC agent run. Must be applied BEFORE creating any test cases, Gherkin feature files, Playwright locators, page objects, or step definitions in Azure DevOps. Prevents duplicate test cases, duplicate feature file attachments, and duplicate Playwright code across multiple runs or agents on the same work item. Triggers on any task involving: ADO work items, test cases, feature files, Gherkin BDD, Playwright automation, locators, page objects, step definitions, qa-gherkin-generator, qa-test-case-manager, or qa-playwright-generator tools.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/qa-stlc-agents:deduplication-protocol

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

> **Read `AGENT-BEHAVIOR.md` before this protocol.**

SKILL.md

347 lines · ~3.6k tokens

Stats

LanguagePython

Parent stars0

MaintenanceExcellent

Last CommitJun 23, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

QA Deduplication Protocol

Read AGENT-BEHAVIOR.md before this protocol. This protocol is a pre-flight gate only — it does not authorise creating anything. All creation decisions require separate explicit user confirmation per artifact type.

This protocol is mandatory. No artifact may be created or attached to ADO without completing the READ → DIFF → CREATE-ONLY-WHAT-IS-MISSING pipeline below.

Any agent that skips this protocol and creates duplicates has violated the QA STLC workflow.

Why This Exists

When QA STLC agents run more than once on the same work item — or when multiple agents run in sequence on the same item — they have no memory of what previous runs created. Without this protocol every run blindly creates new test cases, attaches new feature files, and generates new Playwright code that duplicates existing artifacts. This produces:

2–5× the intended number of ADO test cases per work item
Multiple .feature files on the same work item covering identical scenarios
Multiple versions of locators.ts / PageObject.ts / steps.ts that diverge silently
Test suites that fail because the same step definition is registered twice

This skill codifies the fix as a durable, enforceable protocol that any agent can read and follow.

The Three-Phase Mandatory Workflow

┌───────────────────────────────────────────────────────────────┐
│  PHASE 1 — READ (always first, no exceptions)                 │
│  PHASE 2 — DIFF (semantic matching, not just string equality) │
│  PHASE 3 — CREATE only what has no existing coverage          │
└───────────────────────────────────────────────────────────────┘

PHASE 1 — READ Everything That Already Exists

Before touching any artifact, make all of the following calls in this order.

This protocol handles any work item type — PBI, Bug, or Feature — as follows:

Work item type passed	Step 1A	Step 1B
PBI or Bug	`fetch_work_item(id)` → get parent Feature id	`fetch_feature_hierarchy(parent_feature_id)`
Feature	Skip `fetch_work_item`	`fetch_feature_hierarchy(id)` directly

1A — Fetch the work item (PBI or Bug only)

qa-test-case-manager:fetch_work_item(
  organization_url, project_name, work_item_id
)

Extract and store:

Title, description, acceptance_criteria
Parent Feature ID — used in Step 1B
Story points, priority, state

Skip this step if the ID passed is already a Feature.

1B — Fetch the parent Feature hierarchy

qa-gherkin-generator:fetch_feature_hierarchy(
  organization_url, project_name, feature_id   ← parent Feature ID from 1A, or the ID itself if Feature
)

Extract and store:

Feature title, description, acceptance criteria
All child PBIs and Bugs with their titles + acceptance criteria
All existing test cases across the whole feature
All .feature file attachments already on the feature
Build the flow map from sibling work items (see generate-gherkin skill Step 1D)

Why sibling PBIs matter: A PBI is one slice of a larger flow defined across multiple work items. Siblings often represent prerequisite or downstream steps. Without reading all of them the agent invents navigation steps and test data already defined elsewhere.

1C — get_linked_test_cases on the specific work item

qa-test-case-manager:get_linked_test_cases(
  organization_url, project_name, work_item_id   ← the original ID (PBI, Bug, or Feature)
)

Extract and store:

All existing test case IDs, titles (normalised), priority values

1D — Check for existing Playwright attachments

From the feature hierarchy response, check for:

locators.ts — extract all locator keys already defined
*Page.ts — extract all method names already defined
*.steps.ts — extract all step definition strings already registered

If you cannot read an attachment's content, treat it as fully covering its domain and produce NO new file of that type unless you can prove a gap.

1E — Helix step-pattern scan (mandatory before generating any step definitions)

ADO attachments only cover files explicitly uploaded. The Helix-QA project on disk contains all currently registered step definitions — many of which are never attached to ADO but will be active at runtime because cucumber.js loads them via src/test/steps/**/*.ts.

Generating a new step that duplicates an existing Helix step causes an Ambiguous step definition runtime error, even if the ADO dedup scan found nothing.

Procedure — run every time before writing any *.steps.ts:

1. qa-helix-writer:list_helix_tree(helix_root)
   → collect all paths matching src/test/steps/*.steps.ts

2. For each path found:
   qa-helix-writer:read_helix_file(helix_root, path)
   → extract all step pattern strings (Given/When/Then/And/But)
   → extract all Before/After hook tags
   → add to CACHE[id].existing_attachments.steps_files[]

3. For each path matching src/test/features/*.feature:
   qa-helix-writer:read_helix_file(helix_root, path)
   → extract the Background step strings verbatim
   → add to CACHE[id].existing_attachments.background_steps[]

Background reuse rule: If a Background in a new feature file can be expressed entirely using step patterns already registered in any existing *.steps.ts, you must use the exact registered wording — do NOT generate new step strings for the same intent.

# ✅ Correct — reuses registered pattern from reset-app-state.steps.ts
Background:
  Given the user is logged in as "standard_user"
  And the user is on the inventory page

# ❌ Wrong — generates duplicate steps that cause Ambiguous step definition
Background:
  Given I am on the SauceDemo login page at "https://www.saucedemo.com/"
  And I log in with username "standard_user" and password "secret_sauce"
  And I am on the inventory page at "https://www.saucedemo.com/inventory.html"

This scan is not optional for Jira pipelines either. If the Helix project is present, always perform step 1E before writing steps, regardless of the work item source.

PHASE 2 — DIFF Using Semantic Matching

String equality is not enough. Two test cases are semantically equivalent (duplicates) if they test the same condition on the same subject, even when worded differently.

Normalisation algorithm (apply before comparing)

Lowercase the full title
Strip tag prefixes: [smoke], [regression], [a11y], [negative], [boundary], etc.
Strip filler words: the, a, an, is, are, should, will, when, after, during
Extract the subject noun (what is being tested: button, CSV, backend, upload, template…)
Extract the condition (visible, absent, downloaded, cleared, rejected, correct…)
If subject + condition match an existing case → DUPLICATE, do not create

Semantic coverage matrix

For each proposed test case:
  normalise(proposed.title) → (subject, condition)
  for each existing test case:
    normalise(existing.title) → (subject, condition)
    if subject matches AND condition matches → DUPLICATE → skip
  if no match found → NET-NEW → add to creation list

Only pass the net-new list to create_and_link_test_cases.

Gherkin scenario deduplication

For each proposed Gherkin scenario:

Extract the scenario title
Apply the same normalisation algorithm
Compare against all scenario titles in existing .feature attachments AND existing ADO test case titles
Semantic match → skip; zero net-new → do not attach; some net-new → attach delta file only

Playwright code deduplication

locators.ts: only emit keys not already in existing file; if delta empty → skip; if non-empty → attach as locators.delta.ts
*Page.ts: only emit methods not already present; if delta empty → skip; if non-empty → attach as *Page.delta.ts
*.steps.ts: cross-check every proposed step pattern against both ADO attachment step strings (1D) and Helix on-disk step strings (1E); only emit step strings that appear in neither; NEVER re-register an existing step (causes Ambiguous step definition runtime error); reuse exact registered wording in Background blocks

PHASE 3 — CREATE Only What Is Missing

Diff result	Action
Zero net-new	Skip. Log: `✅ Already fully covered — nothing to create.`
Some net-new, some duplicates	Create only net-new. Log which were skipped and why.
All net-new (first run)	Create all. Normal flow.

Mandatory deduplication report

Output after every run regardless of whether anything was created:

## Deduplication Report — Work Item #<id>

### Test Cases
- Existing: <count> linked
- Proposed: <count>
- Duplicates skipped: <count> (<titles>)
- Net-new created: <count> (<IDs if created>)

### Gherkin Feature File
- Existing attachments: <filenames or "none">
- Proposed scenarios: <count>
- Duplicate scenarios skipped: <count> (<titles>)
- Net-new scenarios: <count>
- Action: <"Skipped — fully covered" | "Attached delta" | "Attached full (first run)">

### Playwright Code
- Existing locators.ts: <"found — N keys" | "not found">
- Net-new locator keys: <count> (<list or "none">)
- Existing page object: <"found — N methods" | "not found">
- Net-new page methods: <count> (<list or "none">)
- Existing steps file: <"found — N steps" | "not found">
- Net-new step definitions: <count> (<list or "none">)
- Action per file: <"Skipped" | "Attached delta" | "Attached full (first run)">

Hard Rules — Never Violate These

NEVER call create_and_link_test_cases without first calling get_linked_test_cases on the same work item.
NEVER attach a .feature file without first checking for existing feature file attachments.
NEVER attach a steps.ts that re-registers an existing step string — causes Ambiguous step definition at runtime. Perform the Helix step-pattern scan (Phase 1E) before writing any step definition. ADO attachment checks alone are insufficient because most Helix step files are never attached to ADO.
NEVER treat tag differences as meaningful. [REGRESSION] Export List downloads CSV is a duplicate of [SMOKE] Clicking Export List downloads a valid CSV file.
NEVER create more than one test case covering the same (subject, condition) pair.
NEVER skip this protocol because the work item appears new. existing_test_cases_count: 0 in the hierarchy can still have cases via get_linked_test_cases — always call both.

Integration With Other QA Skills

This protocol is a work-item-scoped pre-flight gate.

Each unique work_item_id gets exactly one PHASE 1 run. Findings are cached and reused by every subsequent agent operating on the same item.
A different work_item_id always triggers a fresh PHASE 1. Cache from item A must never be used for item B.

work_item_id = 111  →  PHASE 1 runs in full → CACHE[111] populated
  generate-gherkin on 111      →  reads CACHE[111], skips PHASE 1
  generate-playwright on 111   →  reads CACHE[111], skips PHASE 1

work_item_id = 222  →  PHASE 1 runs in full → CACHE[222] populated (CACHE[111] untouched)

Skills that delegate here:

skills/generate-gherkin/SKILL.md          →  delegates here; reads work-item cache
skills/generate-playwright-code/SKILL.md  →  delegates here; reads work-item cache

Work-Item Cache Schema

CACHE[work_item_id] = {
  work_item:            { id, type, title, acceptance_criteria },  # PBI/Bug/Feature
  parent_feature:       { id, title, description },
  sibling_pbis:         [{ id, type, title, acceptance_criteria, state }],
  flow_map:             <assembled string describing the full user journey>,
  existing_test_cases:  [{ id, title, priority }],
  existing_attachments: {
    feature_files: [{ name, content }],
    locators_ts:   { found: bool, keys: [] },
    page_objects:  [{ name, methods: [] }],
    steps_files:   [{ name, step_strings: [] }],   # populated from BOTH 1D (ADO attachments) AND 1E (Helix on-disk read)
    background_steps: [],                               # exact step strings from existing Helix *.feature Background blocks — new Backgrounds must reuse these
  },
  gap_check_completed:  bool,   # set true after generate-gherkin Step 2 passes
  phase1_completed:     true,
}

Before running PHASE 1, check CACHE[work_item_id].phase1_completed:

If true → skip PHASE 1 entirely; use cached data for PHASE 2.
If false / not set → run PHASE 1 in full and populate the cache.

Examples

Multiple work items in one session

► PBI #111 processed:
  CACHE[111] not found → PHASE 1 in full → CACHE[111].phase1_completed = true
  generate-gherkin on 111    → reads CACHE[111]
  generate-playwright on 111 → reads CACHE[111], skips all ADO reads

► Bug #222 processed:
  CACHE[222] not found → PHASE 1 in full → CACHE[222].phase1_completed = true
  (CACHE[111] untouched — completely independent)

Second run on a fully covered item

PHASE 1: get_linked_test_cases(273440) → 35 cases; fetch_feature_hierarchy → .feature attached
PHASE 2: 14 proposed → 14/14 duplicates; 0 net-new locators, methods, steps
PHASE 3: Skip all
REPORT:  ✅ Work item #273440 fully covered. Nothing to create.

Partial gap run

PHASE 1: 3 cases found, no .feature, no Playwright attachments
PHASE 2: 5 proposed → 3 duplicates → 2 net-new
PHASE 3: create 2 test cases; attach full .feature (first run); attach full Playwright files

deduplication-protocol

Invocation

Context Preview

SKILL.md

deduplication-protocol

Invocation

Context Preview

SKILL.md

QA Deduplication Protocol

Why This Exists

The Three-Phase Mandatory Workflow

PHASE 1 — READ Everything That Already Exists

1A — Fetch the work item (PBI or Bug only)

1B — Fetch the parent Feature hierarchy

1C — get_linked_test_cases on the specific work item

1D — Check for existing Playwright attachments

1E — Helix step-pattern scan (mandatory before generating any step definitions)

PHASE 2 — DIFF Using Semantic Matching

Normalisation algorithm (apply before comparing)

Semantic coverage matrix

Gherkin scenario deduplication

Playwright code deduplication

PHASE 3 — CREATE Only What Is Missing

Mandatory deduplication report

Hard Rules — Never Violate These

Integration With Other QA Skills

Work-Item Cache Schema

Examples

Multiple work items in one session

Second run on a fully covered item

Partial gap run

Similar Skills

QA Deduplication Protocol

Why This Exists

The Three-Phase Mandatory Workflow

PHASE 1 — READ Everything That Already Exists

1A — Fetch the work item (PBI or Bug only)

1B — Fetch the parent Feature hierarchy

1C — get_linked_test_cases on the specific work item

1D — Check for existing Playwright attachments

1E — Helix step-pattern scan (mandatory before generating any step definitions)

PHASE 2 — DIFF Using Semantic Matching

Normalisation algorithm (apply before comparing)

Semantic coverage matrix

Gherkin scenario deduplication

Playwright code deduplication

PHASE 3 — CREATE Only What Is Missing

Mandatory deduplication report

Hard Rules — Never Violate These

Integration With Other QA Skills

Work-Item Cache Schema

Examples

Multiple work items in one session

Second run on a fully covered item

Partial gap run

Similar Skills