Goal
Ship static client-side keyword search across every canonical-store artefact (~200 entries at saturation) at preview.offplan.online/search/. Zero infra: no vector store, no embedding API, no server. One JSON manifest emitted at build time, one HTML page with vanilla JavaScript. The bar is "I half-remember a phrase but not where" — getting from query keystroke to clicking the right result in under five seconds on a phone, with no logged-in state.
Success Criteria
(verbatim in frontmatter above)
Translated to operational checks:
- Speed. Manifest fetch + first render in <500 ms on a phone (LTE); search latency <100 ms per keystroke; build script completes in <2 s.
- Discoverability.
/keyboard shortcut works on desktop; sticky search field works on mobile; URL hash deep-links work. - Quality of results. Snippet highlighting + up-to-3-snippets per row + inline "replaced by" links on superseded hits + date-decay and grouping toggles all behave as specified.
- Power-user syntax.
kind:plan auth flowfilters to plan-kind entries containing both tokens;"exact phrase"matches the literal sequence;author:roman status:approvedcomposes with chip state. - Build hygiene. Manifest is idempotent (same input → same bytes); pre-commit hook keeps it fresh; unit tests cover the indexer.
Approach
Two-file ship, no framework. One Python build script writes docs/rendered/search/manifest.json; one HTML page at docs/rendered/search/index.html loads it on boot. Vanilla JS. Inline <style> (matches render_md.py + render_about.py pattern; no separate CSS file at this scale). Visual register matches the rest of the preview site (Skeleton White, Helvetica Neue 200–300, Inter body, JetBrains Mono mono, oxidised pills).
Matcher choice: homegrown ~80-line scorer, not vendored fuzzysort. Reasons:
- No dependency vendoring + supply-chain caution; the corpus is small enough that the homegrown matcher will rank just as well.
- We need custom scoring anyway (field weights, date-decay, grouping) — fuzzysort wouldn't save much.
- Plain-text token matching is the right primitive for canonical-store search; fuzzy-edit matching is a misfit for keyword-heavy queries like "P11f" or "CONV-30".
Design decisions captured in /plan interview (2026-05-11):
| Decision | Choice |
|---|---|
| Date-decay scoring | Configurable toggle in UI (default on; recent boost is usually right) |
| Result grouping default | Toggle in UI (default flat-by-score; switch to per-kind grouping) |
| Snippet rendering | Up to 3 snippets per result (more context worth the extra height) |
| Empty state | Curated suggested searches (4–6 hand-picked queries set the tone) |
| Supersedes link in results | Yes, inline '→ replaced by X' link on row |
| Query syntax | Full set: kind:, status:, author:, "quoted phrases" |
| Recent queries | Yes — last 5 above curated (localStorage; cheap polish) |
| Mobile focus | Sticky search field always visible (no FAB; consistent with desktop) |
Manifest is committed, not gitignored. Re-emit on every commit that touches a source .md via the pre-commit hook (Gate 4). Two reasons: (1) preview.offplan.online deploys from the committed docs/rendered/ directory — there's no CI build step that could regenerate it; (2) committing makes the manifest reviewable in diffs and surfaces accidental breakages early.
No service-worker, no offline cache. Static page; browser HTTP cache is sufficient. Re-evaluate if manifest exceeds 2 MB (we'll be far past saturation by then).
Implementation Steps
Phase F1 — Manifest schema + builder
scripts/build_search_index.py:
- CLI:
--all(default; full rebuild) or--touched <file ...>(incremental — currently writes whole manifest regardless, but signature matches/handoffStep 4.5 contract for future incremental optimisation). - Walk the canonical store:
plans/*.md(excludingINDEX.md,TEMPLATE.md,CHANGELOG.md, anything underplans/done/)workstreams/*.md(same exclusions, includingworkstreams/done/)docs/sessions/*.md(excludingTEMPLATE.md,INDEX.md)docs/decisions/*.mddocs/conventions/*.mddocs/learnings/*.md(folder may not exist; handle gracefully)- Skip
.claude/memory/*(internal context, not public content — verified with peer 2026-05-11)
- For each file, read via
scripts/lib/frontmatter.py(the canonical parser); emit one entry:
{
"slug": "<file stem>",
"kind": "plan|workstream|session|decision|convention|learning",
"status": "<status: or null>",
"title": "<H1 of body, fallback to frontmatter name:>",
"summary": "<frontmatter summary: or empty>",
"headings": ["H2: Goal", "H2: Approach", "H3: ..."],
"body_excerpt": "<first ~400 chars of body, code-fences stripped, no frontmatter>",
"tags": ["..."],
"plans_touched": ["..."],
"workstreams_touched": ["..."],
"adrs_touched": ["..."],
"supersedes": "<frontmatter supersedes: or null>",
"superseded_by": "<computed: which file declares supersedes=this; or null>",
"author": "<frontmatter owner: or 'unknown'>",
"updated_date": "<git last-commit ISO date for this file>",
"render_url": "/<slug>.html",
"about_url": "/<slug>/"
}
- Output:
docs/rendered/search/manifest.json+ a smalldocs/rendered/search/manifest.meta.jsonwith{generated_at, source_count, total_bytes}for diagnostics. - Idempotency: byte-identical output on re-run (sort entries by slug; sort arrays; pretty-print with stable separators).
- Unit tests:
tests/test_build_search_index.py(10 tests — frontmatter parsing, code-fence stripping, supersedes back-resolution, sort stability, empty-folder handling, missing frontmatter graceful fallback, headings extraction, byte-identical re-run, exclusion paths, author fallback).
Phase F2 — Search page HTML + matcher
docs/rendered/search/index.html:
<head>— same canonical CSS asrender_md.py/render_about.py(copy verbatim). Page-specific styles inlined.- Body layout:
- Sticky header: input field (full-width on mobile, capped at 720 px on desktop), filter chips below, settings toggles (date-decay on/off, grouping flat/by-kind) on the right.
- Empty-state region (shown when input is empty).
- Results region.
- Matcher (vanilla JS, ~80 lines):
- Tokenise input. Recognise inline filters:
kind:plan,status:approved,author:roman, and"quoted phrase". Strip these into a filter object before tokenising the remainder as plain keywords. - Score each manifest entry: per-token, sum across fields with weights — title ×3, summary ×2, headings ×2, body_excerpt ×1, tags ×2, slug ×3. Multi-token queries require ALL tokens to be present somewhere.
- Date-decay (when toggle is on): multiply final score by
exp(-Δdays / 90)so a 90-day-old artefact's score halves. Toggle via UI checkbox. - Filter by chip state AND inline-filter state combined.
- Group when grouping toggle is on: bucket results by kind, sort each bucket internally by score.
- Tokenise input. Recognise inline filters:
- Result rendering:
- Each row: title (link to
about_urlfalling back torender_url); kind + status pills; updated_date; author; up to 3 snippets (best-scoring contiguous ~120-char windows containing matched tokens; matches wrapped in<mark>). - If
supersedesis set: dim the title slightly; show inline→ replaced by <successor>link wired to the successor'sabout_url. Title still clickable to view the superseded artefact. - No-results: "No matches for
<query>" + "Clear filters" link if any chips active.
- Each row: title (link to
- Empty state:
- "Recent: <q1> · <q2> · <q3> · <q4> · <q5>" line from
localStorage['p11f.recent'](max 5; last in front). - "Try:" curated suggested searches in a 2-column grid (4–6 entries hand-picked; initial list:
launch plan,admin panel,stripe,canonical store,obsidian,P11e). Hardcoded in the HTML.
- "Recent: <q1> · <q2> · <q3> · <q4> · <q5>" line from
- Keyboard:
/focuses the input from anywhere on the page (intercepted only when input is not already focused).↑/↓move selection through results.Enteropens selected result.Escclears input (or blurs if already empty).
- URL hash:
#q=<encoded query>&kind=<csv>&status=<csv>&author=<csv>&tag=<csv>&decay=<on|off>&group=<flat|kind>. Read on page load; write on every state change (debounced 150 ms). - Recent-query persistence: on successful Enter into a result, push the query string to
localStorage['p11f.recent'](deduplicate, cap at 5).
Phase F3 — Pre-commit Gate 4 + handoff Step 4.5 wiring
scripts/hooks/pre-commitGate 4 — after Gates 1 (Notion-zero), 2 (frontmatter), 3 (wikilink audit): if any staged*.mdlives in the indexed paths AND/ORscripts/build_search_index.pyitself is staged, run the builder andgit add docs/rendered/search/manifest.json docs/rendered/search/manifest.meta.json. Soft-fail (warn, don't block) on builder errors..claude/commands/handoff.mdStep 4.5 — extend the regen-indexes call site to also invokepython3 scripts/build_search_index.py --touched <files_touched_this_session>. Same soft-fail behaviour.- Cross-platform Bash 3 compat — match the pattern set by Gate 2's pre-commit fix (no
mapfile).
Phase F4 — Cross-page search affordance
The search page is useless if nobody knows it exists. Add a small "🔎 Search" link to:
scripts/build-rendered-index.py— top-right of the header, near the filter chips. Add inside the existing P11e bracket if cleanest, or in a new clearly-commented# === P11f search link ===block.scripts/render_about.py— same link in the about-page footer or header (author's call which fits better visually).scripts/render_md.py— already emits a back-to-index link; extend it to also link to search. Tiny change.
Phase F5 — One-pass spot check + smoke test
After F1–F4 ship:
- Build manifest with
--all; verify it weighs <1 MB (~200 entries × ~5 KB each ≈ 1 MB; we're not there yet, so expect ~100–300 KB now). - Spot-check 5 queries against expected results:
stripe(should return Notion-sync + billing-related artefacts)P11f(should return this plan + the vault workstream + the brief-expansion session)forge(should return forge-vault-setup workstream + the F1–F7 work)CONV-30(should return the legacy CONV-30 trinity if theirbody_excerptmentions it)kind:plan canonical(should return repo-as-canonical-store plan + this preview-search plan)
- Mobile sanity check via Chrome DevTools MCP (responsive viewport): sticky field stays sticky; tap-to-focus works; results are readable at ~360 px wide.
- Idempotency proof: run builder twice, assert
git statusis empty after the second.
Files
New:
plans/preview-search.md— this plan.scripts/build_search_index.py— manifest builder (~250 LOC stdlib).docs/rendered/search/index.html— search page (~400 LOC: HTML + CSS + vanilla JS).docs/rendered/search/manifest.json— generated; committed (deploys to preview.offplan.online).docs/rendered/search/manifest.meta.json— generation diagnostics; committed.tests/test_build_search_index.py— 10 unit tests.workstreams/preview-search-p11f.md— new P1 workstream that consumes this plan (F1–F5 phases above).
Modified:
scripts/hooks/pre-commit— Gate 4 added (manifest auto-regen on staged source change).scripts/build-rendered-index.py— add# === P11f search link ===block adding the search affordance to the index header.scripts/render_about.py— add search link in about-page header/footer.scripts/render_md.py— extend back-to-index link with search..claude/commands/handoff.md— Step 4.5 extended to callbuild_search_index.py.workstreams/repo-as-canonical-store-vault.md— tick P11f when workstream lands; cross-link topreview-search-p11fworkstream and to this plan.
Dependencies
- P11a (regen_indexes.py) — already shipped; the search-link affordance composes with the index page produced there.
- P11b (render_md.py) — already shipped; provides the
<meta>tag contract that the indexer consumes. - P11e (render_about.py) — already shipped; the search results link to
about_url. If P11e ever changes URL structure, manifest'sabout_urlfield follows. - P12 (audit_links.py) — already shipped; this plan inherits the pre-commit gate pattern.
No external dependencies (no npm packages, no Python deps beyond stdlib).
Testing
tests/test_build_search_index.py— unit-level coverage of the indexer.- Spot-check queries (manual, Phase F5) — 5 specific queries with expected result shape.
- Cross-platform Bash compat — pre-commit hook tested on Roman's Mac (Bash 3 system) + Forge (Bash 5 via Homebrew).
- Mobile rendering — Chrome DevTools MCP responsive viewport pass.
No end-to-end browser-test framework. The page is small enough to verify by eye + DevTools. If the page grows past ~600 LOC of JS, revisit with a Playwright smoke test.
Workstreams
One workstream: workstreams/preview-search-p11f.md (P1, plan-gated creation per CONV-6780). Consumes all phases F1–F5. Roman or Sergei to /build over 1–2 sessions.
Risks
- Snippet rendering escape.
<mark>insertion into raw body text must HTML-escape first, then wrap matches. Forgetting that injects markup. Mitigation: explicitescape_html()helper in the JS, called before<mark>wrapping; unit test the escape path. - localStorage privacy. Recent queries persist on the device. Documented in the empty-state UI ("Recent: …" line) so users see what's stored. No PII expected at the corpus scale, but worth a one-line note in
docs/conventions/preview-site.md(if/when that file ships). - Manifest staleness. If the pre-commit hook fails silently and the manifest doesn't regenerate, search results lag the source by one commit. Mitigation: Gate 4 prints a loud warning on failure;
/handoffStep 4.5 also calls the builder, providing a second checkpoint. - Mobile keyboard pop-up on
/keystroke. The/shortcut should NOT fire on mobile (no physical keyboard; the slash is just a character). Mitigation: detect touch device + skip the global listener. - Date-decay sensitivity. A 90-day half-life is a guess. Roman should be able to tune it (out of scope for v1 — exposed only as on/off toggle initially; if Roman wants finer control, expose half-life slider in v1.1).
- Indexing performance. ~200 entries × frontmatter parse × git log per file could be slow if implemented naively. Mitigation: bulk
git log --name-only --format=...in one shot, build path→date map up front; profile before optimising further.
Evaluation
Two weeks after ship (target 2026-05-25):
- Usage signal. Roman + Sergei manually log when they used search in
docs/learnings/(cheap proxy; no analytics infra). If neither has used it in 2 weeks, revisit the discoverability surface — likely the cross-page link needs to be more prominent. - Latency check. Run
/check-deploy-latency search(if/when such a tool exists; otherwise eyeball Chrome DevTools network panel). Should still be <500 ms on LTE. - False-negative audit. Pick 5 things Roman half-remembers but can't find via search. If any are matches against indexed content that didn't surface, that's a scoring bug, not a feature gap — fix in v1.1.
Out of scope (deliberate)
- Semantic / embedding search (Plan ?, future; corpus would need to grow past ~500 entries before the cost-benefit flips).
- Full-corpus indexing of binary assets (PDFs in
docs/rendered/, images, design system files). - Save-search / share-query-URL features beyond the basic
#q=hash already in scope. - Notion-side mirror of search results (Plan 2 territory; the Worker syncs source, not derivative data).
- Personalised ranking ("results Roman tends to click rise"). Same calculus as date-decay v1.1.
See Also
- Workstream: repo-as-canonical-store-vault — § P11f bullet + § Search brief (P11f, surfaced 2026-05-11) — original brief that this plan ratifies.
- Plan: repo-as-canonical-store — parent plan; P11f is the search phase of the vault polish workstream.
- Convention: frontmatter — schema this indexer consumes.
- Convention: obsidian — link-shape rule; pre-commit gate pattern (Gate 4 of this plan inherits from Gate 3 there).