Law Firm Wiki Compiler
Compile institutional PI practice knowledge from FirmVault activity logs into a structured Obsidian wiki using Karpathy's LLM Knowledge Base architecture. Use when adding new cases, recompiling, querying, or linting the law firm wiki.
Law Firm Wiki Compiler
When to Use
- Adding new cases or old case archives to the wiki
- Recompiling after activity log updates
- Querying the wiki for institutional knowledge
- Running lint/health checks on wiki articles
- Generating Hermes skills from wiki articles
Excel Ingestion (FileVine Activity Exports)
When Aaron sends an Excel spreadsheet of activity logs from FileVine:
Expected format
- Sheet columns:
Project Name | Note Text | Created At | (empty) - Project Name = "Client Name CaseType MM/DD/YYYY" (e.g., "Amy Stich WC 01/17/2024")
- Note Text = markdown-formatted activity notes (may contain FileVine links, strikethroughs)
- Created At = datetime
Conversion steps
pip install openpyxlif needed- Load with
openpyxl.load_workbook(path, read_only=True) - Slugify case names per FirmVault rules (lowercase, strip apostrophes/quotes, & → and, non-alnum → hyphens)
- Group entries by case, then by date within each case
- Write to
FirmVault/cases/<slug>/Activity Log/<YYYY-MM-DD>.mdwith frontmatter:schema_version: 2 date: "YYYY-MM-DD" category: imported subcategory: settlement_activity_export - Use the
subcategory: settlement_activity_exporttag to identify imported-from-Excel cases later
Multiple files in one session
Aaron often sends multiple Excel files in sequence. Process each one fully (convert → batch → compile → rebuild index) before asking for the next. The converter handles deduplication automatically — if a case dir already exists, new logs append; if a log file for that date exists, it appends an "Imported Entries" section.
Sizing reference (2026-04-12 imports)
- File 1 (settlement_1): 17,639 rows → 198 cases → 6,221 log files (13.7 MB)
- File 2 (settlement_2): 22,182 rows → 169 cases → 7,341 log files (12.9 MB)
- File 3 (settlement_3): 688 rows → 8 cases → 158 log files (small)
- File 4 (closing): 9,363 rows → 125 cases → 2,924 log files
- Conversion takes ~2 seconds per file
- Duplicate detection: compare row count + first/last row to identify resends
Batch size decisions
- >50 cases: 3 parallel subagents (split evenly by log count)
- 10-50 cases: 1-2 subagents depending on log volume
- <10 cases: Single subagent with targeted article updates only. Do NOT have it read all existing articles — point it at the 5-6 most likely articles to update. Set max_iterations=30 to avoid running out of turns on reading.
Reusable converter script
Save to /tmp/convert_excel.py, swap the path for each new file. The script:
- Uses openpyxl (pip install if missing)
- Slugifies per FirmVault rules
- Groups by case → date → writes markdown with frontmatter
- Reports new vs updated case dirs
Architecture
Karpathy's 3-layer pattern: raw sources → LLM compiler → structured wiki
Layer 1: Raw (immutable)
cases/*/Activity Log/*.md — 21K+ activity logs
cases/*/*.md — case files
Layer 2: Wiki (LLM-maintained)
wiki/
Home.md — Obsidian dashboard
index.md — master catalog
log.md — compilation history
concepts/*.md — atomic knowledge articles (63 as of 2026-04-12)
connections/*.md — cross-cutting insights (26 as of 2026-04-12)
AGENTS.md — compiler schema (the spec)
SPEC.md — architecture doc
Layer 3: Consumers
Hermes semantic skills, OpenClaw agents, Aaron via Hermes
Compilation Process
Batch Processing (for bulk cases)
- Group cases into batches of ~80K tokens
- Delegate 3 batches in parallel
- Each subagent reads AGENTS.md, existing articles, case files + sampled logs
- Subagents UPDATE existing articles (evidence_count++) or CREATE new ones
- Do NOT let subagents rewrite index.md (race condition) — rebuild after
- Rebuild index.md from all articles on disk after all batches complete
Key Instructions for Compiler Subagents
- Read AGENTS.md for full schema
- Read ALL existing concept + connection articles before writing
- ANONYMIZE all PII (use "Case A", "Case B", etc.)
- UPDATE existing > CREATE new (upgrading confidence is the goal)
- Confidence: low (<5 cases), medium (5-9), high (10+)
- Use [[wikilinks]] between articles
- Append to log.md, do NOT rewrite index.md
Sampling Strategy
- Large cases (400+ logs): first 40 + last 40 chronologically
- Medium cases (100-400): first 25 + last 25
- Small cases (<100): first 10 + last 10, or all
Subagent Prompt Template
Law Firm Wiki compiler. Read /opt/data/FirmVault/wiki/AGENTS.md.
Read existing articles in wiki/concepts/ and wiki/connections/.
Compile cases: [LIST]. For each: read cases/<slug>/<slug>.md and
sample first N + last N activity logs. UPDATE existing articles
(increment evidence_count, upgrade confidence: 5=medium, 10=high).
CREATE new only for genuinely new patterns. ANONYMIZE PII.
Write to wiki/. Do NOT rewrite index.md. Append to wiki/log.md.
Adapt prompts to data category
Different Excel exports contain different types of data. Add a focus hint:
- Settlement files: "Focus on: settlement patterns, negotiation tactics, treatment timelines, SOL management, adjuster behavior, lien resolution"
- Closing files: "These are CLOSING cases -- look especially for: case closure workflows, decline reasons, final disbursement, file archival, post-closing obligations, client termination patterns"
- Intake files: Focus on onboarding, insurance verification, initial treatment referrals This dramatically improves pattern extraction quality.
Index rebuild
Always rebuild index.md as a separate delegate_task after all compilation batches complete. Even for small batches. The subagent just needs to parse YAML frontmatter from all .md files in concepts/ + connections/ and generate the index per the schema in AGENTS.md. Takes ~60 seconds, max_iterations=15.
Obsidian Vault
The wiki/ directory IS an Obsidian vault:
- .obsidian/ config with graph colors (blue=concepts, orange=connections)
- Home.md as landing page
- [[wikilinks]] use slug names (NOT path-prefixed)
- Graph view shows article interconnections
Wikilink Rules
- Use
[[slug-name]]not[[concepts/slug-name]] - Obsidian resolves by filename, paths break links
Filtering Cases for Compilation
Two approaches — use the Excel file directly (preferred) or scan the vault:
Preferred: Extract slugs from the Excel file itself
# Parse Excel → get unique Project Names → slugify → batch
wb = openpyxl.load_workbook(path, read_only=True)
cases = Counter(str(r[0]).strip() for r in list(wb.active.iter_rows(values_only=True))[1:] if r[0])
slugs = [{"slug": slugify(name), "logs": count} for name, count in cases.items()]
This is precise — only compiles what was just imported.
Fallback: Scan vault by subcategory tag
for slug in os.listdir(cases_dir):
for logfile in os.listdir(log_dir):
if "settlement_activity_export" in open(logfile).read(200):
new_slugs.append(slug)
break
Do NOT use mtime-based filtering — it picks up every case in the vault (including old ones whose dirs were touched during conversion).
Pitfalls
- Parallel subagents cause race conditions on evidence_count — accept ±3 variance
- Don't let subagents rewrite index.md — rebuild it yourself after all batches
- Large cases (1000+ logs) must be truncated — sample strategically
- Wikilinks with path prefixes break in Obsidian — strip
concepts/etc. - The compile.py script generates prompts but doesn't call the LLM directly — use delegate_task
- Some articles reference aspirational links (articles not yet created) — that's OK, they'll be created as more cases are compiled
- mtime-based vault scanning doesn't work for identifying "just imported" cases — conversion touches existing dirs too. Always extract the case list from the Excel file itself.
- Closing cases are mostly declines, not post-settlement closures. The decline/close workflow gets the biggest evidence boost from closing data, not the settlement disbursement workflow.
- Small batches (<10 cases) exhaust subagent iterations if you have them read all 89 articles. Point them at specific articles instead.
Multiple-File Workflow
When user sends multiple Excel files, convert all first then compile:
- Reuse /tmp/convert_excel.py — just patch the filename for each file
- After all converted, batch the NEW cases only (use slugify + check existence)
- Compile in 3 parallel batches, then rebuild index once at the end
Duplicate Detection
User may send the same file twice (same name, different doc ID). Compare row counts + first/last row to detect dupes before converting.
Sizing from Imports
- File 1 (settlement_1): 17.6K entries, 198 cases, 6.2K log files
- File 2 (settlement_2): 22.1K entries, 169 cases, 7.3K log files
- File 3 (settlement_3): 688 entries, 8 cases (small — single-batch)
- File 4 (closing): 9.3K entries, 125 cases, 2.9K log files
- Files 5-7 (archived 2,3,4): 64.3K entries, 692 cases, 21K log files Total ingested: ~114K entries, 1,170 cases, ~56K log files → 93 wiki articles
Preferred Batching
- <20 cases: single subagent, no batching
- 20-300 cases: 3 parallel subagents
-
300 cases: 3 parallel subagents with aggressive sampling (first 10 + last 10)
- Always rebuild index.md AFTER all batches complete (never let subagents touch it)
Pitfall: mtime-based filtering unreliable
Don't use file mtime to find "new" cases — convert_excel.py touches existing files too. Instead, extract case names from the Excel directly and slugify to get the target list.
Files
- FirmVault: /opt/data/FirmVault
- Wiki: /opt/data/FirmVault/wiki/
- Schema: wiki/AGENTS.md
- Converter: /tmp/convert_excel.py (patch filename between runs)
- Article counts: 65 concepts + 28 connections = 93 total (as of 2026-04-12)
- Decisions: /opt/data/FirmVault/decisions/ (ADR-000 through ADR-006)
- Audit report: /opt/data/FirmVault/wiki/reports/workflow-vs-wiki-audit.md
- v2 proposal: /opt/data/FirmVault/wiki/reports/PHASE_DAG_v2_proposal.md
Workflow Auditing
After a major compilation round, audit the wiki against the PHASE_DAG:
- Read PHASE_DAG.yaml (prescribed workflow)
- Read all wiki articles (observed reality)
- Compare: contradictions, gaps, redundancies
- Write audit report to wiki/reports/
- If changes warranted, draft PHASE_DAG v2 proposal
- Document decisions as ADRs in decisions/ (cherry-picked from stirps-ai/stirps-gov)
This audit is what turned 93 wiki articles into actionable architectural decisions. The wiki is evidence; the ADRs are commitments.
No additional documents ship with this skill.
Related Skills
Authorization Packet Summary
Validates and summarizes the four required authorization forms in a personal injury matter (retainer, HIPAA release, insurance authorization, employm…
Case Viability & Conflict Check Report
Produces internal case viability and conflict check memos for personal injury litigation intake. Screens conflicts, assesses liability and damages, c…
Complaint for Negligence
Drafts a court-ready Complaint for Negligence establishing duty, breach, causation, and damages with jurisdiction-aware pleading standards (Twombly-I…
Personal Injury Damages Calculator
Builds auditable personal injury damages calculations separating special damages (medical, wage loss, out-of-pocket) from general damages (pain and s…
Lost Wage & Economic Loss Report
Generates a Lost Wage & Economic Loss Report for personal injury litigation. Calculates past/future lost wages, diminished earning capacity, medical …