← Back
Researching

Working title

Catena

An agent-native authority graph of Delaware case law. Westlaw and Lexis were built for human eyes; this one is built for humans and agents as peer first-class users.

Authority cockpit search interface showing Celera Corporation Shareholder Litigation results Authority field view with 3D node visualization of related cases Citation lab interface showing authority cone for a Delaware case

UI explorations from the research phase — authority cockpit, authority field, citation lab

The Thesis

Westlaw and Lexis were built decades ago for lawyers running keyword search on a desk computer. Both have been retrofitted for the LLM era — Westlaw AI-Assisted Research, Lexis+ AI, CoCounsel — but the underlying data substrate is still organized around pages of prose and human-readable headnotes. The strategic claim of this project is that there's a real opening for a tool designed from the ground up to serve human and machine consumers of legal authority as peer first-class users — neither retrofitted on top of the other.

Two pieces of evidence anchor that claim.

The first is that hallucination in legal AI looks architectural, not solvable through better retrieval. Stanford's Dahl et al. found that general-purpose LLMs hallucinate on 58–88% of legal queries; Magesh et al. showed that purpose-built legal RAG tools still hallucinate 17–33% of the time, and the dominant failure is misgrounding — a real-looking citation pointing to a real case that doesn't actually stand for the cited proposition. As of April 2026, U.S. courts have sanctioned attorneys in over 1,300 cases for AI-fabricated citations. The architectural answer is to surface authentic source passages from a structured graph rather than generate analytical text. The system shouldn't be able to fabricate, by construction.

The second is that access to justice is structurally aligned with the commercial thesis. Roughly 92% of low-income civil legal problems in the United States receive no legal help. The same architectural commitments that make a tool defensible for a litigation partner — zero content generation, source-anchored atomic operations, an open-data substrate — make it safe to put in front of a pro se litigant who has no lawyer to verify outputs. Same system, two audiences.

The Vision

The product is a structured knowledge graph of Delaware case law. Every decision broken into paragraphs. Every paragraph linked to every other paragraph in the corpus that relates to it. Every link carrying two pieces of information: what kind of relationship it is, and how strong the relationship is.

Direct verbatim quotes become heavy edges. Distantly related concepts become light edges. Citations, treatments (overruling, distinguishing, criticizing), doctrinal applications, factual similarities, statutory constructions, and embedding-based semantic similarities all live in the same graph as different edge types. A human researcher or an autonomous agent can navigate by following links by weight and by type.

"See all of the law" — every paragraph of every Delaware decision, navigable by following the links.

Both faces of the product — the human-facing research interface and the autonomous-agent client — hit the same backend through the same Model Context Protocol server. No parallel stacks. The atomic operation set is small and on purpose: find all cases citing X with treatment Y, trace the evolution of doctrine D over a date range, return the authority cone for this paragraph at weight ≥ 0.3.

Why Delaware

The proof of concept is scoped tight: Delaware case law, all courts, via the Caselaw Access Project. About 39,278 opinions from 1776 to 2023, currently sitting on my machine as Parquet from Harvard's COLD Cases dataset. Not federal, not multi-state, not the whole country.

Delaware is the right corpus for a first build. It's the practice area I know best as a litigator, which means I can hand-label the gold-set evaluation pairs without outsourcing judgment. The citation density in Delaware Chancery is unusually high. And it's the corpus that matters for the AI legal-tech companies actually building toward this space. Architecture proven on Delaware extends naturally to A2J-canonical jurisdictions — housing, family, immigration, small claims — in a future phase.

Where It Stands

This is a serious research project, not yet a build. The discipline I'm holding myself to is: do the research before the spec; do the spec before the code. Every technical decision has to be traceable to a research output or explicitly noted as a deliberate exception.

17
Research Outputs
7
Architecture Decisions
39,278
Opinions On Disk

What's already settled, from research and synthesis:

Treatment Taxonomy

16/5 hierarchical schema from Demir & Canbaz (2025), derived from Hellyer (2018). Not invented; adopted with reason.

Granularity

Paragraph as the primary unit, with sentence-level offsets where finer granularity helps. Document-level was rejected.

Edge Schema

One treatment edge type plus 19 non-treatment edges. Multi-label is required; ~19% of citing relationships have more than one acceptable label.

Evaluation

Severity-weighted error metric, not standard accuracy. A 40-relationship Delaware gold set is being hand-labeled for the canonical benchmark.

Storage

Postgres primary with Neo4j mirror. Append-only temporal edges with as_of_date so the graph is queryable as of any point in time.

Interface

MCP-primary, thin REST secondary, stateless. Three-tier auth: anonymous A2J, API-key developer, OAuth commercial — all hitting the same server.

What's still pending: technical-spec sign-off, the Delaware gold-set annotation pass, and the M1 build itself.

The Slow Way

This is the project on my desk where I've deliberately slowed down twice. Most of what I build is faster — a weekend prototype, a working app within a month. Catena isn't that. The whole strategic thesis depends on getting the substrate right, and the substrate is a typed knowledge graph with a benchmark and a governance model and an open-data path. That's worth not rushing.

The research-before-spec posture is also a hedge against my own enthusiasm. There's substantial prior art here — Hellyer's taxonomy, Qura's graph architecture (now part of Legora), the LKIF and KeyCite/Shepard's heritage, the CourtListener data model. The fastest way to move past it confidently is to read it first.

The name Catena is a working title. In legal usage, a "catena of cases" is a chain of authority — a sequence of decisions that together support a proposition. It seemed like the right name for a system whose core operation is linking authority to authority by typed weight.