On this pageSoftware can check itself. Law can't.
OpenAgreements

Why OpenAgreements Exists

Toward verifiable legal AI, and why we think it's a standards problem.

More details about this document
Editor
, OpenAgreements editor
License
CC BY 4.0

Software can check itself. Law can't.

In software, you usually find out when you are wrong. You run the test; it passes or it fails. That one fact, that correctness is checkable, is the quiet reason AI got so good at code so quickly. There is a tight loop between an attempt and a verdict, and a model can run that loop a million times a day.

Law has no comparable verification loop. Hand the same deal to two excellent lawyers and you get two different agreements, both correct, and they might still disagree over which is better. There is no software-style test you can run to score which one is more conformant to a shared standard. We think that gap, more than model size or context length, is much of why legal AI lags coding AI. The bottleneck is not that the answers differ, since they differ in code too; it is that the legal field has not built a shared, fast-running test that returns pass or fail.

This is the problem OpenAgreements exists to work on. This document explains how we think about it.

Correctness by convention

Law is not the only field that faces a verification challenge. Many fields that began with no verifiably right answer solved it the same way: they agreed on one solution by convention and wrote that agreement down in a form precise enough to check against.

There is no cosmically correct side of the road to drive on. The US drives right, the UK drives left, both perfectly safe; what would be catastrophic is renegotiating it at every intersection. There is no uniquely correct way to cut a tapered pipe thread, so American industry simply decided. For a given nominal size, NPT fixes the form (a 60° angle, the taper, the pitch), not because it is the only design that holds pressure, but so a plumber in Ohio and a supplier in Texas mean the same thing by a quarter-inch. In computing, the same holds for HTML and USB: a page or a device conforms to the published spec, or it does not.

Some of what looks like legal disagreement is just the absence of a convention. A great deal of transactional law could be made more uniform this way: the precise wording of common boilerplate, or which terms a given kind of agreement should contain at all.

Standard forms like Bonterms, Common Paper, and the YC SAFE go further, standardizing whole agreements at the level of the template, a shared we drive on the right for an entire category. A checklist works one level up, as a spec: the set of things a document must contain to count as its kind, the way a vehicle counts as roadworthy once it passes inspection. That leaves a lawyer free to keep their own form while still checking it against a shared standard, and it asks for less trust than adopting someone else's template. Wherever a category already has a stable SALI LMSS or FOLIO identifier, it can have a spec.

One of the most complex specs we ever agreed on

One of the most intricate systems we coordinate on every day is the Internet. The conformance suite web browsers are tested against carries on the order of two million individual checks, not the spec itself but its executable shadow: millions of checkable assertions about what a browser must do. And yet you type a URL on any device, any browser, any operating system, and it works. That is a staggering act of coordination, and it was built, not discovered.

We have spent a little time inside that machine. UseJunior, the maintainer of OpenAgreements, is a verified participant in the WHATWG, the body that maintains the core web standards, and contributed a small, merged change to the Document Object Model (DOM) specification. The change resolved an issue raised by Olli Pettay, a longtime Mozilla engineer who works on the DOM and Firefox's Gecko engine, and it was reviewed and approved by the standard's editors. It was a small contribution, merged like any other, but it was enough to make the rigor of that machine concrete to us rather than theoretical. A living standard that large stays coherent because practice, tests, and written requirements keep correcting one another.

The standards bodies did not invent the web from theory. They reverse-engineered what browsers already did, found the consensus, and formalized it, turning de-facto practice into a written specification precise enough to test against, such as the DOM living standard. The common law and the standard forms already encode an enormous amount of settled practice in much the same way. Nobody, to our knowledge, has reverse-engineered it into an internet-style spec yet. The claim that the law is too messy to standardize is no longer an excuse; it is a to-do list.

MUST, SHOULD, MAY, and the two kinds of verifiable

The web solved a subtle problem with three words. Not every requirement is equally binding, so in 1997 the community published RFC 2119, written by Scott Bradner, fixing the meaning of three capitalized words, MUST, SHOULD, and MAY, and specifications since then read against that grammar. It is how millions of checks stay organized instead of collapsing into mush.

Law can be grouped into the same three registers. Some rules are best characterized as MUSTs: an employer's agreement with an employee or contractor that governs trade secrets must include the whistleblower-immunity notice the federal Defend Trade Secrets Act requires, or the employer cannot recover exemplary damages or attorney fees in a trade-secret action against a worker who never received it. Others are more specific or state-specific: in Wyoming, a non-compete cannot bind a physician, no matter what the parties sign. A careful drafter SHOULD follow market practice, though courts do not generally compel it. A provision MAY be included, optional rather than expected, and every MAY carries a trigger: when the triggering facts are present, the option becomes relevant; when they are absent, it is left out.

Those levels do more than tidy things up. They split verifiable into two meaningfully different problems:

  • For a MUST, correctness means the clause faithfully tracks binding primary law. You verify it against the statute, regulation, or case-law holding, with an addressable citation, one a person or an AI can resolve. This is not a convention anyone gets to choose; it is the unglamorous, jurisdiction-by-jurisdiction work of sourcing and checking.
  • For a SHOULD or a MAY, there is often no single right answer, which is where a convention that arose in practice needs to be written down, the way the web standards bodies formalized how authors were already using HTML into a written standard. Our goal here is not to invent a convention but to document the one practice has already established and make it legible.

Even genuinely discretionary questions are not a dead end. When the law calls for a totality of the circumstances, that sometimes means the factors are real but their relative weights are hard to discern, or are still being worked out case by case. We can still name the factors, cite the sources, and preserve the uncertainty, so a lawyer sees what the AI saw. A regulator or a counterparty may still disagree, but the disagreement now has a surface: a factor, a source, a check. Where an area is settled, we structure it; where it is unsettled, we mark it with a caution and move on, rather than pretend to resolve it.

Keep the registers separate and you can say, clause by clause, what conformant even means.

Standing on the standards that already work

A standard that ignores the standards already in use is not a standard; it is a divergence, or as engineers say, a fork. That risk applies to us as much as to anyone: if we reinvented what already exists, we would only add to the noise we set out to reduce. Before you can check whether a provision is conformant with a requirement, it is necessary to agree on what kind of provision it is, and there is already open infrastructure for that. SALI's LMSS and the FOLIO ontology provide open legal taxonomies with stable identifiers for legal concepts, including matter and document classifications. We build on them rather than reinventing them, the way many later internet standards reference RFC 2119 (BCP 14) rather than minting new modal verbs.

None of this is the first attempt to make law machine-readable. Legal informatics has decades of serious work behind it: Akoma Ntoso, LegalXML, LegalRuleML, SALI/FOLIO, and templating systems like the Accord Project. The gap we care about is narrower: contract drafting still lacks an open, practical conformance layer usable by AI, a way to check a contract clause against both the primary law it must track and the conventions the field has settled on. By a conformance layer we mean two things: a set of cited requirements, and a repeatable way to check a clause against them, marking each requirement as satisfied, missing, in conflict, or needing human judgment. A practical way to build a useful standard is to build on the ones that already work. We are assembling, not inventing.

What we found when we tried it

We picked two fairly messy areas of law, privacy notices and restrictive covenants, and tried to turn the practice into a primary-law-backed, RFC-2119-style spec. At one point we tried to skip the hard part: instead of authoring the structure by hand, we asked whether the data could hand it to us.

We embedded clauses drawn from a corpus of more than 100,000 public agreements on EDGAR and ran a battery of unsupervised methods, the kind meant to surface latent structure in a corpus, to see whether the distinctions lawyers actually negotiate would emerge on their own. For the broad subject of a clause, they partly did. But the fine drafting knobs, such as whether a covenant's clock keeps running during a breach or how its geographic reach is drawn, did not reliably surface; with the tools and the sample we had, the signals stayed weak and diffuse (the tolling signal, for instance, came in around a correlation of 0.14, barely above noise). We do not take that as proof the structure is absent from the data, only that we could not recover it this way with today's frontier tools. When we instead supervised the process, naming each concept and pointing to examples, the distinctions resolved, and we used that, together with editorial judgment, to author the requirements by hand.

We do not claim this is impossible to automate. AI keeps getting more sample-efficient, and one day a method may recover these distinctions directly. We could not, at the scale we had, so we did the labelling work by hand and iteratively checked, with the assistance of AI, how well our categories explained the observed variance in the corpus. That is the unglamorous part, and it is the kind of work that does not get done when you are optimizing for a demo.

The expensive step is distilling primary law into a labeled, cited requirement. Once that distillation is public and free, checking a given contract against it is cheap and fast. OpenAgreements' focus is that slow pass from law to requirement, so that downstream legal AI tools can do the many quick passes from requirement to document. That asymmetry is much of why the distillation should be shared rather than rebuilt in private, and why we welcome contributions from other experienced lawyers.

The checklist

What comes out of that slow work is mundane: a checklist. Each requirement carries its level (MUST, SHOULD, or MAY), the condition under which it applies, and a citation back to the authority it came from. That is the core architecture, and its plainness is the point. Primary law, the statutes, regulations, and cases themselves, stays outside our system; we are not aiming to be the system of record for primary law, and we point back to the public sources that already serve that role, such as the Free Law Project's CourtListener. The checklist is the key distillation we add, and every claim on it points back to a source a person or a machine can resolve. The AI applies the checklist; the lawyer owns the call. This mirrors a chain of custody a careful lawyer already keeps: from primary law to the requirement, from the requirement to the document, from the document to the decision. The link from each requirement to its source is inspectable, and a user who wants it can keep the link from a document back to the requirement as well.

The checklist also gives any connected AI something it usually lacks: an open definition of done for a given clause or contract type. Legal AI does not only fail by hallucinating; it can also draft fluently while leaving out something the law requires. A confidentiality agreement can read perfectly and still omit a required whistleblower protection: in SEC-regulated contexts, Rule 21F-17(a) makes it unlawful to impede someone from communicating directly with SEC staff about a possible securities-law violation, and the SEC has been enforcing it actively, with seven public companies paying more than $3 million combined in September 2024. With an explicit checklist, an AI draft can be compared point by point against the requirements, so the gap between it and a high-quality human draft becomes a specific list of missing or weaker requirements rather than a vague sense that something is off. An agent pointed at the checklist can say which MUSTs apply, which the draft satisfies, where it deviates, and where a human still has to decide.

One requirement, many spokes

A jurisdiction's law shows up in four places at once: the practice note that explains it, the checklist that tests it, the template that drafts around it, and the example phrasings that show how others wrote it. The tempting way to keep them consistent is to wire each to the others — note to template, checklist to template, note to checklist. That is a mesh, and a mesh rots: every pair is a private contract no one remembers, and renaming a clause silently drops a note three files away.

So we don't. Each legal requirement is one thin record — a stable slug, an RFC-2119 statement, a taxonomy category, and nothing else. Everything else is a spoke that names the requirement by its slug. The practice-note paragraph that proves the law, the drafting note that tells you how to comply, the template clause, the preferred wording, the phrasings mined from public filings — each one points in at the requirement. The requirement points at nothing. Add a whole new corpus tomorrow and the hub doesn't change; the new records just carry the slug.

The substance lives where its evidence lives. A drafting note that says tie the duration to a specific interest belongs next to the Wyoming paragraph and the case it rests on — not copied into the spec as a free-floating assertion, where the next person to update the law will never see it. The practice note is the source of truth for what the law is; the requirement is only a distilled, machine-checkable handle on it.

And a requirement has to earn its place. We don't let one exist without backing: a MUST has to point to a cited paragraph that proves it is the law; a SHOULD needs that, or at least two real example phrasings; a MAY needs at least one — because you cannot call something an option if no one, anywhere, actually drafts it that way. The discipline runs the other direction too: you cannot assert the law without a citation, so requiring every requirement to rest on a paragraph buys citation-checking for free.

The slug does one more thing. It is long enough to mean something — narrow-tailoring-to-legitimate-interest, not req-0042 — which makes it a checksum for the age of AI. The check is semantic, not textual: you can reword the paragraph all you like and the slug never moves, but the day the law flips from courts will narrow an overbroad covenant to courts will void it, the slug stops describing the paragraph it sits beside. That mismatch is the alarm. A meaning change is a supersede — retire the old slug, add its opposite — never a quiet rename. And because every read of a paragraph-with-its-slug silently asks does this still fit?, the corpus is checked thousands of times a day by whoever is reading it, instead of once by a job that someone has to remember to run. Software can check itself; this is one of the few places law can too.

Why we give it away

We concede there is no near-term business in building this. It is slow, unglamorous, shared infrastructure: jurisdiction-by-jurisdiction sourcing, one labeled requirement at a time, maintained in public as the law changes. It is the kind of work that does not get done when you are optimizing for a demo, but it makes things a little better for the community if it is built once, in the open. We are doing it because we think it should exist, and because we would like a future in which sound legal infrastructure is more widely accessible than it is today. We are grateful to the lawyers who contribute their time to this project, even just by filing an issue against the open-agreements/open-agreements repo.

This is not legal advice, and not a consumer compliance oracle. The artifact is a practitioner reference, something a lawyer or legal team points an AI system at, inspects, and verifies against the cited primary law. The judgment stays with the lawyer.

We have started narrowly and on purpose, with privacy notices and restrictive covenants, built on SALI/FOLIO and RFC 2119, with requirements backed by addressable citations, free and open. If you find this kind of work interesting, turning a body of law into something a little more checkable, we would genuinely like to hear from you at hello@openagreements.org. We expect that most lawyers who come across this will simply use it as a reference, and that is fine.

What it looks like

The goal is plain: a contract should be checkable against a citable standard. For each requirement we publish the rule, the primary-law citation it tracks, and real examples, in a form a person and an AI can read together.

{
  "requirement_id": "address-tolling-during-breach",
  "modal": "SHOULD",
  "criterion": "If the agreement contains a noncompete or non-solicit, it SHOULD state whether the Restricted Period is tolled while the worker is in breach.",
  "sources": [{ "type": "common_law", "text": "Tolling or extension of restrictive-covenant periods", "url": "..." }],
  "checkers": "./checkers/address-tolling-during-breach.json",
  "examples": "./examples/address-tolling-during-breach.json"
}

The criterion is the rule a person reads and the rule the machine checks. Each requirement links to its checkers and to worked examples anyone can inspect or run. We have started publishing this shape.

We think the most useful thing we can publish is not just a better answer, but a standard that helps lawyers check their AI's answers.

Steven Obiajulu, OpenAgreements

Sources

Scholarship

1 Convention: A Philosophical StudyPDF

A community facing no uniquely correct answer can coordinate on one convention and rely on it.

Different coordination equilibria do not have to be equally good—only good enough so that everyone is ready to do his part if the others do.

See David K. Lewis, Convention: A Philosophical Study 70 (Harvard University Press 1969).

Scholarship

2 The Strategy of ConflictPDF

Focal points let people coordinate their expectations without communicating.

Most situations—perhaps every situation for people who are practiced at this kind of game—provide some clue for coordinating behavior, some focal point for each person’s expectation of what the other expects him to expect to be expected to do.

See Thomas C. Schelling, The Strategy of Conflict (Harvard University Press 1960).

Technical standard

3 NPT Taper Pipe Threads (ANSI/ASME B1.20.1)

NPT fixes a 60-degree thread angle and a 3/4-inch-per-foot (1-in-16) taper, per ANSI/ASME B1.20.1.

Angle between sides of thread is 60 degrees.

See ANSI/ASME B1.20.1, Pipe Threads, General Purpose (Inch); NPT specifications via Engineers Edge.

Scholarship

4 Standardization and Innovation in Corporate ContractingPDF

Standardized contract terms gain value through learning and network effects as more parties use them.

Network products become more valuable as their use becomes more widespread.

See Marcel Kahan & Michael Klausner, Standardization and Innovation in Corporate Contracting, 83 Va. L. Rev. 713, 725 (1997).

Data provider

5 WPT: An Overview and History

The web-platform-tests conformance suite carries on the order of 1.8 million subtests across more than 56,000 test files.

WPT has over 56,552 tests and counting (with 1.8 million subtests) at the time of this writing.

See Bocoup, WPT: An Overview and History.

Primary law

6 18 U.S.C. § 1833(b) (Defend Trade Secrets Act)

An employer must include the DTSA whistleblower-immunity notice in agreements governing trade secrets, or it forfeits exemplary damages and attorney fees against an employee not given notice.

An employer shall provide notice of the immunity set forth in this subsection in any contract or agreement with an employee that governs the use of a trade secret or other confidential information.

See 18 U.S.C. § 1833(b).

Primary law

7 Wyo. Stat. § 1-23-108(b)PDF

In Wyoming, a non-compete that restricts a physician's right to practice medicine is void.

Any covenant not to compete provision of an employment, partnership or corporate agreement between physicians that restricts the right of a physician to practice medicine as defined in W.S. 33-26-102(a)(xi), upon termination of the physician's employment, partnership or corporate affiliation, is void

See Wyo. Stat. § 1-23-108(b) (2025) (enacted by 2025 Wyo. Sess. Laws, SF0107, Enrolled Act No. 87).

Scholarship

8 Computable ContractsPDF

Computable contract terms represent contractual obligations as computer-processable instructions.

The basic idea behind a computable contract term is to create a series of actionable, computer-processable instructions that approximate what it is that the parties are intending to do in their contractual arrangement.

See Harry Surden, Computable Contracts, 46 U.C. Davis L. Rev. 629, 658 (2012).

Vendor documentation

9 Accord Project Cicero

Cicero makes natural-language contract and clause templates machine-readable and executable.

Cicero allows you to define natural language contract and clause templates that can be executed by a computer.

See Accord Project, Cicero (project README).

Regulation

10 17 CFR § 240.21F-17(a) (SEC Rule 21F-17)

No person may impede an individual from communicating with SEC staff about a possible securities-law violation, including by enforcing a confidentiality agreement.

No person may take any action to impede an individual from communicating directly with the Commission staff about a possible securities law violation

See 17 C.F.R. § 240.21F-17(a).