On this pageAI token budgeting
Topic Practice Note

Token economics for legal teams

A reader who sees that legal AI spend is driven more by workflow design, pricing asymmetry, and governance overhead than by seat count may choose our managed service to model usage, choose the right vendor structure, and run the program without building that capability in-house.

More details about this document
Editor
, OpenAgreements editor
License
CC BY 4.0
Authorities relied on

How should legal teams budget for AI token usage?

It depends on workflow design more than lawyer headcount. Budget around document-heavy review, long-context use, repeated agent steps, and governance overhead, then treat token counts as one cost input rather than the whole product.

The safest reading of the source set is that 40 million tokens per month is plausible, but not average. It looks more like an active-team or power-user benchmark than a median for legal departments generally. What drives the bill is not team size by itself. It is workflow architecture: long-context review, document-heavy retrieval, repeated agentic steps, and the ratio between cheap input and expensive output. That is why API usage can remain inexpensive for light users and jump quickly for teams running contracts, diligence, investigations, or large-record analysis through production systems. Seat pricing, in turn, is usually buying predictability, workflow packaging, proprietary content, and governance, not just tokens.

There is no token statute in the source set. No court or regulator surfaced here telling legal teams how many tokens they may buy, or what a normal month looks like. The law attaches one step later, when token usage represents confidential legal work moving through third-party systems. At that point the familiar duties reappear: competence, confidentiality, supervision, client communication, and billing discipline.

That matters economically. More tokens usually mean more prompts, more model outputs, more logs, and more opportunities for privileged or client-sensitive text to leave the lawyer's desktop and enter a vendor environment. The source set's ethics materials do not convert that into a numeric ceiling. They do make clear that scaling usage does not weaken the underlying duties. It widens the surface area on which those duties have to hold.

Sources for this answer

Vendor documentation

A.1 OpenAI API Pricing

OpenAI imposes additional costs for regional data processing and specific storage services, including a 10% uplift for data residency endpoints and daily fees for file storage.

Regional processing (data residency) endpoints are charged a 10% uplift for

See OpenAI API Pricing.

Vendor documentation

A.2 Claude API Docs, Pricing

Anthropic's pricing structure for the Claude API incorporates various cost-optimization features, including prompt caching, batch processing discounts, and specific multipliers for data residency and managed agent sessions.

Prompt caching reduces costs and latency by reusing previously processed portions of your prompt across API calls.

See Claude API Docs, Pricing.

Commentary

A.4 Metronome, Harvey Pricing Index

Harvey utilizes a high-touch, opaque enterprise pricing model that positions its AI platform as a labor cost substitute for large law firms and legal departments.

Harvey operates a fully opaque enterprise pricing model with per-seat billing confirmed through official legal documentation.

See Metronome, Harvey Pricing Index.

Commentary

A.7 New York City Bar Association, Current Ethics Opinions and Reports Related to Generative Artificial Intelligence (May 2025)PDF

The New York City Bar Association's report on generative AI ethics concludes that existing Rules of Professional Conduct are generally sufficient to govern the use of AI in legal practice, provided lawyers adhere to core ethical duties such as competence, confidentiality, and supervision.

Lawyers’ use of generative artificial intelligence in connection with the practice of law triggers numerous ethical duties that may arise depending on the nature of the use.

See New York City Bar Association, Current Ethics Opinions and Reports Related to Generative Artificial Intelligence (May 2025).

Is 40 million tokens per month normal for a legal team?

Probably not as an average, but it is a realistic planning case for an active legal team. Treat it as a benchmark for heavy workflows, not as a default for every department.

The law-firm commentary is more useful on workflow than on averages. Freshfields describes litigation and investigations work as a large-record synthesis problem. In its framing, generative AI can generate a detailed chronology from a lengthy complaint or thousands of pages of client documents. That is almost a definition of high-input token burn. It is not hard to see how a single investigation week could consume more tokens than a month of casual chat use.

Buchalter and Perkins Coie describe the transactional side differently, but the economic implication is similar. Buchalter says a single AI-enabled product or business practice can implicate intellectual property ownership, data rights, privacy compliance and several other legal domains at once. Perkins Coie's point is that generative AI changes synthesis inside negotiations and deal work, not just the drafting margin. The result is steadier, repeated consumption: not one giant burst, but many medium-sized document reviews, summaries, clause comparisons, and iterative redlines.

On pricing, the law-firm material is thinner than the vendor and provider material, but the market commentary is consistent. Thomson Reuters' 2026 legal-market work says nearly 90% of legal spend still moves through hourly billing, and client-side commentary says buyers are resistant to explicit AI surcharges. That means token economics are not yet flowing neatly through the legal supply chain. In-house teams may see their own AI costs clearly while outside-counsel savings appear later, or in different form.

The first consequence is that budgeting by lawyer is too crude. The right unit is workflow. A lawyer using AI as faster autocomplete will barely register. A lawyer using long-context tools to summarize a complaint, compare a 50-page contract to a playbook, extract issues from a diligence room, and then draft a memo from the result is running a small compute pipeline whether anyone calls it that or not.

The second consequence is that adoption curves translate into spend curves faster than many legal budgets assume. LawSites reported in March 2026 that legal-professional AI adoption had risen to 69%, up from 31% a year earlier. A separate 2026 study reported 81% adoption in-house against 55% in private practice. Once usage becomes normal, the question stops being whether the team uses AI and becomes what kind of AI use it has normalized. That is where 40 million tokens per month becomes a useful anchor. Not because every small team will land there, but because an active team could get there without doing anything exotic.

The source set does not justify treating 40 million as a median. It does justify treating it as a real planning case. Harvey's own materials describe a minority of power users producing outsized value and time savings. Freshfields, Buchalter, and Mayer Brown describe the kinds of work that create those users: investigations, document-heavy synthesis, iterative transaction support, and large matter files. So 40 million is probably best understood as an active-team benchmark that can be too high for light users and too low for burst months in litigation, diligence, or investigation-heavy periods.

The source set supports it as a plausible planning case for a small but active team. It does not support treating it as a cross-market average. With only one canonical report on this slug, and with some of that report's distributional claims resting on thin support, we think the honest framing is narrower: 40 million is real enough to budget around, but too high to call normal without more evidence.

Sources for this answer

Law-firm commentary

B.1 Freshfields commentary

Supports the cited proposition. (Freshfields commentary)

generate a detailed chronology from a lengthy complaint or thousands of pages of client documents

See Freshfields, Three Key Ways that Generative AI is Changing Litigation and Investigations Work Today.

Commentary

B.2 Buchalter commentary

Supports the cited proposition. (Buchalter commentary)

a single AI-enabled product or business practice can implicate intellectual property ownership, data rights, privacy compliance

See Buchalter, Artificial Intelligence Issues for Transactional Law Practice.

Law-firm commentary

B.3 Perkins Coie commentary

While generative AI can enhance efficiency in legal practice, practitioners must exercise caution, verify outputs for accuracy, and consider ethical implications regarding client confidentiality and the limitations of the technology.

one clearly needs to be an experienced practitioner to identify when the model is going off course or is providing a "close, but not quite correct" or fictitious response.

See Perkins Coie, The Surprising Impact of Generative AI on Transactional Lawyer Practices.

Commentary

B.4 Thomson Reuters, 2026 Report on the State of the US Legal MarketPDF

The 2026 US legal market is experiencing a period of high demand and profitability driven by instability, which historically precedes significant industry downturns.

The legal industry has a peculiar historical habit of surging just before it stumbles.

See Thomson Reuters, 2026 Report on the State of the US Legal Market.

Commentary

B.5 Thomson Reuters, Couples Counseling at Legalweek 2026: Firms and Clients Confront the AI Value Divide

Law firms face a growing conflict with clients regarding the quantification of AI-driven cost savings and the resulting impact on traditional billing models.

Client expectations around AI have shifted from curiosity to accountability — Law firms are now being asked not just whether they use GenAI, but to prove how it delivers measurable cost savings on specific matters

See Thomson Reuters, Couples Counseling at Legalweek 2026: Firms and Clients Confront the AI Value Divide.

Commentary

B.8 PR Newswire, New Study Finds AI is Addressing The Human Issues in Legal

Artificial intelligence is increasingly utilized in the legal profession to enhance operational efficiency, mitigate burnout, and facilitate strategic career development.

96% of those using AI said it has helped achieve business objectives more efficiently, with a majority of respondents saying it has improved the speed (72%) and quality (60%) of their work.

See PR Newswire, New Study Finds AI is Addressing The Human Issues in Legal.

Law-firm commentary

B.9 Mayer Brown commentary

Recent judicial decisions demonstrate that the application of attorney-client privilege and work product doctrine to AI-generated content remains unsettled, necessitating proactive contractual protections for deal practitioners.

The development of jurisprudence concerning AI’s implications for the attorney-client privilege and attorney work product doctrines lags behind the dramatic rise of AI’s use in connection with deal-making.

See Mayer Brown, M&A Discovery in the AI Era: Generative AI Communications and Outputs May Become Litigation Ammunition.

Why can API tokens cost less than enterprise legal AI seats?

Usually because enterprise seats bundle workflow, security, support, content, permissions, and predictable procurement. Raw API token math can look cheap while the operational build still carries real legal and governance cost.

The third consequence is that raw compute is often the wrong comparison point. Provider pricing in the source set shows the shape clearly. OpenAI prices cached input far below fresh input, and both far below output. Anthropic also charges materially more for output than input. On the report's simplified blend, 40 million tokens works out to roughly $160 of monthly raw compute. The exact figure will move with model choice, output ratio, and caching, but even a materially higher real-world mix leaves a wide gap between inference cost and a handful of legal-AI enterprise seats. Enterprise seat pricing is usually not a token break-even story. It is a bundle story: verified workflow, legal content, permissions, auditability, customer support, and data-handling commitments.

That is why the build-versus-buy argument gets distorted when it is framed as API cost versus seat cost. The cheaper side in raw tokens can still be the more expensive side in governance if the in-house team has to assemble retrieval, prompt controls, review checkpoints, logging, vendor management, and support on its own. The opposite is also true. Buying seats for a lightly engaged team can convert a cheap variable cost into a fixed annual spend that never gets used. In both cases the economic mistake is the same: treating tokens as the whole product.

Pure compute math pushes the threshold much farther out than vendor pricing suggests. But seat products are not selling raw inference; they are selling workflow, content, security, and predictability. The unsettled piece is how much of that bundle an in-house team really needs once usage becomes routine rather than experimental.

Sources for this answer

Vendor documentation

C.1 OpenAI API Pricing

OpenAI imposes additional costs for regional data processing and specific storage services, including a 10% uplift for data residency endpoints and daily fees for file storage.

Regional processing (data residency) endpoints are charged a 10% uplift for

See OpenAI API Pricing.

Vendor documentation

C.2 Claude API Docs, Pricing

Anthropic's pricing structure for the Claude API incorporates various cost-optimization features, including prompt caching, batch processing discounts, and specific multipliers for data residency and managed agent sessions.

Prompt caching reduces costs and latency by reusing previously processed portions of your prompt across API calls.

See Claude API Docs, Pricing.

Commentary

C.3 Metronome, Harvey Pricing Index

Harvey utilizes a high-touch, opaque enterprise pricing model that positions its AI platform as a labor cost substitute for large law firms and legal departments.

Harvey operates a fully opaque enterprise pricing model with per-seat billing confirmed through official legal documentation.

See Metronome, Harvey Pricing Index.

Do long-context AI tools save money or waste tokens in legal work?

Unclear, and it likely varies by task. Full-document context may improve some legal analysis, while oversized prompts can waste spend or degrade reasoning.

The fourth consequence is that architecture silently changes the bill. Thomson Reuters' benchmarking work suggests that full-document context can outperform fragmented retrieval for some legal tasks. Other technical material in the source set argues that overstuffed prompts degrade reasoning and waste spend. The practical point is not that one side has won. It is that token budgets are downstream from system design. Two legal teams can buy the same model and end the month with very different bills depending on how much text they retrieve, how often they re-run tasks, and how many invisible steps the system inserts in the background.

Thomson Reuters leans toward more context for document-centric legal tasks. Technical materials cited in the source set argue that longer prompts can degrade reasoning and inflate cost. Perhaps both are true. For some legal workflows, more context could improve accuracy up to a point and then turn into noise. The break point is not settled in this source set.

Sources for this answer

Commentary

D.3 arXiv, Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Large Language Models exhibit a significant and consistent degradation in reasoning performance as input length increases, even when the input remains well within the model's technical maximum capacity.

Our findings show a notable degradation in LLMs’ reasoning performance at much shorter input lengths than their technical maximum.

See arXiv, Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models.

How much does privilege risk change legal AI architecture costs?

It depends, but enough to affect vendor and architecture choices. Consumer tools, weak retention terms, hidden agent steps, and poor auditability can make a cheap deployment more expensive once privilege and discovery risk are priced in.

The source set also points to early 2026 privilege and work-product disputes, but only through secondary summaries. Dentons reports that one court treated enterprise AI systems as tools, not persons and rejected discovery into a party's internal AI-assisted analysis as a fishing expedition. The same Dentons summary says a consumer-grade platform with weaker confidentiality commitments was treated less favorably. Because those cases do not appear in this slug through direct opinions or a second corroborating research run, we think they are better read as directional signals than stable doctrine. But even as signals, they point to the same economic conclusion: cheaper or more convenient architecture can carry a different privilege story than enterprise-grade deployment.

The discovery consequence shows up most clearly in Mayer Brown. Its warning is simple: prompts and outputs generated during deals can become later litigation material. Baker Donelson and Skadden extend the same concern into agentic systems and governance. Once legal teams move from chat interfaces to tools that orchestrate multiple steps, pull data across systems, and act with partial autonomy, the spend question stops being just about inference cost. It becomes a question about liability allocation, auditability, and whether the company or the vendor owns the bad outcome when the chain misfires.

The fifth consequence is that official ledgers can be wrong in both directions. They can understate usage because lawyers still use consumer tools outside procurement. They can also understate compute because agentic products perform retries, checks, summaries, and logging that the end user never experiences as separate actions. So the quietest token cost may be the least visible one: shadow use on the low end, orchestration overhead on the high end.

The early 2026 case summaries in the source set suggest that enterprise deployment and consumer deployment may not be treated the same way. If that holds, then the cheapest apparent path could be more expensive once confidentiality, retention, and discovery consequences are priced in. We think that remains unsettled enough to hedge, but not unsettled enough to ignore.

Sources for this answer

Law-firm commentary

E.1 Dentons commentary

Strictly necessary cookies are essential for website functionality and their use is justified by the legitimate interest in maintaining technical operations and providing requested services.

These are cookies that are required for the operation of our website and use of its features, and therefore cannot be switched off in our systems.

See Dentons, Landmark AI Rulings Impacting All.

Law-firm commentary

E.2 Mayer Brown commentary

Recent judicial decisions demonstrate that the application of attorney-client privilege and work product doctrine to AI-generated content remains unsettled, necessitating proactive contractual protections for deal practitioners.

The development of jurisprudence concerning AI’s implications for the attorney-client privilege and attorney work product doctrines lags behind the dramatic rise of AI’s use in connection with deal-making.

See Mayer Brown, M&A Discovery in the AI Era: Generative AI Communications and Outputs May Become Litigation Ammunition.

Commentary

E.6 Zylo, AI Pricing: What's the True AI Cost for Businesses in 2026?

The proliferation of consumption-based and bundled AI pricing models in SaaS applications creates significant budget volatility and unpredictability for organizations.

Consumption-based pricing and AI add-ons make budgets harder to predict and control.

See Zylo, AI Pricing: What's the True AI Cost for Businesses in 2026?.