How should legal teams budget for AI token usage?
It depends on workflow design more than lawyer headcount. Budget around document-heavy review, long-context use, repeated agent steps, and governance overhead, then treat token counts as one cost input rather than the whole product.
The safest reading of the source set is that 40 million tokens per month is plausible, but not average. It looks more like an active-team or power-user benchmark than a median for legal departments generally. What drives the bill is not team size by itself. It is workflow architecture: long-context review, document-heavy retrieval, repeated agentic steps, and the ratio between cheap input and expensive output. That is why API usage can remain inexpensive for light users and jump quickly for teams running contracts, diligence, investigations, or large-record analysis through production systems. Seat pricing, in turn, is usually buying predictability, workflow packaging, proprietary content, and governance, not just tokens.
There is no token statute in the source set. No court or regulator surfaced here telling legal teams how many tokens they may buy, or what a normal month looks like. The law attaches one step later, when token usage represents confidential legal work moving through third-party systems. At that point the familiar duties reappear: competence, confidentiality, supervision, client communication, and billing discipline.
That matters economically. More tokens usually mean more prompts, more model outputs, more logs, and more opportunities for privileged or client-sensitive text to leave the lawyer's desktop and enter a vendor environment. The source set's ethics materials do not convert that into a numeric ceiling. They do make clear that scaling usage does not weaken the underlying duties. It widens the surface area on which those duties have to hold.
Sources for this answer
Vendor documentation
A.1 OpenAI API PricingOpenAI imposes additional costs for regional data processing and specific storage services, including a 10% uplift for data residency endpoints and daily fees for file storage.
Regional processing (data residency) endpoints are charged a 10% uplift for
See OpenAI API Pricing.
Vendor documentation
A.2 Claude API Docs, PricingAnthropic's pricing structure for the Claude API incorporates various cost-optimization features, including prompt caching, batch processing discounts, and specific multipliers for data residency and managed agent sessions.
Prompt caching reduces costs and latency by reusing previously processed portions of your prompt across API calls.
See Claude API Docs, Pricing.
Commentary
A.3 Thomson Reuters, Legal AI Benchmarking: Evaluating Long Context Performance for LLMsThomson Reuters' internal benchmarking indicates that long-context LLMs often outperform RAG for complex legal document analysis, though effective context windows are frequently smaller than advertised, necessitating rigorous, skill-specific testing protocols.
In our internal testing (more on this later), we found that inputting the full document text into the LLM’s input window (and chunking extremely long documents when necessary) generally outperformed RAG for most of our document-based skills.
See Thomson Reuters, Legal AI Benchmarking: Evaluating Long Context Performance for LLMs.
Commentary
A.4 Metronome, Harvey Pricing IndexHarvey utilizes a high-touch, opaque enterprise pricing model that positions its AI platform as a labor cost substitute for large law firms and legal departments.
Harvey operates a fully opaque enterprise pricing model with per-seat billing confirmed through official legal documentation.
See Metronome, Harvey Pricing Index.
Commentary
A.5 Elephas, Legal AI Tools Pricing Comparison 2026: What Every Tool Actually CostsLegal AI pricing varies significantly across the market, with enterprise-grade tools often requiring opaque sales-qualified pricing and per-seat licensing that creates substantial cost disparities compared to flat-rate, local-processing alternatives.
The reality is that legal AI pricing in 2026 ranges from $9.99/month to $1,000+/month—a 100x difference.
See Elephas, Legal AI Tools Pricing Comparison 2026: What Every Tool Actually Costs.
Commentary
A.6 Thomson Reuters Legal Solutions, ABA Ethics Rules Related to Generative AILawyers are ethically obligated to supervise generative AI tools used in legal practice and maintain technological competence regarding the benefits and risks of such technology under the ABA Model Rules of Professional Conduct.
The effect of this change was to expand the ethical obligation to non-human assistance, including the work generated by technology such as legal AI that’s used in the provision of legal services.
See Thomson Reuters Legal Solutions, ABA Ethics Rules Related to Generative AI.
Commentary
A.7 New York City Bar Association, Current Ethics Opinions and Reports Related to Generative Artificial Intelligence (May 2025)PDFThe New York City Bar Association's report on generative AI ethics concludes that existing Rules of Professional Conduct are generally sufficient to govern the use of AI in legal practice, provided lawyers adhere to core ethical duties such as competence, confidentiality, and supervision.
Lawyers’ use of generative artificial intelligence in connection with the practice of law triggers numerous ethical duties that may arise depending on the nature of the use.
See New York City Bar Association, Current Ethics Opinions and Reports Related to Generative Artificial Intelligence (May 2025).
Is 40 million tokens per month normal for a legal team?
Probably not as an average, but it is a realistic planning case for an active legal team. Treat it as a benchmark for heavy workflows, not as a default for every department.
The law-firm commentary is more useful on workflow than on averages. Freshfields describes litigation and investigations work as a large-record synthesis problem. In its framing, generative AI can “generate a detailed chronology from a lengthy complaint or thousands of pages of client documents”. That is almost a definition of high-input token burn. It is not hard to see how a single investigation week could consume more tokens than a month of casual chat use.
Buchalter and Perkins Coie describe the transactional side differently, but the economic implication is similar. Buchalter says “a single AI-enabled product or business practice can implicate intellectual property ownership, data rights, privacy compliance” and several other legal domains at once. Perkins Coie's point is that generative AI changes synthesis inside negotiations and deal work, not just the drafting margin. The result is steadier, repeated consumption: not one giant burst, but many medium-sized document reviews, summaries, clause comparisons, and iterative redlines.
On pricing, the law-firm material is thinner than the vendor and provider material, but the market commentary is consistent. Thomson Reuters' 2026 legal-market work says nearly 90% of legal spend still moves through hourly billing, and client-side commentary says buyers are resistant to explicit AI surcharges. That means token economics are not yet flowing neatly through the legal supply chain. In-house teams may see their own AI costs clearly while outside-counsel savings appear later, or in different form.
The first consequence is that budgeting by lawyer is too crude. The right unit is workflow. A lawyer using AI as faster autocomplete will barely register. A lawyer using long-context tools to summarize a complaint, compare a 50-page contract to a playbook, extract issues from a diligence room, and then draft a memo from the result is running a small compute pipeline whether anyone calls it that or not.
The second consequence is that adoption curves translate into spend curves faster than many legal budgets assume. LawSites reported in March 2026 that legal-professional AI adoption had risen to 69%, up from 31% a year earlier. A separate 2026 study reported 81% adoption in-house against 55% in private practice. Once usage becomes normal, the question stops being whether the team uses AI and becomes what kind of AI use it has normalized. That is where 40 million tokens per month becomes a useful anchor. Not because every small team will land there, but because an active team could get there without doing anything exotic.
The source set does not justify treating 40 million as a median. It does justify treating it as a real planning case. Harvey's own materials describe a minority of power users producing outsized value and time savings. Freshfields, Buchalter, and Mayer Brown describe the kinds of work that create those users: investigations, document-heavy synthesis, iterative transaction support, and large matter files. So 40 million is probably best understood as an active-team benchmark that can be too high for light users and too low for burst months in litigation, diligence, or investigation-heavy periods.
The source set supports it as a plausible planning case for a small but active team. It does not support treating it as a cross-market average. With only one canonical report on this slug, and with some of that report's distributional claims resting on thin support, we think the honest framing is narrower: 40 million is real enough to budget around, but too high to call normal without more evidence.
Sources for this answer
Law-firm commentary
B.1 Freshfields commentarySupports the cited proposition. (Freshfields commentary)
generate a detailed chronology from a lengthy complaint or thousands of pages of client documents
See Freshfields, Three Key Ways that Generative AI is Changing Litigation and Investigations Work Today.
Commentary
B.2 Buchalter commentarySupports the cited proposition. (Buchalter commentary)
a single AI-enabled product or business practice can implicate intellectual property ownership, data rights, privacy compliance
See Buchalter, Artificial Intelligence Issues for Transactional Law Practice.
Law-firm commentary
B.3 Perkins Coie commentaryWhile generative AI can enhance efficiency in legal practice, practitioners must exercise caution, verify outputs for accuracy, and consider ethical implications regarding client confidentiality and the limitations of the technology.
one clearly needs to be an experienced practitioner to identify when the model is going off course or is providing a "close, but not quite correct" or fictitious response.
See Perkins Coie, The Surprising Impact of Generative AI on Transactional Lawyer Practices.
Commentary
B.4 Thomson Reuters, 2026 Report on the State of the US Legal MarketPDFThe 2026 US legal market is experiencing a period of high demand and profitability driven by instability, which historically precedes significant industry downturns.
The legal industry has a peculiar historical habit of surging just before it stumbles.
See Thomson Reuters, 2026 Report on the State of the US Legal Market.
Commentary
B.5 Thomson Reuters, Couples Counseling at Legalweek 2026: Firms and Clients Confront the AI Value DivideLaw firms face a growing conflict with clients regarding the quantification of AI-driven cost savings and the resulting impact on traditional billing models.
Client expectations around AI have shifted from curiosity to accountability — Law firms are now being asked not just whether they use GenAI, but to prove how it delivers measurable cost savings on specific matters
See Thomson Reuters, Couples Counseling at Legalweek 2026: Firms and Clients Confront the AI Value Divide.
Commentary
B.6 Thomson Reuters, Legal AI Benchmarking: Evaluating Long Context Performance for LLMsThomson Reuters' internal benchmarking indicates that long-context LLMs often outperform RAG for complex legal document analysis, though effective context windows are frequently smaller than advertised, necessitating rigorous, skill-specific testing protocols.
In our internal testing (more on this later), we found that inputting the full document text into the LLM’s input window (and chunking extremely long documents when necessary) generally outperformed RAG for most of our document-based skills.
See Thomson Reuters, Legal AI Benchmarking: Evaluating Long Context Performance for LLMs.
Commentary
B.7 LawSites, AI Adoption Among Legal Professionals Has More Than Doubled in a YearWhile individual adoption of generative AI among legal professionals has increased significantly, a majority of law firms still lack formal AI policies or training programs.
Nearly seven in 10 legal professionals now use generative AI tools for work — a figure that more than doubled in a single year
See LawSites, AI Adoption Among Legal Professionals Has More Than Doubled in a Year.
Commentary
B.8 PR Newswire, New Study Finds AI is Addressing The Human Issues in LegalArtificial intelligence is increasingly utilized in the legal profession to enhance operational efficiency, mitigate burnout, and facilitate strategic career development.
96% of those using AI said it has helped achieve business objectives more efficiently, with a majority of respondents saying it has improved the speed (72%) and quality (60%) of their work.
See PR Newswire, New Study Finds AI is Addressing The Human Issues in Legal.
Law-firm commentary
B.9 Mayer Brown commentaryRecent judicial decisions demonstrate that the application of attorney-client privilege and work product doctrine to AI-generated content remains unsettled, necessitating proactive contractual protections for deal practitioners.
The development of jurisprudence concerning AI’s implications for the attorney-client privilege and attorney work product doctrines lags behind the dramatic rise of AI’s use in connection with deal-making.
See Mayer Brown, M&A Discovery in the AI Era: Generative AI Communications and Outputs May Become Litigation Ammunition.
Why can API tokens cost less than enterprise legal AI seats?
Usually because enterprise seats bundle workflow, security, support, content, permissions, and predictable procurement. Raw API token math can look cheap while the operational build still carries real legal and governance cost.
The third consequence is that raw compute is often the wrong comparison point. Provider pricing in the source set shows the shape clearly. OpenAI prices cached input far below fresh input, and both far below output. Anthropic also charges materially more for output than input. On the report's simplified blend, 40 million tokens works out to roughly $160 of monthly raw compute. The exact figure will move with model choice, output ratio, and caching, but even a materially higher real-world mix leaves a wide gap between inference cost and a handful of legal-AI enterprise seats. Enterprise seat pricing is usually not a token break-even story. It is a bundle story: verified workflow, legal content, permissions, auditability, customer support, and data-handling commitments.
That is why the build-versus-buy argument gets distorted when it is framed as API cost versus seat cost. The cheaper side in raw tokens can still be the more expensive side in governance if the in-house team has to assemble retrieval, prompt controls, review checkpoints, logging, vendor management, and support on its own. The opposite is also true. Buying seats for a lightly engaged team can convert a cheap variable cost into a fixed annual spend that never gets used. In both cases the economic mistake is the same: treating tokens as the whole product.
Pure compute math pushes the threshold much farther out than vendor pricing suggests. But seat products are not selling raw inference; they are selling workflow, content, security, and predictability. The unsettled piece is how much of that bundle an in-house team really needs once usage becomes routine rather than experimental.
Sources for this answer
Vendor documentation
C.1 OpenAI API PricingOpenAI imposes additional costs for regional data processing and specific storage services, including a 10% uplift for data residency endpoints and daily fees for file storage.
Regional processing (data residency) endpoints are charged a 10% uplift for
See OpenAI API Pricing.
Vendor documentation
C.2 Claude API Docs, PricingAnthropic's pricing structure for the Claude API incorporates various cost-optimization features, including prompt caching, batch processing discounts, and specific multipliers for data residency and managed agent sessions.
Prompt caching reduces costs and latency by reusing previously processed portions of your prompt across API calls.
See Claude API Docs, Pricing.
Commentary
C.3 Metronome, Harvey Pricing IndexHarvey utilizes a high-touch, opaque enterprise pricing model that positions its AI platform as a labor cost substitute for large law firms and legal departments.
Harvey operates a fully opaque enterprise pricing model with per-seat billing confirmed through official legal documentation.
See Metronome, Harvey Pricing Index.
Commentary
C.4 Elephas, Legal AI Tools Pricing Comparison 2026: What Every Tool Actually CostsLegal AI pricing varies significantly across the market, with enterprise-grade tools often requiring opaque sales-qualified pricing and per-seat licensing that creates substantial cost disparities compared to flat-rate, local-processing alternatives.
The reality is that legal AI pricing in 2026 ranges from $9.99/month to $1,000+/month—a 100x difference.
See Elephas, Legal AI Tools Pricing Comparison 2026: What Every Tool Actually Costs.
Commentary
C.5 Ironclad, The Reality of AI Agents in Legal Operations TodayLegal teams should implement AI agents in contract workflows by utilizing tiered human oversight, immutable audit trails, and strict data governance controls to mitigate risks and ensure accountability.
Implement tiered human oversight that allows agents to handle routine contract tasks autonomously while requiring attorney review for flagged deviations or high-risk agreements.
See Ironclad, The Reality of AI Agents in Legal Operations Today.
Do long-context AI tools save money or waste tokens in legal work?
Unclear, and it likely varies by task. Full-document context may improve some legal analysis, while oversized prompts can waste spend or degrade reasoning.
The fourth consequence is that architecture silently changes the bill. Thomson Reuters' benchmarking work suggests that full-document context can outperform fragmented retrieval for some legal tasks. Other technical material in the source set argues that overstuffed prompts degrade reasoning and waste spend. The practical point is not that one side has won. It is that token budgets are downstream from system design. Two legal teams can buy the same model and end the month with very different bills depending on how much text they retrieve, how often they re-run tasks, and how many invisible steps the system inserts in the background.
Thomson Reuters leans toward more context for document-centric legal tasks. Technical materials cited in the source set argue that longer prompts can degrade reasoning and inflate cost. Perhaps both are true. For some legal workflows, more context could improve accuracy up to a point and then turn into noise. The break point is not settled in this source set.
Sources for this answer
Commentary
D.1 Thomson Reuters, Legal AI Benchmarking: Evaluating Long Context Performance for LLMsThomson Reuters' internal benchmarking indicates that long-context LLMs often outperform RAG for complex legal document analysis, though effective context windows are frequently smaller than advertised, necessitating rigorous, skill-specific testing protocols.
In our internal testing (more on this later), we found that inputting the full document text into the LLM’s input window (and chunking extremely long documents when necessary) generally outperformed RAG for most of our document-based skills.
See Thomson Reuters, Legal AI Benchmarking: Evaluating Long Context Performance for LLMs.
Commentary
D.2 Law.co, Token Budgeting in Deep Legal Agent ChainsImplementing a token budget for AI agent chains is essential for managing operational costs, mitigating security risks, and ensuring compliance with professional responsibility standards.
The deeper the chain, the more tokens you burn and the more money—plus risk—you take on. Much like a litigation team needs a budget for billable hours, a well-designed AI workflow needs a token budget.
See Law.co, Token Budgeting in Deep Legal Agent Chains.
Commentary
D.3 arXiv, Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language ModelsLarge Language Models exhibit a significant and consistent degradation in reasoning performance as input length increases, even when the input remains well within the model's technical maximum capacity.
Our findings show a notable degradation in LLMs’ reasoning performance at much shorter input lengths than their technical maximum.
See arXiv, Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models.
How much does privilege risk change legal AI architecture costs?
It depends, but enough to affect vendor and architecture choices. Consumer tools, weak retention terms, hidden agent steps, and poor auditability can make a cheap deployment more expensive once privilege and discovery risk are priced in.
The source set also points to early 2026 privilege and work-product disputes, but only through secondary summaries. Dentons reports that one court treated enterprise AI systems as tools, not persons and rejected discovery into a party's internal AI-assisted analysis as a fishing expedition. The same Dentons summary says a consumer-grade platform with weaker confidentiality commitments was treated less favorably. Because those cases do not appear in this slug through direct opinions or a second corroborating research run, we think they are better read as directional signals than stable doctrine. But even as signals, they point to the same economic conclusion: cheaper or more convenient architecture can carry a different privilege story than enterprise-grade deployment.
The discovery consequence shows up most clearly in Mayer Brown. Its warning is simple: prompts and outputs generated during deals can become later litigation material. Baker Donelson and Skadden extend the same concern into agentic systems and governance. Once legal teams move from chat interfaces to tools that orchestrate multiple steps, pull data across systems, and act with partial autonomy, the spend question stops being just about inference cost. It becomes a question about liability allocation, auditability, and whether the company or the vendor owns the bad outcome when the chain misfires.
The fifth consequence is that official ledgers can be wrong in both directions. They can understate usage because lawyers still use consumer tools outside procurement. They can also understate compute because agentic products perform retries, checks, summaries, and logging that the end user never experiences as separate actions. So the quietest token cost may be the least visible one: shadow use on the low end, orchestration overhead on the high end.
The early 2026 case summaries in the source set suggest that enterprise deployment and consumer deployment may not be treated the same way. If that holds, then the cheapest apparent path could be more expensive once confidentiality, retention, and discovery consequences are priced in. We think that remains unsettled enough to hedge, but not unsettled enough to ignore.
Sources for this answer
Law-firm commentary
E.1 Dentons commentaryStrictly necessary cookies are essential for website functionality and their use is justified by the legitimate interest in maintaining technical operations and providing requested services.
These are cookies that are required for the operation of our website and use of its features, and therefore cannot be switched off in our systems.
See Dentons, Landmark AI Rulings Impacting All.
Law-firm commentary
E.2 Mayer Brown commentaryRecent judicial decisions demonstrate that the application of attorney-client privilege and work product doctrine to AI-generated content remains unsettled, necessitating proactive contractual protections for deal practitioners.
The development of jurisprudence concerning AI’s implications for the attorney-client privilege and attorney work product doctrines lags behind the dramatic rise of AI’s use in connection with deal-making.
See Mayer Brown, M&A Discovery in the AI Era: Generative AI Communications and Outputs May Become Litigation Ammunition.
Commentary
E.3 Baker Donelson commentaryOrganizations must implement robust AI governance, compliance, and risk-mitigation strategies to address evolving legal, regulatory, and ethical challenges associated with generative AI and autonomous systems.
Organizations should audit their use of generative AI tools to distinguish between input risks from data scraping and output risks from generating infringing content.
See Baker Donelson, 2026 AI Legal Forecast: From Innovation to Compliance.
Law-firm commentary
E.4 Skadden commentaryThe absence of comprehensive federal AI legislation does not exempt companies from existing regulatory obligations, as regulators are actively applying current laws to AI systems and holding companies accountable for the outcomes of the tools they deploy.
Using artificial intelligence to perform a task doesn’t exempt you from the regulations that already govern that task.
See Skadden, No Loopholes for AI: Putting Legal Guardrails on Your Company's Use of AI.
Commentary
E.5 Ironclad, The Reality of AI Agents in Legal Operations TodayLegal teams should implement AI agents in contract workflows by utilizing tiered human oversight, immutable audit trails, and strict data governance controls to mitigate risks and ensure accountability.
Implement tiered human oversight that allows agents to handle routine contract tasks autonomously while requiring attorney review for flagged deviations or high-risk agreements.
See Ironclad, The Reality of AI Agents in Legal Operations Today.
Commentary
E.6 Zylo, AI Pricing: What's the True AI Cost for Businesses in 2026?The proliferation of consumption-based and bundled AI pricing models in SaaS applications creates significant budget volatility and unpredictability for organizations.
Consumption-based pricing and AI add-ons make budgets harder to predict and control.
See Zylo, AI Pricing: What's the True AI Cost for Businesses in 2026?.
Commentary
E.7 North Carolina Bar Association, Beyond the Ban: Why Your Law Firm Needs a Realistic AI Policy in 2026Law firms should adopt comprehensive AI governance policies that establish ethical guardrails and verification protocols rather than implementing ineffective blanket bans.
A robust AI use policy is vital for balancing innovation with responsibility.
See North Carolina Bar Association, Beyond the Ban: Why Your Law Firm Needs a Realistic AI Policy in 2026.