AI vendor evaluation as an executable skills file

Which legal rules make AI vendor diligence necessary for regulated data?

Existing privacy, health-data, and cross-border rules make AI vendor diligence necessary when regulated data is involved. The legal work is usually processor contracting, business associate agreement scope, transfer-path review, and proof that the vendor can stay inside contractual limits.

No primary-law source in the research says a company must use a particular AI vendor questionnaire. The pressure comes from older rules that turn certain questions into legal necessities once the tool will process regulated or sensitive data. GDPR Article 28 is the clearest example. A controller may use only processors providing “sufficient guarantees” and must put specific processing terms in writing. That is why questions about subprocessors, deletion, audit support, and reuse of prompts or outputs are not procurement theater. They are how a buyer tests whether Article 28 can be satisfied at all.

HIPAA creates the same structure for health-data use cases. The regulation requires a business-associate agreement to “establish the permitted and required uses and disclosures” of protected health information, along with safeguards, subcontractor flow-downs, and return-or-destruction terms. Once PHI is in scope, Does the vendor sign a BAA? is only the opening question. The real questions are whether the offered product is inside the BAA scope, which subprocessors touch PHI, what logs exist, and whether customer-specific artifacts can be deleted at the end.

Cross-border rules now make some vendor questions more concrete than they looked a year ago. The DOJ bulk-data rules define a vendor agreement broadly enough to include arrangements for goods or services, “including cloud-computing services”. The same regime uses risk-based verification duties for covered vendor relationships. That does not make every AI tool a restricted transaction. It does mean vendor identity, ownership, location, and transfer path are legal facts in some data-heavy deployments, not just diligence preferences.

California privacy law points in the same direction. Cal. Civ. Code section 1798.100(d) and the CPPA regulations require service-provider and contractor terms that limit purpose, require equivalent privacy protection, and preserve oversight rights when personal data is being processed on the company's behalf. That is why a vendor answer like we do not train on customer data is not enough by itself. The legal issue is broader: limited purpose, no cross-customer use outside the contract, deletion or return, and enough visibility to tell whether the vendor is still inside those limits.

The notable absence is just as important. The source set did not surface a generally applicable AI-procurement statute telling buyers which diligence questions to ask in ordinary enterprise purchases. Perhaps that will come later for high-risk or public-sector use cases. Today the operative law is mostly processor law, sector law, and contract structure.

Sources for this answer

Primary law

A.1 Regulation (EU) 2016/679, art. 28

Supports the cited proposition. (Regulation (EU) 2016/679, art. 28)

sufficient guarantees

See Regulation (EU) 2016/679, art. 28.

Primary law

A.2 45 C.F.R. § 164.504(e)(2)

Supports the cited proposition. (45 C.F.R. § 164.504(e)(2))

establish the permitted and required uses and disclosures

See 45 C.F.R. § 164.504(e)(2).

Primary law

A.3 28 C.F.R. § 202.258

Supports the cited proposition. (28 C.F.R. § 202.258)

including cloud-computing services

See 28 C.F.R. § 202.258.

Primary law

A.4 28 C.F.R. § 202.1001 and subpart J

Under 28 C.F.R. Part 202, Subpart J, U.S. persons engaging in restricted data transactions with countries of concern or covered persons must adhere to specific due diligence and audit requirements.

§ 202.1001 Due diligence for restricted transactions.

See 28 C.F.R. § 202.1001 and subpart J.

Primary law

A.5 Cal. Civ. Code § 1798.100(d)

California law requires businesses to enter into specific contractual agreements with third parties, service providers, or contractors when selling, sharing, or disclosing consumer personal information to them.

A business that collects a consumer’s personal information and that sells that personal information to, or shares it with, a third party or that discloses it to a service provider or contractor for a business purpose shall enter into an agreement with the third party, service provider, or contractor

See Cal. Civ. Code § 1798.100(d).

Primary law

A.6 Cal. Code Regs. tit. 11, § 7051PDF

California Code of Regulations section 7051 establishes the specific contract requirements that businesses must include when engaging service providers and contractors under the California Consumer Privacy Act.

Contract Requirements for Service Providers and Contractors. ................................ 49

See Cal. Code Regs. tit. 11, § 7051.

What AI vendor questions matter most for training data and outputs?

The most important AI vendor questions are about training rights, customer-data reuse, output ownership, portability, bias controls, validation, logs, deletion, and audit rights. Those topics matter more than a broad responsible AI attestation because they map to the data path and commercial risk.

The law firms are surprisingly aligned on substance. Morgan Lewis says “training data rights remain critical” and treats privacy procedures, data governance, and security as central diligence topics. Its earlier sourcing notes push the same way on ownership of inputs, outputs, analytics, and portability at termination. That is a narrower and more useful frame than generic responsible AI diligence. It suggests that the commercial terms and the data path still do most of the legal work.

Cooley adds two useful points. First, the European Commission's model AI procurement clauses are not only for governments; Cooley treats them as instructive for private buyers too, especially around “AI ethics, liability, transparency and compliance”. Second, Cooley's state-law commentary treats diligence on “training data, cybersecurity and measures taken to prevent biased and discriminatory outputs” as part of vendor review. That pulls bias and validation into scope, but only when the use case justifies it.

Wilson Sonsini is perhaps the clearest on the off-the-shelf case. Its EU AI Act note says buyers should “carry out due diligence on the vendor” and, if the vendor says no personal data is used for training, should verify how the vendor ensures this in practice. Its playbook chapter then turns that into operational questions: source of training data, safeguards against bias, privacy controls, reuse of inputs and outputs, and treatment of personal data inside the model lifecycle.

Outside the firms, the public questionnaires mostly confirm the same core. ACC's AI vendor diligence document asks about upstream AI dependencies, rights in training data, customer-data use for training, portability of trained models, validation, transparency, and audit rights. HECVAT and AI-CAIQ push toward evidence, logs, deletion, and supply-chain controls. The real disagreement is about format, not content. NIST says its AI RMF playbook is “neither a checklist nor set of steps to be followed in its entirety”, while Google's VSAQ model favors self-adapting questionnaires that shrink when risk is low. So the consensus is not use a giant form. It is ask a small number of serious questions, then branch.

Sources for this answer

Law-firm commentary

B.1 Morgan Lewis commentary

Supports the cited proposition. (Morgan Lewis commentary)

training data rights remain critical

See Morgan Lewis, What's New and What's Next: Navigating AI in Technology Transactions.

Law-firm commentary

B.2 Morgan Lewis commentary

When contracting for the use of artificial intelligence in commercial services, parties must address regulatory compliance, intellectual property ownership, and performance standards to effectively allocate risk.

Broadly, responsibility for ensuring an AI tool does not violate applicable laws may fall on the party providing the dataset(s) that train the AI tool.

See Morgan Lewis, Contracting Pointers for Services Incorporating the Use of AI.

Law-firm commentary

B.3 Morgan Lewis, Contract Corner: Ensuring IP Provisions Are Fit for GenAI

Because legislative protections for generative AI outputs are currently inconsistent or absent, parties to software supply agreements should explicitly define IP ownership and indemnity obligations within their contracts.

it is important that contracts relating to the use of GenAI and its outputs address the ownership/licensing of such GenAI outputs in order to document the agreement of the parties in the absence of legislative protections.

See Morgan Lewis, Contract Corner: Ensuring IP Provisions Are Fit for GenAI.

Law-firm commentary

B.4 Cooley commentary

Supports the cited proposition. (Cooley commentary)

AI ethics, liability, transparency and compliance

See Cooley, Model Contractual Clauses for AI Procurement in the EU: Key Takeaways for AI Companies.

Primary law

B.5 Cooley, Utah, Colorado Pave Way for AI-Specific State Laws: Is Your Company R...

Supports the cited proposition. (Cooley, Utah, Colorado Pave Way for AI-Specific State Laws: Is Your Company R...)

training data, cybersecurity and measures taken to prevent biased and discriminatory outputs

See Cooley, Utah, Colorado Pave Way for AI-Specific State Laws: Is Your Company Ready for the Impending Regulation Wave?.

Primary law

B.6 Wilson Sonsini, Europe Prepares for a New Era in AI Regulation

Supports the cited proposition. (Wilson Sonsini, Europe Prepares for a New Era in AI Regulation)

carry out due diligence on the vendor

See Wilson Sonsini, Europe Prepares for a New Era in AI Regulation.

Law-firm commentary

B.7 Wilson Sonsini commentaryPDF

The EU regulatory landscape for companies using third-party AI tools is governed by a complex framework of overlapping legislation, including the AI Act, the GDPR, and the Digital Services Act, all of which may have extraterritorial reach.

The AI Act introduces a new risk-based legal framework for AI tools that will apply across all industry sectors.

See Wilson Sonsini, L-Suite AI Playbook, Chapter 7.

Commentary

B.8 Association of Corporate Counsel, Vendor Due Diligence Questionnaire

The Association of Corporate Counsel's Vendor Due Diligence Questionnaire provides a framework for assessing AI-related risks to determine the appropriate contractual terms and conditions for third-party vendor agreements.

This questionnaire will also provide Customer with a risk assessment of the Supplier AI Products and Services and proposed use cases in order to determine the appropriate and requisite terms that the Customer must include in its legal agreement with the Supplier

See Association of Corporate Counsel, Vendor Due Diligence Questionnaire.

Commentary

B.9 EDUCAUSE, Higher Education Community Vendor Assessment Toolkit

The HECVAT is a standardized assessment tool provided by EDUCAUSE to assist higher education institutions in evaluating vendor cybersecurity and compliance risks, with specific licensing terms governing its use by institutions, vendors, and third-party platforms.

The Higher Education Community Vendor Assessment Toolkit™ (HECVAT) is a tool designed to help college and university professionals more easily measure vendor risk.

See EDUCAUSE, Higher Education Community Vendor Assessment Toolkit.

Commentary

B.10 Cloud Security Alliance, AI Consensus Assessments Initiative Questionnaire (AI-CAIQ) v1.0.2

The AI-CAIQ provides a structured framework for organizations to evaluate and validate their adherence to AI-specific security, governance, and privacy controls.

The AI-CAIQ (AI Consensus Assessment Initiative Questionnaire) is a structured framework designed to help organizations self-assess and validate their adherence to AI-specific controls across critical domains such as governance, security, privacy, and operational resilience.

See Cloud Security Alliance, AI Consensus Assessments Initiative Questionnaire (AI-CAIQ) v1.0.2.

Primary law

B.11 NIST AI RMF Playbook

Supports the cited proposition. (NIST AI RMF Playbook)

neither a checklist nor set of steps to be followed in its entirety

See NIST AI RMF Playbook.

Commentary

B.12 Google, Scalable vendor security reviews

Standardized, open-source questionnaire frameworks can improve the efficiency and scalability of vendor security assessment programs while reducing the administrative burden on vendors.

We scale our efforts through automating much of the initial information gathering and triage portions of the vendor review process.

See Google, Scalable vendor security reviews.

How should legal teams build an AI vendor questionnaire that branches by risk?

Legal teams should make the AI vendor questionnaire executable: intake first, must-pass checks second, and conditional modules for protected health information, retrieval-augmented generation, fine-tuning, or high-impact use. That keeps low-risk purchases moving while preserving escalation for missing evidence or legal blockers.

The first consequence is that the minimum viable questionnaire is not generic procurement. If the tool will touch legal, customer, employee, or health data, the decisive questions are about rights and controls: training, retention, subprocessors, logs, location, portability, indemnity, and scope of contractual paper. A company that gets those wrong can have a beautifully drafted AI policy and still lack a usable legal basis for the deployment it wants.

The third consequence is that the best diligence artifact is executable. Google VSAQ's structure and AI-CAIQ's evidence fields point to the same design: intake first, must-pass conditions second, conditional modules after that, and an output that says more than approved or rejected. In practice that means the AI is not merely filling in a form. It is collecting clauses, mapping missing evidence, deciding which follow-up module is triggered by PHI, RAG, fine-tuning, or high-impact use, and escalating only the files that actually need human review.

The fifth consequence is that a public questionnaire can easily become accidental gatekeeping. NIST warns against mistaking governance material for a complete checklist, and GSA's generative AI acquisition guidance points buyers toward tailored questions and testbeds rather than one frozen form for every purchase. That suggests an important consequence for legal teams adopting a skills-file model. The file becomes most useful when it distinguishes must-pass failures from deferred questions. Otherwise it stops being a diligence tool and starts being a no.

Sources for this answer

Law-firm commentary

C.1 Morgan Lewis commentary

Supports the cited proposition. (Morgan Lewis commentary)

training data rights remain critical

See Morgan Lewis, What's New and What's Next: Navigating AI in Technology Transactions.

Commentary

C.2 Association of Corporate Counsel, Vendor Due Diligence Questionnaire

This questionnaire will also provide Customer with a risk assessment of the Supplier AI Products and Services and proposed use cases in order to determine the appropriate and requisite terms that the Customer must include in its legal agreement with the Supplier

See Association of Corporate Counsel, Vendor Due Diligence Questionnaire.

Commentary

C.3 Google, Scalable vendor security reviews

Standardized, open-source questionnaire frameworks can improve the efficiency and scalability of vendor security assessment programs while reducing the administrative burden on vendors.

We scale our efforts through automating much of the initial information gathering and triage portions of the vendor review process.

See Google, Scalable vendor security reviews.

Commentary

C.4 Cloud Security Alliance, AI Consensus Assessments Initiative Questionnaire (AI-CAIQ) v1.0.2

The AI-CAIQ provides a structured framework for organizations to evaluate and validate their adherence to AI-specific security, governance, and privacy controls.

The AI-CAIQ (AI Consensus Assessment Initiative Questionnaire) is a structured framework designed to help organizations self-assess and validate their adherence to AI-specific controls across critical domains such as governance, security, privacy, and operational resilience.

See Cloud Security Alliance, AI Consensus Assessments Initiative Questionnaire (AI-CAIQ) v1.0.2.

Primary law

C.5 NIST AI RMF Playbook

Supports the cited proposition. (NIST AI RMF Playbook)

neither a checklist nor set of steps to be followed in its entirety

See NIST AI RMF Playbook.

Agency guidance

C.6 GSA, GSA releases generative AI acquisition resource guide for federal buyers

The GSA has issued a resource guide to assist federal contracting officers in the responsible acquisition and procurement of generative AI technologies.

The guide includes considerations for the responsible acquisition of generative AI and introduces questions that contracting officers should ask to make informed procurement decisions.

See GSA, GSA releases generative AI acquisition resource guide for federal buyers.

Why must AI vendor privacy promises be checked by product and endpoint?

AI vendor privacy promises must be checked by product and endpoint because no-training language, retention settings, zero data retention, business associate agreement scope, and logs can differ across services. A usable diligence record ties each answer to the exact product, endpoint, deployment pattern, and evidence URL.

The second consequence is that vendor answers have to be product-specific. The large providers now publish much better privacy positions than they did in 2023, but the detail matters. OpenAI, Anthropic, Google Cloud, and Microsoft all distinguish, in different ways, between not used for training, retention needed to operate the service, zero-retention options, BAA scope, and product-specific logging. That means a one-line answer like enterprise data is not used to train models can be directionally true and still miss the actual issue. A useful skills file therefore records product name, endpoint, deployment pattern, and evidence URL for each material answer.

The fourth consequence is that external AI vendor is no longer a proxy for unacceptable controls. That is perhaps the source-set fact that cuts most against easy skepticism. Major vendors now publish DPAs, subprocessor lists, HIPAA pathways, and some form of no-training-by-default language for commercial products. The dividing line is becoming less third party versus not third party and more does this workflow, on this product, with these controls, fit the company's data and use case.

Sources for this answer

Vendor documentation

D.1 OpenAI, Data controls in the OpenAI platform

OpenAI provides enterprise customers with specific data controls, including options to opt out of model training, configure data retention periods, select data residency regions, and implement customer-managed encryption keys.

As of March 1, 2023, data sent to the OpenAI API is not used to train or improve OpenAI models (unless you explicitly opt in to share data with us).

See OpenAI, Data controls in the OpenAI platform.

Vendor documentation

D.2 Anthropic, Is my data used for model training?

Anthropic's commercial terms specify that user inputs and outputs are not used for model training by default, unless the user explicitly provides feedback or opts into data usage.

By default, we will not use your inputs or outputs from our commercial products (e.g. Claude for Work, Anthropic API, Claude Gov, etc.) to train our models.

See Anthropic, Is my data used for model training?.

Vendor documentation

D.3 Google Cloud, Service Specific Terms

Google Cloud's Service Specific Terms impose strict limitations on the use of Customer Data for model training and prohibit customers from using AI/ML services to reverse engineer, develop competing products, or circumvent Google's proprietary model restrictions.

Google will not use Customer Data to train or fine-tune any AI/ML models without Customer's prior permission or instruction.

See Google Cloud, Service Specific Terms.

Vendor documentation

D.4 Microsoft, Data, privacy, and security for Azure Direct Models in Microsoft Foundry

Microsoft maintains strict data privacy protections for Azure Direct Models, ensuring that customer prompts, completions, and training data are not used to train or improve Microsoft or third-party AI models without explicit customer authorization.

Your prompts (inputs) and completions (outputs), your embeddings, and your training data: - are NOT available to other customers. - are NOT available to OpenAI or other Azure Direct Model providers. - are NOT used by Azure Direct Model providers to improve their models or services.

See Microsoft, Data, privacy, and security for Azure Direct Models in Microsoft Foundry.

Vendor documentation

D.5 OpenAI, Enterprise privacy at OpenAI

OpenAI provides enterprise-level data privacy and security controls, including user ownership of inputs and outputs and a default policy against training models on customer data.

We do not train our models on your data by default

See OpenAI, Enterprise privacy at OpenAI.

Vendor documentation

D.6 Anthropic, Business Associate Agreements (BAA) for Commercial Products

Anthropic's Business Associate Agreement (BAA) coverage is limited to specific HIPAA-ready services and features, requiring administrative activation and compliance with defined configuration requirements.

For Claude Enterprise features to be covered under a BAA, an administrator must activate HIPAA compliance in the HIPAA-ready Claude Enterprise admin settings under “Data & Privacy” and sign Anthropic's BAA.

See Anthropic, Business Associate Agreements (BAA) for Commercial Products.

Vendor documentation

D.7 Google Cloud, Vertex AI and zero data retention

Google Cloud provides mechanisms for customers to restrict the use of their data for model training and to manage or disable data retention and caching features within Vertex AI.

Google won't use your data to train or fine-tune any AI/ML models without your prior permission or instruction.

See Google Cloud, Vertex AI and zero data retention.

Vendor documentation

D.8 Microsoft, Monitor Azure OpenAI in Microsoft Foundry Models

Azure Monitor provides a centralized framework for collecting, analyzing, and alerting on system performance and operational logs for Azure OpenAI resources.

The Azure Monitor service collects and aggregates metrics and logs from every component of your system.

See Microsoft, Monitor Azure OpenAI in Microsoft Foundry Models.

Which AI vendor diligence questions are still unsettled for legal teams?

The unsettled questions are whether a market-standard legal-department AI request for proposal exists, how much high-impact-use diligence belongs in the baseline file, and which popular questions actually predict failure. The current source set supports tiering more strongly than a single mandatory questionnaire.

The public-source record is still thin on one basic issue: whether there is a market-standard legal-department AI RFP. The research surfaced ACC's questionnaire, a California bar RFP, public-sector materials, and vendor-side templates, but not a settled private-market standard for legal teams. Perhaps that is why the better frame is a skills file rather than a canonical questionnaire. The standard seems to be emerging at the level of topics, not document form.

It is also unsettled how much high-impact use belongs in the baseline file. Cooley and Wilson Sonsini both pull bias, validation, and transparency into diligence, especially when regulated or consequential uses are involved. But the same source set suggests those are often conditional modules rather than universal opening questions. We think the honest reading is that a hiring tool, clinical support tool, or other high-impact system belongs on a longer branch than a drafting assistant inside legal ops.

Another open question is which popular questions actually predict failure. The materials support training rights, retention, deletion, logs, subprocessors, and portability as load-bearing. They are weaker support for broad questions like Do you follow the NIST AI RMF? or Do you have a responsible AI policy? Those may be useful maturity signals. They do not seem to do the legal work that the smaller set does. That is an inference rather than a rule, but it is a strong one.

And there is a structural uncertainty in publishing the file itself. Once a public questionnaire exists, business teams may treat it as mandatory even for small vendors or pilot tools. The source set leans against that outcome, but perhaps not strongly enough to prevent it without explicit tiering. A skills file that does not mark must-pass, context-dependent, and defer for pilot may end up producing more process than diligence.

Sources for this answer

Law-firm commentary

E.5 Morgan Lewis commentary

Supports the cited proposition. (Morgan Lewis commentary)

training data rights remain critical

See Morgan Lewis, What's New and What's Next: Navigating AI in Technology Transactions.

Primary law

E.3 Cooley, Utah, Colorado Pave Way for AI-Specific State Laws: Is Your Company R...

Supports the cited proposition. (Cooley, Utah, Colorado Pave Way for AI-Specific State Laws: Is Your Company R...)

training data, cybersecurity and measures taken to prevent biased and discriminatory outputs

See Cooley, Utah, Colorado Pave Way for AI-Specific State Laws: Is Your Company Ready for the Impending Regulation Wave?.

Law-firm commentary

E.4 Wilson Sonsini commentaryPDF

The AI Act introduces a new risk-based legal framework for AI tools that will apply across all industry sectors.

See Wilson Sonsini, L-Suite AI Playbook, Chapter 7.

Primary law

E.6 NIST AI RMF Playbook

Supports the cited proposition. (NIST AI RMF Playbook)

neither a checklist nor set of steps to be followed in its entirety

See NIST AI RMF Playbook.

Commentary

E.1 Association of Corporate Counsel, Vendor Due Diligence Questionnaire

This questionnaire will also provide Customer with a risk assessment of the Supplier AI Products and Services and proposed use cases in order to determine the appropriate and requisite terms that the Customer must include in its legal agreement with the Supplier

See Association of Corporate Counsel, Vendor Due Diligence Questionnaire.

Primary law

E.2 State Bar of California, Request for Proposal: Legal Operations, Technology, ...PDF

The State Bar of California is a public corporation within the judicial branch of state government, established by the Legislature and the California Constitution, with a mandate to regulate the legal profession and protect the public.

The State Bar, created in 1927 by the Legislature and adopted as a judicial branch agency by amendment to the California Constitution in 1960, is a public corporation within the judicial branch of state government.

See State Bar of California, Request for Proposal: Legal Operations, Technology, and Artificial Intelligence Consultant.