SOC 2 and AI governance: mapping LLM controls to Trust Services Criteria

SOC 2's Trust Services Criteria predate LLMs by more than a decade. The controls were written for a different shape of system — webapps, batch jobs, databases — but they apply, with careful translation, to AI workflows. The work is mostly translation, not invention.

The mistake I see when AI workflows hit a first SOC 2 audit is that the team treats the AI surface as a special category that needs its own framework. It does not. The TSCs are general enough to cover it. They just need to be mapped explicitly so the auditor does not have to do the mapping in their head — and so the next audit's reviewer can see continuity.

Here is the mapping I use on Build engagements, with the artifacts each control wants to see for an LLM-powered workflow.

CC1 — Control Environment

The auditor wants to see that someone owns AI governance. Not a committee, not a slide. A named human, with a job description that includes the word "AI."

For a small company this can be the CTO, the Head of Engineering, or a fractional AI/CTO if the team has not yet hired one. For a larger org it is usually a dedicated AI governance role inside the security or compliance function. What matters is that there is an accountability chain from a model output back to a person who is responsible for the policies that produced it.

Artifacts: an AI Governance Charter (one to three pages), org chart with AI accountability marked, role description that mentions AI explicitly.

CC2 — Communication and Information

This control is about how the system communicates what it is doing — to users, to operators, to leadership. For AI workflows, that means model behavior is documented and changes are announced.

The artifact auditors look at most often: a model card, or its internal equivalent. What model is in production, what version, what its known limitations are, what its evaluation results were on the most recent run.

Artifacts: internal model card or model selection memo, change log for model and prompt versions, user-facing copy that disclosures AI involvement where required.

CC3 — Risk Assessment

The risk register has to include AI-specific risks: prompt injection, training data leakage if you ever fine-tune, hallucination in clinical or compliance-critical outputs, vendor concentration, model deprecation. Each one gets a likelihood, an impact, and a mitigation.

The version of the risk register that is most useful in audit is the one that names the worst plausible outcome of each risk in business terms, not technical terms. "Model hallucinates a medication name in a draft" is sharper than "LLM produces incorrect output."

Artifacts: AI risk register, threat model document, mitigations mapped to controls.

CC4 — Monitoring Activities

This is where AI workflows often have the weakest story. Traditional system monitoring is mature; AI output monitoring is newer. The auditor wants to see that you are watching the model in production the same way you would watch any other production system.

The monitoring shape that works: a small evaluation harness with golden cases that runs on schedule (daily or weekly), a drift dashboard that compares current performance to baseline, and a defined alerting threshold that triggers an investigation. You do not need fancy ML monitoring tools for this. A cron job that runs evals and posts to Slack is often enough for a first audit.

Artifacts: evaluation harness with versioned test cases, drift dashboard, alerting playbook.

CC5 — Control Activities

Standard SOC 2 control documentation, applied to AI-specific processes. The audit asks for evidence that controls are designed, implemented, and reviewed periodically. AI controls show up as a subset: prompt review, skill file review, evaluation review, governance meeting cadence.

Artifacts: control matrix that includes AI-specific controls, meeting minutes from the governance forum.

CC6 — Logical and Physical Access

The AI orchestration layer is access-controlled like every other production system. Federated where possible. RBAC mapped to clinical, operational, and administrative roles. Logged.

In practice for healthcare workflows this means: the model API key is in a secrets manager, not committed. The orchestration service is behind your existing SSO. Skill files and prompt templates are in a repo with PR-based access control. Logs of who saw what are kept for the same retention period as PHI logs.

Artifacts: access matrix, secrets management evidence, SSO integration documentation, audit log retention policy.

CC7 — System Operations

You have monitoring, incident response, and change management — applied to prompts and skill files specifically, not just to code.

The lift here is that "code" historically meant compiled artifacts. For an LLM workflow, prompt templates and skill files behave like code: they change output behavior, they need review, they need version control, they need rollback. The control language has to be updated to reflect that.

Artifacts: incident response runbook that names AI-specific failure modes, change management procedure that covers prompt and skill file changes, monitoring documentation.

CC8 — Change Management

Prompt changes go through PR review. A clinical or domain reviewer is named on changes that affect output behavior. Reviews are tracked in a way the auditor can sample.

This is the control where I see the most pushback from engineering teams — the instinct is that prompt changes are "just text" and shouldn't need a clinical signoff. They do. The substrate that ships their output to clinicians or compliance officers needs the same gate that the rest of your clinical product has.

Artifacts: PR template that includes domain reviewer field, review log, named reviewers.

CC9 — Risk Mitigation

You have insurance, vendor management, and BAA chains documented. For AI workflows, the BAA chain is the one that gets the most scrutiny — see HIPAA-compliant AI automation for the full version of that story.

Artifacts: BAA register, vendor risk assessments, cyber insurance policy.

Confidentiality and Privacy

These criteria apply when you carry confidential or personal data. For healthcare AI, almost every workflow touches them.

The two AI-specific moves here:

PHI minimization at the prompt level. Only the minimum data shape needed for the task goes into the prompt. Build a de-identification or tokenization layer where the task allows.
Confidentiality of prompt and output logs. The log store has the same access controls as the source data. It is not exempt because it is "logs."

Artifacts: data classification policy that includes prompt and inference data, encryption documentation, log access controls.

What this looks like in practice

A SOC 2 audit for an AI-powered healthcare workflow ends up being 30–40% longer in the first round than for a non-AI product of the same size. After the first round, it normalizes — because the controls are not new, just translated.

The teams that pass cleanly are the ones that did the translation work in advance: they sat down with the TSCs, walked through them with a senior engineer who understood both compliance and the AI architecture, and produced the artifacts above. The teams that struggle are the ones that get to the audit and try to retrofit governance onto a working system.

If you are heading into a SOC 2 audit and the AI surface is the part making you nervous, book a call. The translation is doable. The right time to start is before the auditor's first meeting, not after. For the broader story on healthcare AI compliance, the full mapping including evidence collection cadence is in the Healthcare AI Automation Playbook.