Hiring a healthcare AI consultant: a CTO's checklist

The "healthcare AI consultant" market doubled in 2024 and doubled again in 2025. Most of the new entrants are real. Some are good. A small share can sit in a room with your security officer, your compliance lead, and your medical director and survive the conversation. Picking from this market without a structured filter is how digital health companies lose a quarter to a consultant who cannot actually do the work.

This is the checklist I use when sourcing AI specialists into companies I advise, and the same one I would use if I were hiring someone like me from the buyer side. It is shaped for engagements where the consultant will touch PHI, ship code, or influence the AI architecture — not for one-off advisory.

Step 1. Define the engagement shape before sourcing

The mistake that wastes the most cycles is starting candidate calls without a clear engagement shape. The conversation becomes a Rorschach test — each consultant tells a different story about what they could do — and the buyer has no way to compare.

Three engagement shapes are common, and they want different consultants:

A Build engagement — ships a working AI workflow into production over 4 to 8 weeks. Fixed scope, fixed price. The consultant is hands-on with code.
A Fractional AI/CTO engagement — embedded senior leadership for the AI surface. Strategy, hiring support, architecture review, governance. 3-month minimum to deliver value.
An advisory engagement — occasional senior judgment without operational responsibility. Office hours, board prep, vendor evaluation. Lowest commitment, lowest leverage.

Write the shape down before the first call. If the answer is "it depends on what they suggest," do that thinking first — the discovery is too valuable to outsource to a vendor pitch.

For published-pricing examples of the first two shapes, see the services page.

Step 2. Source from credible filters, not from LinkedIn outreach

The signal-to-noise ratio on inbound LinkedIn outreach for AI consulting work is poor in 2025–26. Better sources, ranked by signal strength:

Referrals from CTOs who have actually shipped clinical AI. Three referrals from people who have done the work are worth more than thirty profiles you find on your own.
Authors of substantive public writing on healthcare AI. Not LinkedIn carousels. Long-form posts on real architectural questions, with named systems and named decisions. The writing is a sample of the thinking.
Public talks at clinical-AI conferences — HIMSS, HLTH, ViVE, Anthropic / OpenAI dev days when they have healthcare tracks. Not the keynotes; the working sessions where the practitioners talk.
GitHub repos with production-shaped code — evaluation harnesses, skill files, MCP servers. Public artifacts that show the practitioner can actually build, not just narrate.

The implicit filter in all four: someone who is putting their thinking into public, named artifacts has more skin in the game than someone whose entire pitch is a deck.

Step 3. The five questions that separate signal from noise

By the second call, the candidate should be able to answer these without hedging.

"Walk me through the last production AI system you shipped, end to end."

The healthy answer names the model provider, the orchestration layer, the data shape, the BAA chain, the evaluation harness, the governance forum, and what broke in the first month. The unhealthy answer is generic — "we used GPT-4 to summarize documents and it saved 40% of clinician time." The latter is a marketing claim, not an engineering story.

"How do you handle BAAs in the data path?"

This is the fastest single question to filter for healthcare-AI competence. The candidate should be able to walk through the specific BAA chain they design (provider, cloud, RAG/vector store, observability) and the way they verify it before any prompt goes to production. If they say "we just use Anthropic's enterprise tier and it's all covered," ask a follow-up about the cloud and the orchestration layer. The depth of the answer is the signal.

For the depth this should reach, see HIPAA-compliant AI automation.

"Show me an evaluation harness you've built or contributed to."

Either a public repo, a sanitized internal example, or a working description with the test case shape, the labeling process, the rollup metric, and the cadence of runs. Candidates who cannot describe an eval harness in concrete terms have probably not yet shipped AI in a setting where one was required.

For the shape of a healthcare-grade eval harness, see evaluation frameworks for clinical AI.

"What's your model selection rationale and what do you do when a model is deprecated?"

A senior practitioner has a documented model selection process and a plan for deprecation. A less-senior one has a model they like and no plan. The deprecation question is particularly diagnostic because every team that has shipped AI in production has felt it — model versions roll over, prompts that worked yesterday produce different outputs today, and the team needs a deliberate response.

"Tell me about a healthcare AI engagement that went badly."

The healthy answer is specific, names the failure mode, and describes what the consultant changed in their practice afterward. The unhealthy answer is either evasive ("we haven't had one") or blames the client. Both are disqualifying for serious work.

Step 4. The disqualifiers

Any one of these should end the conversation, regardless of how strong the rest of the pitch is.

"AI for everything" positioning. Healthcare AI is a discipline. A consultant who is also doing finance AI, legal AI, and marketing AI is doing none of them at the depth a regulated workflow requires.
No published BAA or compliance posture. The consultant should be able to send you their BAA, their data handling policy, and a security questionnaire response. If those don't exist, they have not done this work at the scale that requires them.
No reference clients in healthcare. Not "advisory clients" — clients where the consultant shipped or directly influenced a production system. Three is a reasonable minimum.
All deliverables are decks. A healthy engagement produces working software, working evals, working governance documents — not just strategy decks. If the case studies are deck-only, the consultant is upstream of the work that has to actually ship.
"We'll figure out compliance later." The single most expensive sentence in healthcare AI. A consultant who says it has either not been audited or is selling a product that will not survive one. See why clinical AI pilots stall for what "figure out compliance later" actually costs.

Step 5. The structural test: who owns the deliverable

The cleanest healthcare AI engagements have an explicit answer to "who owns the code, the evals, the governance documents, and the model selection rationale at the end of the engagement?" The healthy answer is "the client owns all of it; the consultant is replaceable from day one." The unhealthy answer is the engagement is structured so the consultant becomes load-bearing in a way the client cannot exit.

This shows up in:

Where the code lives. Client's repo, client's cloud, client's account. Not "we host it for you and you log in."
Whose name is on the model provider account. Client's, not consultant's. The BAAs follow.
Where the governance documents live. Client's wiki or shared drive, owned by a named client-side person, not consultant's.
Who runs the evaluation harness. Client's CI or scheduled job, not consultant's laptop.

A consultant who pushes back on any of these is structuring the engagement for lock-in. That is the wrong shape.

Step 6. The pricing question is the easiest one

Healthcare AI consulting in 2026 has a wide pricing range. The market clears around:

Hourly advisory: $300–$600/hour for senior practitioners.
Retainer fractional engagements: $5K–$25K/month, depending on scope.
Fixed-scope Build engagements: $15K–$60K per project, depending on complexity and PHI surface area.

Pricing significantly above the top of this range is justified only by uncommon track record — a former named CTO of a category-defining company, etc. Pricing significantly below the bottom is a signal that the consultant has not yet been pricing senior work and may not be ready for the engagement you are sourcing.

The most expensive consultant is rarely the wrong choice. The cheapest one almost always is.

What good looks like at the end of the search

You have three to five candidates who have passed the question set. Each has a defensible BAA story, a working production system to point at, references who answer the phone, and a deliverable structure that leaves you with all the assets. You can describe — out loud — what differentiates them. The decision feels uncomfortable in the way real choices do, not arbitrary in the way picking from a list of indistinguishable LinkedIn profiles does.

If you are starting that search and would like a calibration call — not a pitch, just a peer conversation about what to look for — book a call. Thirty minutes. If I am not the right fit for your engagement, I will say so and where possible point you to someone who is. For the broader framing of where AI consulting fits in a healthcare roadmap, the Playbook covers the strategic surface.