Our case study on health-labs.ai covered the business outcome: 60% call reduction, 89% patient satisfaction, 30-day build. This post is for the developers who want to know what the code actually looks like.
We'll cover the database schema for lab result storage, the Express routing architecture, how we designed the Claude prompts to stay medically safe, and the real performance and cost numbers from production.
The Stack
No surprises. No hype.
- Backend: Node.js + Express.js
- Database: PostgreSQL (Neon serverless)
- AI: Claude API (
claude-3-5-sonnet) via Anthropic SDK - PDF parsing: PDF.js for structured lab PDFs
- Deployment: Render (auto-deploys, managed Postgres)
- Auth: Magic links (no passwords, HIPAA-friendly)
If you've built a CRUD app in Node, you can build this. The AI part is smaller than you'd expect. The data modeling part is everything.
Database Schema: Modeling Lab Results
Lab results are the core entity. Getting this wrong early is painful — you're storing structured medical data that needs to be queryable by panel, by date, by patient, and by clinic. We modeled four tables.
-- Clinics own everything downstream
CREATE TABLE clinics (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
protocol JSONB NOT NULL DEFAULT '{}',
-- protocol stores: normal_ranges, dosing_notes, scope_limits
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Patients belong to a clinic
CREATE TABLE patients (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
clinic_id UUID REFERENCES clinics(id) ON DELETE CASCADE,
email TEXT NOT NULL,
name TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(clinic_id, email)
);
-- Each PDF upload = one lab_report
CREATE TABLE lab_reports (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
patient_id UUID REFERENCES patients(id) ON DELETE CASCADE,
report_date DATE NOT NULL,
source_lab TEXT, -- 'Quest', 'LabCorp', 'Other'
raw_text TEXT, -- full extracted PDF text (fallback)
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Individual biomarker readings, normalized
CREATE TABLE lab_values (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
report_id UUID REFERENCES lab_reports(id) ON DELETE CASCADE,
marker TEXT NOT NULL, -- 'testosterone', 'hematocrit', 'estradiol'
value NUMERIC,
unit TEXT, -- 'ng/dL', '%', 'pg/mL'
in_range BOOLEAN, -- computed at insert time
reference_low NUMERIC,
reference_high NUMERIC,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Index for trend queries
CREATE INDEX idx_lab_values_patient_marker
ON lab_values (report_id, marker);
CREATE INDEX idx_lab_reports_patient_date
ON lab_reports (patient_id, report_date DESC);
The key design decisions: protocol is JSONB on the clinic (not hardcoded), so each clinic's normal ranges differ. in_range is computed at insert using the clinic's ranges — you don't want to re-derive this on every query. Raw text is stored as fallback for PDFs Claude's vision had to parse.
PDF Ingestion Pipeline
Lab PDFs come in two flavors: machine-readable (Quest, LabCorp structured exports) and scanned images (faxed results, old records). We handle both.
// POST /api/labs/upload
app.post('/api/labs/upload', requireClinicAuth, upload.single('pdf'), async (req, res) => {
const { patientId } = req.body;
const clinic = req.clinic; // from auth middleware
// Step 1: Try structured extraction first
let labValues = await extractStructuredPDF(req.file.buffer);
// Step 2: Fall back to Claude vision for unstructured
if (!labValues || labValues.length === 0) {
const base64 = req.file.buffer.toString('base64');
labValues = await extractWithClaude(base64, clinic.protocol);
}
// Step 3: Normalize units against clinic protocol
const normalized = normalizeUnits(labValues, clinic.protocol.normal_ranges);
// Step 4: Persist
const report = await db.query(
`INSERT INTO lab_reports (patient_id, report_date, source_lab, raw_text)
VALUES ($1, $2, $3, $4) RETURNING id`,
[patientId, normalized.reportDate, normalized.sourceLab, normalized.rawText]
);
const valueInserts = normalized.values.map(v =>
db.query(
`INSERT INTO lab_values
(report_id, marker, value, unit, in_range, reference_low, reference_high)
VALUES ($1, $2, $3, $4, $5, $6, $7)`,
[report.rows[0].id, v.marker, v.value, v.unit,
v.value >= v.refLow && v.value <= v.refHigh,
v.refLow, v.refHigh]
)
);
await Promise.all(valueInserts);
res.json({ reportId: report.rows[0].id, valuesExtracted: normalized.values.length });
});
We run structured extraction before Claude to avoid unnecessary API costs. Structured extraction handles ~70% of uploads. Claude only fires for the other 30%. That's why the cost stays at $0.02 — not every query hits the AI, and uploads are cheaper than Q&A sessions.
The Claude Prompt: Medical Context Injection
This is where most health AI projects go wrong. You can't just throw lab values at a model and ask for interpretation. You get hallucinated normal ranges, generic advice, scope violations. The fix is structured context injection.
function buildMedicalContext(patient, labHistory, clinic) {
// Get latest values for each marker
const latestValues = getLatestByMarker(labHistory);
// Build trend summaries (last 3 readings per marker)
const trends = computeTrends(labHistory);
return `
CLINIC: ${clinic.name}
CLINIC PROTOCOL:
${formatProtocol(clinic.protocol)}
PATIENT LAB HISTORY (most recent first):
${formatLabHistory(latestValues, trends)}
SCOPE RULES (CRITICAL — never violate):
- You provide information only, never treatment recommendations
- Never suggest dose changes — refer to provider for dosing
- If a value is critically abnormal, say "contact your provider immediately"
- You cannot diagnose conditions
- Stay within data you can see — never speculate on values not present
`.trim();
}
// Example of formatProtocol output:
// Normal testosterone: 600–900 ng/dL (this clinic's target range)
// Hematocrit: flag if >50%, suggest provider contact if >52%
// Estradiol: flag if >40 pg/mL on standard protocol
// POST /api/chat
app.post('/api/chat', requirePatientAuth, async (req, res) => {
const { question } = req.body;
const patient = req.patient;
// Fetch context data
const [clinic, labHistory] = await Promise.all([
getClinic(patient.clinicId),
getLabHistory(patient.id, { limit: 10 }) // last 10 reports
]);
const systemContext = buildMedicalContext(patient, labHistory, clinic);
const response = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 512, // intentionally capped — short answers only
system: systemContext,
messages: [{ role: 'user', content: question }]
});
// Log for audit trail (HIPAA)
await logQuery({
patientId: patient.id,
question,
responseTokens: response.usage.output_tokens,
model: response.model
});
res.json({ answer: response.content[0].text });
});
Why max_tokens: 512?
Intentional. Medical AI answers should be short. Long answers get into treatment territory — "here are four things you could do…" — that's scope creep. We constrain the model structurally, not just through prompt instructions. At 512 tokens, Claude produces a 2–4 sentence answer. That's what a patient needs.
Cost math: 512 output + ~1,200 context input ≈ 1,712 total tokens. At Sonnet pricing, that's roughly $0.018 per query — we round up to $0.02 accounting for variance.
Prompt Design: The Scope Rules Block
The SCOPE RULES (CRITICAL — never violate) block is the most important part of the prompt. We tested without it. Claude was helpful and wrong — suggesting dose increases, inferring diagnoses, speculating on values not in the data.
The rules block does three things:
- Defines what the AI is (information provider, not clinician)
- Defines hard stops (no dose changes, no diagnoses)
- Defines escalation behavior (critical values → contact provider)
We also tested Llama 2 7B with the same prompt. The rules block worked inconsistently — fine on common patterns, failure on edge cases (unusual units, borderline values, multi-marker correlations). Claude's instruction-following is more reliable here, and the cost delta is $0.02. It's not a hard call.
Patient: "My hematocrit is 52%. Is that bad?"
Without scope rules:
"Your hematocrit of 52% is slightly elevated. You may want to consider
reducing your testosterone dosage or donating blood to bring it down.
Common side effects of high hematocrit include..."
[SCOPE VIOLATION: suggested dose reduction, gave treatment advice]
With scope rules:
"Your hematocrit is 52%, which is slightly above your clinic's target
range of under 50%. This is worth discussing with your provider at your
next appointment. If you experience shortness of breath or unusual fatigue,
contact your provider sooner."
[CORRECT: informational only, escalation cue included]
Performance Numbers
Measured over 1,000 production queries across the pilot clinic:
- p50 response time: 410ms (context build + Claude round-trip)
- p95 response time: 780ms
- p99 response time: 1,240ms (PDF upload queries with vision fallback)
- Error rate: 0.3% (all Anthropic API timeouts, retried successfully)
- Hallucination rate: 0% (zero out-of-scope responses in 1K queries)
The 0% hallucination number needs context: we define hallucination as "AI mentioned a value or range not in the patient's data or clinic protocol." With structured context injection, the model has nothing to hallucinate from — it can only reference what we put in the system prompt.
What We'd Do Differently
Streaming responses
We're not streaming yet. For 512-token responses, the latency is acceptable, but there's a visible 400ms blank state before the answer appears. We'll add streaming via anthropic.messages.stream() in the next iteration. Not hard — just not in scope for a 30-day build.
Cached clinic protocol
We fetch clinic protocol from Postgres on every query. For a single clinic, fine. At 10+ clinics, we'd cache the protocol object in memory with a 5-minute TTL. node-cache or a simple Map with timestamps. One of those "obvious in hindsight" things.
Lab value embeddings
For patients with 12+ months of history, the context window gets heavy. A better architecture would embed lab values and retrieve only semantically relevant history for a given question. We didn't need this at pilot scale. At 500+ patients with 2+ years of data, it becomes relevant.
The Part That Isn't Code
We spent more time on the clinic onboarding flow than the AI code. Getting clinicians to upload their protocol document, define normal ranges, and validate the AI responses on sample questions — that's the product work that makes the AI safe.
The AI is only as reliable as the data you feed it. Garbage protocol → garbage guardrails. The best prompt engineering in the world can't compensate for an incomplete clinic protocol object.
If you're building health AI, your first investment is in the data model and the content pipeline, not the model itself. The model is a commodity. The structured, clinic-specific context is the moat.
Building an AI app for your vertical?
We've shipped production AI in healthcare, logistics, and operations. Same stack, different context injection. Talk to us before you start the build.