Building an AI Lab Portal — The Full Stack Breakdown

Our case study on health-labs.ai covered the business outcome: 60% call reduction, 89% patient satisfaction, 30-day build. This post is for the developers who want to know what the code actually looks like.

We'll cover the database schema for lab result storage, the Express routing architecture, how we designed the Claude prompts to stay medically safe, and the real performance and cost numbers from production.

$0.02Cost per Query

<800msp95 Response

0Hallucinations (1K queries)

30 daysBuild Time

The Stack

No surprises. No hype.

Backend: Node.js + Express.js
Database: PostgreSQL (Neon serverless)
AI: Claude API (claude-3-5-sonnet) via Anthropic SDK
PDF parsing: PDF.js for structured lab PDFs
Deployment: Render (auto-deploys, managed Postgres)
Auth: Magic links (no passwords, HIPAA-friendly)

If you've built a CRUD app in Node, you can build this. The AI part is smaller than you'd expect. The data modeling part is everything.

Database Schema: Modeling Lab Results

Lab results are the core entity. Getting this wrong early is painful — you're storing structured medical data that needs to be queryable by panel, by date, by patient, and by clinic. We modeled four tables.

SQL — Core Schema

-- Clinics own everything downstream
CREATE TABLE clinics (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name        TEXT NOT NULL,
  protocol    JSONB NOT NULL DEFAULT '{}',
  -- protocol stores: normal_ranges, dosing_notes, scope_limits
  created_at  TIMESTAMPTZ DEFAULT NOW()
);

-- Patients belong to a clinic
CREATE TABLE patients (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  clinic_id   UUID REFERENCES clinics(id) ON DELETE CASCADE,
  email       TEXT NOT NULL,
  name        TEXT NOT NULL,
  created_at  TIMESTAMPTZ DEFAULT NOW(),
  UNIQUE(clinic_id, email)
);

-- Each PDF upload = one lab_report
CREATE TABLE lab_reports (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  patient_id  UUID REFERENCES patients(id) ON DELETE CASCADE,
  report_date DATE NOT NULL,
  source_lab  TEXT,            -- 'Quest', 'LabCorp', 'Other'
  raw_text    TEXT,            -- full extracted PDF text (fallback)
  created_at  TIMESTAMPTZ DEFAULT NOW()
);

-- Individual biomarker readings, normalized
CREATE TABLE lab_values (
  id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  report_id     UUID REFERENCES lab_reports(id) ON DELETE CASCADE,
  marker        TEXT NOT NULL,  -- 'testosterone', 'hematocrit', 'estradiol'
  value         NUMERIC,
  unit          TEXT,           -- 'ng/dL', '%', 'pg/mL'
  in_range      BOOLEAN,        -- computed at insert time
  reference_low NUMERIC,
  reference_high NUMERIC,
  created_at    TIMESTAMPTZ DEFAULT NOW()
);

-- Index for trend queries
CREATE INDEX idx_lab_values_patient_marker
  ON lab_values (report_id, marker);

CREATE INDEX idx_lab_reports_patient_date
  ON lab_reports (patient_id, report_date DESC);

The key design decisions: protocol is JSONB on the clinic (not hardcoded), so each clinic's normal ranges differ. in_range is computed at insert using the clinic's ranges — you don't want to re-derive this on every query. Raw text is stored as fallback for PDFs Claude's vision had to parse.

PDF Ingestion Pipeline

Lab PDFs come in two flavors: machine-readable (Quest, LabCorp structured exports) and scanned images (faxed results, old records). We handle both.

JavaScript — PDF Ingestion Route

// POST /api/labs/upload
app.post('/api/labs/upload', requireClinicAuth, upload.single('pdf'), async (req, res) => {
  const { patientId } = req.body;
  const clinic = req.clinic; // from auth middleware

  // Step 1: Try structured extraction first
  let labValues = await extractStructuredPDF(req.file.buffer);

  // Step 2: Fall back to Claude vision for unstructured
  if (!labValues || labValues.length === 0) {
    const base64 = req.file.buffer.toString('base64');
    labValues = await extractWithClaude(base64, clinic.protocol);
  }

  // Step 3: Normalize units against clinic protocol
  const normalized = normalizeUnits(labValues, clinic.protocol.normal_ranges);

  // Step 4: Persist
  const report = await db.query(
    `INSERT INTO lab_reports (patient_id, report_date, source_lab, raw_text)
     VALUES ($1, $2, $3, $4) RETURNING id`,
    [patientId, normalized.reportDate, normalized.sourceLab, normalized.rawText]
  );

  const valueInserts = normalized.values.map(v =>
    db.query(
      `INSERT INTO lab_values
       (report_id, marker, value, unit, in_range, reference_low, reference_high)
       VALUES ($1, $2, $3, $4, $5, $6, $7)`,
      [report.rows[0].id, v.marker, v.value, v.unit,
       v.value >= v.refLow && v.value <= v.refHigh,
       v.refLow, v.refHigh]
    )
  );
  await Promise.all(valueInserts);

  res.json({ reportId: report.rows[0].id, valuesExtracted: normalized.values.length });
});

Architecture decision

We run structured extraction before Claude to avoid unnecessary API costs. Structured extraction handles ~70% of uploads. Claude only fires for the other 30%. That's why the cost stays at $0.02 — not every query hits the AI, and uploads are cheaper than Q&A sessions.

The Claude Prompt: Medical Context Injection

This is where most health AI projects go wrong. You can't just throw lab values at a model and ask for interpretation. You get hallucinated normal ranges, generic advice, scope violations. The fix is structured context injection.

JavaScript — Context Builder

function buildMedicalContext(patient, labHistory, clinic) {
  // Get latest values for each marker
  const latestValues = getLatestByMarker(labHistory);

  // Build trend summaries (last 3 readings per marker)
  const trends = computeTrends(labHistory);

  return `
CLINIC: ${clinic.name}
CLINIC PROTOCOL:
${formatProtocol(clinic.protocol)}

PATIENT LAB HISTORY (most recent first):
${formatLabHistory(latestValues, trends)}

SCOPE RULES (CRITICAL — never violate):
- You provide information only, never treatment recommendations
- Never suggest dose changes — refer to provider for dosing
- If a value is critically abnormal, say "contact your provider immediately"
- You cannot diagnose conditions
- Stay within data you can see — never speculate on values not present
`.trim();
}

// Example of formatProtocol output:
// Normal testosterone: 600–900 ng/dL (this clinic's target range)
// Hematocrit: flag if >50%, suggest provider contact if >52%
// Estradiol: flag if >40 pg/mL on standard protocol

JavaScript — Query Handler

// POST /api/chat
app.post('/api/chat', requirePatientAuth, async (req, res) => {
  const { question } = req.body;
  const patient = req.patient;

  // Fetch context data
  const [clinic, labHistory] = await Promise.all([
    getClinic(patient.clinicId),
    getLabHistory(patient.id, { limit: 10 }) // last 10 reports
  ]);

  const systemContext = buildMedicalContext(patient, labHistory, clinic);

  const response = await anthropic.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 512,  // intentionally capped — short answers only
    system: systemContext,
    messages: [{ role: 'user', content: question }]
  });

  // Log for audit trail (HIPAA)
  await logQuery({
    patientId: patient.id,
    question,
    responseTokens: response.usage.output_tokens,
    model: response.model
  });

  res.json({ answer: response.content[0].text });
});

Why `max_tokens: 512`?

Intentional. Medical AI answers should be short. Long answers get into treatment territory — "here are four things you could do…" — that's scope creep. We constrain the model structurally, not just through prompt instructions. At 512 tokens, Claude produces a 2–4 sentence answer. That's what a patient needs.

Cost math: 512 output + ~1,200 context input ≈ 1,712 total tokens. At Sonnet pricing, that's roughly $0.018 per query — we round up to $0.02 accounting for variance.

Prompt Design: The Scope Rules Block

The SCOPE RULES (CRITICAL — never violate) block is the most important part of the prompt. We tested without it. Claude was helpful and wrong — suggesting dose increases, inferring diagnoses, speculating on values not in the data.

The rules block does three things:

Defines what the AI is (information provider, not clinician)
Defines hard stops (no dose changes, no diagnoses)
Defines escalation behavior (critical values → contact provider)

We also tested Llama 2 7B with the same prompt. The rules block worked inconsistently — fine on common patterns, failure on edge cases (unusual units, borderline values, multi-marker correlations). Claude's instruction-following is more reliable here, and the cost delta is $0.02. It's not a hard call.

Example Exchange

Patient: "My hematocrit is 52%. Is that bad?"

Without scope rules:
"Your hematocrit of 52% is slightly elevated. You may want to consider
reducing your testosterone dosage or donating blood to bring it down.
Common side effects of high hematocrit include..."
[SCOPE VIOLATION: suggested dose reduction, gave treatment advice]

With scope rules:
"Your hematocrit is 52%, which is slightly above your clinic's target
range of under 50%. This is worth discussing with your provider at your
next appointment. If you experience shortness of breath or unusual fatigue,
contact your provider sooner."
[CORRECT: informational only, escalation cue included]

Performance Numbers

Measured over 1,000 production queries across the pilot clinic:

p50 response time: 410ms (context build + Claude round-trip)
p95 response time: 780ms
p99 response time: 1,240ms (PDF upload queries with vision fallback)
Error rate: 0.3% (all Anthropic API timeouts, retried successfully)
Hallucination rate: 0% (zero out-of-scope responses in 1K queries)

The 0% hallucination number needs context: we define hallucination as "AI mentioned a value or range not in the patient's data or clinic protocol." With structured context injection, the model has nothing to hallucinate from — it can only reference what we put in the system prompt.

What We'd Do Differently

Streaming responses

We're not streaming yet. For 512-token responses, the latency is acceptable, but there's a visible 400ms blank state before the answer appears. We'll add streaming via anthropic.messages.stream() in the next iteration. Not hard — just not in scope for a 30-day build.

Cached clinic protocol

We fetch clinic protocol from Postgres on every query. For a single clinic, fine. At 10+ clinics, we'd cache the protocol object in memory with a 5-minute TTL. node-cache or a simple Map with timestamps. One of those "obvious in hindsight" things.

Lab value embeddings

For patients with 12+ months of history, the context window gets heavy. A better architecture would embed lab values and retrieve only semantically relevant history for a given question. We didn't need this at pilot scale. At 500+ patients with 2+ years of data, it becomes relevant.

The Part That Isn't Code

We spent more time on the clinic onboarding flow than the AI code. Getting clinicians to upload their protocol document, define normal ranges, and validate the AI responses on sample questions — that's the product work that makes the AI safe.

The AI is only as reliable as the data you feed it. Garbage protocol → garbage guardrails. The best prompt engineering in the world can't compensate for an incomplete clinic protocol object.

If you're building health AI, your first investment is in the data model and the content pipeline, not the model itself. The model is a commodity. The structured, clinic-specific context is the moat.

Building an AI app for your vertical?

We've shipped production AI in healthcare, logistics, and operations. Same stack, different context injection. Talk to us before you start the build.

Get in Touch See health-labs.ai Live