DermoAI is an AI-powered skin disease triage platform designed to bridge the gap between laboratory AI and real-world smartphone screening.
Models that shine on curated clinical photography collapse the moment they meet a real consumer device. This is the wall DermoAI was built to break.
A 51-point collapse in accuracy. Most AI models fail the moment they leave the laboratory.
A five-phase R&D cycle. Each phase was built to uncover and remediate a specific failure mode found during validation — moving from a fragile academic model to a deployment-ready triage platform.
A pretrained ResNet-50 established the floor. It captured broad structure but missed the fine-grained texture and micro-vascular patterns that separate malignancy from benign lookalikes.
The backbone moved to ConvNeXt-Base (88M parameters) — transformer-inspired design with LayerNorm and GeLU, retaining CNN training efficiency. Texture extraction improved sharply.
Train–validation contamination was discovered: images from the same patient spanned both splits, inflating every metric. The validation pipeline was rebuilt with GroupShuffleSplit for true patient-level separation.
Testing on the Midas consumer-photo set exposed the real wall: 78.65% lab accuracy collapsed to 27.59% on phones — a 51-point drop traced to lighting, focus, resolution, and background skin texture.
2,308 smartphone images were folded into training alongside camera-specific augmentations — color jitter, Gaussian noise, motion blur, and perspective warps — forcing camera-invariant representations.
Supervised contrastive learning pulled embeddings of same-class lesions together and pushed lookalikes apart, sharpening the malignant–benign decision boundary.
TTA averaged predictions over augmented views, and ensembling combined complementary backbones — stabilizing predictions on noisy inputs.
A safety-first ConvNeXt-Base triage engine, calibrated for real-world smartphone input, validated across six diverse datasets including consumer photography.
Each card maps to a real failure mode uncovered during validation — and the engineering decision that resolved it.
Same-patient images across train and validation silently inflated every metric. Solved with GroupShuffleSplit.
Clinical photography and phone cameras are different visual languages. Bridged with 2,308 real photos and camera-specific augmentations.
Rare malignancies were drowned by common benign lesions. Addressed with class balancing, focal loss, and melanoma-specific handling.
88M-parameter backbones and heavy augmentation pushed memory limits. Managed with gradient accumulation and mixed-precision training.
Missing a cancer is far worse than a false alarm. We deliberately biased toward sensitivity — 87.30% — over strict top-1 accuracy.
Raw softmax confidence was overconfident. Label smoothing (ε=0.1) and a conservative triage threshold restored trustworthy probabilities.
A modular, high-throughput pipeline — from a patient's phone to a structured, safety-calibrated clinical report.
Evaluated on a clean, patient-separated set of 4,555 images across 12 disease classes — including real-world smartphone photography.
From smartphone capture to a safety-calibrated clinical report — end to end.
Architecture alone gets you to 77%. The real gains come from data integrity, augmentation, and calibration — the unglamorous work.
Metrics without patient-level splitting are fiction. The 9-point jump to 82.79% only counted once leakage was eliminated.
A model isn't deployed until it survives a phone camera. Domain adaptation is not optional — it is the product.
In medicine, a missed cancer is not a metric — it's a person. Safety-first design beats leaderboard accuracy every time.
Localized attention heatmaps for clinician trust.
Expand the smartphone training corpus.
INT8/FP16 quantization via ONNX for on-device inference.
External dermatologist-supervised trials.
Swin Transformer ensemble & self-attention backbones.