What my caries detector gets wrong
- mAP50 0.813
- F1 0.97
- recall 0.78
This is an example post. It exists to show the format — every element below is something you can use in your own writing. Replace it with a real post, or delete it once you’ve seen how it looks.
The headline number for this model is an mAP50 of 0.813, which sounds healthy. But an aggregate score hides where a clinical tool actually matters: the cases it misses. This post walks through those.
The failures that matter
A false negative — a real caries lesion the model calls “sound” — is the expensive mistake here. Recall, not precision, is the number I watch.
The model is most confident exactly where I am least: small interproximal lesions on low-contrast radiographs.
That quote above is a pull-quote — in Markdown it’s just a blockquote
(a line starting with >).
Where it slips
- Early interproximal lesions under 1mm
- Radiographs with heavy noise or low contrast
- Restoration margins it mistakes for decay
A bit of the evaluation code
Code blocks are monospace and bordered. Here’s the snippet that produced the recall figure:
from sklearn.metrics import recall_score
# y_true / y_pred are 1 for "caries", 0 for "sound"
recall = recall_score(y_true, y_pred)
print(f"recall: {recall:.2f}") # recall: 0.78
Inline code like recall_score is styled too.
That teal box is a callout — write it as a blockquote whose first line is
> [!NOTE]. I use it for the “what I’d do differently” reflection at the end
of each post.
Where this goes next
The next iteration focuses on the low-contrast set. I’ll post the results here when the run finishes — including the cases it still gets wrong.