What my caries detector gets wrong

This is an example post. It exists to show the format — every element below is something you can use in your own writing. Replace it with a real post, or delete it once you’ve seen how it looks.

The headline number for this model is an mAP50 of 0.813, which sounds healthy. But an aggregate score hides where a clinical tool actually matters: the cases it misses. This post walks through those.

The failures that matter

A false negative — a real caries lesion the model calls “sound” — is the expensive mistake here. Recall, not precision, is the number I watch.

The model is most confident exactly where I am least: small interproximal lesions on low-contrast radiographs.

That quote above is a pull-quote — in Markdown it’s just a blockquote (a line starting with >).

Where it slips

Early interproximal lesions under 1mm
Radiographs with heavy noise or low contrast
Restoration margins it mistakes for decay

Confusion matrix from the validation run. — Validation-set confusion matrix: false negatives (FN) are the clinically costly cell.

A bit of the evaluation code

Code blocks are monospace and bordered. Here’s the snippet that produced the recall figure:

from sklearn.metrics import recall_score

# y_true / y_pred are 1 for "caries", 0 for "sound"
recall = recall_score(y_true, y_pred)
print(f"recall: {recall:.2f}")  # recall: 0.78

Inline code like recall_score is styled too.

That teal box is a callout — write it as a blockquote whose first line is > [!NOTE]. I use it for the “what I’d do differently” reflection at the end of each post.

Where this goes next

The next iteration focuses on the low-contrast set. I’ll post the results here when the run finishes — including the cases it still gets wrong.