Medical AI benchmarks — what it actually means when a model 'passes the USMLE'
When a headline says a model outperformed physicians on an exam, it seems to compress the entire debate into one number. The problem is that a benchmark is not clinical practice, and test performance is not the same as real-world reliability.