LONDON WALLET
  • Home
  • Investing
  • Business Finance
  • Markets
  • Industries
  • Opinion
  • UK
  • Real Estate
  • Crypto
No Result
View All Result
LONDON WALLET
  • Home
  • Investing
  • Business Finance
  • Markets
  • Industries
  • Opinion
  • UK
  • Real Estate
  • Crypto
No Result
View All Result
LondonWallet
No Result
View All Result

Op-ed: How well can AI chatbots mimic doctors in a treatment setting? We put 5 to the test

Robert Frost by Robert Frost
July 18, 2024
in Industries
Op-ed: How well can AI chatbots mimic doctors in a treatment setting? We put 5 to the test
74
SHARES
1.2k
VIEWS
Share on FacebookShare on Twitter


Dr. Scott Gottlieb is a physician and served as the 23rd Commissioner of the U.S. Food and Drug Administration. He is a CNBC contributor and is a member of the boards of Pfizer and several other startups in health and tech. He is also a partner at the venture capital firm New Enterprise Associates. Shani Benezra is a senior research associate at the American Enterprise Institute and a former associate producer at CBS News’ Face the Nation.

Many consumers and medical providers are turning to chatbots, powered by large language models, to answer medical questions and inform treatment choices. We decided to see whether there were major differences between the leading platforms when it came to their clinical aptitude.

To secure a medical license in the United States, aspiring doctors must successfully navigate three stages of the U.S. Medical Licensing Examination (USMLE), with the third and final installment widely regarded as the most challenging. It requires candidates to answer about 60% of the questions correctly, and historically, the average passing score hovered around 75%.

When we subjected the major large language models (LLMs) to the same Step 3 examination, their performance was markedly superior, achieving scores that significantly outpaced many doctors.

But there were some clear differences between the models.

Typically taken after the first year of residency, the USMLE Step 3 gauges whether medical graduates can apply their understanding of clinical science to the unsupervised practice of medicine. It assesses a new doctor’s ability to manage patient care across a broad range of medical disciplines and includes both multiple-choice questions and computer-based case simulations.

We isolated 50 questions from the 2023 USMLE Step 3 sample test to evaluate the clinical proficiency of five different leading large language models, feeding the same set of questions to each of these platforms — ChatGPT, Claude, Google Gemini, Grok and Llama.

Other studies have gauged these models for their medical proficiency, but to our knowledge, this is the first time these five leading platforms have been compared in a head-to-head evaluation. These results could give consumers and providers some insights on where they should be turning.

You might also like

Mercedes takes out the trash as German city deploys 18 electric garbage trucks

Electreon snaps up InductEV’s wireless charging tech in new MoU

BYD may bring an even smaller, cheaper EV to Europe

Here’s how they scored:

  • ChatGPT-4o (Open AI) — 49/50 questions correct (98%)
  • Claude 3.5 (Anthropic) — 45/50 (90%)
  • Gemini Advanced (Google) — 43/50 (86%)
  • Grok (xAI) — 42/50 (84%)
  • HuggingChat (Llama) — 33/50 (66%)

In our experiment, OpenAI’s ChatGPT-4o emerged as the top performer, achieving a score of 98%. It provided detailed medical analyses, employing language reminiscent of a medical professional. It not only delivered answers with extensive reasoning, but also contextualized its decision-making process, explaining why alternative answers were less suitable.

Claude, from Anthropic, came in second with a score of 90%. It provided more human-like responses with simpler language and a bullet-point structure that might be more approachable to patients. Gemini, which scored 86%, gave answers that weren’t as thorough as ChatGPT or Claude, making its reasoning harder to decipher, but its answers were succinct and straightforward.

Grok, the chatbot from Elon Musk’s xAI, scored a respectable 84% but didn’t provide descriptive reasoning during our analysis, making it hard to understand how it arrived at its answers. While HuggingChat — an open-source website built from Meta’s Llama — scored the lowest at 66%, it nonetheless showed good reasoning for the questions it answered correctly, providing concise responses and links to sources.

One question that most of the models got wrong related to a 75-year-old woman with a hypothetical heart condition. The question asked the physicians which was the most appropriate next step as part of her evaluation. Claude was the only model that generated the correct answer.

Another notable question, focused on a 20-year-old male patient presenting with symptoms of a sexually transmitted infection. It asked physicians which of five choices was the appropriate next step as part of his workup. ChatGPT correctly determined that the patient should be scheduled for HIV serology testing in three months, but the model went further, recommending a follow-up examination in one week to ensure that the patient’s symptoms had resolved and that the antibiotics covered his strain of infection. To us, the response highlighted the model’s capacity for broader reasoning, expanding beyond the binary choices presented by the exam.

These models weren’t designed for medical reasoning; they’re products of the consumer technology sector, crafted to perform tasks like language translation and content generation. Despite their non-medical origins, they’ve shown a surprising aptitude for clinical reasoning.

Newer platforms are being purposely built to solve medical problems. Google recently introduced Med-Gemini, a refined version of its previous Gemini models that’s fine-tuned for medical applications and equipped with web-based searching capabilities to enhance clinical reasoning.

As these models evolve, their skill in analyzing complex medical data, diagnosing conditions and recommending treatments will sharpen. They may offer a level of precision and consistency that human providers, constrained by fatigue and error, might sometimes struggle to match, and open the way to a future where treatment portals can be powered by machines, rather than doctors.



Source link

Share30Tweet19
Previous Post

OKX selects Malta as MiCA hub for EU crypto compliance

Next Post

Thursday’s analyst calls: Gap gets an upgrade, beauty stock to pop nearly 35%

Robert Frost

Robert Frost

Jutawantoto Jutawantoto Jutawantoto Jutawantoto Berita Terbaru Hari

Recommended For You

Mercedes takes out the trash as German city deploys 18 electric garbage trucks
Industries

Mercedes takes out the trash as German city deploys 18 electric garbage trucks

November 19, 2025
Electreon snaps up InductEV’s wireless charging tech in new MoU
Industries

Electreon snaps up InductEV’s wireless charging tech in new MoU

November 19, 2025
BYD may bring an even smaller, cheaper EV to Europe
Industries

BYD may bring an even smaller, cheaper EV to Europe

November 19, 2025
Lincoln Electric is bringing DC fast charging to sites WITHOUT 3-phase power
Industries

Lincoln Electric is bringing DC fast charging to sites WITHOUT 3-phase power

November 19, 2025
Next Post
Thursday’s analyst calls: Gap gets an upgrade, beauty stock to pop nearly 35%

Thursday's analyst calls: Gap gets an upgrade, beauty stock to pop nearly 35%

Related News

Top resources for injury victims in manufacturing in America – London Business News | London Wallet

Top resources for injury victims in manufacturing in America – London Business News | London Wallet

March 14, 2025
Tsunami of appeals proves business rates system just isn’t working – London Business News | London Wallet

Tsunami of appeals proves business rates system just isn’t working – London Business News | London Wallet

May 18, 2023
Rivian (RIVN) stock is trending on analyst prediction of ‘massive’ EV market gains

Rivian (RIVN) stock is trending on analyst prediction of ‘massive’ EV market gains

December 9, 2024

Browse by Category

  • Business Finance
  • Crypto
  • Industries
  • Investing
  • Markets
  • Opinion
  • Real Estate
  • UK

London Wallet

Read latest news about finance, business and investing

  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2025 London Wallet - All Rights Reserved!

No Result
View All Result
  • Checkout
  • Contact
  • Home
  • Login/Register
  • My account
  • Privacy Policy
  • Terms and Conditions

© 2025 London Wallet - All Rights Reserved!

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?