What's new

Say hello to AI, goodbye to your doctor?

BouncerGuy

Super Moderator
Staff member
Joined
Aug 29, 2023
Runs
33,891
The Path to Medical Superintelligence

The Microsoft AI team shares research that demonstrates how AI can sequentially investigate and solve medicine’s most complex diagnostic challenges—cases that expert physicians struggle to answer.

Benchmarked against real-world case records published each week in the New England Journal of Medicine, we show that the Microsoft AI Diagnostic Orchestrator (MAI-DxO) correctly diagnoses up to 85% of NEJM case proceedings, a rate more than four times higher than a group of experienced physicians. MAI-DxO also gets to the correct diagnosis more cost-effectively than physicians.

As demand for healthcare continues to grow, costs are rising at an unsustainable pace, and billions of people face multiple barriers to better health – including inaccurate and delayed diagnoses. Increasingly, people are turning to digital tools for medical advice and support. Across Microsoft’s AI consumer products like Bing and Copilot, we see over 50 million health-related sessions every day. From a first-time knee-pain query to a late-night search for an urgent-care clinic, search engines and AI companions are quickly becoming the new front line in healthcare.

We want to do more to help -and believe generative AI can be transformational. That’s why, at the end of 2024, we launched a dedicated consumer health effort at Microsoft AI, led by clinicians, designers, engineers, and AI scientists. This effort complements Microsoft’s broader health initiatives and builds on our longstanding commitment to partnership and innovation. Existing solutions include RAD-DINO which helps accelerate and improve radiology workflows and Microsoft Dragon Copilot, our pioneering voice-first AI assistant for clinicians.

For AI to make a difference, clinicians and patients alike must be able to trust its performance. That’s where our new benchmarks and AI orchestrator come in.

Medical Case Challenges and Benchmarks

To practice medicine in the United States, physicians need to pass the United States Medical Licensing Examination (USMLE), a rigorous and standardized assessment of clinical knowledge and decision making. USMLE questions were among the earliest benchmarks used to evaluate AI systems in medicine, offering a structured way to compare model performance – both against each other and against human clinicians.

In just three years, generative AI has advanced to the point of scoring near-perfect scores on the USMLE and similar exams. But these tests primarily rely on multiple-choice questions, which favor memorization over deep understanding. By reducing medicine to one-shot answers on multiple-choice questions, such benchmarks overstate the apparent competence of AI systems and obscure their limitations.

At Microsoft AI, we’re working to advance and evaluate clinical reasoning capabilities. To move beyond the limitations of multiple-choice questions, we’ve focused on sequential diagnosis, a cornerstone of real-world medical decision making.  In this process, a clinician begins with an initial patient presentation and then iteratively selects questions and diagnostic tests to arrive at a final diagnosis. For example, a patient presenting with cough and fever may lead the clinician to order and review blood tests and a chest X-ray before they feel confident about diagnosing pneumonia.

Each week, the New England Journal of Medicine (NEJM) – one of the world’s leading medical journals – publishes a Case Record of the Massachusetts General Hospital, presenting a patient’s care journey in a detailed, narrative format. These cases are among the most diagnostically complex and intellectually demanding in clinical medicine, often requiring multiple specialists and diagnostic tests to reach a definitive diagnosis.

How does AI perform? To answer this, we created interactive case challenges drawn from the NEJM case series – what we call the Sequential Diagnosis Benchmark (SD Bench). This benchmark transforms 304 recent NEJM cases into stepwise diagnostic encounters where models – or human physicians – can iteratively ask questions and order tests. As new information becomes available, the model or clinician updates their reasoning, gradually narrowing toward a final diagnosis. This diagnosis can then be compared to the gold-standard outcome published in the NEJM.

Each requested investigation also incurs a (virtual) cost, reflecting real-world healthcare expenditures. This allows us to evaluate performance across two key dimensions: diagnostic accuracy and resource expenditure.  You can watch how an AI system progresses through one of these challenges in this short video.

Getting to a Correct Diagnosis

We evaluated a comprehensive suite of frontier generative AI models against the 304 NEJM cases. The foundation models tested included GPT, Llama, Claude, Gemini, Grok, and DeepSeek.

Beyond baseline benchmarking, we also developed the Microsoft AI Diagnostic Orchestrator (MAI-DxO), a system designed to emulate a virtual panel of physicians with diverse diagnostic approaches collaborating to solve diagnostic cases.  We believe that orchestrating multiple language models will be critical to managing complex clinical workflows. Orchestrators can integrate diverse data sources more effectively than individual models, while also enhancing safety, transparency, and adaptability in response to evolving medical needs. This model-agnostic approach promotes auditability and resilience, key attributes in high-stakes, fast-evolving clinical environments.

MAI-DxO boosted the diagnostic performance of every model we tested.  The best performing setup was MAI-DxO paired with OpenAI’s o3, which correctly solved 85.5% of the NEJM benchmark cases. For comparison, we also evaluated 21 practicing physicians from the US and UK, each with 5-20 years of clinical experience. On the same tasks, these experts achieved a mean accuracy of 20% across completed cases.

MAI-DxO is configurable, enabling it to operate within defined cost constraints. This allows for explicit exploration of the cost-value trade-offs inherent in diagnostic decision making. Without such constraints, an AI system might otherwise default to ordering every possible test – regardless of cost, patient discomfort, or delays in care. Importantly, we found that MAI-DxO delivered both higher diagnostic accuracy and lower overall testing costs than physicians or any individual foundation model tested.

What’s Next?

Physicians are typically characterized by the breadth or depth of their expertise. Generalists, like family physicians, manage a wide array of conditions across ages and organ systems. Specialists, such as rheumatologists, focus deeply on a single system, disease area or even condition. No single physician, however, can span the full complexity of the NEJM case series. AI, on the other hand, doesn’t face this trade-off. It can blend both breadth and depth of expertise, demonstrating clinical reasoning capabilities that, across many aspects of clinical reasoning, exceed those of any individual physician.

This kind of reasoning has the potential to reshape healthcare. AI could empower patients to self-manage routine aspects of care and equip clinicians with advanced decision support for complex cases. Our findings also suggest that AI reduce unnecessary healthcare costs. U.S. health spending is nearing 20% of US GDP, with up to 25% of that estimated to be wasted – per having little influence on patient outcomes.

Of course, our research has important limitations. Although MAI-DxO excels at tackling the most complex diagnostic challenges, further testing is needed to assess its performance on more common, everyday presentations. Clinicians in our study worked without access to colleagues, textbooks, or even generative AI, which may feature in their normal clinical practice.  This was done to enable a fair comparison to raw human performance.

A novel aspect of this work is its attention to cost. While real-world health costs vary across geographies and systems, and include many downstream factors that we don’t account for, we apply a consistent methodology across all agents and physicians evaluated to help quantify high level trade-offs between diagnostic accuracy and resource use.

For us, this is just the first step. We’re energized by the opportunities ahead. Important challenges remain before generative AI can be safely and responsibly deployed across healthcare. We need evidence drawn from real clinical environments, alongside appropriate governance and regulatory frameworks to ensure reliability, safety, and efficacy. That’s why we’re partnering with leading health organizations to rigorously test and validate these approaches—an essential step before any broader roll out.

Together with our partners, we strongly believe that the future of healthcare will be shaped by augmenting human expertise and empathy with the power of machine intelligence. We are excited to take the next steps in making that vision a reality.

 
It is good because it can save costs and time.

Op is a bit misleading as Atm all AI is doing is just providing help with certain diagnostics? You can't replace surgeons with ai.

Lastly imo I feel like AI is a bit of a fad? I remember 10 years ago when silicon valley was hyping VR and AR as the next big thing but these fell off.

No one wants to play video games with a machine on their head, or go to tue cinema wearing 3d glasses 24/7 and have their chairs move to stimulate 4D 🤣

I remember when everyone was like, It is the end of the 2d era, mobile phones are toast and we will from now on walk around with a Google glass stuck to our heads.

Even now, Ai has failed to

A) Replace PHD researchers
B) Replace actual physicians who are able to run proper diagnostics
C) Replace any doctors
D) Replace University lecturers and School Teachers
E) Replace Video game makers, all chatgpt ai video games are hilariously bad
F) Replace Movie directors and actors, heck even replace video game voice actors
G) Replace Graohic designers and artists
H) Replace Manufacturers or Personal drivers? Heck Tesla is so bad at self automated ai cars that its terrible. It nearly killed someone by crashing him into a running train?
I) Replace Waiters and Diners at restraunts? Japan tried this and failed miserably 🤣.

Just like how VR and Ar failed to kill 2D, kill mobile phones, replace Classic Couch gaming etc etc.

Ai wont replace physicians. Humans have made advancements to human society but it has always come in the guise of

A) Simplicity: Mobile phones worked cause it made our lives easier

B) Human hands. Every machine and technological invention has always been backed by Human hands.

The concept of AI isnt new, the concept has been around since the 1600's.

AI doesnt exist thats the simple truth. Computer models going off ove information given by Humans isn't actually a self aware 2nd coming of the terminatorm
 
Chatgpt is nothing more then a personalised version of Google.

It aint replacing physicians, biggest lie of all time
 
Chatgpt is nothing more then a personalised version of Google.

It aint replacing physicians, biggest lie of all time
This is a very simplistic view, AI has made HUMUNGOUS strides already and will take over jobs. Comparing 1600s to 2025 is like comparing Travis Head away record with Inzamam's.
 
This is a very simplistic view, AI has made HUMUNGOUS strides already and will take over jobs. Comparing 1600s to 2025 is like comparing Travis Head away record with Inzamam's.
Ai will take over jobs like copywriting and other stuff. Haven't denied that.

In the same way Google took over some jobs as well. No need to travel to your local courier and ask for mails to be sent to loved ones when you have Google meets.

That doesnt meet it will replace physicians lol.
 
Ai will take over jobs like copywriting and other stuff. Haven't denied that.

In the same way Google took over some jobs as well. No need to travel to your local courier and ask for mails to be sent to loved ones when you have Google meets.

That doesnt meet it will replace physicians lol.
I don't agree bro. Right now AI is like Travis Head away from home in the Physician field. It can work great in favourable conditions but not much otherwise. Very soon it will develop into Inzamam/YK - reliable everywhere. You need to check out the recent advancements particularly in your home country of Australia.
 
I don't agree bro. Right now AI is like Travis Head away from home in the Physician field. It can work great in favourable conditions but not much otherwise. Very soon it will develop into Inzamam/YK - reliable everywhere. You need to check out the recent advancements particularly in your home country of Australia.
I'm already aware of canva AI advancements and how far they came in image generation after acquiring Leonardo Ai thank you very much.

AI will defo make advancements but people are imaging a reality thats still a good 200 years away
 
I'm already aware of canva AI advancements and how far they came in image generation after acquiring Leonardo Ai thank you very much.

AI will defo make advancements but people are imaging a reality thats still a good 200 years away
You are contradicting your initial post bro. Feel free to let me know if you need me to share some links about advancement of AI in medicine especially specific to the English speaking world that you may not be aware of. Always happy to help.
 
Chatgpt is nothing more then a personalised version of Google.

It aint replacing physicians, biggest lie of all time
It will happen eventually.

Robotic surgery is going to be the main way of performing surgery by 2030. We already have it. But not as prevalent.
 
It will happen eventually.

Robotic surgery is going to be the main way of performing surgery by 2030. We already have it. But not as prevalent.
Lets wait and see. But I remember the VR and AR trend and how it failed miserably
 
Let’s wait and see. But I remember the VR and AR trend and how it failed miserably
Generative AI is getting better and better.

For most people AI means LLM’s which is only a tiny part of what AI is. AI is not just about crunching data and spitting out the results. It can analyze and reason too. Now it can also self improve.

We are at the doorstep of AGI.People like Sam Altman and Zuckerberg are not talking about AGI. Their focus is on Super Intelligence. It will happen by 2030 I believe.
 
It will happen eventually.

Robotic surgery is going to be the main way of performing surgery by 2030. We already have it. But not as prevalent.
I believe he misunderstood AI and thought ChatGPT would be performing surgeries. My grandmother is the same - still learning basics of AI.
 
I believe he misunderstood AI and thought ChatGPT would be performing surgeries. My grandmother is the same - still learning basics of AI.
Most people think AI is just LLM that answers some simple questions in text form. Basically something like ChatGPT or Grok…

A lot is happening and it is hard to keep up. Google Gemini, Claude… everyone is trying to outgun others. Whoever achieves Super Intelligence first will rule the world forever. Basically either US or China. The world will drastically change once ASI is achieved. I am no expert in AI, but I do follow what is happening and it is exciting and sometimes scary. machines will 100% replace human labor. Most of these overpopulated countries are not ready for it.
 
Generative AI is getting better and better.

For most people AI means LLM’s which is only a tiny part of what AI is. AI is not just about crunching data and spitting out the results. It can analyze and reason too. Now it can also self improve.

We are at the doorstep of AGI.People like Sam Altman and Zuckerberg are not talking about AGI. Their focus is on Super Intelligence. It will happen by 2030 I believe.
Zuckerberg has never done anything on his own. He is the biggest con artist of all time.

He is intelligent yes, but everything about him is exaggerated.

A) He is not a self taught coder, he had a personal tutor who frequently visited.

B) He didn't create that music synpase player that was sold to Microsoft, his friend adam did and he simply was along for the ride.

C) He got 1600 in sat due to his dad assigning a tutor and giving him a 3 year head start to get it perfect and he specifically went to a school designed to get you into IVY.

D) After being given the idea of Facebook, He copied MySpace's og Proptype code but used a bit of CSS to change the layout to blue and white.

E) He then had Edvardo do all of his marketing across campus and his roommates to find him employees.

F) After that he had Sean get all his investments for him and eventually he got rid of both when he found sandy who was the perfect coo and could make amazing decisons.

G) All of his ideas like poke, Meta Ai failed and he needed to his brilliant acquisition team that was set up by Peter theil not him, to buy insta, WhatsApp and now scale Ai(49%)

The only thing I will give him credit for is that he was never money oriented and hence refused to sell his company by any means despite being pressured to. Most people would drool over a billion dollars when having a company that wasnt making any money at the time and was relying in investment money.

But everything from Meta ads, Instagram copying snapchats story, All the Acquisitions, none of that was his doing.

All he did was copy a prototype and edit and lucked out on brilliant networking + knew when to get rid of said network once they became a burden to the company.

What he says means nothing cause he has freqently failed to invent or usher any new era.
 
Zuckerberg has never done anything on his own. He is the biggest con artist of all time.

He is intelligent yes, but everything about him is exaggerated.

A) He is not a self taught coder, he had a personal tutor who frequently visited.

B) He didn't create that music synpase player that was sold to Microsoft, his friend adam did and he simply was along for the ride.

C) He got 1600 in sat due to his dad assigning a tutor and giving him a 3 year head start to get it perfect and he specifically went to a school designed to get you into IVY.

D) After being given the idea of Facebook, He copied MySpace's og Proptype code but used a bit of CSS to change the layout to blue and white.

E) He then had Edvardo do all of his marketing across campus and his roommates to find him employees.

F) After that he had Sean get all his investments for him and eventually he got rid of both when he found sandy who was the perfect coo and could make amazing decisons.

G) All of his ideas like poke, Meta Ai failed and he needed to his brilliant acquisition team that was set up by Peter theil not him, to buy insta, WhatsApp and now scale Ai(49%)

The only thing I will give him credit for is that he was never money oriented and hence refused to sell his company by any means despite being pressured to. Most people would drool over a billion dollars when having a company that wasnt making any money at the time and was relying in investment money.

But everything from Meta ads, Instagram copying snapchats story, All the Acquisitions, none of that was his doing.

All he did was copy a prototype and edit and lucked out on brilliant networking + knew when to get rid of said network once they became a burden to the company.

What he says means nothing cause he has freqently failed to invent or usher any new era.
He is spending millions to assemble a super team to achieve ASI. It’s the AI arms race between Meta, Google. OpenAI and Tesla.
Of course there is China too. They always pull out surprises.
 
He is spending millions to assemble a super team to achieve ASI. It’s the AI arms race between Meta, Google. OpenAI and Tesla.
Of course there is China too. They always pull out surprises.
When has Meta ever actually achieved anything compared to the others.

Lets look at all their og products that weren't stolen.

1) Messaging: Massive failure, they had to acquire beacon and change the name to Facebook messenger.

2) Facebook Phone with HTC? Massive failure. They launched this phone back when Apple was kn infancy and Samsung wasnt established, could have easily dominated the market alongside Apple, but the phone was bad ans the design team incompetent.

3) Poke? Massive failure to the point that Zuckerberg had to beg Evan to sell snapchat for 3B. After refusing one of the teammates recommended to just steal snap's story feature and input it into Instagram.

4) Their original photo app? Massive failure, they decided to buy insta instead?

5) the og Meta pay? Massive failure, they decided to acquire a startup that was competing with payment providers.

6) Even when it comes to integrating search engines into their team they failed miserably and had to buy out Blake ross(the og creator of firefox) to build it for them.

7) Even now Meta Ai? Massive failure, they had to acquire Scale Ai to do it for them.
 
When has Meta ever actually achieved anything compared to the others.

Lets look at all their og products that weren't stolen.

1) Messaging: Massive failure, they had to acquire beacon and change the name to Facebook messenger.

2) Facebook Phone with HTC? Massive failure. They launched this phone back when Apple was kn infancy and Samsung wasnt established, could have easily dominated the market alongside Apple, but the phone was bad ans the design team incompetent.

3) Poke? Massive failure to the point that Zuckerberg had to beg Evan to sell snapchat for 3B. After refusing one of the teammates recommended to just steal snap's story feature and input it into Instagram.

4) Their original photo app? Massive failure, they decided to buy insta instead?

5) the og Meta pay? Massive failure, they decided to acquire a startup that was competing with payment providers.

6) Even when it comes to integrating search engines into their team they failed miserably and had to buy out Blake ross(the og creator of firefox) to build it for them.

7) Even now Meta Ai? Massive failure, they had to acquire Scale Ai to do it for them.
By Apple in infancy I mean, Apple had just launched their first smartphone. Had Zuckerberg been smart he could have turned his empire into Samsung as well.

It would have been beneficial for him since he would not onpy control ads but he'd also control the devices that they were sold on.

Would have given him a massive leverage over Apple and he could have easily blackmailed them for money into order to run FB on IOS.
 
Back
Top