AI Is Coming for Healthcare, and It Terrifies Me.

I spent a decade at a community pharmacy, deciphering scribbled prescriptions and building long-term relationships with my patients. I’ve shared the joys of women getting their prenatal vitamins for the first time and the sorrows of the relatives coming in to thank me for the support I provided during the difficult journeys after their loved ones were gone.
I am not practicing at the moment, my white coat being used as a Halloween costume by my son, but I remember these stories vividly. These shared moments still matter to me because they were unmistakably human.
And yet here we are, in 2025, and the headlines scream that AI is coming for every job that involves a brain. Doctors, lawyers, artists — apparently none of us are safe. So let me admit the uncomfortable truth:
Knowing everything I know today, I believe that AI is absolutely going to outperform humans in the near future.
Yes, you read that right. I believe that AI will outperform humans in many domains, including healthcare. AI won’t get sloppy because it’s multitasking under pressure to hit business metrics. And it won’t forget that Mr. Jones doesn’t like blue-colored pills even though he only mentioned it once in 2017.
And that terrifies me. But not for the reasons you think. I’m not afraid that AI will replace clinicians. I’m afraid it will lock in the worst parts of our healthcare system at scale, while quietly eroding the parts that make care human.
So here’s the AI memo I’ve written for you — part systems thinking, part computer science, part “healthcare professional who still remembers human stories.”
From Rule-Based Systems to Machine Learning
Debates around AI typically use the phrase Artificial Intelligence as a broad umbrella term, causing a lot of confusion. So let’s clear a few things up first.
For the purposes of this discussion, I am going to distinguish between two fundamental types of systems: Symbolic Systems and Machine Learning (ML).
Let’s start with Symbolic Systems. These types of systems are basically “if–then” machines. Humans write the rules — the system executes them.
Imagine you bring a prescription for Warfarin (a blood thinner) to CVS. The pharmacy tech gathers your information and enters the prescription into the system. The system then checks if this medication interacts with any other medications you are currently taking. Let’s also assume that your med list includes Aspirin. The computer system sees “Warfarin + Aspirin” and displays a Major Interaction Warning to alert the pharmacist because it was programmed to flag this particular combination as dangerous.
We’ve had these kinds systems for a while now. In fact, most major retailers implemented clinical decision support systems (CDSSs) in the 90s following the Omnibus Budget Reconciliation Act.
Yet, you haven’t seen the headlines screaming that automation is going to take over pharmacist jobs in all these years. Why? Because these systems are rigid and still rely on humans to resolve the problem.
A rule-based system sees “Drug A + Drug B = Bad.” But in contrast to a human, it cannot look at the patient’s lab results, realize that the patient has been stable on this particular mix for years, and decide to suppress the warning. Similarly, it cannot clinically assess high/low dose alerts.
If a doctor prescribes high doses of painkillers to a cancer patient who’s developed a tolerance, the system only sees “Prescribed Dose > Max” and triggers an alert. It cannot understand the context of palliative care and decide that the higher dose is warranted here (unless a human explicitly wrote a rule for it).
Because these systems follow strict logic without judgment, they flag all rule violations, even trivial ones, leading to a well-established problem in healthcare — alert fatigue. In other words, the more rules we add to “keep patients safe,” the noisier the system becomes for the humans who still have to make the real decisions. Instead of increasing human productivity, we are wearing down healthcare providers’ precious attention.
That’s why big retailers like CVS and Walmart and hospital systems are currently racing to layer another kind of AI (Machine Learning models) on top of the good old-fashioned symbolic systems.
In contrast to symbolic systems, ML models do not follow a fixed script written by a human. Instead, they are trained on millions of real-world examples (e.g., past prescriptions, lab values, previous pharmacist overrides, etc.) and gradually adjust internal parameters until they can predict what a human would likely do next.
For instance, a machine learning model would be able to predict what a pharmacist would do with any given alert. If the model is 99% confident the clinician would click “override warning” anyway (based on historical data), the alert would either be suppressed or shown to the pharmacist in soft gray instead of fire-engine red.
Sounds amazing! In theory. But here’s the catch. In reality, the first and foremost risk we encounter when trying to implement ML-based systems is “the black box problem.”
The black box problem in AI refers to the lack of transparency, where a system’s internal decision-making process is opaque. This means that while we can see the input data and the AI’s final output, we cannot understand how the model reached that conclusion.
To see why that opacity is dangerous, we need to look at how accountability in healthcare has shifted — from one person to a maze of systems and incentives.
Evolution of Accountability in Healthcare
When we had family doctors practicing as solo entities, the responsibility rested solely with the individual. Clinical judgment, diagnosis, prescribing, dispensing — all were traceable to one person. If something went wrong, we all looked straight at that person.
Within this framework, clinicians felt a deep personal duty to their patients. You might remember your grandparents reminiscing about the good old days when “their doctor or pharmacist knew them by name.” Patient trust was relational.
As practices consolidated and healthcare systems grew, accountability began to fan out. Clinical decisions are still made by individuals, but they are now influenced by standard operating procedures, integrated delivery networks, and backstage payment systems.
Consider the following scenario: a doctor prescribes a med that is not covered by the patient’s insurance plan; the pharmacy takes 2 days to figure out a substitute that is covered; the pharmacy submits the request for the covered med to the doctor’s office; the doctor’s office is closed for the holidays; the patient is harmed because of the treatment delay.
Whose fault is that?
To be fair, systematization brought real gains: standardized guidelines, broader access, safety checks, electronic records. But it also spread responsibility across so many hands that it became hard to see where things are breaking.
The larger and more complex the healthcare system, the more constraints individual clinicians are operating under (e.g., time limits, staffing issues, insurance denials, reimbursement rates, business metrics, etc.), and the more patients start falling through the cracks.
The Black Box Problem
What happens when we add AI to the mix?
We’re layering new technology on top of a system that is already fragmented, opaque, and full of misaligned incentives.
In the U.S., we spend more on healthcare than any other high-income country and still end up with worse outcomes on many basic measures. According to a 2023 report by the Commonwealth Fund, the U.S. has the highest infant and maternal mortality rates, as well as the highest death rates for avoidable or treatable conditions when compared to other high-income countries.
That gap isn’t due to a lack of data or automation. It’s in how the system is structured. I fear AI will amplify what is already broken. At scale. In ways that are impossible to see.
When the Algorithm Says “Discharge”
Various AI systems have already been implemented in healthcare. One common example is predictive modeling used to estimate hospital or long-term care facility length of stay. These tools can help health systems manage staffing, bed availability, and equipment needs more effectively. But similar systems have also been adopted by insurers — where the incentives are very different — and this has drawn significant scrutiny and multiple class-action lawsuits.
Traditional symbolic systems are clunky but explainable. We know exactly what logic the system follows in decision-making. ML-based AI works differently. A model looks at various data points — patient’s lab values, medical conditions, prior hospitalizations, past clinician notes, zip code, gender, race, age — and spits out a prediction: 16 days. And no one knows how the model arrived at that conclusion.
Get Iryna Nozdrin’s stories in your inbox
Join Medium for free to get updates from this writer.
We know what data went in and what number came out, but we cannot trace a human-understandable path between the two. Even the engineers who built the model cannot say, “Here’s exactly why it did that.”
What we do know is that the model is not analyzing data the way a clinician would. It is not reasoning about a specific patient in a specific clinical context. It is simply relying on correlations buried in its training data.
Maybe patients from certain neighborhoods historically had shorter stays because of social factors. Maybe patients with certain insurance plans stayed longer because coverage allowed it. The model doesn’t understand any of that, and it doesn’t care. It just learns that “when the pattern looks like this, the outcome is usually that.” And those learned patterns are not necessarily clinically relevant or fair.
What happens when a system like that is implemented?
You can find yourself discharged early or denied coverage because a model somewhere pushed out a number no one can explain.
This is exactly what happened in the case of 85-year-old Frances Walter, whose insurer, Security Health Plan, cut off payment for her care because an algorithm used by the company estimated she would be ready to leave the nursing home in 16.6 days — despite the fact that she could not dress herself or push a walker without help. While she fought the denial, she spent down her life savings to continue receiving care and ultimately had to enroll in Medicaid.
A similar story happened to Dolores Millam, whose insurer, UnitedHealthcare, terminated payments for her care prematurely because an algorithm predicted she would need only a 15-day stay.
Sadly, these cases are not anomalies.
UnitedHealth Group was sued over its use of nH Predict algorithm. The lawsuit alleges that “the insurance company’s reliance on artificial intelligence (AI) tools to deny certain medical claims under Medicare Advantage plans constituted breach of contract, breach of the implied covenant of good faith and fair dealing, unjust enrichment, and insurance bad faith.”
According to The Guardian, “nH Predict has a 90% error rate, meaning nine out of 10 denials are reversed upon appeal.” Humana and Cigna are also facing class-action lawsuits alleging that their algorithms systematically denied care.
Accuracy on Paper, Disaster in Real Life?
If you look at vendor decks, AI in healthcare often sounds like a miracle. 95–99% accuracy! Superior performance! But there’s a gap between test-set performance and real-world deployment.
What is rarely emphasized is that models are usually evaluated on carefully curated data that is clean, labeled, and neatly formatted. In contrast, real life is messy. Clinical notes can be incomplete or contain errors. Lab results can be missing or delayed. Workflows vary between hospitals, and even between units.
Clinicians cope with this effortlessly. We ignore noise, ask clarifying questions, and combine weak signals with years of pattern recognition. AI, for now, cannot do that. A model that looks unbeatable on paper can underperform severely when dropped into the messiness of clinical reality.
Take the infamous case of Epic’s Sepsis Prediction Model, one of the clearest examples of this gap in action.
Epic marketed the model aggressively, claiming superior performance and the ability to flag sepsis up to six hours before clinical onset. Hundreds of U.S. hospitals implemented it. Then, in 2021, researchers at the University of Michigan decided to measure what actually happened when the model ran live.
The results, published in JAMA Internal Medicine, were sobering. The observed performance was substantially worse than what Epic marketed. The model identified only 7% of patients with sepsis who were missed by a clinician. It also failed to identify 67% of patients with sepsis despite generating alerts on 18% of all hospitalized patients.
Epic released its model in 2017 but has never published the model’s architecture, training data, or performance metrics in the peer-reviewed literature. So these shortcomings had to be discovered the hard way.
This fiasco was highlighted by Princeton computer scientists Arvind Narayanan and Sayash Kapoor in their book AI Snake Oil. It turned out there was a major flaw in how Epic evaluated its prediction tool.
The model’s inputs included whether antibiotics had already been administered — something that happens after a patient has been diagnosed with sepsis. As a result, the model could “predict” sepsis in cases where sepsis was already known. Despite this, Epic developers counted these post-diagnosis cases as successful predictions during internal validation, inflating reported performance.
It feels reasonable to implement AI solutions that have demonstrated performance superior to clinicians. After all, we all want better outcomes for patients. But someone must ask for receipts: peer-reviewed validation on held-out data. Otherwise, we risk rushing to implement systems that not only underperform relative to humans but may be closer to a coin flip in accuracy — as turned out to be the case with Epic’s Sepsis Model.
Scaling Old Biases With New Efficiency
Healthcare data is not a neutral reflection of human biology. It is a historical record of how we treated people in the past — who got access, who got listened to, who got documented thoroughly, and who got dismissed.
Did you know that Black patients are less likely to be prescribed adequate pain medication than white patients with the same condition? Or that women presenting with symptoms of heart disease are significantly less likely than men to receive a referral or treatment, often being misdiagnosed with stress or anxiety instead?
When we train models on that data, we’re not just teaching them about medical care. We’re also baking in health inequities. If certain patient populations correlate with fewer medical interventions, the model might quietly learn to recommend less aggressive care for these patients. It wouldn’t “know” it’s being unfair. It would simply be compressing patterns from the past into predictions about the future. And those predictions would drive automated triage, outreach, and resource allocation.
And that is horrifying. Because individual biases — while still harmful — can be exposed and challenged. AI is a black box. What can’t be exposed cannot be challenged.
There are various articles addressing AI biases.
A recent study from Cornell University, for instance, found that various LLMs (e.g., GPT-4o, Claude 3.5, etc.) often suggest significantly lower salary expectations to women compared to their male counterparts. And a study published in JAMA Dermatology revealed that the majority of AI-based melanoma prediction models are trained on data sets that are heavily composed of light-skinned images from patients in the US, Europe, and Australia. As a result, they show suboptimal performance for images of darker skin tones in real-world settings.
But the clearest real-world example of how algorithmic decision-making can ruin tens of thousands of lives is the Dutch childcare benefits scandal.
The Dutch Tax and Customs Administration used a self-learning algorithm to detect supposed fraud in childcare benefit claims. Approximately 35,000 parents were accused. It was later revealed that having a second (non-Dutch) nationality was automatically treated as a risk indicator. The algorithm was trained on past fraud cases — which were themselves already overrepresented by minority families.
Once flagged, the system demanded immediate full repayment of sometimes tens of thousands of euros, plus fines. Interest and penalties snowballed. Many families were thrown into severe poverty, often unable to pay for necessities such as food and heating. In some cases, parents lost custody of their children. Some commited suicide.
The scandal led to the resignation of Prime Minister Mark Rutte’s third cabinet in 2021. In 2022, the Dutch state acknowledged that its actions were an example of institutional racism.
And here are some worrisome numbers from the U.S.
According to a study published in Health Affairs, 65% of U.S. hospitals use AI-assisted predictive models. But only 44% of hospitals that use predictive models evaluated them for bias using data from their own health system.
All of this is why the black box problem terrifies me. Not because AI is inherently evil, and not because I believe humans are infallible. But because we are plugging opaque decision-making engines into a healthcare system that already struggles with transparency and equity. And we are asking clinicians and patients to simply “trust the algorithm” without giving them the tools to question it.
The Cost of Chasing Shiny Tech
Deploying AI at scale is not cheap. Health systems spend a lot of money on data infrastructure, integration with electronic health records, and workflow redesign. And every dollar has an opportunity cost. Money poured into AI is money not spent on hiring more nurses, doctors, and pharmacists, or on fixing basic scheduling and follow-up logistics.
If we’re not careful, “AI transformation” becomes a way of investing in technical complexity instead of the simple, boring interventions that we already know save lives.
As more decisions are automated, the clinical encounters become even thinner. Instead of a conversation about quality-adjusted life years and attainable goals, you get a risk score, a templated care plan, and a checklist of “adherence” items. Ironically, we are racing to deploy AI to “personalize” care, yet fewer clinicians feel like they are providing care, and more patients walk away feeling like data points.
To be clear, I am not dismissing AI in healthcare entirely. I know that AI is already achieving miracles in fields like radiology and drug discovery. But here’s an important nuance I want you to see clearly. When an algorithm flags a potential tumor on a CT scan or predicts a protein structure, it is performing a specific, verifiable task. A radiologist can look at the highlighted pixels and say, “Yes, that is a mass.” There is still a human in the loop.
And this is fundamentally different from a black box that tells an insurer your grandmother should be discharged in 16 days based on hidden correlations. The former amplifies clinical perception. The latter automates systemic denial.
AI will shape the future of healthcare. The question is not whether we use it, but how. If we treat AI as a shortcut to reduce staffing costs or outsource judgment, we will harden the very failures patients already struggle with. But if we use it to support clinicians and strengthen transparency, we might actually build a more humane system. We need to stop asking if AI is smart enough to replace us, and start asking if it is transparent enough to be trusted.


