Using AI in Healthcare? Be Cautious!

Prof Gillie Gabay
Mar 24
6 min read

The medical and nursing communities expect to find algorithmic fairness and overcome global challenges in healthcare. But these expectations remain unmet.

Research from 2024–2026 highlights several critical biases of AI in healthcare. First, AI models reflect historical trends where women's symptoms are dismissed or under-studied. Cardiology algorithms trained predominantly on male data can consequently fail to recognize the different patterns of heart attack expression in women, leading to higher rates of misdiagnosis and mortality. A 2026 study noted that many clinical guidelines that are used to train AI do not account for sex differences. For example, some models used for liver disease were found to have a 44% false negative rate for women compared to 23% for men. Artificial 3-D hearts were formed for male bodies, rather than for females. Also, some Large Language Models were found to describe similar symptoms using terms such as complex or disabled, while downplaying the severity of the same symptoms in women.

AI can inadvertently prioritize care for certain groups over others. Because less money is historically spent on minority patients due to systemic barriers, the AI falsely concluded they were healthier than equally sick white patients, resulting in fewer care resources being allocated to them. Moreover, skin cancer detection AI trained primarily on images of light-skinned patients was shown to be significantly less accurate for patients with darker skin tones, potentially delaying life-saving treatment. AI can also discriminate based on socioeconomic characteristics that correlate with geography or identity. Models trained on data from high-resource urban hospitals often fail when applied to rural or low-income populations whose living conditions and access to care are different. There is a notable lack of representative data for trans and gender-diverse individuals, which can lead to AI-driven insensitivity to their specific needs of care.

Furthermore, a major study published in late 2024 investigated how AI assesses cardiovascular risk. Researchers found a disturbing trend: when a patient’s profile included a psychiatric comorbidity (e.g., anxiety, depression), the AI was significantly more likely to downplay physical cardiac symptoms in women compared to men. The AI effectively mimicked historical human bias where women's physical pain is often attributed to emotional or mental distress, potentially leading to delayed life-saving cardiac interventions. Another research from Harvard Medical School in 2025 revealed that AI models used in pathology can infer the self-reported race and gender of patients directly from tissue slides, even when that information is hidden from the model. Because the AI can view these demographics, it may apply different diagnostic thresholds. For example, some models were inaccurate at differentiating lung cancer subtypes in African American patients because those specific genetic mutations were underrepresented in the training data.

Conversely, new multimodal AI models are being designed specifically to mitigate these gaps by using diverse datasets that better predict outcomes for African American men, who historically face worse prognoses. One of the biggest hurdles in skin cancer detection has been the lack of images showing melanoma on darker skin.

Thus, AI carries gender and other systemic biases in patient care, which often arise because AI models learn from historical healthcare data that already contains human prejudices, structural inequalities, and gaps in medical research. Table 1 presents biases by process in healthcare.

Process	Bias in the Process
Data Collection	Historical underrepresentation of minorities in clinical trials.
Model Design	Choosing proxies such as cost reflects social inequality rather than biological reality.
Deployment	Automation bias, where clinicians trust the AI output over their own observation of the patient.

Table 1. The Process and Bias Creation

How to Protect Systems and Patients from AI Biases

Developers are called to: actively curate data that includes underrepresented groups; allow clinicians to see why the AI made a certain recommendation; and detect potential bias. Systems should mandate that AI models be tested for performance disparities, test if a model is as accurate for women as it is for men, before they approve the models for clinical use. Researchers need to move away from older, limited scales and adopt scales that use ten shades to better represent human diversity. New models that achieve over 99% accuracy combine multiple types of data to ensure the AI doesn't fail due to older algorithms. The proactive approach in healthcare suggests that AI shouldn't just be checked for bias at the end of the R&D process but built with diverse inputs from the start. As AI moves from being experimental to operational in 2026, healthcare systems are adopting rigorous frameworks to ensure these tools are safe and equitable. Several auditing frameworks represent a gold standard for clinical governance of AI.

The Coalition for Health AI (CHAI) and The Joint Commission, which is the primary US hospital accrediting body, released guidelines for the responsible use of AI in healthcare. The guidelines require multidisciplinary oversight so that before an AI tool is deployed, it must be examined by a committee that includes IT staff, clinicians, legal experts, and patient advocates from underrepresented populations. Since an AI tool that works in a large urban teaching hospital may be biased in rural community clinics, the framework requires hospitals to validate AI performance using their own local patient data rather than relying solely on the vendor's claims. Also, since bias is not a one-time fix, systems must use real-time dashboards to monitor data drifts that may result in an algorithm’s inaccuracy.

As for the European Union, the EU AI Act sets strict legal mandates for high-risk systems that include clinical diagnostic tools. The data governance requires developers to prove that their training data is representative and to conduct specific bias testing before the product hits the market. The Act also mandates that all clinical AI must have a human in the loop with a clear override mechanism. A doctor must always be able to disregard the AI's output if it seems biased or incorrect. Systems must maintain immutable audit trails for 5–10 years, allowing regulators to rewind and understand why an AI made a specific, biased recommendation in a past case.

Many institutions are now employing specialized groups of hackers and ethicists who deliberately try to trick the AI into showing bias during the testing phase. Last, a voluntary, highly influential framework is used by many health-tech startups. It focuses on four functions: Establishing a Chief AI Officer role to own accountability; Identifying which patient groups might be harmed by the AI; Using specific metrics to quantify bias and; Deciding if a biased risk is acceptable or if the tool should be decommissioned. Table 2 presents a practical checklist for auditing in health systems.

Practical Audit Checklist for Health Systems (2026)

Process	Action Item
Data Lineage	Trace where the training data came from.
Subgroup Analysis	Test accuracy for women, the elderly, and ethnic minorities separately to ensure the 95% total accuracy isn't hiding 60% accuracy for one group.
Explainability	Use transparent tools that enable you to understand why a decision was made and spot if the AI is using proxies for race or gender.
Adverse Event Reporting	Use blinded safety reporting systems to allow clinicians to report biased AI behavior without fear of litigation.

With the EU AI Act, hidden costs now include massive penalties for non-compliance and the reputational damage of biased algorithms. There is a shift toward Agentic AI, autonomous agents that handle multi-step tasks like prior authorizations and clinical summarization. For executives, this means that AI is a collaborator to the workforce, NOT a tool and NOT a replacement. Systems that fail to invest in systems that mitigate biases create a two-tiered health system, where marginalized populations receive lower-quality care while others receive vetted, high-touch care. To lead a responsible AI healthcare organization, executives should focus on three pillars: governance, diversity, and transparency.

Conclusions

For healthcare executives, the rise of AI-driven care is a high-stakes transition. Moving into 2026, the focus should shift from "can we build it?" to "can we trust it and scale it responsibly?" The implications for leadership involve navigating new legal liabilities, clinical risks, and the fundamental restructuring of care workflows. Traditional return on investment models are being replaced by a more holistic framework that values clinical and ethical returns alongside financial gains.

Additional Reading

Chinta SV, Wang Z, Palikhe A, Zhang X, Kashif A, Smith MA, Liu J, Zhang W. AI-driven healthcare: A review on ensuring fairness and mitigating bias. PLOS Digital Health. 2025 May 20;4(5):e0000864.
Hussain SA, Bresnahan M, Zhuang J. The bias algorithm: how AI in healthcare exacerbates ethnic and racial disparities–a scoping review. Ethnicity & Health. 2025 Feb 17;30(2):197-214.
Sasseville M, Ouellet S, Rhéaume C, Sahlia M, Couture V, Després P, Paquette JS, Darmon D, Bergeron F, Gagnon MP. Bias mitigation in primary health care artificial intelligence models: scoping review. Journal of medical Internet research. 2025 Jan 7;27:e60269.