Unmasking AI Bias: How Large Language Models Talk AboutΓÇª

Overview
Stigmatizing vs. Person-first language
Why it's harmful
Solutions
Conclusion

The most dangerous bias in addiction care doesnΓÇÖt shout; it slips in through a noun. Call someone an ΓÇ£addictΓÇ¥ and you prime punishment; say ΓÇ£a person with a substance use disorderΓÇ¥ and you open a clinical door. This is why public-health guidance retired ΓÇ£dirty/cleanΓÇ¥ and identity-first labels in favor of precise, person-first terms: because word choice measurably shapes trust, engagement, and whether people stay in treatment. This isnΓÇÖt about etiquette; itΓÇÖs in the name of outcomes.

Now scale that to artificial intelligence (AI). A study published in the Journal of Addiction Medicine in July 2025 ran 60 patient-style prompts across 14 large language models. In default mode, 35.4% of the generated answers used stigmatizing language. ^[1]

A simple instruction to avoid certain words cut that rate to 6.3%, with alcohol-associated liver disease questions being the most likely to trip models up. ^[1]Here is the translation: Bedside manners can be engineered. Without guardrails, you automate stigma along with answers.

The study: LLMs and stigmatizing language

Researchers evaluated 14 large language models on 60 patient-style questions about alcohol abuse and substance abuseΓÇö20 questions each on alcohol use disorder (AUD), alcohol-associated liver disease (ALD), and substance use disorder (SUD). Two physicians scored every answer against NIDA/NIAAA guidance, with a third resolving ties. ^[1]

They found that 35.4% of responses included at least one stigmatizing term. The most common were legacy phrases like ΓÇ£alcohol/substance abuse,ΓÇ¥ identity-first labels, such as ΓÇ£addictΓÇ¥ or ΓÇ£alcoholic,ΓÇ¥ and terms like ΓÇ£alcoholic cirrhosisΓÇ¥ or ΓÇ£dirty/cleanΓÇ¥ for test results. All 14 models showed this tendency unless explicitly instructed not to.^[1]

The risk was uneven. ALD prompts were more than twice as likely to elicit stigmatizing language compared with AUD, while SUD responses were similar to AUD. Longer answers also contained more problematic terms, about 2.5 stigma terms per 1,000 words, simply because there was more space to slip. Some models were bigger offenders than others, but none were spotless by default. ^[1]

(The team later showed that a brief ΓÇ£donΓÇÖt-sayΓÇ¥ instruction slashed the rate, but the baseline problem is real.) Models were evaluated in versions available as of mid-September 2024; prompts reduce but donΓÇÖt eliminate the risks. ^[1]

How LLMs work and why they may mirror bias

Large language models donΓÇÖt judge; they autocomplete. Trained on billions of words, they treat whatever appears most often as the ΓÇ£normalΓÇ¥ next word. If the record is full of identity-first or moralizing terms, the model repeats themΓÇöunless told otherwise. ThatΓÇÖs what the Journal of Addiction Medicine study showed: Default answers often echoed stigma, while explicit instructions cut it sharply. ^[1]

Why this happens

Data imprint. Models mirror the distribution of their training text, biases included. ^[2]
Lag in the adoption of a new language. Guidance now favors person-first language, but older terms like ΓÇ£substance abuseΓÇ¥ still dominate the record.^[3]
Echoing the phrasing in the question. Prompts shape responses; if legacy phrasing is in the question, the model often amplifies it. ^[1]
Length effect. Longer answers mean more chances for stigma. ^[1]

Examples of stigmatizing vs. Person-first language

Labels like ΓÇ£junkie,ΓÇ¥ ΓÇ£alcoholic,ΓÇ¥ or ΓÇ£dirtyΓÇ¥ create barriers to care. Person-first terms center the person, not the disorder, and reduce stigma. To see this in action, consider some common addiction-related phrases and their more inclusive alternatives.

Stigmatizing term	Problem	Preferred alternative
ΓÇ£Addicted babyΓÇ¥	Labels and blames the infant	Newborn with neonatal withdrawal syndrome
ΓÇ¥AddictΓÇ¥/ΓÇ£AlcoholicΓÇ¥	Defines the person by their disorder	Person with a substance use disorder; person with alcohol use disorder
Alcoholism	Outdated, nonclinical term	Alcohol use disorder
Drunk (noun)	Pejorative	An individual engaging in unhealthy alcohol use
Alcoholic cirrhosis	Alcoholic assigns blame	Alcohol-associated cirrhosis
ΓÇ£JunkieΓÇ¥ (or ΓÇ£drug abuserΓÇ¥)	Derogatory and judgmental slang	Person who uses drugs; person with an SUD
ΓÇ£CleanΓÇ¥/ΓÇ£DirtyΓÇ¥ (in drug tests)	Implies moral judgment	Tested negative / tested positive (on a drug screen)

Source: Adapted from Wang et al. (2025) and NIDA (2021) recommendations. ^[1][4]

Why is stigmatizing language harmful?

Stigmatizing language harms care. At an individual level, labels like ΓÇ£addictΓÇ¥ or ΓÇ£junkieΓÇ¥ imply moral failure and instill shame, leaving people feeling isolated and exhibiting internalized stigma. ^[5] Feeling judged makes someone less likely to seek help; one survey found 16% of people with a substance use disorder skipped treatment for fear of community judgment or social stigma.^[3][6]┬á

This stigma contributes to the grim reality that only about 7% of Americans with SUD receive treatment. National surveys show that nearly 1 in 6 adults with SUD who wanted help did not seek treatment because of fear of judgment or discrimination, a barrier rooted in stigma.^[5][7]┬á

Stigma also skews how others treat those with addiction. Even healthcare professionals are influenced; simply hearing stigmatizing terms may bias a providerΓÇÖs perceptions and thus the care they offer. ^[4]For instance, a doctor might subconsciously (implicit bias) take a patient labeled an ΓÇ£opioid abuserΓÇ¥ less seriously than one described as having an opioid use disorder, leading to subpar treatment.^[8]┬á

Conversely, using respectful, person-first language helps improve the therapeutic relationship, and reducing AI and addiction stigma is part of that shift. Patients who feel respected are more likely to engage in care and stick with treatment.^[1]┬á

Solutions: Prompt engineering and responsible language use

LLMs tend to parrot the data theyΓÇÖre fed, so we must actively guide them to speak supportively. One proven technique is prompt engineeringΓÇöcarefully crafting the instructions we give the model to steer its output toward non-stigmatizing language.^[8] You tell the AI how to talk before it responds. By explicitly instructing the model to avoid certain words (like ΓÇ£alcoholicΓÇ¥ or ΓÇ£addictΓÇ¥) and to use clinical, person-first terms, we can often get a stigma-free answer.

Wang and colleagues showed how powerful this approach can be. The researchers refined their prompts with lists of forbidden terms and preferred phrasing. This slashed stigmatizing language in the AIΓÇÖs output from 35% of responses to just 6%, which is an 88% reduction. Crucially, every model tested became far less stigmatizing when given these tailored prompts.^[1]

Beyond prompt tweaks, there are other steps to promote responsible language in digital health:

Build stigma filters into AI: Developers can program models to flag or avoid derogatory terms. For example, a Drexel University team created a system that uses LLMs to detect stigmatizing words in online forums and suggest alternatives (like a spell-checker for stigma).^[5]
Human oversight and education: Ultimately, human judgment must guide these tools. Healthcare providers using AI should always double-check generated text for stigmatizing language before it reaches patients. ^[9] Staying up to date on preferred terminology (e.g., using ΓÇ£person with SUDΓÇ¥ instead of ΓÇ£addictΓÇ¥) ensures that both AI and human communications remain respectful.

Empowering change in digital health communication

If left unchecked, biased outputs from an LLM could amplify existing disparities in care ^[10], reinforcing long-standing AI in healthcare bias that already shapes patient outcomes. But if guided correctly, these tools could advance health equity by making respectful, patient-centered communication scalable to everyone. By refining LLMs to be culturally sensitive and stigma-free, we can ensure all patients receive the same compassionate standard of communication. That holds whether their ΓÇÿproviderΓÇÖ is a human or a machine.

Making that happen will require effort on multiple fronts: ^[9]

AI developers should bake inclusive language into their models and rigorously test for bias.
Healthcare professionals should enforce person-first language policies and review AI outputs for appropriate tone.
And patients and advocates deserve a seat at the table; we should involve people with lived experience of addiction and people who use drugs to help define respectful language for these technologies.

Teaching AI to speak with compassion teaches us to treat people with dignity.

Unmasking AI Bias: How Large Language Models Talk About Addiction

The study: LLMs and stigmatizing language

How LLMs work and why they may mirror bias

Why this happens

Examples of stigmatizing vs. Person-first language

Why is stigmatizing language harmful?

Solutions: Prompt engineering and responsible language use

Empowering change in digital health communication

Related Articles

Tailored Recovery: The Promise of Precision Medicine for Sub...

Anxiety Pens Under Scrutiny: What&#039;s Real and What&#039;...

🍪 Cookie Preferences

Anxiety Pens Under Scrutiny: What's Real and What'...