Does ChatGPT Provide Higher Quality and More Empathetic Responses to Patient Questions Compared to Physician Responses?
BACKGROUND AND PURPOSE:
- Artificial intelligence (AI) such as ChatGPT can provide responses to patients
- Ayers et al. (JAMA Internal Medicine, 2023) evaluated the ability of ChatGPT to provide quality and empathetic responses to patient questions
METHODS:
- Cross-sectional study
- Dataset
- Patient questions from a public social media forum (Reddit’s r/AskDocs)
- Interventions
- Chatbot responses
- Generated by entering the original question into a fresh session (without prior questions having been asked in the session)
- Physician responses
- Chatbot responses
- Study design
- Responses were anonymized and randomly ordered, and were then evaluated in triplicate by a team of health care professionals
- Evaluators judged
- Quality: Very poor | Poor | Acceptable | Good | Very good
- Empathy: Not empathetic | Slightly empathetic | Moderately empathetic | Empathetic | Very empathetic
- Primary outcomes
- Mean outcomes, ordered on a 1 to 5 scale and compared between chatbot and physicians
RESULTS:
- 195 questions and responses
- Percentage of chatbot responses preferred by evaluators: 78.6% (95% CI, 75.0 to 81.8)
- Physician responses were significantly
- Shorter than chatbot responses (P<0.001)
- Physician: 52 (IQR, 17 to 62) words | Chatbot: 211 (IQR, 168 to 245) words
- Lower quality than chatbot responses
- P<0.001 (P<0.001)
- Shorter than chatbot responses (P<0.001)
- The proportion of responses rated as good or very good quality (≥ 4) was higher for chatbot than physicians
- Chatbot: 78.5% (95% CI, 72.3 to 84.1)
- Physicians: 22.1% (95% CI, 16.4 to 28.2)
- Chatbot responses were also evaluated as significantly more empathetic than physician responses (P<0.001)
- The proportion of responses rated empathetic or very empathetic (≥4) was higher for chatbot than for physicians
- Chatbot: 45.1% (95% CI, 38.5 to 51.8)
- Physicians: 4.6% (95% CI, 2.1 to 7.7)
CONCLUSION:
- Compared to physician responses, chatbot responses to patient-posed questions had a 3.6 times higher prevalence of being good or very good quality, and a 9.8 times higher prevalence of being empathetic or very empathetic
- Limitations included
- Quality and empathy measures were not pilot tested or validated
- Study evaluators were coauthors that could lead to biases
- Another significant limitation was lack of context as stated by the authors
The main study limitation was the use of the online forum question and answer exchanges
Such messages may not reflect typical patient-physician questions
For instance, we only studied responding to questions in isolation, whereas actual physicians may form answers based on established patient-physician relationships
Learn More – Primary Sources:
Want to share this with your colleagues?
SPECIALTY AREAS
- Alerts
- Allergy And Immunology
- Cancer Screening
- Cardiology
- Cervical Cancer Screening
- Dermatology
- Diabetes
- Endocrine
- ENT
- Evidence Matters
- General Internal Medicine
- Genetics
- Geriatrics
- GI
- GU
- Hematology
- ID
- Medical Legal
- Mental Health
- MSK
- Nephrology
- Neurology
- PcMED Connect
- PrEP for Patients
- PrEP for Physicians
- Preventive Medicine
- Pulmonary
- Rheumatology
- Vaccinations
- Women's Health
- Your Practice