A Comparative Analysis of Large Language Models in Managing Disorders of Sex Development: Evaluation Based on Clinical Guidelines

SAİME SÜNDÜS UYGUN, Fatma Özcan Sıkı

  • Year : 2025
  • Vol : 41
  • Issue : 4
  •  Page : 201-204
Objective: This study aims to compare the guideline compliance of two widely used AI-based chatbot systems, ChatGPT and Bing AI, with the clinical recommendations outlined in the Disorders of Sex Development (DSD) guideline published by the Turkish Neonatal Society.
Materials and Methods: A standardized evaluation set comprising 40 questions based on the DSD guideline was utilized. The questions were grouped under six main categories reflecting clinical decision-making processes and were presented to both ChatGPT and Bing AI. Responses were scored on a 5-point Likert scale by two independent experts, assessing their alignment with the guideline. Mean scores were calculated for each category, and statistical comparisons were made using the Wilcoxon signed-rank test.
Results: ChatGPT demonstrated high consistency with the guideline across all categories (mean score: 4.88), while Bing AI showed lower compliance in several areas (mean score: 3.25). The differences in scores between the two systems were statistically significant across all categories (p < 0.05), with Bing AI performing particularly poorly in the areas of diagnosis/laboratory testing and multidisciplinary approach.
Conclusion: ChatGPT demonstrated higher accuracy and consistency than Bing AI in providing guideline-based clinical support regarding DSD. The use of AIsupported systems aligned with current guidelines holds significant potential in supporting complex, multidisciplinary decision-making processes. Therefore, the selection of AI tools in clinical settings should be informed by such systematic evaluations.
Cite this Article As : Uygun SS, Ozcan Siki F. A Comparative Analysis of Large Language Models in Managing Disorders of Sex Development: Evaluation Based on Clinical Guidelines. Selcuk Med J 2025;41(4): 201-204

Download Citation: Endnote/Zotero/Mendeley (RIS) RIS File

Download Citation: BibTeX BibTeX File

Description : None of the authors, any product mentioned in this article, does not have a material interest in the device or drug. Research, not supported by any external organization. grant full access to the primary data and, if requested by the magazine they agree to allow the examination of data.
A Comparative Analysis of Large Language Models in Managing Disorders of Sex Development: Evaluation Based on Clinical Guidelines
, Vol. 41 (4)
Received : 10.05.2025, Accepted : 19.10.2025, Published Online : 11.12.2025
Selçuk Tıp Dergisi
ISSN:1017-6616;
E-ISSN:2149-8059;