Loading…
10th International Congress on Information and Communication Technology in concurrent with ICT Excellence Awards (ICICT 2025) will be held at London, United Kingdom | February 18 - 21 2025.
Tuesday February 18, 2025 12:15pm - 12:30pm GMT
Authors - Pouyan Nahed, Sepideh Farivar, Kazem Taghva
Abstract - This paper presents a large-scale biomedical Named Entity Recognition (NER) dataset automatically annotated using a Large Language Model (LLM) applied to the eligibility criteria from ClinicalTrials.gov. The dataset comprises over 4.6 million named entities, covering categories such as diseases, interventions, outcomes, and participants. A pseudo-labeling approach was employed to generate annotations with soft labels, providing confidence scores for each entity. We address challenges related to entity ambiguity and label inconsistency by introducing a structured mapping strategy to ensure uniformity across the dataset. The resulting dataset is a valuable resource for advancing tasks such as NER, information extraction, and text classification in biomedical research. By making this dataset publicly available, we aim to support the development of AI-driven healthcare applications.
Paper Presenters
avatar for Kazem Taghva

Kazem Taghva

United States of America
Tuesday February 18, 2025 12:15pm - 12:30pm GMT
Ludgate Suite - 1F America Square Conference Centre, London, United Kingdom

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link