10th International Congress on Information and Communication Technology in concurrent with ICT Excellence Awards (ICICT 2025) will be held at London, United Kingdom | February 18 - 21 2025.
Authors - Eduardo Puraivan, Patricio Tapia, Miguel Rodriguez, Steffanie Kloss, Connie Cofre-Morales, Pablo Ormeno-Arriagada, Karina Huencho-Iturra Abstract - This study provides empirical evidence on the effectiveness of large language models (LLMs), particularly ChatGPT, for automating the identification and analysis of cognitive demand levels in reading comprehension assessment tasks, using Barret’s Taxonomy. The manual classification of these tasks, even for experienced teachers, poses challenges due to their complexity and the time required. To address this issue, a three-step methodology was developed: selection of reading comprehension activities, automatic classification by ChatGPT, and comparison with the classifications from a group of experts. The experiment included 25 questions based on four readings extracted from a fourth-grade teacher’s guide for primary education. The results showed variability in the agreement between ChatGPT’s classifications and those of the experts: 77% in Activity 1, 50% in Activity 2, 52% in Activity 3, and 67% in Activity 4. At the question level, agreements ranged from 0% to 100%, highlighting discrepancies even among the evaluators, which underscores the inherent subjectivity of the task. Despite these divergences, the results emphasize the potential of LLMs to streamline the classification of educational activities on a large scale and the need to continue refining these models to enhance their performance in more complex pedagogical tasks.