Аннотация
In large-scale programming courses, providing learners with immediate and effective feedback is a significant challenge. This study explores the potential of Large Language Models (LLMs) to generate feedback on code assignments and to address the gaps in Automated Test-based Feedback (ATF) tools commonly employed in programming courses. We applied dedicated metrics in a Massive Open Online Course (MOOC) on programming to assess the correctness of feedback generated by two models, GPT-3.5-turbo and GPT-4, using a reliable ATF as a benchmark. The findings point to effective error detection, yet the feedback is often inaccurate, with GPT-4 outperforming GPT-3.5-turbo. We used insights gained from the prompt practices to develop Gipy, an application for submitting course assignments and obtaining LLM-generated feedback. Learners participating in a field experiment perceived the feedback provided by Gipy as moderately valuable, while at the same time recognizing its potential to complement ATF. Given the learners' critique and their awareness of the limitations of LLM-generated feedback, the studied implementation may be able to take advantage of the best of both ATF and LLMs as feedback resources. Further research is needed to assess the impact of LLM-generated feedback on learning outcomes and explore the capabilities of more advanced models.
Пользователи данного ресурса
Пожалуйста,
войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)