Location
https://www.kennesaw.edu/ccse/events/computing-showcase/fa24-cday-program.php
Document Type
Event
Start Date
19-12-2024 4:00 PM
Description
Crafting quiz questions that effectively assess students’ understanding of lectures and course materials, such as textbooks, poses significant challenges. Recent AI-based quiz generation efforts have predominantly concentrated on static resources, like textbooks and slides, often overlooking the dynamic and interactive elements of live lectures—contextual cues, discussions, and interactions—that contribute to the learning experience. In this work, we propose a Retrieval-Augmented Generation (RAG) model that processes multimodal inputs by combining text, audio, and video to produce quizzes that capture a fuller context. Our method incorporates Whisper for audio transcription and utilizes a Large Vision-Language Model (LVLM) to extract essential visual data from lecture videos. By integrating both spoken and visual elements, our model generates quizzes that more closely represent the lecture environment. We evaluate the model’s impact on quiz relevance, diversity, and engagement, showing that this multimodal approach fosters a more dynamic and immersive learning experience. Performance metrics, including hit rate and mean reciprocal rank (MRR), are used to assess question relevance and accuracy. A high hit rate indicates the model’s reliability in producing pertinent questions, while MRR highlights ranking quality, demonstrating the prompt appearance of relevant questions. Strong results in these metrics confirm our model’s effectiveness, though current limitations include challenges in handling abstract concepts absent in the lecture material—a gap we aim to bridge in future developments by integrating external knowledge sources.
Included in
GPR-185 A Multimodal Approach to Quiz Generation: Leveraging RAG Models for Educational Assessments
https://www.kennesaw.edu/ccse/events/computing-showcase/fa24-cday-program.php
Crafting quiz questions that effectively assess students’ understanding of lectures and course materials, such as textbooks, poses significant challenges. Recent AI-based quiz generation efforts have predominantly concentrated on static resources, like textbooks and slides, often overlooking the dynamic and interactive elements of live lectures—contextual cues, discussions, and interactions—that contribute to the learning experience. In this work, we propose a Retrieval-Augmented Generation (RAG) model that processes multimodal inputs by combining text, audio, and video to produce quizzes that capture a fuller context. Our method incorporates Whisper for audio transcription and utilizes a Large Vision-Language Model (LVLM) to extract essential visual data from lecture videos. By integrating both spoken and visual elements, our model generates quizzes that more closely represent the lecture environment. We evaluate the model’s impact on quiz relevance, diversity, and engagement, showing that this multimodal approach fosters a more dynamic and immersive learning experience. Performance metrics, including hit rate and mean reciprocal rank (MRR), are used to assess question relevance and accuracy. A high hit rate indicates the model’s reliability in producing pertinent questions, while MRR highlights ranking quality, demonstrating the prompt appearance of relevant questions. Strong results in these metrics confirm our model’s effectiveness, though current limitations include challenges in handling abstract concepts absent in the lecture material—a gap we aim to bridge in future developments by integrating external knowledge sources.