Presenter Information

Mourya Teja KunukuFollow

Location

https://www.kennesaw.edu/ccse/events/computing-showcase/fa24-cday-program.php

Streaming Media

Document Type

Event

Start Date

19-12-2024 4:00 PM

Description

Crafting quiz questions that effectively assess students’ understanding of lectures and course materials, such as textbooks, poses significant challenges. Recent AI-based quiz generation efforts have predominantly concentrated on static resources, like textbooks and slides, often overlooking the dynamic and interactive elements of live lectures—contextual cues, discussions, and interactions—that contribute to the learning experience. In this work, we propose a Retrieval-Augmented Generation (RAG) model that processes multimodal inputs by combining text, audio, and video to produce quizzes that capture a fuller context. Our method incorporates Whisper for audio transcription and utilizes a Large Vision-Language Model (LVLM) to extract essential visual data from lecture videos. By integrating both spoken and visual elements, our model generates quizzes that more closely represent the lecture environment. We evaluate the model’s impact on quiz relevance, diversity, and engagement, showing that this multimodal approach fosters a more dynamic and immersive learning experience. Performance metrics, including hit rate and mean reciprocal rank (MRR), are used to assess question relevance and accuracy. A high hit rate indicates the model’s reliability in producing pertinent questions, while MRR highlights ranking quality, demonstrating the prompt appearance of relevant questions. Strong results in these metrics confirm our model’s effectiveness, though current limitations include challenges in handling abstract concepts absent in the lecture material—a gap we aim to bridge in future developments by integrating external knowledge sources.

Share

COinS
 
Dec 19th, 4:00 PM

GPR-185 A Multimodal Approach to Quiz Generation: Leveraging RAG Models for Educational Assessments

https://www.kennesaw.edu/ccse/events/computing-showcase/fa24-cday-program.php

Crafting quiz questions that effectively assess students’ understanding of lectures and course materials, such as textbooks, poses significant challenges. Recent AI-based quiz generation efforts have predominantly concentrated on static resources, like textbooks and slides, often overlooking the dynamic and interactive elements of live lectures—contextual cues, discussions, and interactions—that contribute to the learning experience. In this work, we propose a Retrieval-Augmented Generation (RAG) model that processes multimodal inputs by combining text, audio, and video to produce quizzes that capture a fuller context. Our method incorporates Whisper for audio transcription and utilizes a Large Vision-Language Model (LVLM) to extract essential visual data from lecture videos. By integrating both spoken and visual elements, our model generates quizzes that more closely represent the lecture environment. We evaluate the model’s impact on quiz relevance, diversity, and engagement, showing that this multimodal approach fosters a more dynamic and immersive learning experience. Performance metrics, including hit rate and mean reciprocal rank (MRR), are used to assess question relevance and accuracy. A high hit rate indicates the model’s reliability in producing pertinent questions, while MRR highlights ranking quality, demonstrating the prompt appearance of relevant questions. Strong results in these metrics confirm our model’s effectiveness, though current limitations include challenges in handling abstract concepts absent in the lecture material—a gap we aim to bridge in future developments by integrating external knowledge sources.