Location
https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php
Document Type
Event
Start Date
24-11-2025 4:00 PM
Description
Traditional keyword search struggles with the scale, complexity, and contextual depth of clinical data. This project develops and evaluates semantic search systems that better understand medical language, enabling physicians and researchers to retrieve contextually relevant information through a Retrieval Augmented Generation (RAG) framework. We integrate privacy-preserving methods, including differential privacy and homomorphic encryption to protect sensitive clinical transcriptions. For improved speed and accuracy, we enhance the baseline RAG architecture with Hierarchical Navigable Small World (HNSW) indexing and Maximal Marginal Relevance (MMR) based reranking. To ensure scalability, clinical documents are ingested using PySpark and stored in a vector database optimized for high-dimensional queries, enabling fast, accurate, and privacy-aware retrieval of medical transcriptions.
Included in
GC-1215 ClinicalRAG: A Scalable Benchmark of Privacy, Relevance, and Speed in Semantic Retrieval for clinical transcriptions
https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php
Traditional keyword search struggles with the scale, complexity, and contextual depth of clinical data. This project develops and evaluates semantic search systems that better understand medical language, enabling physicians and researchers to retrieve contextually relevant information through a Retrieval Augmented Generation (RAG) framework. We integrate privacy-preserving methods, including differential privacy and homomorphic encryption to protect sensitive clinical transcriptions. For improved speed and accuracy, we enhance the baseline RAG architecture with Hierarchical Navigable Small World (HNSW) indexing and Maximal Marginal Relevance (MMR) based reranking. To ensure scalability, clinical documents are ingested using PySpark and stored in a vector database optimized for high-dimensional queries, enabling fast, accurate, and privacy-aware retrieval of medical transcriptions.