Location

https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php

Document Type

Event

Start Date

24-11-2025 4:00 PM

Description

Traditional keyword search struggles with the scale, complexity, and contextual depth of clinical data. This project develops and evaluates semantic search systems that better understand medical language, enabling physicians and researchers to retrieve contextually relevant information through a Retrieval Augmented Generation (RAG) framework. We integrate privacy-preserving methods, including differential privacy and homomorphic encryption to protect sensitive clinical transcriptions. For improved speed and accuracy, we enhance the baseline RAG architecture with Hierarchical Navigable Small World (HNSW) indexing and Maximal Marginal Relevance (MMR) based reranking. To ensure scalability, clinical documents are ingested using PySpark and stored in a vector database optimized for high-dimensional queries, enabling fast, accurate, and privacy-aware retrieval of medical transcriptions.

Share

COinS
 
Nov 24th, 4:00 PM

GC-1215 ClinicalRAG: A Scalable Benchmark of Privacy, Relevance, and Speed in Semantic Retrieval for clinical transcriptions​ ​

https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php

Traditional keyword search struggles with the scale, complexity, and contextual depth of clinical data. This project develops and evaluates semantic search systems that better understand medical language, enabling physicians and researchers to retrieve contextually relevant information through a Retrieval Augmented Generation (RAG) framework. We integrate privacy-preserving methods, including differential privacy and homomorphic encryption to protect sensitive clinical transcriptions. For improved speed and accuracy, we enhance the baseline RAG architecture with Hierarchical Navigable Small World (HNSW) indexing and Maximal Marginal Relevance (MMR) based reranking. To ensure scalability, clinical documents are ingested using PySpark and stored in a vector database optimized for high-dimensional queries, enabling fast, accurate, and privacy-aware retrieval of medical transcriptions.