Uncovering Genetic Patterns in Salmonella enterica Using K-Means Clustering
Disciplines
Artificial Intelligence and Robotics | Bacteria | Databases and Information Systems | Medicinal and Pharmaceutical Chemistry | Numerical Analysis and Scientific Computing | Other Pharmacy and Pharmaceutical Sciences
Abstract (300 words maximum)
Clustering techniques play a crucial role in genomic data analysis by uncovering hidden patterns and relationships within large datasets. This study applies k-means clustering to Salmonella Enterica genomic data to classify and analyze genetic variations among different strains. Salmonella Enterica is a significant pathogen responsible for foodborne illnesses worldwide, and understanding its genetic diversity can aid in tracking outbreaks and improving public health responses. Our approach involves preprocessing genomic sequence data, extracting relevant features, and applying k-means clustering to group similar strains based on genetic similarity. The results reveal distinct groupings that may correspond to variations in virulence, antibiotic resistance, or geographic origin. These insights contribute to a deeper understanding of Salmonella Enterica population structures and could enhance epidemiological surveillance efforts. This mentored research leverages unsupervised machine learning to generate new knowledge in bacterial genomics. By applying computational clustering methods to pathogen data, this study provides an innovative approach to classifying Salmonella Enterica strains, which may have implications for public health monitoring and outbreak prevention.
Academic department under which the project should be listed
SPCEET - Robotics and Mechatronics Engineering
Primary Investigator (PI) Name
Razvan Voicu
Uncovering Genetic Patterns in Salmonella enterica Using K-Means Clustering
Clustering techniques play a crucial role in genomic data analysis by uncovering hidden patterns and relationships within large datasets. This study applies k-means clustering to Salmonella Enterica genomic data to classify and analyze genetic variations among different strains. Salmonella Enterica is a significant pathogen responsible for foodborne illnesses worldwide, and understanding its genetic diversity can aid in tracking outbreaks and improving public health responses. Our approach involves preprocessing genomic sequence data, extracting relevant features, and applying k-means clustering to group similar strains based on genetic similarity. The results reveal distinct groupings that may correspond to variations in virulence, antibiotic resistance, or geographic origin. These insights contribute to a deeper understanding of Salmonella Enterica population structures and could enhance epidemiological surveillance efforts. This mentored research leverages unsupervised machine learning to generate new knowledge in bacterial genomics. By applying computational clustering methods to pathogen data, this study provides an innovative approach to classifying Salmonella Enterica strains, which may have implications for public health monitoring and outbreak prevention.