Presentation Type

Article

Location

Kennesaw, Georgia

Start Date

1-4-2026 1:45 PM

End Date

1-4-2026 3:00 PM

Description

Cigarette smoking remains a major public health issue in the United States, causing more than 480,000 deaths annually and leading to significant healthcare and economic costs. Understanding smoking behavior over the life course is important for developing effective public health interventions. In this study, the Cancer Intervention and Surveillance Modeling Network (CISNET) Smoking History Generator (SHG), which simulates individual life histories of smoking and mortality in the United States, was used to generate the smoking trajectories. A total of 200,000 individuals (100,000 males and 100,000 females) from the 1960 birth cohort were simulated. An unsupervised distance-based clustering (k-means) algorithm was developed to learn patterns in the smoking trajectories and to group individuals with similar smoking behaviors. The optimal number of clusters was determined using the elbow method and silhouette analysis. Cluster validity was evaluated using the silhouette score, Davies-Bouldin index (DBI), and Calinski-Harabasz index (CHI), yielding values of 0.38, 0.95, and 41,680.45, respectively, indicating acceptable cluster separation and compactness. The results revealed several distinct smoking trajectory patterns that resemble those observed in real populations, including light, moderate, heavy, early quitter, and long-term smoking behaviors. Identifying these trajectory groups provides valuable insights into how smoking behavior evolves over the life course. Such information can help public health to better target prevention and smoking cessation strategies.

Share

COinS
 
Apr 1st, 1:45 PM Apr 1st, 3:00 PM

Distance-Based Clustering for Identifying Patterns in Lifetime Smoking Trajectories

Kennesaw, Georgia

Cigarette smoking remains a major public health issue in the United States, causing more than 480,000 deaths annually and leading to significant healthcare and economic costs. Understanding smoking behavior over the life course is important for developing effective public health interventions. In this study, the Cancer Intervention and Surveillance Modeling Network (CISNET) Smoking History Generator (SHG), which simulates individual life histories of smoking and mortality in the United States, was used to generate the smoking trajectories. A total of 200,000 individuals (100,000 males and 100,000 females) from the 1960 birth cohort were simulated. An unsupervised distance-based clustering (k-means) algorithm was developed to learn patterns in the smoking trajectories and to group individuals with similar smoking behaviors. The optimal number of clusters was determined using the elbow method and silhouette analysis. Cluster validity was evaluated using the silhouette score, Davies-Bouldin index (DBI), and Calinski-Harabasz index (CHI), yielding values of 0.38, 0.95, and 41,680.45, respectively, indicating acceptable cluster separation and compactness. The results revealed several distinct smoking trajectory patterns that resemble those observed in real populations, including light, moderate, heavy, early quitter, and long-term smoking behaviors. Identifying these trajectory groups provides valuable insights into how smoking behavior evolves over the life course. Such information can help public health to better target prevention and smoking cessation strategies.