Classifying Variable Stars from Stellar Light Curve Data

Presenters

Ryan ParkerFollow

Disciplines

Applied Statistics | Data Science | Multivariate Analysis | Numerical Analysis and Scientific Computing | Stars, Interstellar Medium and the Galaxy | Statistical Models

Abstract (300 words maximum)

Due to advances in collection techniques, variable star light curve data is being produced faster than the existing curves can be classified. Automated classification has been attempted, but most endeavors use sophisticated techniques to extract high-level variables and many produce inconsistent results, often finding the greatest predictive impact from low-level variables. Here, several of the more successful methods were compared using these low-level features from the OGLE4 variable star catalogue. In addition, a probability-based, multi-level classifier was developed to increase classification accuracy of the underrepresented classes and improve user confidence. Random Forest and Gradient Boosting Trees presented the highest accuracy and the multi-level classifier outperformed even these. Not only could these models accurately predict the classes using easier-to-calculate variables, but the multi-level framework also increases this accuracy further and functions as a trustable system, rejecting low-confidence samples based on a user-determined confidence threshold.

Academic department under which the project should be listed

CCSE - Data Science and Analytics

Primary Investigator (PI) Name

Ramazan Aygun

This document is currently not available here.

Share

COinS
 

Classifying Variable Stars from Stellar Light Curve Data

Due to advances in collection techniques, variable star light curve data is being produced faster than the existing curves can be classified. Automated classification has been attempted, but most endeavors use sophisticated techniques to extract high-level variables and many produce inconsistent results, often finding the greatest predictive impact from low-level variables. Here, several of the more successful methods were compared using these low-level features from the OGLE4 variable star catalogue. In addition, a probability-based, multi-level classifier was developed to increase classification accuracy of the underrepresented classes and improve user confidence. Random Forest and Gradient Boosting Trees presented the highest accuracy and the multi-level classifier outperformed even these. Not only could these models accurately predict the classes using easier-to-calculate variables, but the multi-level framework also increases this accuracy further and functions as a trustable system, rejecting low-confidence samples based on a user-determined confidence threshold.