Finding Undervalued Pitch Metrics at Coors' Field Using a Random Forest Classifier
Disciplines
Sports Management
Abstract (300 words maximum)
It has been said that baseball has changed more in the last decade than the 100 years prior. Nearly all this change has been driven by the growing use of statistics and analytics. Baseball Operation departments now use data to aid decision making and development across all facets of the game. So far, pitching has outpaced hitting on what data is most important, and how to incorporate it. Dozens of metrics are taken on every pitch thrown, and teams have raced to figure out how to put the puzzle together. One team that has struggled with pitching is the Colorado Rockies. Playing at a much higher elevation than other teams, the thinner air impacts the way the ball moves out of a pitcher's hand and after it is hit. This has led to the Rockies struggling to find pitchers that have success at home.
I wanted to research the difference in importance of these pitch metrics at the Rockies’ home park, Coors Field, versus at a lower elevation. To do this, I separated the data by each pitch type and ran a Random Forest Classifier Model predicting swinging strikes for pitches thrown at Coors Field and pitches thrown at other MLB stadiums. I then found the feature importance for each model and subtracted to get the difference between them.
The difference in feature importance in the models were small, but over the course of a season, small differences can make large impacts. Using these models, it can be determined which metrics have a greater impact on getting swing and miss on pitches in Colorado versus other stadiums. By knowing this, the Rockies can target pitchers with specific characteristics in free agency and the draft that may be undervalued by the other teams.
Academic department under which the project should be listed
CCSE - Data Science and Analytics
Primary Investigator (PI) Name
Dr. Joe DeMaio
Additional Faculty
Bob Vanderheyden, Applied Statistics and Analytics, rvanderh@kennesaw.edu
Finding Undervalued Pitch Metrics at Coors' Field Using a Random Forest Classifier
It has been said that baseball has changed more in the last decade than the 100 years prior. Nearly all this change has been driven by the growing use of statistics and analytics. Baseball Operation departments now use data to aid decision making and development across all facets of the game. So far, pitching has outpaced hitting on what data is most important, and how to incorporate it. Dozens of metrics are taken on every pitch thrown, and teams have raced to figure out how to put the puzzle together. One team that has struggled with pitching is the Colorado Rockies. Playing at a much higher elevation than other teams, the thinner air impacts the way the ball moves out of a pitcher's hand and after it is hit. This has led to the Rockies struggling to find pitchers that have success at home.
I wanted to research the difference in importance of these pitch metrics at the Rockies’ home park, Coors Field, versus at a lower elevation. To do this, I separated the data by each pitch type and ran a Random Forest Classifier Model predicting swinging strikes for pitches thrown at Coors Field and pitches thrown at other MLB stadiums. I then found the feature importance for each model and subtracted to get the difference between them.
The difference in feature importance in the models were small, but over the course of a season, small differences can make large impacts. Using these models, it can be determined which metrics have a greater impact on getting swing and miss on pitches in Colorado versus other stadiums. By knowing this, the Rockies can target pitchers with specific characteristics in free agency and the draft that may be undervalued by the other teams.