Finding Undervalued Pitch Metrics at Coors' Field Using a Random Forest Classifier

Presenters

Andrew PlantFollow

Disciplines

Sports Management

Abstract (300 words maximum)

It has been said that baseball has changed more in the last decade than the 100 years prior. Nearly all this change has been driven by the growing use of statistics and analytics. Baseball Operation departments now use data to aid decision making and development across all facets of the game. So far, pitching has outpaced hitting on what data is most important, and how to incorporate it. Dozens of metrics are taken on every pitch thrown, and teams have raced to figure out how to put the puzzle together. One team that has struggled with pitching is the Colorado Rockies. Playing at a much higher elevation than other teams, the thinner air impacts the way the ball moves out of a pitcher's hand and after it is hit. This has led to the Rockies struggling to find pitchers that have success at home.

I wanted to research the difference in importance of these pitch metrics at the Rockies’ home park, Coors Field, versus at a lower elevation. To do this, I separated the data by each pitch type and ran a Random Forest Classifier Model predicting swinging strikes for pitches thrown at Coors Field and pitches thrown at other MLB stadiums. I then found the feature importance for each model and subtracted to get the difference between them.

The difference in feature importance in the models were small, but over the course of a season, small differences can make large impacts. Using these models, it can be determined which metrics have a greater impact on getting swing and miss on pitches in Colorado versus other stadiums. By knowing this, the Rockies can target pitchers with specific characteristics in free agency and the draft that may be undervalued by the other teams.

Academic department under which the project should be listed

CCSE - Data Science and Analytics

Primary Investigator (PI) Name

Dr. Joe DeMaio

Additional Faculty

Bob Vanderheyden, Applied Statistics and Analytics, rvanderh@kennesaw.edu

This document is currently not available here.

Share

COinS
 

Finding Undervalued Pitch Metrics at Coors' Field Using a Random Forest Classifier

It has been said that baseball has changed more in the last decade than the 100 years prior. Nearly all this change has been driven by the growing use of statistics and analytics. Baseball Operation departments now use data to aid decision making and development across all facets of the game. So far, pitching has outpaced hitting on what data is most important, and how to incorporate it. Dozens of metrics are taken on every pitch thrown, and teams have raced to figure out how to put the puzzle together. One team that has struggled with pitching is the Colorado Rockies. Playing at a much higher elevation than other teams, the thinner air impacts the way the ball moves out of a pitcher's hand and after it is hit. This has led to the Rockies struggling to find pitchers that have success at home.

I wanted to research the difference in importance of these pitch metrics at the Rockies’ home park, Coors Field, versus at a lower elevation. To do this, I separated the data by each pitch type and ran a Random Forest Classifier Model predicting swinging strikes for pitches thrown at Coors Field and pitches thrown at other MLB stadiums. I then found the feature importance for each model and subtracted to get the difference between them.

The difference in feature importance in the models were small, but over the course of a season, small differences can make large impacts. Using these models, it can be determined which metrics have a greater impact on getting swing and miss on pitches in Colorado versus other stadiums. By knowing this, the Rockies can target pitchers with specific characteristics in free agency and the draft that may be undervalued by the other teams.

blog comments powered by Disqus