Developing an Empirical Model to Forecast United States Presidential Elections: A Machine Learning Approach

Authors

  • Noah Loewy Student, Paul D Schreiber High School, 101 Campus Drive Port Washington, NY, USA
  • Ashok Singh a:1:{s:5:"en_US";s:31:"University of Nevada, Las Vegas";}
  • Tina Marie Gallagher Co-Coordinator of Math Research Program, Paul D Schreiber High School, 101 Campus Drive Port Washington, NY, USA, 11050

DOI:

https://doi.org/10.14738/assrj.710.9210

Keywords:

Presidential Election; Electoral College; Forecast; XGBoost; Multiple Linear Regression

Abstract

In this paper, we develop and compare two models for forecasting the 2020 U.S. presidential election using multiple linear regressions (MLR) and the Machine Learning method of Extreme Gradient Boosting (xgboost). We predict each state’s Republican vote share using seven continuous predictors from 1976-2016, as well as dummy columns for each state. After computing 95% confidence intervals for each prediction, we determine the candidates’ electoral college probabilities. The xgboost appears to be a very strong predictor, accounting for 98.6% of the variance with a 3.34% root mean square error (RMSE), whereas the MLR only accounts for 71.8% of the variance and leaves an RMSE of 6.35%. We observe that 1) both models predict a Democratic electoral college landslide in the 2020 elections, 2) Georgia, Iowa, Florida, North Carolina, and Ohio are crucial for the Republicans to win, and 3) Extreme Gradient Boosting is an attractive alternative to MLR in election forecasting.

 

Downloads

Published

2020-10-17

How to Cite

Loewy, N., Singh, A., & Marie Gallagher, T. . (2020). Developing an Empirical Model to Forecast United States Presidential Elections: A Machine Learning Approach. Advances in Social Sciences Research Journal, 7(10), 186–198. https://doi.org/10.14738/assrj.710.9210