Open Access Open Access  Restricted Access Subscription or Fee Access

Using Machine Learning to Predict National Hockey League Average Home Game Attendance

Barry E King, Jennifer L Rice, Julie Vaughan


Research predicting National Hockey League average attendance is presented. The seasons examined are the 2013 hockey season through the beginning of the 2017 hockey season. Multiple linear regression and three machine learning algorithms – random forest, M5 prime, and extreme gradient boosting – are employed to predict out-of-sample average home game attendance. Extreme gradient boosting generated the lowest out-of-sample root mean square error.  The team identifier (team name), the number of Twitter followers (a surrogate for team popularity), median ticket price, and arena capacity have appeared as the top four predictor variables. 


National Hockey League; linear regression, random forests; M5 prime; extreme gradient boosting

Full Text:



Borland, J. and MacDonald, R., ‘Demand for sport’ (2003) Oxford Review of Economic Policy, 19(4), 478-502.

Boyd, T., and Krehbiel, T., ‘The effect of promotion timing on major league baseball attendance; (1999) Sport Marketing Quarterly, 8(4), 23-34.

Breiman, L., Machine Learning, (2001). 45: 5.

Deshpande, S., and Jensen, S., ‘Estimating an NBA player’s impact on his team’s chances of winning’ (2016). Journal of Quantitative Analysis in Sports, 12(2), 51-72.

Douvis, J., ‘What makes fans attend professional sporting events? A review’ (2014) Advances in Sport Management Research Journal, vol. 1 pp. 40-70.

ESPN, (n.d.). Retrieved December 7, 2017, from www.espn/nhl/attendance.

Gitter, S., and Rhoads, T., ‘Determinants of minor league baseball attendance’ (2010) Journal of Sports Economics, 11(6), 614-628.

Gladden, J., and Funk, D., ‘Understanding brand loyalty in professional sport: examining the link between brand associations and brand loyalty’ (2001) International Journal of Sports Marketing and Sponsorship, 3(1), 54-81.

The Hockey News, (n.d.). Retrieved December 7, 2017 from

Jewell, R. and Molina, D., ‘An evaluation of the relationship between Hispanics and major league soccer’ (2005) Journal of Sports Economics, 6(2), 160-177.

Jane, W.,’ The effect of star quality on attendance demand’ (2016) Journal of Sports Economics, 17(4), 396-417.

Kakoty, S., (n.d.). What is the simple explanation of M5P (M5 model trees) algorithm in machine learning/data mining? Retrieved December 21, 2017 from

Liu, J., (n.d.). Updated: xgboost with parameter tuning. Retrieved December 21, 2017 from

Mongeon, K., Winfree, and J., ‘Comparison of television and gate demand in the National Basketball Association’ (2012) Sport Management Review, 15(1), 72-79.

Nishad, (n.d.). What do we mean by node impurity ref-random forest? Retrieved December 21, 2017 from stats.stackexchange/questions/223109/what-do-we-mean-by-node-impurity-ref-random-forest.

Nishida, K. (2017) Retrieved from

Paul, R., and Weinbach, A., Determinants of attendance in the Quebec major junior hockey league: role of winning, scoring, and fighting’ (2011) Atlantic Economics Journal, 39(3) pp. 303-311.

Peters, D., (1999). Winning percentage and attendance in the NHL. (Unpublished undergraduate project). St. John Fisher College, Rochester, NY.

Polamuri, S., (n.d.). How the random forest algorithm works in machine learning. Retrieved December 21, 2017 from

Quinlan, J., ‘Learning with continuous classes’ (1992) Proceedings AI’92 (Adams & Sterling, eds.), 343-348, World Scientific, Singapore.

Raut, S., (n.d.). Want to know how to choose machine learning algorithm? Retrieved December 21, 2017 from www.datascience, (n.d.). Retrieved December 7, 2017, from, (n.d.). Retrieved December 7, 2017 from

Trail, G., Anderson, D., and Lee, D., ‘A longitudinal study of team-fan role identity on self-reported attendance behavior and future intentions’ (2017) Journal of Amateur Sport, 3(1) pp. 27-49.

Trawiński, B., Smętek, M., Telec, Z., and Lasota, T., ‘Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms’ (2012) International Journal of Applied Mathematics and Computer Science, 22(4), pp. 867-881. Retrieved 20 Dec. 2017, from doi:10.2478/v10006-012-0064-z.

VividSeats, (n.d.). Retrieved December 7, 2017 from

Welling, S.H., (2015) Retrieved from


Wiedecke, J., (1999). Factors affecting attendance in the National Hockey League: a multiple regression model. (Unpublished master’s thesis). University of North Carolina, Chapel Hill.

Wikipedia (n.d.) Retrieved from

Zhu, N., and Chen, T., (2016). XGBoost: implementing the winningest Kaggle algorithm in Spark and Flink. Retrieved December 13, 2017 from



  • There are currently no refbacks.