Abstract
Predicting the box-office revenue of a movie before its theatrical release is an important but challenging problem that requires a high level of Artificial Intelligence. Nowadays, social media has shown its predictive power in various domains, which motivates us to exploit social media content to predict box-office revenues. In this study, we employ both linear and non-linear regression models, which are based on the crowd wisdom of social media, especially the posts of users, to predict movie box-office revenues. More specifically, the attention and popularity of the movie, purchase intention of users, and comments of users are automatically mined from social media data. In our model, the use of Linear Regression and Support Vector Regression in predicting the box-office revenue of a movie before its theatrical release is explored. To evaluate the effectiveness of the proposed approach, a cross-validation experiment is conducted. The experimental results show that large-scale social media content is correlated with movie box-office revenues and that the purchase intention of users can lead to more accurate movie box-office revenue predictions. Both the linear and non-linear prediction models have the advantage of predicting movie grosses in our experiments.
Similar content being viewed by others
Notes
The baidu trends of the director
The baidu trends of the main actors
References
Asur S, Huberman BA (2010) Predicting the future with social media [C]//Web intelligence and intelligent agent technology (WI-IAT), 2010. IEEE/WIC/ACM international conference on IEEE 1:492–499
Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market [J]. J Comput Sci 2(1):1–8
Boser B E, Guyon I M, Vapnik V N. (1992) A training algorithm for optimal margin classifiers [C]// Proceedings of the fifth annual workshop on Computational learning theory. ACM, 144–152
Bothos E., Apostolou D., Mentzas G. (2010) Using Social Media to Predict Future Events with Agent-Based Markets. IEEE Intelligent Systems, vol. PP, no. 99.
Chaovalit P, Zhou L. (2005) Movie review mining: A comparison between supervised and unsupervised classification approaches [C]//System Sciences, 2005. HICSS’05. Proceedings of the 38th Annual Hawaii International Conference on. IEEE 112c-112c
Chen A (2002) Forecasting gross revenues at the movie box office [J]. University of Washington, Seattle
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Ding X, Liu B, Yu P S. (2008) A holistic lexicon-based approach to opinion mining [C]//Proceedings of the 2008 International Conference on Web Search and Data Mining. ACM, 231–240.
Drucker H, Burges CJC, Kaufman L et al (1997) Support vector regression machines. J Adv neural inf Process Syst 9:155–161
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent [J]. J Stat Softw 33(1):1
Gayo-Avello D, Metaxas P T, Mustafaraj E. (2011). Limits of electoral predictions using twitter [C]//ICWSM.
Gruhl D, Guha R, Kumar R, et al. (2005) The predictive power of online chatter [C]//Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, 78–87
Jansen H J, Koop R. (2006) Pundits, ideologues, and the ranters: The British Columbia election online [J]. Canadian Journal of Communication, 30 (4)
Jansen BJ, Zhang M, Sobel K et al (2009) Twitter power: tweets as electronic word of mouth [J]. J Am Soc Inf Sci Technol 60(11):2169–2188
Joachims T (1999) Making large scale SVM learning practical [J]
Joshi M, Das D, Gimpel K, et al. (2010) Movie reviews and revenues: An experiment in text regression [C]//Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 293–296
Jungherr A, Jürgens P, Schoen H (2012) Why the pirate party won the German election of 2009 or the trouble with predictions: a response to tumasjan, a., sprenger, to, sander, pg, & welpe, im “predicting elections with twitter: what 140 characters reveal about political sentiment”. J Soc Sci Comput Rev 30(2):229–234
Litman BR, Kohl LS (1989) Predicting financial success of motion pictures: The’80s experience [J]. J Media Eco 2(2):35–50
Liviu L, Mihaela T (2011) Predicting product performance with social media. J Nforma Educ 15(2):46–56
Metaxas P T, Mustafaraj E, Gayo-Avello D. (2011) How (not) to predict elections [C]//Privacy, security, risk and trust (PASSAT), 2011 IEEE third international conference on and 2011 I.E. third international conference on social computing (SocialCom). IEEE, 165–171
Mishne G, Glance N S. (2006) Predicting Movie Sales from Blogger Sentiment [C]//AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs. 155–158.
O’Connor B, Balasubramanyan R, Routledge BR et al (2010) From tweets to polls: linking text sentiment to public opinion time series. J ICWSM 11:122–129
Pang B, Lee L (2008) Opinion mining and sentiment analysis [J]. Found trends Inf Retr 2(1–2):1–135
Pang B, Lee L, Vaithyanathan S. (2002) Thumbs up?: sentiment classification using machine learning techniques [C]//Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics 79–86
Ritterman J, Osborne M, Klein E. (2009) Using prediction markets and Twitter to predict a swine flu pandemic [C]//1st international workshop on mining social media. 9
Sakaki T, Okazaki M, Matsuo Y. (2010) Earthquake shakes Twitter users: real-time event detection by social sensors [C]//Proceedings of the 19th international conference on World Wide Web. ACM, 851–860
Sawhney MS, Eliashberg J (1996) A parsimonious model for forecasting gross box-office revenues of motion pictures [J]. Mark Sci 15(2):113–131
Schölkopf B, Smola A J. (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond [M]. MIT press
Sharda R, Delen D (2006) Predicting box-office success of motion pictures with neural networks [J]. Expert Syst Appl 30(2):243–254
Sharda R, Meany E. (2000) Forecasting gate receipts using neural network and rough sets [C]//Proceedings of the International DSI Conference. : 1–5
Si J., Mukherjee A., Liu B., Li Q., Li H., Deng X. (2008). Exploiting Topic based Twitter Sentiment for Stock Prediction. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL-2013), pp. 24–29
Simonoff JS, Sparrow IR (2000) Predicting movie grosses: winners and losers, blockbusters and sleepers [J]. Chance 13(3):15–24
Skoric M, Poor N, Achananuparp P, et al. (2012) Tweets and votes: A study of the 2011 singapore general election [C]//System Science (HICSS), 2012 45th Hawaii International Conference on. IEEE, 2583–2591
Sochay S (1994) Predicting the performance of motion pictures [J]. J Media Eco 7(4):1–20
Sysomos Inc, “An In-Depth Look Inside the Twitter World ”. http://www.sysomos.com/insidetwitter/. [Accessed Feb 3, 2012].
Theil H (1961) Economic forecasts and policy [J]
Tumasjan A, Sprenger T O, Sandner P G, et al. Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment[J]. ICWSM, 2010, 10: 178–185
UzZaman N, Blanco R, Matthews M. (2012) TwitterPaul: Extracting and Aggregating Twitter Predictions [J]. arXiv preprint arXiv:1211.6496
Vapnik V. (2000) The nature of statistical learning theory [M]. springer
Wikipedia, “social media”. http://en.wikipedia.org/wiki/Social_media
Williams C, Gulati G. (2008) What is a social network worth? Facebook and vote share in the 2008 presidential primaries[C]. American Political Science Association
Zhang L, Luo J, Yang S (2009) Forecasting box office revenue of movies with BP neural network [J]. Expert Syst Appl 36(3):6580–6587
Zhang W, Skiena S. (2009) Improving movie gross prediction through news analysis [C]//Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology-Volume 01. IEEE Computer Society 301–304
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net [J]. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320
Acknowledgment
Ting Liu: model building, experiment design, paper writing
Xiao Ding: model building, experiment design, paper writing
Yiheng Chen: model building, experiment design
Haochen Chen: data collection, experiment design
Maosheng Guo: data collection
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, T., Ding, X., Chen, Y. et al. Predicting movie Box-office revenues by exploiting large-scale social media content. Multimed Tools Appl 75, 1509–1528 (2016). https://doi.org/10.1007/s11042-014-2270-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2270-1