Abstract
In order to determine a suitable automobile insurance policy premium, one needs to take into account three factors: the risk associated with the drivers and cars on the policy, the operational costs associated with management of the policy and the desired profit margin. The premium should then be some function of these three values. We focus on risk assessment using a data science approach. Instead of using the traditional frequency and severity metrics, we instead predict the total claims that will be made by a new customer using historical data of current and past policies. Given multiple features of the policy (age and gender of drivers, value of car, previous accidents, etc.), one can potentially try to provide personalized insurance policies based specifically on these features as follows. We can compute the average claims made per year of all past and current policies with identical features and then take an average over these claim rates. Unfortunately there may not be sufficient samples to obtain a robust average. We can instead try to include policies that are “similar” to obtain sufficient samples for a robust average. We therefore face a trade-off between personalization (only using closely similar policies) and robustness (extending the domain far enough to capture sufficient samples). This is known as the bias–variance trade-off. We model this problem and determine the optimal trade-off between the two (i.e., the balance that provides the highest prediction accuracy) and apply it to the claim rate prediction problem. We demonstrate our approach using real data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
The data used for this publication are confidential, and hence, we are only permitted to provide results but cannot share the data.
Code Availability
The code used to generate results is also proprietary to the company, but we hope that our pseudo-code can be used if one wishes to apply the model to their datasets.
References
Albrecher, H., Bommier, A., Filipović, D., et al.: Insurance: models, digitalization, and data science. Eur. Actuar. J. 9, 349–360 (2019)
Bian, Y., Yang, C., Zhao, J.L., et al.: Good drivers pay less: a study of usage-based vehicle insurance models. Transp. Res. A: Policy Pract. 107, 20–34 (2018). https://doi.org/10.1016/j.tra.2017.10.018
David, M., Jemna, D.V.: Modeling the frequency of auto insurance claims by means of poisson and negative binomial models. Analele stiintifice ale Universitatii “Al I Cuza” din Iasi Stiinte economice/Scientific Annals of the“ Al I Cuza” (2015)
Denuit, M., Trufin, J.: Effective Statistical Learning Methods for Actuaries. Springer Actuarial Lecture Notes (2019)
Errais, E.: Pricing insurance premia: a top down approach. Annals of Operations Research, pp. 1–16 (2019)
Esfandabadi, Z.S., Ranjbari, M., Scagnelli, S.D.: (0) Prioritizing risk-level factors in comprehensive automobile insurance management: A hybrid multi-criteria decision-making model. Glob. Bus. Rev. https://doi.org/10.1177/0972150920932287,
Guelman, L.: Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Syst. Appl. 39(3), 3659–3667 (2012)
Hanafy, M., Ming, R.: Machine learning approaches for auto insurance big data. Risks 9(2), 42 (2021)
Hassani, H., Unger, S., Beneki, C.: Big data and actuarial science. Big Data Cogn. Comput. 4, 40 (2020)
He, B., Zhang, D., Liu, S., et al.: Profiling driver behavior for personalized insurance pricing and maximal profit. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 1387–1396. https://doi.org/10.1109/BigData.2018.8622491 (2018)
Hosein, P.: On the prediction of automobile insurance claims: the personalization versus confidence trade-off. In: 2021 IEEE International Conference on Technology Management, pp. 1–6. Operations and Decisions (ICTMOD), IEEE (2021)
Hosein, P., Rahaman, I., Nichols, K., et al.: Recommendations for long-term profit optimization. In: ImpactRS@ RecSys (2019)
Jeong, H., Valdez, E.A.: Predictive compound risk models with dependence. Insurance Math. Econom. 94, 182–195 (2020)
Kanchinadam, T., Qazi, M., Bockhorst, J., et al.: Using discriminative graphical models for insurance recommender systems. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 421–428 (2018). https://doi.org/10.1109/ICMLA.2018.00069
Liu, Y., Wang, B.J., Lv, S.G.: Using multi-class adaboost tree for prediction frequency of auto insurance. J. Appl. Finance Bank. 4(5), 45 (2014)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Guyon, I., Luxburg, U.V., Bengio, S., et al. (Eds.) Advances in Neural Information Processing Systems, vol 30. Curran Associates, Inc (2017). https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
Qazi, M., Fung, G.M., Meissner, K.J., et al.: An insurance recommendation system using bayesian networks. In: Proceedings of the Eleventh ACM Conference on Recommender Systems. Association for Computing Machinery, New York, NY, USA, RecSys ’17, pp. 274–278 (2017). https://doi.org/10.1145/3109859.3109907
Qazi, M., Tollas, K., Kanchinadam, T., et al.: Designing and deploying insurance recommender systems using machine learning. WIREs Data Min. Knowl. Discovery 10(4), e1363 (2020). https://doi.org/10.1002/widm.1363
Su, X., Bai, M.: Stochastic gradient boosting frequency-severity model of insurance claims. PLoS ONE 15(8), e0238000 (2020)
Zhang, Y., Dukic, V.: Predicting multivariate insurance loss payments under the bayesian copula framework. J. Risk Insurance 80(4), 891–919 (2013)
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Contributions
The sole author performed the research, wrote the code for evaluating the solution and wrote the entire paper
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hosein, P. A data science approach to risk assessment for automobile insurance policies. Int J Data Sci Anal 17, 127–138 (2024). https://doi.org/10.1007/s41060-023-00392-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-023-00392-x