Algoritmos de machine learning para la detección del fraude en el seguro de automóviles

Badal Valero, Elena; Sanjuán Díaz, Andrés; Segura Gisbert, Jorge

doi:10.26360/2020_2

Algoritmos de machine learning para la detección del fraude en el seguro de automóviles

Badal Valero, Elena ¹
Sanjuán Díaz, Andrés ¹
Segura Gisbert, Jorge ¹

1 Universitat de València

Universitat de València

Valencia, España

ROR https://ror.org/043nxc105

Journal:

Anales del Instituto de Actuarios Españoles

ISSN: 0534-3232

Year of publication: 2020

Issue: 26

Pages: 23-46

Type: Article

DOI: 10.26360/2020_2 DIALNET GOOGLE SCHOLAR Open access editor

More publications in: Anales del Instituto de Actuarios Españoles

Abstract

El fraude en el seguro de automóvil ha aumentado considerablemente en los últimos años, indudablemente impulsado por la crisis económica. Este incremento significativo del número de reclamaciones fraudulentas, así como los nuevos requerimientos asociados con Solvencia II, conducen a un mayor control y asignación de recursos contra el fraude por parte de las aseguradoras. Por estas razones, la importancia del uso de avanzadas técnicas de predicción para la detección de accidentes sospechosos está más que justificada.

Bibliographic References

Artís, M., Ayuso, M., Guillen, M. (1999). Técnicas cuantitativas para la detección del fraude en el seguro del automóvil. Anales del Instituto de Actuarios Españoles, 5, 51-84.
ASEPEG (2020). Glorario de Términos. En https://www.apeseg.org.pe/glosario-de-terminos/
Ayuso, M., Guillén, M. (1999). Modelos de deteccion de fraude en el seguro de automóvil, Cuadernos Actuariales, 8, 135-149.
Badal-Valero E., Alvarez-Jareño, J.A. y Pavía, J.M. (2018). Combining Benford's Law and Machine Learning to detect Money Laundering. An actual Spanish court case, Forensic Science International, 282, 24-34.
Belhadji, B., Dionne, G. (1997). Development of an expert system for automatic detection of automobile insurance fraud. Working Paper 97-06. École des Hautes Études Commerciales. Université de Montréal.
Ben-Hur, A., Horn, D., Siegelmann, H., Vapnik, V. (2001). Support Vector Clustering. Journal of Machine Learning Research. 2. 125-137.
Bolton, R.J. y Hand, D.J. (2002). Statistical Fraud Detection: A Review. Statistical Science, 17 (13), 235-255.
Breiman, L. (2001). Random Forests. Machine Learning, 45 (1), 5-32.
Brockett, P.L, Xia, X y Derrig, R. (1995).Using Kohonen’s Self-Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud. Journal of Risk and Insurance, 65 (2), 245-274.
Burez, J. y Van den Poel, D. (2009). Handling class imbalance in customer churn prediction. Experts Systems with Applications, 36, 4626-4636.
Cestnik, B, Kononenko, I, Bratko, I. (1987). A knowledge elicitation tool for sophisticated users. Progress in Machine Learning, 31-45, Sigma Press.
Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P. (2002). Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357.
Chawla, N. V. (2005). Data mining for imbalanced datasets: An overview. Data mining and knowledge discovery handbook, 853-867. Springer US.
Chen, C.; Liaw, A. y Breiman, L. (2004). Using random forest to learn imbalanced data. Technical Report 666. Statistics Department of University of California at Berkley.
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H.; Chen, K., Mitchell, R., Cano, I., Zhou, T., Li, M., Xie, J., Lin, M.,L Geng, Y. y Li, Y. (2018). xgboost: Extreme Gradient Boosting. R package version 0.71.2. https://CRAN.R-project.org/package=xgboost
Crocker, K.J. y S. Tennyson (2002). Insurance Fraud and Optimal Claims Settlement Strategies. Journal of Law & Economics, 45(2), 469-507.
Cummins, J.D. y Tennyson, S. (1996). Moral Hazard in Insurance Claiming: Evidence from Automobile Insurance. Journal of Risk and Uncertainty, 12 (1), 29-50.
Derrig, R.A y Ostaszewski, K.M. (1995). Fuzzy Techniques of Pattern Recognition in Risk and Claim Classification, Journal of Risk and Insurance, 62 (3), 447-482.
Friedman, Jerome H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29 (5), 1189–1232.
Guo, H., Li, Y., Shang, J., Mingyun, G., Yuanyue, H., and Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications.Expert Syst. Appl., 73:220–239.
Hastie T, Rosset S, Tibshirani R, Zhu J (2004). The Entire Regularization Path for the Support Vector Machine. Journal of Machine Learning Research, 5, 1391–1415.
He, H. y Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263-1284.
Hidalgo Ruiz-Capillas, S (2014). Random Forests para detección de fraude en medios de pago. Trabajo Final de Máster. Universidad Autónoma de Madrid.
Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79 (8), 2554-2558.
Huigevoort, Chantine (2015). Customer Churn prediction for an insurance company. Eindhoven University of Technology. Master Thesis. https://pure.tue.nl/ws/portalfiles/portal/47019808 [Último acceso: 23 de septiembre de 2018]
ICEA (2018). El Fraude al Seguro Español. Estadística a diciembre. Año 2017. Madrid, España.
Karatzoglou, A., Meyer, D. y Hornik, K. (2006). Support Vector Machines in R. Journal of Statistical Software, 15 (i09).
Kaymak, U.; Ben-David, A. y Potharst, R. (2012). The AUK: A simple alternative to the AUC. Engineering Applications of Artificial Intelligence, 25 (5), pp. 1082-1089.
Keramati, A., Jafari-Marandi, R., Aliannejadi, M., Ahmadian, I., Mozaffari, M., y Abbasi, U. (2014). Improved churn prediction in telecommunication industry using data mining techniques. Applied Soft Computing, 24, pp. 994-1012.
Kohavi, R., y F. Provost (1998) On Applied Research in Machine Learning. In Editorial for the Special Issue on Applications of Machine Learning and the Knowledge Discovery Process, Columbia. University, New York, 30.
Kursa, M.B. y Rudnicki, W.R. (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11), 1-13. URL: http://www.jstatsoft.org/v36/i11/
Liaw, A. y Wiener M. (2002). Classification and Regression by Random Forest. R News 2 (3), pp 18-22.
López, V., Fernández, A., García, S. Palade, V. y Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences. 250, 113-141.
Lunardon, N., Menardi, G. y Torelli, N. (2014). ROSE: a Package for Binary Imbalanced Learning. The R Journal, 6 (1), 82-92.
Picard, P. (2000). Economic analysis of insurance fraud. Handbook of insurance. 315-362. Springer, Dordrecht.
Shmueli, G.; Patel, N.R. y Bruce, P.C. (2011). Data mining for business intelligence: concepts, techniques, and applications in microsoft office excel with xlminer. John Wiley and Sons, second edition.
Silver, N. (2014). La Señal y el Ruido. Ediciones Península, Barcelona. Swets, J.A. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285-1293.
Therneau, T. y Atkinson, B. (2018). rpart: Recursive Partitioning and Regression Trees. R package version 4.1-13. URL: https://CRAN.Rproject.org/package=rpart
Van Vlasselaer, V., Eliassi-Rad, T., Akoglu, L., Snoeck, M. y Baesesns, B. (2017). Network-based Fraud Detection for Social Security Fraud. Management Science, 63 (9), 3090-3110.
Yen, S.J y Lee, Y.S (2009). Cluster-based under-sampling approaches for imbalanced data distributions. Expert Systems with Applications, 36 (3), 5718-5727.

Data source: Dialnet

Algoritmos de machine learning para la detección del fraude en el seguro de automóviles

Universitat de València

Abstract

Bibliographic References