Designing a health insurance fraud detection system using artificial intelligence algorithms

Document Type : Original Article

Authors

1 Assistant Professor, Department of Operations Management and Information Technology, Faculty of Management, Kharazmi University, Tehran, Iran.

2 Assistant Professor, Department of Banking, Insurance and Customs, Faculty of Management, Kharazmi University, Tehran, Iran.

3 Assistant Professor, Insurance Research Institute, Tehran, Iran.

4 Assistant Professor, Department of Industrial Management and Information Technology, Faculty of Management and Accounting, Shahid Beheshti University, Tehran, Iran.

5 Master’s Student, Department of Operations Management and Information Technology, Faculty of Management, Kharazmi University, Tehran, Iran.

10.48308/jimp.16.1.133

Abstract

Introduction: With the rapid expansion of healthcare services, fraud in health insurance systems has become a serious challenge. This study aims to design and develop an intelligent and modular framework for fraud detection in health insurance. The framework is designed to identify abusive and fraudulent behaviors regardless of the type of service or actor involved, and to adapt effectively to dynamic and complex environments. The primary objective is to provide a flexible solution that enhances the accuracy of fraud detection while reducing human error in the decision-making process.
Methods: The proposed framework consists of four key modules. First, a knowledge-based module leverages insights from insurance and medical experts to build a simulation framework for fraud detection, enabling the medical-insurance team to describe and visualize abnormal behaviors based on the actions of different actors. Second, a two-stage data warehouse is designed to efficiently process large volumes of insurance data. In the first-stage warehouse, the ETL (extract–transform–load) process ingests claims data, cleanses data quality issues, and removes inconsistencies and errors to prepare the data for feature extraction required for fraud detection. In the second-stage warehouse, in collaboration with insurance and medical experts, relevant features for fraud detection are extracted and selected. To this end, a framework for simulating the fraud-detection process is built to enable the medical-insurance team to describe, analyze, and visualize abnormal behaviors based on the actions of different actors. Accordingly, a list of twenty key features for fraud detection was extracted and documented, covering information about actors, productss/services, and related features for each type of fraud. Third, the fraud detection engine is based on a proposed algorithm called K-IF, which first clusters data using Isolation Forest (IF) and then identifies suspicious samples using K-Means. Fourth, visualization tools and a dynamic management dashboard are developed to support interactive analysis and real-time updates by users. 
Results and discussion: Experimental results on labeled datasets demonstrate that the proposed algorithm, by leveraging the discriminative power of IF and the clustering precision of K-Means, achieves better performance across multiple metrics and computational times than common algorithms such as LOF, OCSVM, EE, DBSCAN, AE, and K-Means. Furthermore, results from applying the proposed algorithm to real data from a health insurance company indicate that this approach, with reduced dependence on contamination rate and improved accuracy in detecting edge cases, demonstrates strong anomaly-detection capabilities. Ultimately, the framework has been developed as a software package for private insurance companies, offering advanced analytical tools that significantly enhance decision-making and reduce the need for human intervention.
Conclusion: This study highlights that success in detecting insurance fraud is directly tied to the quality and precision of features extracted from healthcare transaction data. The synergy between demographic, financial, and service-related data plays a crucial role in increasing the sensitivity of machine learning models to anomalous behaviors. However, the lack of accurate and structured data remains a major challenge in developing effective fraud detection software. The developed framework, designed as a software package for managing health insurance claims, integrates machine learning models, a modular architecture, and a modern user interface to deliver high scalability and rapid responsiveness to organizational needs. It is recommended that insurance company managers adopt this solution as part of their digital strategy for claims management. By integrating with existing systems and utilizing secure databases and interactive dashboards, they can achieve improved efficiency, greater transparency, and reduced fraud-related costs.

Keywords

Main Subjects


  1. AbuAlghanam, O., Alazzam, H., Alhenawi, E. A., Qatawneh, M., & Adwan, O. (2023). Fusion-based anomaly detection system using modified isolation forest for internet of things. Journal of Ambient Intelligence and Humanized Computing14(1), 131-145.‏
  2. Bauder, R., Khoshgoftaar, T. M., & Seliya, N. (2017). A survey on the state of healthcare upcoding fraud analysis and detection. Health Services and Outcomes Research Methodology17, 31-55.‏
  3. Busch, R. S. (2012). Healthcare fraud: auditing and detection guide. John Wiley & Sons.
  4. Chandralekha, E., Vinodhini, S., Kandasamy, V., & Rama, P. (2025). Heart Rate Anomaly Detection in Healthcare Using Elliptic Envelope and Local Forest. Procedia Computer Science258, 1677-1687.‏
  5. Debener, J., Heinke, V., & Kriebel, J. (2023). Detecting insurance fraud using supervised and unsupervised machine learning. Journal of Risk and Insurance90(3), 743-768.‏
  6. De Meulemeester, H., De Smet, F., van Dorst, J., Derroitte, E., & De Moor, B. (2025). Explainable unsupervised anomaly detection for healthcare insurance data. BMC Medical Informatics and Decision Making25(1), 14.‏
  7. Dharmadhikari, (2025) Insurance Claims Management Market Report 2025 (Global Edition). Site: https://www.cognitivemarketresearch.com/insurance-claims-management-market-report.
  8. du Preez, A., Bhattacharya, S., Beling, P., & Bowen, E. (2025). Fraud detection in healthcare claims using machine learning: A systematic review. Artificial Intelligence in Medicine160, 103061.‏
  9. Ding, Z., & Fei, M. (2013). An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. IFAC Proceedings Volumes46(20), 12-17.‏
  10. Ding, K., Zhou, Q., Tong, H., & Liu, H. (2021, April). Few-shot network anomaly detection via cross-network meta-learning. In Proceedings of the web conference 2021(pp. 2448-2456).‏
  11. Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd(Vol. 96, No. 34, pp. 226-231).‏
  12. Frigge, M., Hoaglin, D. C., & Iglewicz, B. (1989). Some implementations of the boxplot. The American Statistician43(1), 50-54.‏
  13. Feng, Y., Cai, W., Yue, H., Xu, J., Lin, Y., Chen, J., & Hu, Z. (2022). An improved X-means and isolation forest based methodology for network traffic anomaly detection. Plos one17(1), e0263423.‏
  14. Hamid, Z., Khalique, F., Mahmood, S., Daud, A., Bukhari, A., & Alshemaimri, B. (2024). Healthcare insurance fraud detection using data mining. BMC Medical Informatics and Decision Making24(1), 112.‏
  15. Johnson, M. E., & Nagarur, N. (2016). Multi-stage methodology to detect health insurance claim fraud. Health care management science19, 249-260.‏
  16. Jones, P. J., James, M. K., Davies, M. J., Khunti, K., Catt, M., Yates, T., ... & Mirkes, E. M. (2020). FilterK: A new outlier detection method for k-means clustering of physical activity. Journal of biomedical informatics104, 103397.‏
  17. Jafarnejad Chaghoshi, A., Khani, A. M., & Rezasoltani, A. (2024). Risk Modeling in Banking Services for the Blind Using Fuzzy FMEA and Graph Neural Network (GNN). Journal of Industrial Management Perspective, 14(4), 223-255.‏ (In Persian).
  18. Kose, I. (2020). An Ontology-Based Medical Information Management System for Electronic Claim Processing Systems.‏
  19. Kose, I., Gokturk, M., & Kilic, K. (2015). An interactive machine-learning-based electronic fraud and abuse detection system in healthcare insurance. Applied Soft Computing36, 283-299.‏
  20. Kumar, M., Ghani, R., & Mei, Z. S. (2010, July). Data mining to predict and prevent errors in health insurance claims processing. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining(pp. 65-74).‏
  21. Keshvari Kamran, J., Keramati, M., Toloie Eshlaghy, A., & Amin Mousavi, S. A. (2024). Presenting the Elements and Reinforcement Learning Methodology of Hospital Accreditation Based on the Agent-Based Conceptual Model. Journal of Industrial Management Perspective, 14(1), 238-264.‏ (In Persian).
  22. Liou, F. M., Tang, Y. C., & Chen, J. Y. (2008). Detecting hospital fraud and claim abuse through diabetic outpatient services. Health care management science11, 353-358.‏
  23. Luo, W., & Gallagher, M. (2010, December). Unsupervised DRG upcoding detection in healthcare databases. In 2010 IEEE International Conference on Data Mining Workshops(pp. 600-605). IEEE.‏
  24. Leung, K., & Leckie, C. (2005, January). Unsupervised anomaly detection in network intrusion detection using clusters. In Proceedings of the Twenty-eighth Australasian conference on Computer Science-Volume 38(pp. 333-342).‏
  25. Laskar, M. T. R., Huang, J. X., Smetana, V., Stewart, C., Pouw, K., An, A., ... & Liu, L. (2021). Extending isolation forest for anomaly detection in big data via K-means. ACM Transactions on Cyber-Physical Systems (TCPS)5(4), 1-26.‏
  26. Lu, J., Lin, K., Chen, R., Lin, M., Chen, X., & Lu, P. (2023). Health insurance fraud detection by using an attributed heterogeneous information network with a hierarchical attention mechanism. BMC Medical Informatics and Decision Making23(1), 62.
  27. Leevy, J. L., Salekshahrezaee, Z., & Khoshgoftaar, T. M. (2024, July). A Review of Unsupervised Anomaly Detection Techniques for Health Insurance Fraud. In 2024 IEEE 10th International Conference on Big Data Computing Service and Machine Learning Applications (BigDataService)(pp. 141-149). IEEE.‏
  28. Li, Y., Yan, C., Liu, W., & Li, M. (2018). A principle component analysis-based random forest with the potential nearest neighbor method for automobile insurance fraud identification. Applied Soft Computing70, 1000-1009.‏
  29. Lee, I., & Shin, Y. J. (2020). Machine learning for enterprises: Applications, algorithm selection, and challenges. Business Horizons63(2), 157-170.‏
  30. Matloob, I., Khan, S., Rukaiya, R., Alfrahi, H., & Ali Khan, J. (2025). Healthcare fraud detection using adaptive learning and deep learning techniques. Evolving Systems16(2), 72.‏
  31. Mohanta, A., & Panigrahi, S. (2023). Health Insurance Fraud Detection Using Feature Selection and Ensemble Machine Learning Techniques. In Advances in Distributed Computing and Machine Learning: Proceedings of ICADCML 2023(pp. 197-207). Singapore: Springer Nature Singapore.‏
  32. Massi, M. C., Ieva, F., & Lettieri, E. (2020). Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases. BMC medical informatics and decision making20(1), 160.‏
  33. Machireddy, J. (2025). Automation in healthcare claims processing: Enhancing efficiency and accuracy.‏
  34. Naidoo, K., & Marivate, V. (2020, April). Unsupervised anomaly detection of healthcare providers using generative adversarial networks. In Conference on e-Business, e-Services and e-Society(pp. 419-430). Cham: Springer International Publishing.‏
  35. Ortega, P. A., Figueroa, C. J., & Ruz, G. A. (2006). A Medical Claim Fraud/Abuse Detection System based on Data Mining: A Case Study in Chile. DMIN6, 26-29.
  36. Pooya, A. R., & Javan Rad, E. (2014). Implementation of Neural Networks in Group Technology and Its Comparison to the Results of K-means, Similarity Coefficient Method and Rank Order Clustering. Journal of Industrial Management Perspective, 3(4), 39-62.‏ (In Persian).
  37. Puggini, L., & McLoone, S. (2018). An enhanced variable selection and Isolation Forest based methodology for anomaly detection with OES data. Engineering Applications of Artificial Intelligence67, 126-135.‏
  38. Ripan, R. C., Sarker, I. H., Hossain, S. M. M., Anwar, M. M., Nowrozy, R., Hoque, M. M., & Furhad, M. H. (2021). A data-driven heart disease prediction model through K-means clustering-based anomaly detection. SN Computer Science2(2), 112.‏
  39. Samariya, D., Ma, J., Aryal, S., & Zhao, X. (2023). Detection and explanation of anomalies in healthcare data. Health Information Science and Systems11(1), 20.‏
  40. Shamitha, S. K., & Ilango, V. (2020, July). A time-efficient model for detecting fraudulent health insurance claims using artificial neural networks. In 2020 International Conference on System, Computation, Automation and Networking (ICSCAN)(pp. 1-6). IEEE.‏
  41. Sasaki, Y. (2007). The truth of the F-measure. Teach tutor mater1(5), 1-5.‏
  42. Sisko, A. M., Keehan, S. P., Poisal, J. A., Cuckler, G. A., Smith, S. D., Madison, A. J., ... & Hardesty, J. C. (2019). National health expenditure projections, 2018–27: economic and demographic trends drive spending and enrollment growth. Health affairs38(3), 491-501.‏
  43. Suroor, N., & Misra, T. (2024). Medical Insurance Fraud Detection. In Deep Learning in Internet of Things for Next Generation Healthcare(pp. 182-193). Chapman and Hall/CRC.‏
  44. Suesserman, M., Gorny, S., Lasaga, D., Helms, J., Olson, D., Bowen, E., & Bhattacharya, S. (2023). Procedure code overutilization detection from healthcare claims using unsupervised deep learning methods. BMC Medical Informatics and Decision Making23(1), 196.‏
  45. Sun, J., Li, Y., Chen, C., Lee, J., Liu, X., Zhang, Z., ... & Xu, W. (2020, April). FDHelper: assist unsupervised fraud detection experts with interactive feature selection and evaluation. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems(pp. 1-12).‏
  46. Spiekermann, D., & Keller, J. (2021). Unsupervised packet-based anomaly detection in virtual networks. Computer Networks192, 108017.‏
  47. Vishwakarma, M., & Kesswani, N. (2023). A new two-phase intrusion detection system with Naïve Bayes machine learning for data classification and elliptic envelop method for anomaly detection. Decision Analytics Journal7, 100233.‏
  48. Wynia, M. K., Cummins, D. S., VanGeest, J. B., & Wilson, I. B. (2000). Physician manipulation of reimbursement rules for patients: between a rock and a hard place. Jama283(14), 1858-1865.‏‏
  49. Yamanishi, K., Takeuchi, J. I., Williams, G., & Milne, P. (2000, August). On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining(pp. 320-324).
  50. Yoo, Y., Shin, J., & Kyeong, S. (2023). Medicare fraud detection using graph analysis: a comparative study of machine learning and graph neural networks. IEEE Access11, 88278-88294.‏