Designing a health insurance fraud detection system using artificial intelligence algorithms

Farrokh, Mojtaba; Sharifi, Sirous; Hozarmoghadam, Nasrin; Raad, Abas; Norouzi, Alireza

doi:10.48308/jimp.2025.241325.1658

Designing a health insurance fraud detection system using artificial intelligence algorithms

Articles in Press

Document Type : Original Article

Authors

¹ Kharazmi university

² Assistant Professor, Department of Banking, Insurance, and Customs, Faculty of Management, Kharazmi University

³ Assistant Professor, Insurance Research Center, Tehran, Iran

⁴ Assistant Professor, Department of Industrial Management and Information Technology, Faculty of Management and Accounting, Shahid Beheshti University

⁵ Master’s Student in Business Administration, Department of Operations Management and Information Technology, Faculty of Management, Kharazmi University

10.48308/jimp.2025.241325.1658

Abstract

Introduction: With the rapid expansion of healthcare services, fraud in health insurance systems has become a serious challenge. This study aims to design and develop an intelligent and modular framework for fraud detection in health insurance. The framework is designed to identify abusive and fraudulent behaviors regardless of the type of service or actor involved, and to adapt effectively to dynamic and complex environments. The primary objective is to provide a flexible solution that enhances the accuracy of fraud detection while reducing human error in the decision-making process.

Methods: The proposed framework consists of four key modules. First, a knowledge-based module leverages insights from insurance and medical experts to build a simulation framework for fraud detection, enabling the medical-insurance team to describe and visualize abnormal behaviors based on the actions of different actors. Second, a two-stage data warehouse is designed to efficiently process large volumes of insurance data. In the first-stage warehouse, the ETL (extract–transform–load) process ingests claims data, cleanses data quality issues, and removes inconsistencies and errors to prepare the data for feature extraction required for fraud detection. In the second-stage warehouse, in collaboration with insurance and medical experts, relevant features for fraud detection are extracted and selected. To this end, a framework for simulating the fraud-detection process is built to enable the medical-insurance team to describe, analyze, and visualize abnormal behaviors based on the actions of different actors. Accordingly, a list of twenty key features for fraud detection was extracted and documented, covering information about actors, productss/services, and related features for each type of fraud. Third, the fraud detection engine is based on a proposed algorithm called K-IF, which first clusters data using Isolation Forest (IF) and then identifies suspicious samples using K-Means. Fourth, visualization tools and a dynamic management dashboard are developed to support interactive analysis and real-time updates by users.

Results and discussion: Experimental results on labeled datasets demonstrate that the proposed algorithm, by leveraging the discriminative power of IF and the clustering precision of K-Means, achieves better performance across multiple metrics and computational times than common algorithms such as LOF, OCSVM, EE, DBSCAN, AE, and K-Means. Furthermore, results from applying the proposed algorithm to real data from a health insurance company indicate that this approach, with reduced dependence on contamination rate and improved accuracy in detecting edge cases, demonstrates strong anomaly-detection capabilities. Ultimately, the framework has been developed as a software package for private insurance companies, offering advanced analytical tools that significantly enhance decision-making and reduce the need for human intervention.

Conclusion: This study highlights that success in detecting insurance fraud is directly tied to the quality and precision of features extracted from healthcare transaction data. The synergy between demographic, financial, and service-related data plays a crucial role in increasing the sensitivity of machine learning models to anomalous behaviors. However, the lack of accurate and structured data remains a major challenge in developing effective fraud detection software. The developed framework, designed as a software package for managing health insurance claims, integrates machine learning models, a modular architecture, and a modern user interface to deliver high scalability and rapid responsiveness to organizational needs. It is recommended that insurance company managers adopt this solution as part of their digital strategy for claims management. By integrating with existing systems and utilizing secure databases and interactive dashboards, they can achieve improved efficiency, greater transparency, and reduced fraud-related costs.

Keywords