Intelligent Anomaly Detection in Unbalanced Industrial Data Using the Xgboost Model and Genetic Algorithm (GA) To Optimize Performance in Identifying Defective Products in the Production Line

Nematnia, Rasoul; Khademi, Maryam; Fathi, Kiamars; Sardar, Soheila

doi:10.48308/jimp.15.3.170

Intelligent Anomaly Detection in Unbalanced Industrial Data Using the Xgboost Model and Genetic Algorithm (GA) To Optimize Performance in Identifying Defective Products in the Production Line

Document Type : Original Article

Authors

¹ Ph.D. Student, Department of Industrial Management, North Tehran Branch, Islamic Azad University, Tehran, Iran.

² Associate Professor, Department of Applied Mathematics, South Tehran Branch, Islamic Azad University, Tehran, Iran.

³ Assistant Professor, Department of Industrial Management, South Tehran Branch, Islamic Azad University, Tehran, Iran.

⁴ Assistant Professor, Department of Industrial Management, North Tehran Branch, Islamic Azad University, Tehran, Iran.

10.48308/jimp.15.3.170

Abstract

Introduction and Objectives: The process of production lines and their sequence is one of the fundamental approaches in planning industrial products in bulk. Lack of proper planning in lines and suitable solutions for optimizing effective systems in the production and assembly process leads to increased time allocated to production, increased machine downtime, and consequently a decrease in the number of products produced in terms of quantity and production rate. Inefficiency of allocated resources results in increased system costs, all of which ultimately lead to low productivity and loss of available resources. Therefore, the main objective of this research is to identify anomalies in the semiconductor wafer production process using machine learning methods. The data used includes various features from produced wafers collected from a major manufacturer in the semiconductor industry, containing information about the status of wafers during the production process. To improve model performance and reduce the negative effects of outlier data, a winsorizing method was used to adjust extreme values in some features. Additionally, to better prepare the data, features were standardized so that the model would not be sensitive to scale differences between features.
Method: In this research, through data preprocessing methods and simulation in Python software, efforts were made to increase the model's accuracy in identifying anomalies. The first step was data preparation and removal or adjustment of outlier data. Since some features contained extreme values that could skew the model, a "winsorizing" method was employed. Winsorizing means limiting very large and very small values of each feature to certain thresholds to reduce their impact on model performance. Another key step in this project was dimensionality reduction; given that this dataset includes 1,558 features, processing and analyzing all these features requires significant computational resources and may complicate the model unnecessarily. Therefore, using Linear Discriminant Analysis (LDA), the dimensionality of the data was reduced to a lower-dimensional space to create better separation between normal and anomalous classes. This dimensionality reduction helps the model classify data more accurately while simplifying computational processing.
Findings: After data preparation, the standard table of orthogonal arrays in Taguchi method is used to standardize the data. L9(34) orthogonal arrays are selected as the most suitable design for models three to six. Then, the research data is used to identify anomalies using XGBoost model and genetic algorithm and compare the two models. The performance of the model was evaluated using confusion matrix and ROC curve and the efficiency of the algorithm. The results showed that the model has a high ability to identify anomalies and the value under the curve AUC was obtained equal to 0.97.Next, in order to further optimize and manage the challenge of data imbalance, Genetic Algorithm (GA) was used as an evolutionary approach to adjust the feature weights and classification threshold. These results indicate the ability of the model to distinguish healthy and defective samples with high accuracy. This research shows that by using appropriate data preprocessing techniques and machine learning models, successful results can be achieved in identifying production anomalies and identifying defective parts and preventing defective products from entering the market
Conclusion: The results of this study showed that the XGBoost method has a high ability to detect anomalies. Also, the genetic algorithm has been able to improve performance metrics such as precision (92.4%), recall (0.924), and score (0.913) and provide stable convergence over different generations. The combination of XGBoost and genetic algorithm (GA) allows for more accurate detection of anomalies and shows that this approach can be used as a practical framework in improving quality control, reducing waste, and increasing the efficiency of production lines.

Keywords

Main Subjects

Heuristic and Metaheuristic Algorithms

References

Antonini, M., Pincheira, M., Vecchio, M., & Antonelli, F. (2023). An adaptable and unsupervised TinyML anomaly detection system for extreme industrial environments. Sensors, 23(4), 2344. MDPI AG. https://doi.org/10.3390/s23042344
Choi, H., Kim, D., Kim, J., Kim, J., & Kang, P. (2022). Explainable anomaly detection framework for predictive maintenance in manufacturing systems. Applied Soft Computing, 125, 109147. Elsevier BV. https://doi.org/10.1016/j.asoc.2022.109147
Chalapathy, R., & Chawla, S. (2019). Deep learning for anomaly detection: A survey (Version 2). arXiv. https://doi.org/10.48550/arXiv.1901.03407
Delice, Y., Aydoğan, E. K., Özcan, U., & İlkay, M. S. (2017). A particle swarm optimization algorithm to mixed-model two-sided assembly line balancing. Journal of Intelligent Manufacturing, 28(1), 23–36.
Kamran, K., & Behnamian, J. (2023). Unrelated parallel machine scheduling with sequence-dependent setup times in multi-factory production network: Modeling and algorithm. Industrial Management Perspective, 13(3), 223–248. https://doi.org/10.48308/JIMP.13.3.223 (In Persian)
Lee, K. S., Kim, S. B., & Kim, H.-W. (2023). Enhanced anomaly detection in manufacturing processes through hybrid deep learning techniques. IEEE Access, 11, 93368–93380. https://doi.org/10.1109/ACCESS.2023.3308698
Kiangala, S. K., & Wang, Z. (2021). An effective adaptive customization framework for small manufacturing plants using extreme gradient boosting–XGBoost and random forest ensemble learning algorithms in an Industry 4.0 environment. Machine Learning with Applications, 4, 100024. Elsevier BV. https://doi.org/10.1016/j.mlwa.2021.100024
Lu, L., Zhang, Y., Si, Z., & Dou, Z. (2024). Research on anomaly detection of parts in workshop production line based on BO-XGBoostLSS. Lecture Notes in Computer Science, 1291. Springer Nature Singapore.
Liu, J. (2024). Predicting Chinese stock market using XGBoost multi-objective optimization with optimal weighting. PeerJ Computer Science, 10, e1931. PeerJ. https://doi.org/10.7717/peerj-cs.1931
Leng, J., Chen, Z., Sha, W., Lin, Z., Lin, J., & Liu, Q. (2022). Digital twins-based flexible operating of open architecture production line for individualized manufacturing. Advanced Engineering Informatics, 53, 101676. Elsevier BV. https://doi.org/10.1016/j.aei.2022.101676
Lee, D., Kim, C.-K., Yang, J., Cho, K.-Y., Choi, J., Noh, S.-D., & Nam, S. (2022). Digital twin-based analysis and optimization for design and planning of production lines. Machines, 10(12), 1147. MDPI AG. https://doi.org/10.3390/machines10121147
Liu, C., He, Y., Wang, Y., Li, Y., Wang, S., Wang, L., & Wang, Y. (2020). Effects of process parameters on cutting temperature in dry machining of ball screw. ISA Transactions, 101.
Loh, C.-H., Chen, Y.-C., & Su, C.-T. (2024). Using transfer learning and radial basis function deep neural network feature extraction to upgrade existing product fault detection systems for Industry 4.0. Electronics, 14(7), 2913. https://doi.org/10.3390/electronics14072913
Li, Z., Kucukkoc, I., & Tang, Q. (2017). New MILP model and station-oriented ant colony optimization algorithm for balancing U-type assembly lines. Computers & Industrial Engineering.
Moslemipour, G., & Ghadirpour, S. M. (2021). Intelligent design of a dynamic facility layout in the stochastic environment of flexible manufacturing systems considering routing flexibility. Journal of Industrial Management Perspective, 11(1), 175–209. https://doi.org/10.52547/jimp.11.1.175 (In Persian)
Javaid, M., Haleem, A., Singh, R. P., & Suman, R. (2022). Artificial intelligence applications for Industry 4.0: A literature-based study. Journal of Industrial Integration and Management, 7(1), 83–111.
Nguyen, H. D., Tran, K. P., Thomassey, S., & Hamad, M. (2021). Forecasting and anomaly detection approaches using LSTM and LSTM autoencoder techniques with the applications in supply chain management. International Journal of Information Management, 57, 102282. Elsevier BV. https://doi.org/10.1016/j.ijinfomgt.2020.102282
Pang, G., Shen, C., Cao, L., & Van Den Hengel, A. (2021). Deep learning for anomaly detection. ACM Computing Surveys, 54(2), 1–38. ACM. https://doi.org/10.1145/3439950
Park, K. T., Yang, J., & Noh, S. D. (2020). VREDI: Virtual representation for a digital twin application in a work-center-level asset administration shell. Journal of Intelligent Manufacturing, 32(2), 501–544. https://doi.org/10.1007/s10845-020-01586-x
Rousopoulou, V., Vafeiadis, T., Nizamis, A., Iakovidis, I., Samaras, L., Kirtsoglou, A., Georgiadis, K., Ioannidis, D., & Tzovaras, D. (2022). Cognitive analytics platform with AI solutions for anomaly detection. Computers in Industry, 134, 103555. Elsevier BV. https://doi.org/10.1016/j.compind.2021.103555
Rawat, A. (2020). A review on Python programming. International Journal of Research in Engineering, Science and Management, 3(12), 8–11.
Sadeghi, H., Farughi, H., Kalevandi, F., & Solgi, M. (2023). Production planning system with variable demand and stochastic machine breakdown. Industrial Management Perspective, 13(3), 93–126. https://doi.org/10.48308/JIMP.13.3.93 (In Persian)
Shi, H., Cao, G., Ma, G., Duan, J., Bai, J., & Meng, X. (2022). New progress in artificial intelligence algorithm research based on big data processing of IoT systems on intelligent production lines. Computational Intelligence and Neuroscience, 2022, 1–12. Hindawi. https://doi.org/10.1155/2022/3283165
Phuyal, S., Bista, D., & Bista, R. (2020). Challenges, opportunities and future directions of smart manufacturing: A state-of-the-art review. Sustainable Futures.
Yang, S., Feng, M., & Guan, D. (2022). Intelligent scheduling system for production line automatic matching based on DSSM-XGBoost. Journal of Physics: Conference Series, 2203, 012072. https://doi.org/10.1088/1742-6596/2203/1/012072
Shi, X., Xiao, Y., Mei, X., Tao, T., & Wang, H. (2023). Thermal error modeling of machine tool based on dimensional error of machined parts in automatic production line. ISA Transactions, 135, 575–584. https://doi.org/10.1016/j.isatra.2022.09.043
Sankhye, S., & Hu, G. (2020). Machine learning methods for quality prediction in production. Logistics, 4(4), 35. https://doi.org/10.3390/logistics4040035
Sobhi Shoje, M., & Smuee, P. (2015). Presenting a stable approach for the robotic assembly line sequence balance problem considering robot failures. 15th International Industrial Engineering Conference, Yazd University, Yazd.
Srinath, K. R. (2017). Python—the fastest growing programming language. International Research Journal of Engineering and Technology, 4(12), 354–357.
Soller, S., Kranz, M., & Hoelzl, G. (2020). Adaptive error prediction for production lines with unknown dependencies. In Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics (pp. 227–234). ACM. https://doi.org/10.1145/3405962.3405994
Usuga Cadavid, J. P., Lamouri, S., Grabot, B., Pellerin, R., & Fortin, A. (2020). Machine learning applied in production planning and control: A state-of-the-art in the era of Industry 4.0. Journal of Intelligent Manufacturing, 31(6), 1531–1558. https://doi.org/10.1007/s10845-019-01531-7
Wang, Y., Perry, M., Whitlock, D., & Sutherland, J. W. (2022). Detecting anomalies in time series data from a manufacturing system using recurrent neural networks. Journal of Manufacturing Systems, 62, 823–834. Elsevier BV. https://doi.org/10.1016/j.jmsy.2020.12.007
Wagner, R., Fischer, J., Gauder, D., Haefner, B., & Lanza, G. (2020). Virtual in-line inspection for function verification in serial production by means of artificial intelligence. Procedia CIRP, 92, 63–68. Elsevier BV. https://doi.org/10.1016/j.procir.2020.03.126
Yang, S., Feng, M., & Guan, D. (2022). Intelligent scheduling system for production line automatic matching based on DSSM-XGBoost. Journal of Physics: Conference Series, 2203(1), 012072. https://doi.org/10.1088/1742-6596/2203/1/012072
Zhang, W., Xu, W., Liu, G., & Gen, M. (2017). An effective hybrid evolutionary algorithm for stochastic multiobjective assembly line balancing problem. Journal of Intelligent Manufacturing, 28(3), 783–791.

Journal of Industrial Management Perspective

Volume 15, Issue 3 - Serial Number 59
September 2025
Pages 170-194

Article View: 399
PDF Download: 151

Intelligent Anomaly Detection in Unbalanced Industrial Data Using the Xgboost Model and Genetic Algorithm (GA) To Optimize Performance in Identifying Defective Products in the Production Line

References

Volume 15, Issue 3 - Serial Number 59
September 2025
Pages 170-194

Files

Share

How to cite

Statistics

Intelligent Anomaly Detection in Unbalanced Industrial Data Using the Xgboost Model and Genetic Algorithm (GA) To Optimize Performance in Identifying Defective Products in the Production Line

References

Volume 15, Issue 3 - Serial Number 59September 2025Pages 170-194

Files

Share

How to cite

Statistics

Volume 15, Issue 3 - Serial Number 59
September 2025
Pages 170-194