Document Type : Original Article
Authors
1
Ph.D. Student, Department of Industrial Management, North Tehran Branch, Islamic Azad University, Tehran, Iran.
2
Associate Professor, Department of Applied Mathematics, South Tehran Branch, Islamic Azad University, Tehran, Iran.
3
Assistant Professor, Department of Industrial Management, South Tehran Branch, Islamic Azad University, Tehran, Iran.
4
Assistant Professor, Department of Industrial Management, North Tehran Branch, Islamic Azad University, Tehran, Iran.
10.48308/jimp.15.3.170
Abstract
Introduction and Objectives: The process of production lines and their sequence is one of the fundamental approaches in planning industrial products in bulk. Lack of proper planning in lines and suitable solutions for optimizing effective systems in the production and assembly process leads to increased time allocated to production, increased machine downtime, and consequently a decrease in the number of products produced in terms of quantity and production rate. Inefficiency of allocated resources results in increased system costs, all of which ultimately lead to low productivity and loss of available resources. Therefore, the main objective of this research is to identify anomalies in the semiconductor wafer production process using machine learning methods. The data used includes various features from produced wafers collected from a major manufacturer in the semiconductor industry, containing information about the status of wafers during the production process. To improve model performance and reduce the negative effects of outlier data, a winsorizing method was used to adjust extreme values in some features. Additionally, to better prepare the data, features were standardized so that the model would not be sensitive to scale differences between features.
Method: In this research, through data preprocessing methods and simulation in Python software, efforts were made to increase the model's accuracy in identifying anomalies. The first step was data preparation and removal or adjustment of outlier data. Since some features contained extreme values that could skew the model, a "winsorizing" method was employed. Winsorizing means limiting very large and very small values of each feature to certain thresholds to reduce their impact on model performance. Another key step in this project was dimensionality reduction; given that this dataset includes 1,558 features, processing and analyzing all these features requires significant computational resources and may complicate the model unnecessarily. Therefore, using Linear Discriminant Analysis (LDA), the dimensionality of the data was reduced to a lower-dimensional space to create better separation between normal and anomalous classes. This dimensionality reduction helps the model classify data more accurately while simplifying computational processing.
Findings: After data preparation, the standard table of orthogonal arrays in Taguchi method is used to standardize the data. L9(34) orthogonal arrays are selected as the most suitable design for models three to six. Then, the research data is used to identify anomalies using XGBoost model and genetic algorithm and compare the two models. The performance of the model was evaluated using confusion matrix and ROC curve and the efficiency of the algorithm. The results showed that the model has a high ability to identify anomalies and the value under the curve AUC was obtained equal to 0.97.Next, in order to further optimize and manage the challenge of data imbalance, Genetic Algorithm (GA) was used as an evolutionary approach to adjust the feature weights and classification threshold. These results indicate the ability of the model to distinguish healthy and defective samples with high accuracy. This research shows that by using appropriate data preprocessing techniques and machine learning models, successful results can be achieved in identifying production anomalies and identifying defective parts and preventing defective products from entering the market
Conclusion: The results of this study showed that the XGBoost method has a high ability to detect anomalies. Also, the genetic algorithm has been able to improve performance metrics such as precision (92.4%), recall (0.924), and score (0.913) and provide stable convergence over different generations. The combination of XGBoost and genetic algorithm (GA) allows for more accurate detection of anomalies and shows that this approach can be used as a practical framework in improving quality control, reducing waste, and increasing the efficiency of production lines.
Keywords
Main Subjects