Automated Sensor Anomaly Detection for Wafer Yield Improvement

Project Status: Active | Role: Independent Process Data Consultant

1. Executive Summary

In semiconductor manufacturing, process excursions can lead to significant yield loss (scrap). This project analyzed the SECOM dataset (1567 wafers, 591 sensors) to identify the root cause of a 6.6% yield loss.

Key Findings:

Data Integrity: Identified 144 sensors with zero variance or >50% missing data, removing them to reduce noise.
Root Cause Analysis: Using a Random Forest Classifier, Sensor 59 was identified as the primary driver of failure (Importance Score: 0.030, 2x higher than next feature).
Failure Mode: Distribution analysis reveals a "Right Shift" anomaly. Wafers with higher values on Sensor 59 are significantly more likely to fail.
Recommendation: Implement a tighter Upper Control Limit (UCL) on Sensor 59 based on the "Golden Path" baseline of passing wafers.

Note: GitHub cannot render the interactive HTML report directly. To see the full analysis including visualizations, please view the Exploratory Analysis Notebook.

2. Technical Methodology

Data Source: UCI Machine Learning Repository (SECOM Data).
Stack: Python, Pandas, Scikit-Learn, Matplotlib/Seaborn.
Key Techniques:
- Preprocessing: Variance Thresholding (removing "dead" sensors) and Null Value Imputation.
- Feature Engineering: Boruta / Random Forest Feature Importance to rank sensor impact.
- Modeling: Logistic Regression for baseline classification of Pass/Fail wafers.

3. Getting Started

Note on Data: The raw data files (raw_wafer_data.csv) are not included in this repository to maintain a lightweight footprint.

To generate the data locally:

Clone the repository:

git clone [https://github.com/berlinsudduth/semiconductor-yield-optimization.git](https://github.com/berlinsudduth/semiconductor-yield-optimization.git)

Run the data loader script:
```
python src/data_loader.py
```
The CSV files will appear in the main directory, ready for analysis.

4. Project Structure

├── notebooks
│   └── 01_exploratory_analysis.ipynb   <- Main analysis and visualization
├── reports
│   └── 01_exploratory_analysis.html    <- Rendered HTML report
├── src
│   ├── data_loader.py                  <- Script to download and merge web data
│   └── generate_report.py              <- Utility to build the HTML report
├── .gitignore                          <- Standard python ignore file
└── README.md                           <- The top-level documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Sensor Anomaly Detection for Wafer Yield Improvement

1. Executive Summary

2. Technical Methodology

3. Getting Started

4. Project Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
notebooks		notebooks
reports		reports
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

berlinsudduth/semiconductor-yield-optimization

Folders and files

Latest commit

History

Repository files navigation

Automated Sensor Anomaly Detection for Wafer Yield Improvement

1. Executive Summary

2. Technical Methodology

3. Getting Started

4. Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages