Dimensionality Reduction in machine learning is the process of reducing the number of input Dimensions/Column /Features in a dataset while preserving as much important information as possible.
Original Dataset (High Dimensions)
Suppose we collect the following data
for each student:
1.
IQ
2.
Study Hours per Day
3.
Attendance (%)
4.
Internal Exam Marks
5.
Assignment Score
6.
Project Marks
7.
Mid-Sem Marks
8.
End-Sem Marks
o 8 features (8
dimensions)
Problem
- Many features are correlated (exam marks,
assignments, projects)
- More features → complex model, higher
computation, risk of overfitting
Apply PCA (Principal Component
Analysis)
PCA combines related features into principal
components:
- PC1 (Academic Performance)
- Internal Marks, Assignment Score, Project Marks,
Mid-Sem, End-Sem
- PC2 (Effort & Consistency)
- Study Hours, Attendance
- PC3 (Cognitive Ability)
- IQ
o
Reduced from 8
dimensions to 3 dimensions
Use
of Dimensionality Reduction
- Reduces computational cost
- Removes irrelevant or redundant features
- Avoid overfitting
- Improves model performance
- Enables data visualization (e.g., reducing to 2D or 3D)

0 Comments