Dimensionality Reduction

 Dimensionality Reduction:

    Dimensionality Reduction in machine learning is the process of reducing the number of input Dimensions/Column /Features  in a dataset while preserving as much important information as possible.

Original Dataset (High Dimensions)

Suppose we collect the following data for each student:

1.   IQ

2.   Study Hours per Day

3.   Attendance (%)

4.   Internal Exam Marks

5.   Assignment Score

6.   Project Marks

7.   Mid-Sem Marks

8.   End-Sem Marks

o    8 features (8 dimensions)

Problem

  • Many features are correlated (exam marks, assignments, projects)
  • More features → complex model, higher computation, risk of overfitting

Apply PCA (Principal Component Analysis)

PCA combines related features into principal components:

  • PC1 (Academic Performance)
    • Internal Marks, Assignment Score, Project Marks, Mid-Sem, End-Sem
  • PC2 (Effort & Consistency)
    • Study Hours, Attendance
  • PC3 (Cognitive Ability)
    • IQ

o   Reduced from 8 dimensions to 3 dimensions

Use of Dimensionality Reduction

  • Reduces computational cost
  • Removes irrelevant or redundant features
  • Avoid overfitting
  • Improves model performance
  • Enables data visualization (e.g., reducing to 2D or 3D)

Post a Comment

0 Comments