Un-Supervised Machine Learning

 

2) Un-Supervised Machine Learning.

In data, there are only inputs and no outputs, which are called unlabeled data. When we identify patterns or groups in the data without any output, it is called Unsupervised Machine Learning.

When data has only input and no output, it is called unsupervised machine learning.

Example: Student data

Input

IQ

CGPA

110

8.5

120

9.1

100

7.8

115

8.8

105

8.0

In Un-Supervised Learning, there is no output—only input data. So, we perform tasks such as Clustering, Dimensionality Reduction, Anomaly Detection, and Association Rule Learning.

2.1                  Types of Un-Supervised Machine Learning:

******************** >< ********************

I.      Clustering :

Clustering is an unsupervised machine learning technique that groups similar data points into clusters based on their features, without using any labeled data.

For example,

we have a dataset of IQ and CGPA. We plot this data on a 2D coordinate system, where the X-axis represents IQ and the Y-axis represents CGPA. A clustering algorithm detects groups of students such as high IQ–high CGPA, high IQ–low CGPA, low IQ–high CGPA, and low IQ–low CGPA. In this way, students are grouped into categories. When a new student comes, the algorithm places the student into a group, and we can assign labels like 1, 2, 3, or 4 to the groups.

******************** >< ********************

II.      Dimensionality Reduction:

Dimensionality Reduction in machine learning is the process of reducing the number of input Dimensions/Column /Features  in a dataset while preserving as much important information as possible.

Original Dataset (High Dimensions)

Suppose we collect the following data for each student:

1.   IQ

2.   Study Hours per Day

3.   Attendance (%)

4.   Internal Exam Marks

5.   Assignment Score

6.   Project Marks

7.   Mid-Sem Marks

8.   End-Sem Marks

o    8 features (8 dimensions)

Problem

  • Many features are correlated (exam marks, assignments, projects)
  • More features → complex model, higher computation, risk of overfitting

Apply PCA (Principal Component Analysis)

PCA combines related features into principal components:

  • PC1 (Academic Performance)
    • Internal Marks, Assignment Score, Project Marks, Mid-Sem, End-Sem
  • PC2 (Effort & Consistency)
    • Study Hours, Attendance
  • PC3 (Cognitive Ability)
    • IQ

o   Reduced from 8 dimensions to 3 dimensions

Use of Dimensionality Reduction

  • Reduces computational cost
  • Removes irrelevant or redundant features
  • Helps avoid overfitting
  • Improves model performance
  • Enables data visualization (e.g., reducing to 2D or 3D)

******************** >< ********************

III.    Anomaly Detection:

Anomaly Detection is a machine learning technique used to identify rare, unusual, or abnormal data points that are different from normal data patterns.

 

Simple - Anomaly detection is the process of finding data points that do not follow the expected pattern in a dataset.



Use of Anomaly Detection:

·        Fraud

·        System failures

·        Security attacks

·        Medical abnormalities

·        Data errors

******************** >< ********************

IV.      Association Rule Learning:

·        Association Rule Learning  is a machine learning technique used to discover relationships, patterns, or associations between items in large datasets.

·        Association Rule Learning finds hidden relationships between items that frequently occur together in a dataset.

Example

If customers buy:  Bread

Then they are also likely to buy:  Milk

 

******************** >< ********************


Post a Comment

0 Comments