Un-Supervised Machine Learning

2) Un-Supervised Machine Learning.

In data, there are only inputs and no outputs, which are called unlabeled data. When we identify patterns or groups in the data without any output, it is called Unsupervised Machine Learning.

When data has only input and no output, it is called unsupervised machine learning.

Example: Student data

Input
IQ	CGPA
110	8.5
120	9.1
100	7.8
115	8.8
105	8.0

In Un-Supervised Learning, there is no output—only input data. So, we perform tasks such as Clustering, Dimensionality Reduction, Anomaly Detection, and Association Rule Learning.

2.1 Types of Un-Supervised Machine Learning:

******************** >< ********************

I. Clustering :

Clustering is an unsupervised machine learning technique that groups similar data points into clusters based on their features, without using any labeled data.

For example,

we have a dataset of IQ and CGPA. We plot this data on a 2D coordinate system, where the X-axis represents IQ and the Y-axis represents CGPA. A clustering algorithm detects groups of students such as high IQ–high CGPA, high IQ–low CGPA, low IQ–high CGPA, and low IQ–low CGPA. In this way, students are grouped into categories. When a new student comes, the algorithm places the student into a group, and we can assign labels like 1, 2, 3, or 4 to the groups.

******************** >< ********************

II. Dimensionality Reduction:

Dimensionality Reduction in machine learning is the process of reducing the number of input Dimensions/Column /Features in a dataset while preserving as much important information as possible.

Original Dataset (High Dimensions)

Suppose we collect the following data for each student:

1. IQ

2. Study Hours per Day

3. Attendance (%)

4. Internal Exam Marks

5. Assignment Score

6. Project Marks

7. Mid-Sem Marks

8. End-Sem Marks

o 8 features (8 dimensions)

Problem

Many features are correlated (exam marks, assignments, projects)
More features → complex model, higher computation, risk of overfitting

Apply PCA (Principal Component Analysis)

PCA combines related features into principal components:

PC1 (Academic Performance)

Internal Marks, Assignment Score, Project Marks, Mid-Sem, End-Sem

PC2 (Effort & Consistency)

Study Hours, Attendance

PC3 (Cognitive Ability)

o Reduced from 8 dimensions to 3 dimensions

Use of Dimensionality Reduction

Reduces computational cost
Removes irrelevant or redundant features
Helps avoid overfitting
Improves model performance
Enables data visualization (e.g., reducing to 2D or 3D)

******************** >< ********************

III. Anomaly Detection:

Anomaly Detection is a machine learning technique used to identify rare, unusual, or abnormal data points that are different from normal data patterns.

Simple - Anomaly detection is the process of finding data points that do not follow the expected pattern in a dataset.

Use of Anomaly Detection:

·        Fraud
·        System failures
·        Security attacks
·        Medical abnormalities
·        Data errors

******************** >< ********************

IV. Association Rule Learning:

· Association Rule Learning is a machine learning technique used to discover relationships, patterns, or associations between items in large datasets.

· Association Rule Learning finds hidden relationships between items that frequently occur together in a dataset.

Example

If customers buy: Bread

Then they are also likely to buy: Milk

******************** >< ********************

Un-Supervised Machine Learning

Posted by: Notesheet2021

Post a Comment

0 Comments

About Me

Report Abuse

Followers

Search This Blog

Pages

Bottom Ad [Post Page]

Author Social Links

Author Description

Post Page Advertisement [Top]

Subscribe Us

Ad Space

Popular Posts

Simple Linear Regression

Multiple Linear Regression