Machine learning
•
Machine learning is a field of computer science that
uses statistical techniques to give computer systems the ability to
"learn” with data, without being explicitly programmed.
•
Machine learning is technology which enables computers
to learn automatically from past data and building model and predict output for
future value.
•
In simple word machine learning is all about learning
from data
•
Now, in the definition, "explicitly programmed"
means that we write a program for every scenario to handle that scenario.
•
But in machine learning, we don’t do that. Instead, we have
some data and an algorithm, and we instruct the algorithm to analyze or move
over the data and identify patterns. Between the input and output, once we
recognize the patterns, we provide new input data to the algorithm, and it
generates the output.
•
In a conventional or traditional program, you write a
program based on the logic you have created for a given problem or scenario.
Then, you provide data to the program, and the computer generates the output.
But in machine learning, things are
different. Here, you provide data that includes both inputs and outputs.
However, you don’t write a program or create logic manually. Instead, the logic
is automatically generated by the machine learning algorithm.
So here good part is that you can’t
write program abnd logic for every condition or scenario
So, the good part is that you don’t
have to write a program and logic for every condition or scenario, as machine
learning algorithms will handle it automatically.
•
Example,
•
If you write a program to add two numbers, whenever you
provide two numbers to it, the program will return their sum.
•
But in machine learning, you can provide data in an Excel
sheet where each row contains numbers and their sum. The machine learning
algorithm trains on this data, learns patterns, and in the future, when you
provide two or more numbers, it will know how to perform the addition where as
program you written for add two number However, the program you wrote to add
two numbers cannot handle more than two numbers as input because it is
explicitly coded to add only two numbers. This is main difference.
•
So, from this example, you can understand why ML is so
popular nowadays
•
Now, we will discuss in which scenarios ML is more useful
than normal software or traditional programming.
•
1) In some scenarios, you can't write a program or define
all possible cases, and that's where Machine Learning helps.
Example – You are trying to build a spam
classifier that identifies whether a given email is spam or not. As a software
developer, what can you do? First, you can analyze a large number of spam
emails and try to identify some patterns. For example, if the word "discount"
appears more than three times, or the word "sale" is used frequently,
or if the email contains too many images, these could be indicators of spam.
Based on these patterns, you could write a long list of if-else
conditions to develop spam detection software.
But if an advertising company finds out
that your code marks an email as spam when the word "discount"
appears more than three times, they can bypass this by using synonyms like
"offer." As a result, your program will no longer detect this
condition, and the email won’t be identified as spam.
As a software developer, you would need
to continuously update the code and refine the logic to account for such
variations. This means you must regularly update your program or software to
ensure it continues to work properly.
In Machine Learning, this does not
happen because the system learns from data. If there are changes in the data,
the algorithm automatically adapts its logic. That is the beauty of Machine
Learning.
In Machine Learning, you only need to write
one program or algorithm for a given scenario, and it can handle various cases
dynamically without requiring constant manual updates.
2) Machine Learning is used for complex
tasks where there are countless possible cases that are difficult to
anticipate. In such scenarios, traditional programming may not be effective.
Example – Image classification, you can
classify dog that in image dog is present or not
Types of machine learning
Machine learning types depend on
three different factors, but today we will focus on the amount of supervision
needed for a machine learning algorithm to be trained.
As shown in the figure, based on the
amount of supervision, Machine Learning is divided into four categories. The
first category is Supervised Machine Learning, the second is unsupervised
Machine Learning, the third is Semi-Supervised Machine Learning, and the last
is reinforcement learning.
This is a famous categorization of
machine learning types. If you read any book or watch a YouTube video, you will
see these categories: supervised learning, unsupervised learning,
semi-supervised learning, and reinforcement learning.
So, we will discuss each one in detail,
one by one, to understand the logic behind it.
Machine learning is all about
learning from data and training on it. If both input and output (labeled data)
are present in the dataset, and the task
is to find the relationship between them so that, given a new input, the output can be predicted, then this
type of learning is called supervised machine learning.
Let me explain with an example. Suppose
we have data on 5,000 students, which includes two pieces of information:
first, the student's IQ, and second, their CGPA. Additionally, we have one more
piece of information—whether the student was placed or not.
Sr. No |
IQ |
CGPA |
Placement (Y/N) |
1 |
120 |
8.5 |
Y |
2 |
110 |
7.8 |
Y |
3 |
130 |
9.1 |
Y |
4 |
105 |
7.0 |
N |
... |
... |
... |
... |
4997 |
102 |
6.9 |
N |
4998 |
138 |
9.3 |
Y |
4999 |
99 |
6.2 |
N |
5000 |
125 |
8.7 |
Y |
Now, if someone asks which of these
three columns of information is the input and which is the output, you can
easily say that IQ and CGPA are the input columns, while the column indicating
whether the student was placed or not is the output column. So, here, the
output column (student was placed or not) depends on the other two input
columns.
Now, in this data, both Input and
Output are present. When you apply a Machine Learning algorithm, it identifies
the mathematical relationship between Input and Output. This allows the model
to predict whether a student will be placed or not based on their IQ and CGPA
in the future. This is known as Supervised Machine Learning.
Supervised
Machine Learning has two types: Regression
and Classification.
To understand these, we first need to know the types of data. Generally, data
is categorized into two types: Numerical
data, such as age, weight, IQ, and CGPA, and Categorical data, such as
gender and nationality.
Supervised Machine Learning has two
types: Regression and Classification. To understand these, we
first need to know the types of data. Generally, data is categorized into two
types:
- Numerical data –
Examples: age, weight, IQ, CGPA, etc.
- Categorical data –
Examples: gender, nationality, blood group, and education level (e.g.,
Bachelor's, Master's, PhD)."
Now, let's understand what Regression
is. If you are working on a Supervised Machine Learning problem, it
means you have a dataset where both input and output columns are present. If
the output column contains numerical values, then the Supervised Machine
Learning problem is called Regression.
Example :
Student Placement Data
Sr. No. |
IQ |
CGPA |
Package (in LPA) |
1 |
120 |
8.5 |
10.5 |
2 |
110 |
7.8 |
6.8 |
3 |
130 |
9.1 |
15.2 |
4 |
105 |
7.0 |
5.5 |
... |
... |
... |
... |
4997 |
102 |
6.9 |
4.8 |
4998 |
138 |
9.3 |
18.0 |
4999 |
99 |
6.2 |
3.9 |
5000 |
125 |
8.7 |
12.5 |
=This table contains 5000
records of students' data with four columns:
- Sr. No. – Serial number of the record.
- IQ – The intelligence quotient of
the student.
- CGPA – The cumulative grade point
average.
- Package (in LPA) – The
salary package offered in Lakhs Per Annum (LPA).
The data can be used for Regression-based
Supervised Machine Learning, where IQ and CGPA are inputs, and the package
is the numerical output. This helps in predicting salary packages based on
academic performance and intelligence.
If you understand Regression,
then Classification is easy to understand. In Supervised Machine
Learning, if the output column is categorical instead of numerical,
the problem is classified as Classification.
Now, let's understand what Regression
is. If you are working on a Supervised Machine Learning problem, it
means you have a dataset where both input and output columns are present. If
the output column contains numerical values, then the Supervised Machine
Learning problem is called Regression.
Example –
Sr. No |
IQ |
CGPA |
Placement (Y/N) |
1 |
120 |
8.5 |
Y |
2 |
110 |
7.8 |
Y |
3 |
130 |
9.1 |
Y |
4 |
105 |
7.0 |
N |
... |
... |
... |
... |
4997 |
102 |
6.9 |
N |
4998 |
138 |
9.3 |
Y |
4999 |
99 |
6.2 |
N |
5000 |
125 |
8.7 |
Y |
Student Placement Data
This table contains 5000 student
records with three input columns and one output column. It is used for Classification-based
Supervised Machine Learning, where the goal is to predict whether a student
will be placed or not based on their IQ and CGPA.
Columns Explanation:
- Sr. No. – A unique serial number for
each student in the dataset.
- IQ – The Intelligence Quotient
of the student, which measures cognitive ability.
- CGPA – The Cumulative Grade
Point Average, representing academic performance.
- Placement (Y/N) – The output
label (target variable):
- Y (Yes) → The student was placed in a
job.
- N (No) → The student was not placed.
We will discuss
some examples to determine whether they are Regression or Classification
problems:
1)
Given house data, you need to predict the price of
a house → Regression Problem
2)
Given email data, you need to predict whether an email
is spam or not → Classification Problem
3)
Given weather data, you need to predict whether it
will rain today or not → Classification Problem
4)
Given an image, you need to determine whether a dog
is present or not → Classification Problem
5)
Given an image, you need to predict how many dogs
are present → Regression Problem
6) 6) Predicting a person's age based on their height and weight → Regression Problem
7) 7) Predicting the number of tickets sold for a concert based on past data → Regression Problem
8) 8) Determining whether a loan applicant will default or not → Classification Problem
9) 9) Estimating the fuel efficiency (miles per gallon) of a car based on engine size → Regression Problem
0 Comments