Artificial Intelligence

[Day 9] Supervised Machine Learning Algorithms

Before diving into deep learning, let’s master the basics—real-world ML tools like regression, trees, SVMs, and more that solve big problems simply.

Akshay Seth

17 Dec 2024 • 7 min read

The biggest mistake many people make when diving into AI/ML is jumping straight into complex topics like deep learning, building Generative AI applications, or other advanced techniques. While these areas are fascinating and valuable, it’s crucial to understand that many real-world problems can often be solved with simpler algorithms. Mastering these foundational techniques not only builds a strong base but also helps you tackle challenges effectively without overcomplicating the solution.

In the next couple of days, I will guide you step by step, focusing on practical, Python-based projects that mirror real-life scenarios. By starting small and gradually advancing, you'll gain not just theoretical knowledge but also hands-on experience in applying AI/ML to solve tangible problems.

What sets this approach apart is that you won't just learn concepts—you’ll implement them. From understanding basic algorithms to experimenting with data, you’ll develop real-world applications and see your learning come to life. This isn’t just about studying—it’s about creating, exploring, and truly mastering the art of machine learning.

Machine learning is broadly classified into three types: Supervised Learning, Unsupervised Learning, and Reinforcement Learning.

Let's start with the first one.

1- Supervised Machine Learning Algorithms

Supervised machine learning is a cornerstone of modern artificial intelligence, empowering systems to make predictions or decisions based on labeled data.

From predicting stock prices to diagnosing diseases, supervised algorithms are used across diverse fields. This article provides an in-depth explanation of popular supervised learning algorithms, complete with their mathematical formulas and real-life use cases in finance, healthcare, and other industries.

1. Linear Models

a) Linear Regression

Overview

Linear Regression is one of the simplest supervised algorithms for predicting continuous outcomes. It assumes a linear relationship between input features (independent variables) and the target variable (dependent variable).

Formula

y = w0 + w1 * x1 + w2 * x2 + ... + wn * xn

Explanation

y: Predicted value (target variable).
x1, x2, ..., xn: Input features.
w0, w1, ..., wn: Model parameters (weights).

Real-Life Use Cases

Finance: Predicting house prices based on size, location, and amenities.
Healthcare: Estimating patient expenses based on treatment type and duration.
Retail: Forecasting sales based on historical data and market trends.

Graphical Representation

A straight line plotted on a 2D graph showing the relationship between a feature (x-axis) and the target variable (y-axis).

Image: Greenkiwibird, CC0, via Wikimedia Commons

b) Logistic Regression

Overview

Logistic Regression is a classification algorithm used to predict binary outcomes. It uses the logistic (sigmoid) function to map predictions to probabilities.

Formula

P(y=1|x) = 1 / (1 + exp(-(w0 + w1 * x1 + w2 * x2 + ... + wn * xn)))

Explanation

P(y=1|x): Probability of the positive class.
x1, x2, ..., xn: Input features.
w0, w1, ..., wn: Model parameters.

Real-Life Use Cases

Finance: Credit risk analysis (predicting loan default).
Healthcare: Diagnosing diseases (e.g., whether a tumor is malignant or benign).
E-commerce: Predicting customer churn.

Graphical Representation

A sigmoid curve demonstrating how input values are mapped to probabilities between 0 and 1.

Image: https://commons.wikimedia.org/wiki/File:Logistic-curve.png

2. Decision Trees and Ensembles

a) Decision Trees Algorithm

Overview

Decision Trees split the data into subsets based on feature values, creating a tree-like structure. It’s widely used for both classification and regression tasks.

Splitting Criteria

Gini Impurity:

G = 1 - Σ(pi^2)

Entropy:

H = -Σ(pi * log(pi))

Explanation

pi: Proportion of samples in class i.

Real-Life Use Cases

Finance: Fraud detection.
Healthcare: Deciding treatment plans based on patient conditions.
Marketing: Segmenting customers.

Graphical Representation

A tree diagram showing decision splits based on feature thresholds.

Image attribution : Gilgoldm, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

b) Random Forest

Overview

Random Forest builds multiple decision trees and aggregates their predictions for improved accuracy and robustness.

Formula

Final Prediction = (1/n) * Σ(Ti(x))

Explanation

Ti(x): Prediction from the i-th tree.
n: Number of trees.

Real-Life Use Cases

Finance: Credit scoring.
Healthcare: Predicting patient readmission.
E-commerce: Product recommendations.

Graphical Representation

A forest of decision trees with aggregated outputs.

Image attribution: TseKiChun, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

c) Gradient Boosting Machines (GBM)

Overview

GBM sequentially builds trees, with each tree correcting the errors of its predecessors. It’s highly effective for complex problems.

Formula

Fm(x) = Fm-1(x) + γ * hm(x)

Explanation

Fm(x): Updated model after m-th iteration.
Fm-1(x): Previous model.
hm(x): New tree trained on residuals.
γ: Learning rate.

Real-Life Use Cases

Finance: Predicting loan default likelihood.
Healthcare: Survival analysis.
Retail: Customer segmentation.

Graphical Representation

A series of trees with residual corrections are highlighted.

3. Support Vector Machines (SVM)

Overview

SVM finds the optimal hyperplane that maximizes the margin between data points of different classes. It’s used for both linear and non-linear classification tasks.

Formula

Linear SVM:

f(x) = w^T * x + b

Kernel Transformation:

φ(x): Maps data into a higher-dimensional space.

Explanation

w: Weight vector.
x: Input feature vector.
b: Bias term.

Real-Life Use Cases

Finance: Stock market prediction.
Healthcare: Classification of medical images (e.g., MRI scans).
Security: Intrusion detection systems.

Graphical Representation

A 2D graph with data points and a separating hyperplane maximizing the margin.

Image: https://freesvg.org/svm-support-vector-machines-diagram-vector-image

4. k-Nearest Neighbors (k-NN)

Overview

k-NN is an instance-based algorithm that classifies data points based on the majority class of their k nearest neighbors.

Formula

d(p, q) = √Σ((pi - qi)^2)

Explanation

p, q: Data points.
d(p, q): Euclidean distance.

Real-Life Use Cases

Finance: Identifying similar investment patterns.
Healthcare: Classifying patients into risk groups.
Retail: Product recommendations.

Graphical Representation

A scatterplot with nearest neighbors highlighted for a new data point.

5. Naive Bayes

Overview

Naive Bayes is a probabilistic classifier based on Bayes' Theorem, assuming that features are conditionally independent.

Formula

P(C|X) = (P(X|C) * P(C)) / P(X)

Explanation

P(C|X): Posterior probability of class C given features X.
P(X|C): Likelihood of features given class C.
P(C): Prior probability of class C.
P(X): Evidence.

Real-Life Use Cases

Finance: Spam email detection.
Healthcare: Predicting diseases based on symptoms.
Text Analysis: Sentiment analysis of reviews.

Graphical Representation

A bar chart comparing probabilities for different classes.

Image attribution : Sebastian Raschka, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

6. Neural Networks

Algorithms inspired by the brain that learn patterns from data to make predictions.

Image source: https://www.geeksforgeeks.org/artificial-neural-networks-and-its-applications/

How: Data flows through layers of connected nodes (input → hidden → output). Errors are corrected through backpropagation.

Examples:Image recognition (e.g., identifying faces).Speech recognition (e.g., Alexa, Siri).Disease diagnosis (e.g., detecting cancer in X-rays).Fraud detection (e.g., flagging unusual transactions).

6. Ensembles and Meta-Algorithms

No single model is perfect; ensembles leverage the strengths of multiple models for better results. It basically combines multiple models to improve prediction accuracy and reduce errors.

Types and Examples

Bagging
- What: Trains models on random data subsets and averages results.
- Example: Random Forest predicting loan defaults.
Boosting
- What: Sequentially trains models, each correcting the previous one's errors.
- Example: Gradient Boosting for customer churn prediction.
Stacking
- What: Combines outputs of different models using a final model.
- Example: Blending SVM, decision trees, and logistic regression to detect fraud.

8. Specialized Algorithms

Linear Discriminant Analysis (LDA)

Purpose: Projects data onto a lower-dimensional space for classification.
Example: Recognizing different handwritten letters.

Quadratic Discriminant Analysis (QDA)

Purpose: Similar to LDA but assumes different covariance for each class.
Example: Medical diagnosis.

Note: LDA and QDA are specialized for small, well-separated datasets but rely on strict assumptions like data normality, limiting their real-world use. Modern algorithms like Random Forests, SVMs, and Neural Networks are more versatile and effective for complex tasks. Focus on these broader, more applicable methods instead.

Note: If you are finding it difficult to understand these complex formulas, please ignore them. I will discuss them in detail in the next couple of articles.

These supervised machine-learning algorithms offer powerful tools for solving a wide range of problems. By understanding their working principles and real-life applications, you can choose the right algorithm for your specific needs.

💬 Join the DecodeAI WhatsApp Channel for regular AI updates → Click here

1- Supervised Machine Learning Algorithms

1. Linear Modelsa) Linear Regression

Overview

Formula

Explanation

Real-Life Use Cases

Graphical Representation

b) Logistic Regression

Overview

Formula

Explanation

Real-Life Use Cases

Graphical Representation

2. Decision Trees and Ensembles a) Decision Trees Algorithm

Overview

Splitting Criteria

Explanation

Real-Life Use Cases

Graphical Representation

b) Random Forest

Overview

Formula

Explanation

Real-Life Use Cases

Graphical Representation

c) Gradient Boosting Machines (GBM)

Overview

Formula

Explanation

Real-Life Use Cases

Graphical Representation

3. Support Vector Machines (SVM)

Overview

Formula

Explanation

Real-Life Use Cases

Graphical Representation

4. k-Nearest Neighbors (k-NN)

Overview

Formula

Explanation

Real-Life Use Cases

Graphical Representation

5. Naive Bayes

Overview

Formula

Explanation

Real-Life Use Cases

Graphical Representation

6. Neural Networks

6. Ensembles and Meta-Algorithms

Types and Examples

8. Specialized Algorithms

Linear Discriminant Analysis (LDA)

Quadratic Discriminant Analysis (QDA)

Sign up for more like this.

1. Linear Models

a) Linear Regression

2. Decision Trees and Ensembles

a) Decision Trees Algorithm