Introduction to Machine Learning

Core ML concepts: types of learning, ensemble methods, and where each algorithm fits.

What is Machine Learning?

Machine learning (ML) is a branch of artificial intelligence where systems learn patterns from data instead of being explicitly programmed. Performance improves automatically as more data is provided.

The key idea: instead of writing rules by hand, you feed data to an algorithm and let it discover the rules itself.

Types of Machine Learning

1. Supervised Learning

You train the model on labeled data (input + correct output). Two main categories:

Regression

Predicting continuous values.

Linear Regression
Ridge / Lasso
Decision Tree Regressor
Random Forest Regressor
SVR
KNN Regressor

Classification

Predicting categories/labels.

Logistic Regression
Decision Tree
Random Forest
KNN
SVM
Naive Bayes
Gradient Boosting (XGBoost, LightGBM)

2. Unsupervised Learning

Train on unlabeled data (no known output).

Clustering

Grouping similar items.

K-Means
DBSCAN
Hierarchical Clustering
Mean Shift

Dimensionality Reduction

Reducing number of features.

PCA
t-SNE
LDA
Autoencoders

3. Reinforcement Learning

An agent learns by taking actions in an environment and receiving rewards.

Q-Learning
Deep Q Networks (DQN)
SARSA
Policy Gradient Methods
Actor-Critic Methods

4. Semi-Supervised Learning

Uses a small amount of labeled data + large amounts of unlabeled data. Useful when labeling is expensive.

Self-Training Classifier
Label Propagation
Semi-Supervised SVM
Pseudo-labeling

Ensemble Learning

Ensemble learning combines multiple models to create a better one. It typically improves accuracy and reduces overfitting.

Bagging (Bootstrap Aggregating)

How Bagging Works

Create random subsets of the training data (with replacement)
Train a separate model on each subset independently (in parallel)
Combine predictions — average for regression, majority vote for classification

Bagging reduces variance (overfitting). The classic example is Random Forest = bagging of decision trees.

Boosting

How Boosting Works

Train a weak model (e.g., a small decision tree)
Identify the errors made by that model
Train the next model to specifically correct those errors
Repeat sequentially — each new model focuses on the hardest cases

Boosting reduces bias and builds strong predictive models.

AdaBoost
Gradient Boosting
XGBoost
LightGBM
CatBoost

Bagging trains models in parallel (independent). Boosting trains models sequentially (each depends on previous errors). This is the fundamental difference.

When to Use What

Scenario	Approach
Predict a number (price, temperature)	Regression (supervised)
Predict a category (spam/not spam)	Classification (supervised)
Find groups in data	Clustering (unsupervised)
Reduce features for visualization	Dimensionality Reduction
Train an agent (game, robot)	Reinforcement Learning
High-variance model (overfitting)	Bagging (Random Forest)
High-bias model (underfitting)	Boosting (XGBoost, LightGBM)

Supervised Unsupervised Reinforcement Bagging Boosting Ensemble