ML Playground / Intro to ML View Notebook

Introduction to Machine Learning

Core ML concepts: types of learning, ensemble methods, and where each algorithm fits.

What is Machine Learning?

Machine learning (ML) is a branch of artificial intelligence where systems learn patterns from data instead of being explicitly programmed. Performance improves automatically as more data is provided.

The key idea: instead of writing rules by hand, you feed data to an algorithm and let it discover the rules itself.

Types of Machine Learning

1. Supervised Learning

You train the model on labeled data (input + correct output). Two main categories:

Regression

Predicting continuous values.

  • Linear Regression
  • Ridge / Lasso
  • Decision Tree Regressor
  • Random Forest Regressor
  • SVR
  • KNN Regressor

Classification

Predicting categories/labels.

  • Logistic Regression
  • Decision Tree
  • Random Forest
  • KNN
  • SVM
  • Naive Bayes
  • Gradient Boosting (XGBoost, LightGBM)

2. Unsupervised Learning

Train on unlabeled data (no known output).

Clustering

Grouping similar items.

  • K-Means
  • DBSCAN
  • Hierarchical Clustering
  • Mean Shift

Dimensionality Reduction

Reducing number of features.

  • PCA
  • t-SNE
  • LDA
  • Autoencoders

3. Reinforcement Learning

An agent learns by taking actions in an environment and receiving rewards.

4. Semi-Supervised Learning

Uses a small amount of labeled data + large amounts of unlabeled data. Useful when labeling is expensive.

Ensemble Learning

Ensemble learning combines multiple models to create a better one. It typically improves accuracy and reduces overfitting.

Bagging (Bootstrap Aggregating)

How Bagging Works
  1. Create random subsets of the training data (with replacement)
  2. Train a separate model on each subset independently (in parallel)
  3. Combine predictions — average for regression, majority vote for classification

Bagging reduces variance (overfitting). The classic example is Random Forest = bagging of decision trees.

Boosting

How Boosting Works
  1. Train a weak model (e.g., a small decision tree)
  2. Identify the errors made by that model
  3. Train the next model to specifically correct those errors
  4. Repeat sequentially — each new model focuses on the hardest cases

Boosting reduces bias and builds strong predictive models.

Bagging trains models in parallel (independent). Boosting trains models sequentially (each depends on previous errors). This is the fundamental difference.

When to Use What

ScenarioApproach
Predict a number (price, temperature)Regression (supervised)
Predict a category (spam/not spam)Classification (supervised)
Find groups in dataClustering (unsupervised)
Reduce features for visualizationDimensionality Reduction
Train an agent (game, robot)Reinforcement Learning
High-variance model (overfitting)Bagging (Random Forest)
High-bias model (underfitting)Boosting (XGBoost, LightGBM)

Supervised Unsupervised Reinforcement Bagging Boosting Ensemble