Core ML concepts: types of learning, ensemble methods, and where each algorithm fits.
Machine learning (ML) is a branch of artificial intelligence where systems learn patterns from data instead of being explicitly programmed. Performance improves automatically as more data is provided.
The key idea: instead of writing rules by hand, you feed data to an algorithm and let it discover the rules itself.
You train the model on labeled data (input + correct output). Two main categories:
Predicting continuous values.
Predicting categories/labels.
Train on unlabeled data (no known output).
Grouping similar items.
Reducing number of features.
An agent learns by taking actions in an environment and receiving rewards.
Uses a small amount of labeled data + large amounts of unlabeled data. Useful when labeling is expensive.
Ensemble learning combines multiple models to create a better one. It typically improves accuracy and reduces overfitting.
Bagging reduces variance (overfitting). The classic example is Random Forest = bagging of decision trees.
Boosting reduces bias and builds strong predictive models.
Bagging trains models in parallel (independent). Boosting trains models sequentially (each depends on previous errors). This is the fundamental difference.
| Scenario | Approach |
|---|---|
| Predict a number (price, temperature) | Regression (supervised) |
| Predict a category (spam/not spam) | Classification (supervised) |
| Find groups in data | Clustering (unsupervised) |
| Reduce features for visualization | Dimensionality Reduction |
| Train an agent (game, robot) | Reinforcement Learning |
| High-variance model (overfitting) | Bagging (Random Forest) |
| High-bias model (underfitting) | Boosting (XGBoost, LightGBM) |
Supervised Unsupervised Reinforcement Bagging Boosting Ensemble