Chapter 6 Machine Learning Models

6.1 Introduction

6.1.1 What is Machine Learning

General Definition:

> [Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed.
> <footer>--- Arthur Samuel, 1959</footer>

Engineering-oriented Definition:

> A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
> <footer>--- Tom Mitchell, 1997</footer>

6.1.2 Why do we use Machine Learning?

Machine learning techniques are great for:

Problems whose existing solutions require a lot of fine-tuning and long list of rules: one ML algorithm can often simplify code and perform better than traditional approach.
Complex problems for which using a traditional approach yields no good solution
Fluctuating environments: A ML system can adapt to new data
Getting insights about complex problems and large amount of data (data mining).

6.1.3 Examples of ML applications in industries

ML technique	Problems
Classification	Analyzing images of products on a production line to automatically classify them (with CNNs) Classifying new articles (NLP + RNNs/CNNs/Transformers) Automatically flagging offensive comments on discussion forums (text classification w/ NLP tools) Market segmentation / social network analysis
Regression	Forecasting company revenue based on performance metrics (with Linear/Polynomial Regression / a regression SVM / a regression Random Forest / RNN)
Clustering	Client segmentation based on purchases to derive marketing strategy for each segment
Anomaly Detection	Detecting credit card fraud
Recommender System	Recommending a product that clients may be interested in based on past purchases

6.1.4 Types of Machine Learning Systems

Criteria:

trained with human supervision? —> supervised, unsupervised, semisupervised, Reinforcement Learning
can learn incrementally on the fly? —> online vs. batch learning
work by simply comparing new data points to know data points OR by detecting patterns in training data and building a predictive model? —> instance-based vs. model-based learning

6.1.4.1 Supervised Learning

Most important Supervised Learning algorithms:

Regression
- Linear Regression
- Logistic Regression
Classification
- k-Nearest Neighbors
- Support Vector Machines (SVM)
- Decision Trees and Random Forests
- Neural Networks

6.1.4.2 Unsupervised Learning

--> unlabeled data (only x but no y)

Most important Unsupervised Learning algorithms:

Clustering —> e.g. run a clustering to detect groups of similar observations where we do not need to tell the algorithm which group an observation belongs to: it finds those connections without human supervision.
- K-Means
- DBSCAN
- Hierarchical Cluster Analysis
Anomaly detection and novelty detection
- One-class SVM
- Isolation Forest
Visualization and Dimensionality reduction —> feed visualization algorithms complex and unlabeled data, and they output a 2D or 3D representation of our data that can be easily plotted —> helps us understand how data is organized and perhaps identify unsuspected patterns. | Dimensionality reduction aims at simplifying data without losing too much information by merging correlated features into one for example (also the process of feature extraction).
- Principal Component Analysis (PCA)
- Kernel PCA
- Locally Linear Embedding (LLE)
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
Association rule learning —> dig into large amounts of data and discover interesting relations between attributes
- Apriori
- Eclat

6.1.4.3 Semisupervised Learning

Since labeling data is usually time-consuming and costly, we will often have plenty of unlabeled instances, and few labeled instances. Some algorithms can deal with data that’s partially labeled.

Most semisupervised learning algorithms are combinations of unsupervised and supervised algorithms. For example, deep belief networks (DBNs) are based on unsupervised components called restricted Boltzmann machines (RBMs) stacked on top of one another. RBMs are trained sequentially in an unsupervised manner, and then the whole system is fine-tuned using supervised learning techniques.

6.1.4.4 Reinforcement Learning

In RL, the learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in turn (or penalties in the form of negative rewards). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation.

6.1.4.5 Batch and Online Learning

Batch Learning

In batch learning, the system is incapable of learning incrementally: it must be trained using all the available data.

Online learning

In online learning, you train the system incrementally by feeding it data instances sequentially, either individually or in small groups called mini-batches. Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives.

6.1.5 Practical ML advice

6.2 Supervised Learning

Terminology:

Features: $x^{(i)}$ denotes the “input” variable
Target: $y^{(i)}$ denotes the “output” (i.e. outcome to predict)
Training Example: A pair $(x^{(i)}, y^{(i)})$
Training set: a list of $n$ training examples: $\{(x^{(i)},y^{(i)}); i = 1, …, n\}$
Space: $\mathcal{X}$ denotes the space of input values; $\mathcal{Y}$ denotes the space of output values. Usually, $\mathcal{X}=\mathcal{Y}=\mathcal{R}.$

6.2.0.1 Formal description of supervised learning problem:

Given a training set, to learn a function $h: \mathcal{X} \mapsto \mathcal{Y}$ so that $h(x)$ is a “good” predictor for the corresponding value of $y$. For historical reasons, this function $h$ is called a hypothesis.
Continuous target —> regression; discrete target —> classification