# Tools

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lukeconibear/intro_ml/blob/main/docs/02_tools.ipynb)

In [None]:
# if you're using colab, then install the required modules
import sys

IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    %pip install --quiet --upgrade pytorch-lightning

## Overview

There is huge variety of machine learning and deep learning tools.

In this course, we'll focus on:

- [scikit-learn](scikit-learn)
- [TensorFlow](tensorflow)
- [PyTorch](pytorch)

The tool you choose depends many considerations, for example:

- Your research problem
- Model availability (e.g., pre-trained, state-of-the-art)
- Ecosystem (e.g., compatibility with other tools)
- Personal preferences
- Deployment (e.g., hardware)

There are many discussions on the different choices e.g., [1](https://www.assemblyai.com/blog/pytorch-vs-tensorflow-in-2022/), [2](https://keras.io/why_keras/).

(scikit-learn)=
### [scikit-learn](https://scikit-learn.org/stable/)

Scikit-learn has a wide range of simple and efficient classic machine learning tools.  

- [Documentation](https://scikit-learn.org/stable/user_guide.html)
- [Tutorials](https://scikit-learn.org/stable/tutorial/index.html)
- [Examples](https://scikit-learn.org/stable/auto_examples/index.html)

There are ones for:

- [Linear Models](https://scikit-learn.org/stable/modules/linear_model.html) ([examples](https://scikit-learn.org/stable/auto_examples/index.html#generalized-linear-models))
    - A set of methods where the output is a linear combination of the inputs.
    - For example, fitting a straight line to the data using [Linear Regression](https://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares) (also known as ordinary least squares).
- [Nearest Neighbours](https://scikit-learn.org/stable/modules/neighbors.html) ([examples](https://scikit-learn.org/stable/auto_examples/index.html#nearest-neighbors))
    - Find a (pre-defined) number of training samples closest in distance to the new point, and predict the label from these.
    - The number of samples can be defined in different ways.
    - There are various measures of distance.
    - For example, classifying labels based on their closeness to other samples in [Nearest Neighbor Classification](https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbors-classification).
- [Support Vector Machines](https://scikit-learn.org/stable/modules/svm.html) ([examples](https://scikit-learn.org/stable/auto_examples/index.html#support-vector-machines))
    - Place a _decision function_ (i.e., the support vector) between data points to classify, regress, or find outliers.
    - For example, find the two hardest to categorise samples and place a decision boundary between them in [Support Vector Classification](https://scikit-learn.org/stable/modules/svm.html#classification).
- [Decision trees](https://scikit-learn.org/stable/modules/tree.html) ([examples](https://scikit-learn.org/stable/auto_examples/index.html#decision-trees))
    - Predict the value of a target variable by learning simple _decision_ rules inferred from the data features.
    - Many decisions are grouped together into a _tree_.
    - For example, many decision trees together in an ensemble is a [Random Forest](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html).
- And [many more](https://scikit-learn.org/stable/index.html#).

(tensorflow)=
### [TensorFlow](https://www.tensorflow.org/)

Tensorflow is an end-to-end open source machine learning platform.

- [Documentation](https://www.tensorflow.org/guide)
- [Tutorials](https://www.tensorflow.org/tutorials)
- [Examples](https://keras.io/examples/)

TensorFlow has a user-friendly, high-level API (Application Programming Interface) called [Keras](https://keras.io/).

Keras includes a wide range of high-level objects ([tutorials](https://keras.io/guides/)) including:

- [Models](https://keras.io/api/models/)
- [Layers](https://keras.io/api/layers/)
    - [Activations](https://keras.io/api/layers/activations) e.g., `tf.keras.activations.sigmoid`
    - [Regularisers](https://keras.io/api/layers/regularizers/) e.g., `tf.keras.regularizers.l2`
    - [Convolutional](https://keras.io/api/layers/convolution_layers/) e.g., `tf.keras.layers.Conv2D`
    - [Recurrent](https://keras.io/api/layers/recurrent_layers/) e.g., `tf.keras.layers.LSTM` (long-short-term-memory)
    - [Preprocessing](https://keras.io/api/layers/preprocessing_layers/) e.g., `tf.keras.layers.Normalization`
- [Optimisers](https://keras.io/api/optimizers/) e.g., `tf.keras.optimizers.Adam`
- [Losses](https://keras.io/api/losses/) e.g., `tf.keras.losses.MeanSquaredError`
- [Metrics](https://keras.io/api/metrics/) e.g., `tf.keras.metrics.Accuracy`

You can always go lower level when required (e.g., custom objects).

Through Keras and TensorFlow you create models and layers using any of the [following APIs](https://blog.tensorflow.org/2019/01/what-are-symbolic-and-imperative-apis.html):

| | [Sequential](https://keras.io/guides/sequential_model/) | [Functional](https://keras.io/guides/functional_api/) | [Subclassing](https://www.tensorflow.org/guide/keras/custom_layers_and_models/) |
| --- | --- | --- | --- |
| Data structure | Graph: Linear stack of layers. | Graph: Non-linear DAG (directed acyclic graph) of layers. | Object-orientated. Write the forward pass (backward pass is automatic). | 
| Shared layers and multiple inputs/outputs | No. Each layer has one input and one output. | Yes. Each layer can have multiple inputs and outputs. | Yes. |
| Main benefits and drawbacks | Simplest, (re)usability (easily saved), model checks to catch errors early, static. | Similar to seqential, but more flexible. | Maximum flexibility, no model checks, more complex, dynamic. |
| Show model graph? | Yes. | Yes. | Can add via the guidance [here](https://github.com/tensorflow/tensorflow/issues/31647#issuecomment-692586409). |

There are many [libraries and extensions](https://www.tensorflow.org/resources/libraries-extensions) including:

- [TensorFlow Extended](https://www.tensorflow.org/tfx) for deployment.
- [TensorFlow Lite](https://www.tensorflow.org/lite/guide) for mobile and IoT (internet of things) devices.
- [TensorBoard](https://www.tensorflow.org/tensorboard) for visualising the experiment results.
- [And many more (including projects, papers, and experiments)](https://github.com/jtoy/awesome-tensorflow).

(pytorch)=
### [PyTorch](https://pytorch.org/)

PyTorch is an end-to-end open source machine learning platform.

- [Documentation](https://pytorch.org/docs/stable/index.html)
- [Tutorials](https://pytorch.org/tutorials/)

PyTorch has user-friendly APIs:

- [PyTorch Lightning](https://pytorch-lightning.readthedocs.io/en/latest/)
    - High-level.
    - Helps write boilerplate code, scale out to multiple devices, and other helpful things.
    - [Tutorials](https://pytorchlightning.github.io/lightning-tutorials/index.html).
- [PyTorch Lightning-Flash](https://lightning-flash.readthedocs.io/en/latest/quickstart.html)
    - Even higher-level.
    - Abstractions above PyTorch Lightning for fast prototyping.

PyTorch (and its extensions) include a wide range of high-level objects including:

- [Models](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module)
- [Layers](https://pytorch.org/docs/stable/nn.html)
    - [Activations](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) e.g., `torch.nn.Sigmoid`
    - [Regularisers](https://pytorch.org/docs/stable/nn.html#dropout-layers) e.g., `torch.nn.Dropout`
    - [Convolutional](https://pytorch.org/docs/stable/nn.html#convolution-layers) e.g., `torch.nn.Conv2d`
    - [Recurrent](https://pytorch.org/docs/stable/nn.html#recurrent-layers) e.g., `torch.nn.LSTM`
    - [Preprocessing](https://pytorch.org/vision/stable/transforms.html#compositions-of-transforms) e.g., `torchvision.transforms.Normalize`
- [Optimisers](https://pytorch.org/docs/stable/optim.html) e.g., `torch.optim.Adam`
- [Losses](https://pytorch.org/docs/stable/nn.html#loss-functions) e.g., `torch.nn.MSELoss`
- [Metrics](https://torchmetrics.readthedocs.io/en/latest/) e.g., `torchmetrics.Accuracy`

You can always go lower level when required (e.g., custom objects).

Similar to TensorFlow/Keras, you can create models and layers in PyTorch using either Sequential or Subclassing APIs (or in combination). These have similar features to the table above, where the Sequential API is simpler and the Subclassing API enables flexibility.

There are many [libraries and extensions](https://pytorch.org/ecosystem/) including:

- [TorchServe](https://pytorch.org/serve/) for deployment.
- [Pytorch Live](https://pytorch.org/live/) for mobile and IoT devices.
- [And many more (including projects, papers, and experiments)](https://github.com/bharathgs/Awesome-pytorch-list)

## Example - Linear regression

Let's start with a introductory example fitting a straight line to data.

Don't worry too much about some of the details as we'll cover them in later lesson.

For now, focus on the general workflow.

We'll see how this in done in each of three key tools we cover here: scikit-learn, TensorFlow, and PyTorch.

Let's create some (noisy) data to train on:

In [None]:
import numpy as np

In [None]:
def create_noisy_linear_data(num_points):
    x = np.arange(num_points)
    noise = np.random.normal(0, 1, num_points)
    y = 2 * x + noise
    # convert to 2D arrays
    x, y = x.reshape(-1, 1), y.reshape(-1, 1)
    return x, y

In [None]:
x_train, y_train = create_noisy_linear_data(10)

```{caution} 

Input arrays to models needs to be 2 dimensional (2D) i.e., a column of rows.

For example, instead of one row:

`>>> np.arange(10)`  
`array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])`  

Convert this to a column of rows using `.reshape(-1, 1)`:  

`>>> np.arange(10).reshape(-1, 1)`  
`array([[0],`  
`       [1],`  
`       [2],`  
`       [3],`  
`       [4],`  
`       [5],`  
`       [6],`  
`       [7],`  
`       [8],`  
`       [9]])`  

```

### scikit-learn

First, let's try with [scikit-learn](https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html):

In [None]:
from sklearn import linear_model

In [None]:
model_sklearn = linear_model.LinearRegression()

When fit is called for Linear Regression, the _loss_ that is trying to be minimised is the _mean squared error_ between the predictions and the actual values.

This determines what parameters the model learns.

In [None]:
model_sklearn.fit(x_train, y_train)

The data was from the line `y = 2x`, so the gradient was 2.

Let's see what the model estimated it to be:

In [None]:
model_sklearn.coef_[0]

Pretty close, considering there was only 10 training data points.

### TensorFlow

Now, for **TensorFlow**:

In [None]:
import tensorflow as tf

Create the model (using the simpler sequential API).

Note, it's helpful to name the layers in the model.

In [None]:
model_tf = tf.keras.Sequential(
    [
        tf.keras.Input(shape=(1,), name="inputs"),
        tf.keras.layers.Dense(units=1, name="outputs"),
    ],
    name="sequential",
)

For reference, here's what this would have looked like using the functional and subclassing APIs:

In [None]:
inputs = tf.keras.Input(shape=(1,), name="inputs")
outputs = tf.keras.layers.Dense(units=1, name="outputs")(inputs)
model_tf_functional = tf.keras.Model(inputs, outputs, name="functional")

In [None]:
class MyModel(tf.keras.Model):
    def __init__(self, **kwargs):
        super(MyModel, self).__init__(**kwargs)  # handles standard arguments e.g., name
        self.outputs = tf.keras.layers.Dense(units=1, name="outputs")

    def call(self, inputs):  # have inputs as argument to call, rather than define
        x = self.outputs(inputs)
        return x


model_tf_subclassing = MyModel(name="subclassing")

You can now show the model summary.

Note, this only shows layers (not the `Input` object).

In [None]:
model_tf.summary()

You can also show the model graph:

In [None]:
tf.keras.utils.plot_model(model_tf, show_shapes=True)

Now, compile the model.

The keyword arguments to `optimizer`, `loss`, and `metrics` can either be strings (e.g., `mean_squared_error`) or TensorFlow objects (e.g., `tf.keras.losses.MeanSquaredError()`)

In [None]:
model_tf.compile(
    optimizer="sgd",
    loss="mean_squared_error",
    metrics=["accuracy"],
)

And, train the model.

[Epochs](https://developers.google.com/machine-learning/glossary/#epoch) are how many passes over the whole training set.

In [None]:
model_tf.fit(
    x_train,
    y_train,
    epochs=10,
    verbose=False,  # print out the metrics per epoch
);

And, let's see what this model though the gradient was:

In [None]:
model_tf.weights[0].numpy()

### PyTorch

In [None]:
import torch
from torch import nn
from torch.utils.data import DataLoader, TensorDataset

[Create the dataset and dataloader](https://pytorch.org/tutorials/beginner/introyt/trainingyt.html#dataset-and-dataloader):

In [None]:
x_train_tensor = torch.from_numpy(x_train).type(torch.float32)
y_train_tensor = torch.from_numpy(y_train).type(torch.float32)

In [None]:
ds_train = TensorDataset(x_train_tensor, y_train_tensor)

In [None]:
dataloader_train = DataLoader(ds_train)

[Create the model](https://pytorch.org/tutorials/beginner/introyt/trainingyt.html#the-model) (using the simpler sequential API):

In [None]:
model_torch = nn.Sequential(nn.Linear(in_features=1, out_features=1))

In [None]:
print(model_torch)

For reference, here's what this would have looked like using the subclassing APIs:

In [None]:
class NeuralNetwork(nn.Module):
    def __init__(self):  # model definition
        super(NeuralNetwork, self).__init__()  # instantiate the nn.Module
        self.outputs = nn.Linear(in_features=1, out_features=1)

    def forward(self, x):  # the computations for the forward layer, not called directly
        logits = self.outputs(x)
        return logits


model_torch_subclassing = NeuralNetwork()
print(model_torch_subclassing)

```{note}
The backward propagation is calculated automatically, though you can do it manually if you like.
```

Define the [loss](https://pytorch.org/tutorials/beginner/introyt/trainingyt.html#loss-function) and [optimiser](https://pytorch.org/tutorials/beginner/introyt/trainingyt.html#optimizer):

In [None]:
loss_function = nn.MSELoss()
optimiser = torch.optim.SGD(model_torch.parameters(), lr=1e-3)

Define a [single training step](https://pytorch.org/tutorials/beginner/introyt/trainingyt.html#the-training-loop):

In [None]:
def train(dataloader, model, loss_function, optimiser):
    size = len(dataloader.dataset)
    model.train()  # set the model in training mode, rather than in evaluation mode i.e., `model.eval()`

    # for each batch of data
    for batch, (X, y) in enumerate(dataloader):

        # step 1: make a prediction for these inputs
        prediction = model(X)

        # step 2: compute the loss for that prediction
        loss = loss_function(prediction, y)

        # step 3: first, clean the gradients
        optimiser.zero_grad()

        # step 4: backpropagate the gradients for that loss
        loss.backward()

        # step 5: update the parameters accordingly
        optimiser.step()

Note, that testing doesn't need the gradients (i.e., steps 3-5).  

Hence, the test function would look something like:

```python
def test(dataloader, model, loss_function):
    size = len(dataloader.dataset)
    model.eval()  # set the model in evaluation mode
    ...
    
    with torch.no_grad():  # don't track gradients
        for batch, (X, y) in enumerate(dataloader):
            # step 1: make a prediction for these inputs
            prediction = model(X)

            # step 2: compute the loss for that prediction
            loss = loss_function(prediction, y)
            ...
```

We'll see more examples of testing later.

Run the [training step over multiple epochs](https://pytorch.org/tutorials/beginner/introyt/trainingyt.html#per-epoch-activity):

In [None]:
NUM_EPOCHS = 5

for epoch in range(NUM_EPOCHS):
    train(dataloader_train, model_torch, loss_function, optimiser)

And, let's see what this model thought the gradient was:

In [None]:
# to check parameter names
for name, parameter in model_torch.named_parameters():
    print(name)

In [None]:
model_torch[0].weight

Now, we can see how well these models fit a line to the data.

First, grab the predictions of each model (from the training data for plotting purposes).

In [None]:
y_pred_sklearn = model_sklearn.predict(x_train)

In [None]:
y_pred_tf = model_tf.predict(x_train)

In [None]:
y_pred_torch = model_torch(x_train_tensor).detach().numpy()

Then, show these lines on a plot:

In [None]:
import matplotlib.gridspec as gridspec
import matplotlib.pyplot as plt

In [None]:
colors = {"data": "#1b9e77", "sklearn": "#d95f02", "tf": "#7570b3", "torch": "#66a61e"}

In [None]:
def make_plot(ax, y_pred, label, title):
    ax.scatter(x_train, y_train, color=colors["data"])
    ax.plot(x_train, y_pred, color=colors[label], linewidth=3)
    ax.set_title(title)
    ax.set_ylim([0, 18])
    ax.set_xlim([0, 9])
    ax.set_facecolor("whitesmoke")

In [None]:
fig = plt.figure(1, figsize=(12, 4))
ax1, ax2, ax3 = fig.subplots(1, 3)

make_plot(ax1, y_pred_sklearn, "sklearn", "scikit-learn")
make_plot(ax2, y_pred_tf, "tf", "TensorFlow")
make_plot(ax3, y_pred_torch, "torch", "PyTorch")

plt.show()

They all did a good job of fitting a function to the data.

In other words, they found the association in the data.

However, this was a very simple example that probably didn't require machine learning (let alone deep learning).

Though it demonstrates what they do.

Now, let's look at something a little more suitable.

## Example - Digit classification

Let's train a model to recognise handwritten digits using the classic [MNIST dataset](https://en.wikipedia.org/wiki/MNIST_database).

This is a classification task.

### scikit-learn

First, with [scikit-learn](https://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html):

In [None]:
from sklearn import datasets, linear_model, metrics, svm
from sklearn.model_selection import train_test_split

#### Load the data

In [None]:
digits = datasets.load_digits()

Take a look at the labelled data:

In [None]:
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, label in zip(axes, digits.images, digits.target):
    ax.set_axis_off()
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    ax.set_title(f"Label: {label}")

#### Preprocess and split the data

In [None]:
def preprocess_data(digits):
    # the data comes as 2D 8x8 pixels
    # flatten the images to 1D 64 pixels
    n_samples = len(digits.images)
    data = digits.images.reshape((n_samples, -1))
    return n_samples, data

In [None]:
n_samples, data = preprocess_data(digits)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    data, digits.target, test_size=0.5, shuffle=False
)

#### Create a model

Here, we will use a [Support Vector Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC).

Don't worry about what gamma is for now (if you're interested, read the documentation).

In [None]:
model = linear_model.LogisticRegression()

In [None]:
model = svm.SVC(gamma=0.001)

#### Fit the model to the training data

In [None]:
model.fit(X_train, y_train)

#### Use the model to predict the test data

In [None]:
y_pred = model.predict(X_test)

Take a look at the predictions for these test digits:

In [None]:
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, prediction in zip(axes, X_test, y_pred):
    ax.set_axis_off()
    image = image.reshape(8, 8)  # 1D 64 pixels to 2D 8*8 pixels for plotting
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    ax.set_title(f"Prediction: {prediction:.0f}")

Looking good. The predicted labels match the ground truth images.

#### How well did our model do overall?

In [None]:
overall_accuracy = metrics.accuracy_score(y_test, y_pred)
overall_accuracy

97% accuracy is good.

Let's do some quick error analysis using a [confusion matrix](https://scikit-learn.org/stable/modules/model_evaluation.html#confusion-matrix).

This shows how well the classification model did for each category.

The predictions are on the x-axis and the true labels from the test data are on the y-axis.

A perfect score would be where the predictions always match the true labels (i.e., all values are on the diagonal line).

In [None]:
confusion_matrix = metrics.ConfusionMatrixDisplay.from_predictions(y_test, y_pred)
confusion_matrix.figure_.suptitle("Confusion Matrix")
plt.show()

We can see that the although the model did well, it struggled with 3's by confusing them with 5's, 7's, and 8's.

This points us in the direction of how we might improve the model.

We could also use cross-validation to find the variation in the training score:

In [None]:
from sklearn.model_selection import KFold, cross_val_score

In [None]:
cv = KFold(n_splits=5, shuffle=False)

In [None]:
test_scores = cross_val_score(model, X_train, y_train, cv=cv)

In [None]:
test_scores

In [None]:
print(f"CV accuracy = {test_scores.mean():0.2f} (+/- {test_scores.std():0.2f})")

#### [Save the model](https://scikit-learn.org/stable/modules/model_persistence.html)

You can save models using `joblib`:

In [None]:
from joblib import dump

In [None]:
import os
from pathlib import Path

path_models = f"{os.getcwd()}/models"
Path(path_models).mkdir(parents=True, exist_ok=True)

You can then save the model using:

```python
dump(model, f"{path_models}/mnist_model_sklearn.joblib")
```

You could then load this model back using:

```python
from joblib import load

reloaded_model = load(f'{path_models}/mnist_model_sklearn.joblib')
```

### TensorFlow

Now, with [TensorFlow](https://www.tensorflow.org/datasets/keras_example).

Check whether there are any [GPUs (Graphical Processing Units)](https://www.tensorflow.org/guide/gpu) available.

Note, the [device](https://developers.google.com/machine-learning/glossary/#device) is the hardware that TensorFlow runs on (e.g., CPUs (Central Processing Units), GPUs).

In [None]:
print("Num GPUs Available: ", len(tf.config.list_physical_devices("GPU")))

#### Load and split the data

In [None]:
(train_images, train_labels), (
    test_images,
    test_labels,
) = tf.keras.datasets.mnist.load_data()

Take a look at some of the training data:

In [None]:
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, label in zip(axes, train_images, train_labels):
    ax.set_axis_off()
    image = image.reshape(28, 28)  # 1D 784 pixels to 2D 28*28 pixels for plotting
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    ax.set_title(f"Label: {label}")

#### Create the model

Can use any of the sequential, functional, or subclassing APIs.

Let's use the simpler [Sequential API](https://keras.io/guides/sequential_model/) for now.

You could also use many `.add()` calls instead of the list.

```{note}
You could make the final layer a softmax (to output probabilities directly), though this is [discouraged](https://www.tensorflow.org/tutorials/quickstart/beginner#build_a_machine_learning_model) for numerical stability reasons.
```

```{tip}
It's often useful to place pre-processing steps into the model pipeline too.

For example, here we flatten the 2D image to a 1D tensor and [normalise](https://developers.google.com/machine-learning/glossary/#normalization) the images to greyscale (i.e., convert the values to between 0 and 1).
```

In [None]:
model = tf.keras.Sequential(
    [
        tf.keras.Input(shape=(28, 28), name="inputs"),
        tf.keras.layers.Flatten(name="flatten"),
        tf.keras.layers.Rescaling(1.0 / 255, name="normalise"),
        tf.keras.layers.Dense(128, activation="relu", name="layer1"),
        tf.keras.layers.Dense(128, activation="relu", name="layer2"),
        tf.keras.layers.Dense(10, name="outputs"),  # 1 unit per class
    ]
)

model.summary()

We can now also visualise the architecure:

In [None]:
tf.keras.utils.plot_model(model, show_shapes=True)

#### Compile the model

It's useful to name the metrics, especially if there's more than one.

Here, we'll use the Adam optimiser, sparse categorical crossentropy loss, and a metric of accuracy.

In [None]:
model.compile(
    optimizer="adam",
    loss=tf.keras.losses.SparseCategoricalCrossentropy(
        from_logits=True
    ),  # ensure classifies using logits
    metrics=["accuracy"],
)

#### Fit the model to the training data

The `fit()` call returns a `history` object.

```{note}
The `validation_split` keyword argument can only be used for NumPy training data.
```

In [None]:
BATCH_SIZE = 32

history = model.fit(
    train_images,
    train_labels,
    epochs=2,
    batch_size=BATCH_SIZE,
    verbose=False,  # print the output from each epoch
    validation_split=0.2,  # automatically set apart a validation set: 0.2 means 20% for validation
);

The `history.history` dictionary then contains the loss and metrics per epoch:

In [None]:
history.history

#### Predictions

Use the model for predictions with [`model.predict()`](https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict) (i.e., inference).

Models return [logits or log-odds](logits_and_log_odds). If you'd like these be to probabilities, add a softmax layer:

In [None]:
probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])

In [None]:
y_pred = probability_model.predict(test_images)

Each prediction has a probability per category:

In [None]:
y_pred[0]

The most likely category can be found by finding the maximum of these (using [`np.argmax`](https://numpy.org/doc/stable/reference/generated/numpy.argmax.html)):

In [None]:
np.argmax(y_pred[0])

So, the model thinks the first digit is a 7.

Let's see if that's right by plotting the first four test digits with their predictions:

In [None]:
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, prediction in zip(axes, test_images, y_pred):
    ax.set_axis_off()
    image = tf.reshape(image, (28, 28))  # 1D 784 pixels to 2D 28*28 pixels for plotting
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    ax.set_title(f"Prediction: {np.argmax(prediction):.0f}")

#### Let's now evaluate the model overall

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy (R2): {test_acc}")

Similar to scikit-learn an overall test accuracy of 97% is good.

Note, that the training accuracy and validation accuracy were both 97% too.

As before, let's have a look at a confusion matrix for some quick error analysis.

_Note, TensorFlow does have its own [`confusion_matrix`](https://www.tensorflow.org/api_docs/python/tf/math/confusion_matrix) method. Though I'll use the scikit-learn one here again as it has a nice plot feature._

In [None]:
confusion_matrix = metrics.ConfusionMatrixDisplay.from_predictions(
    test_labels, np.argmax(y_pred, axis=1)
)
confusion_matrix.figure_.suptitle("Confusion Matrix")
plt.show()

This model did well for most digits, though struggled a bit with 5's.

#### [Save the model](https://www.tensorflow.org/tutorials/keras/save_and_load)

A model includes:

- Architecture
- Weights (i.e., state)
- Configuration (e.g., optimiser, loss, metrics)

You can save the whole or parts.

The different formats are:

- [TensorFlow SavedModel](https://www.tensorflow.org/guide/saved_model): single archive (recommended)
    - Save: `model.save()` or `tf.keras.models.save_model()`
    - Load: `tf.keras.models.load_model()`
    - Note, Keras H5 was the older format.
- Architecture only (JSON)
    - Save: `get_config()` and `tf.keras.models.model_to_json()`
    - Load: `from_config()` and `tf.keras.models.model_from_json()`
- Weights only

In [None]:
model.save(f"{path_models}/model_tf_mnist")

In [None]:
!ls {path_models}/model_tf_mnist

#### Load the model

Reload the saved model and evaluate it on the test data.

In [None]:
new_model = tf.keras.models.load_model(f"{path_models}/model_tf_mnist")
new_model.summary()

In [None]:
loss, acc = new_model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))

### [PyTorch (Lightning)](https://colab.research.google.com/github/PyTorchLightning/lightning-tutorials/blob/publication/.notebooks/lightning_examples/mnist-hello-world.ipynb)

Here, we'll do a [simple example](https://colab.research.google.com/drive/1F_RNcHzTfFuQf-LeKvSlud6x7jXYkG31#scrollTo=nbQAcRna5e_q) using PyTorch Lightning.

This avoids creating some of the boilerplate code needed for pure PyTorch.

This will just include training for now (i.e., no validation or testing).

In [None]:
import os

import pytorch_lightning as pl
import torch
import torch.nn.functional as F
from pytorch_lightning.callbacks.progress import TQDMProgressBar
from torch import nn
from torch.utils.data import DataLoader, random_split
from torchmetrics import Accuracy
from torchvision import transforms
from torchvision.datasets import MNIST

```{note}
`torch.nn.functional` contains functions for neural networks, while `torch.nn` defines them as modules.
```

In [None]:
BATCH_SIZE = 32
PATH_DATASETS = f"{os.getcwd()}/data"

#### Prepare the data

In [None]:
train_dataloader = DataLoader(
    MNIST(PATH_DATASETS, train=True, download=True, transform=transforms.ToTensor()),
    batch_size=BATCH_SIZE,
)

#### Create the model

This include the loss, optimiser, and training steps.

`pl.LightningModule` is a `nn.Module` with more features.

For more information on how to convert a PyTorch model to a PyTorch Lightning model see:

- [PyTorch Lightning MasterClass](https://www.youtube.com/playlist?list=PLaMu-SDt_RB5NUm67hU2pdE75j6KaIOv2) videos.
- [Tutorial](https://pytorch-lightning.readthedocs.io/en/stable/starter/converting.html)
- [Demonstration video](https://youtu.be/QHww1JH7IDU)

In [None]:
class MNISTModel(pl.LightningModule):
    def __init__(self):
        super(MNISTModel, self).__init__()
        self.layer1 = torch.nn.Linear(in_features=28 * 28, out_features=10)

    def forward(self, x):
        x = x.view(x.size(0), -1)  # flatten inputs
        x = self.layer1(x)  # pass inputs through hidden layer
        output = torch.relu(x)  # run activation function for layer
        return output

    def training_step(self, batch, batch_index):
        x, y = batch
        y_hat = self(x)  # predicted y output
        loss = F.cross_entropy(y_hat, y)
        tensorboard_logs = {"train_loss": loss}
        return {"loss": loss, "log": tensorboard_logs}

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

In [None]:
mnist_model = MNISTModel()
print(mnist_model)

#### Create the trainer

```{warning}
The progress bar can be too fast for Colab / Kaggle. If developing in these platforms, be sure to slow the refresh rate by increasing the value in: `callbacks=TQDMProgressBar(refresh_rate=20)`.
```

In [None]:
trainer = pl.Trainer(gpus=0, callbacks=TQDMProgressBar(refresh_rate=20), max_epochs=5)

#### Fit the model

In [None]:
if IN_COLAB:
    trainer.fit(mnist_model, train_dataloader)

We can see the loss reduce at the right of the progress bar.

You can change what is logged by editing the `training_step` method.

#### (Optional) [Adding](https://colab.research.google.com/drive/1F_RNcHzTfFuQf-LeKvSlud6x7jXYkG31#scrollTo=gjo55nA549pU) in validation and testing to the model creation

Note, DataLoaders are now incorporated into the model creation.

In [None]:
class MNISTModel(pl.LightningModule):
    def __init__(self):
        super(MNISTModel, self).__init__()
        self.layer1 = torch.nn.Linear(in_features=28 * 28, out_features=10)

    def forward(self, x):
        x = x.view(x.size(0), -1)  # flatten x
        x = self.layer1(x)  # pass inputs through hidden layer
        output = torch.relu(x)  # run activation function for layer
        return output

    def training_step(self, batch, batch_index):
        x, y = batch
        y_hat = self(x)  # predicted y output
        loss = F.cross_entropy(y_hat, y)
        tensorboard_logs = {"train_loss": loss}
        return {"loss": loss, "log": tensorboard_logs}

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

    # -------------------------
    # same as above up to here
    # new stuff below

    def validation_step(self, batch, batch_index):
        x, y = batch
        y_hat = self(x)
        val_loss = F.cross_entropy(y_hat, y)
        return {"val_loss": val_loss}

    def test_step(self, batch, batch_index):
        x, y = batch
        y_hat = self(x)
        test_loss = F.cross_entropy(y_hat, y)
        return {"test_loss": test_loss}

    def validation_epoch_end(self, outputs):  # hook for validation
        average_val_loss = torch.stack([x["val_loss"] for x in outputs]).mean()
        tensorboard_logs = {"val_loss": average_val_loss}
        return {"val_loss": average_val_loss, "log": tensorboard_logs}

    def test_epoch_end(self, outputs):  # hook for test
        average_test_loss = torch.stack([x["test_loss"] for x in outputs]).mean()
        logs = {"test_loss": average_test_loss}
        self.log_dict(logs)
        return {"test_loss": average_test_loss, "log": logs, "progress_bar": logs}

    # also added in the dataloaders below

    def train_dataloader(self):
        return DataLoader(
            MNIST(
                PATH_DATASETS,
                train=True,
                download=True,
                transform=transforms.ToTensor(),
            ),
            batch_size=BATCH_SIZE,
        )

    def val_dataloader(self):
        return DataLoader(
            MNIST(
                PATH_DATASETS,
                train=True,
                download=True,
                transform=transforms.ToTensor(),
            ),
            batch_size=BATCH_SIZE,
        )

    def test_dataloader(self):
        return DataLoader(
            MNIST(
                PATH_DATASETS,
                train=False,
                download=True,
                transform=transforms.ToTensor(),
            ),
            batch_size=BATCH_SIZE,
        )

In [None]:
mnist_model = MNISTModel()
print(mnist_model)

In [None]:
trainer = pl.Trainer(gpus=0, callbacks=TQDMProgressBar(refresh_rate=20), max_epochs=5)

Note, the trainer only required the model as input, as the `train_dataloader` is part of the model now.

In [None]:
if IN_COLAB:
    trainer.fit(mnist_model)

#### [Evaluation](https://pytorch-lightning.readthedocs.io/en/stable/common/evaluation.html)

Now, testing the model is simply done by running:

In [None]:
if IN_COLAB:
    trainer.test(mnist_model)

#### Save the model

The model is saved automatically to `lightning_logs/`.

It is incrementally split over versions e.g., `version_0`.

This then saves checkpoints per epoch, overwriting with the latest epoch.

To [save a model in PyTorch](https://pytorch.org/tutorials/beginner/saving_loading_models.html) (without Lightning):

```python
state_dict = model.state_dict()  # extract the parameters
torch.save(state_dict, "my_model_weights.pth")  # save the parameters
```

#### [Load the model](https://pytorch-lightning.readthedocs.io/en/latest/starter/introduction.html#checkpointing)

```python
path_checkpoints = f"{os.getcwd()}/lightning_logs/version_0/checkpoints"
path_model = f"{path_checkpoints}/{os.listdir(path_checkpoints)[0]}"

reloaded_model = MNISTModel.load_from_checkpoint(path_model)
```

To [load a model in PyTorch](https://pytorch.org/tutorials/beginner/saving_loading_models.html) (without Lightning):

```python
new_state_dict = torch.load("my_weights.pth")  # load the parameters
new_model = MNISTModel(..)  # instantiate a model
new_model.load_state_dict(new_state_dict)  # setup the new model with these parameters
```

## Questions

```{admonition} Question 1

If you were looking to do classic machine learning, what tool is a good choice?

```

```{admonition} Question 2

If you were looking to do deep learning using a high-level API, what tools are a good choice?

```

```{admonition} Question 3

What are good reasons for choosing a high or low-level API?

```

```{admonition} Question 4

When creating a model, which API is simpler to use?

- Sequential
- Subclassing

```

```{admonition} Question 5

Put these general steps in order:

- Compile the model
- Preprocess the data
- Test the model
- Fit the model to the training data
- Create the model
- Download the data

```

```{admonition} Question 6

Which machine learning library is the best?

```

## {ref}`Solutions <tools>`

## Key Points

```{important}

- [x] _scikit-learn is great for classic machine learning problems._
- [x] _TensorFlow and PyTorch are both great for deep learning problems._
- [x] _Keras (high-level API for TensorFlow) and PyTorch Lightning (high-level API for PyTorch) have many high-level objects to help you create deep learning models._
- [x] _You can use low-level APIs for any custom objects._
- [x] _Explore your data before using it._
- [x] _Check your model before fitting the training data to it._
- [x] _Evaluate your model and analyse the errors it makes._

```

## Further information

### Good practices

- Many decisions around model architecture are based on previous work, literature, and trial-and-error.
- Debugging: 
    - Test each part individually, before testing the whole.
    - Check the model summary and visualise the architecture.
    - Use debug modes:
        - Add `run_eagerly=True` with the call to `fit()` in Keras.
        - Use `Trainer(fast_dev_run=True)` in PyTorch Lightning.
    - Tips for [Keras](https://keras.io/examples/keras_recipes/debugging_tips/) and [PyTorch Lightning](https://pytorch-lightning.readthedocs.io/en/stable/common/debugging.html).
- Offloading computations to a GPU may not be beneficial for small models.
- Tips for optimising GPU performance from [TensorFlow](https://www.tensorflow.org/guide/gpu_performance_analysis), [NVIDIA](https://docs.nvidia.com/deeplearning/performance/index.html).

### Other options

There are [many other tools for machine learning](https://github.com/josephmisiti/awesome-machine-learning), including:

- [JAX](https://jax.readthedocs.io/en/latest/#)
    - A library for GPU accelerated NumPy with automatic differentiation.
- [Flax](https://github.com/google/flax)
    - A neural network library and ecosystem for JAX that is designed for flexibility.
- [Haiku](https://dm-haiku.readthedocs.io/en/latest/)
    - Built on top of JAX to provide simple, composable abstractions for machine learning research.
- [XGBoost](https://xgboost.readthedocs.io/en/stable/)
    - Gradient boosting library.
- [Caffe](https://caffe.berkeleyvision.org/)
    - Deep learning framework.
- [Sonnet](https://sonnet.readthedocs.io/en/latest/)
    - High-level API for TensorFlow.
- [fastai](https://docs.fast.ai/)
    - High-level API for PyTorch.

### Resources

- [Machine Learning Glossary](https://developers.google.com/machine-learning/glossary)
- [Project template for PyTorch Lightning](https://github.com/PyTorchLightning/deep-learning-project-template)