When we create Keras model and start training, we generally specify the value of some super parameters (such as learning rate) to regulate the training process, and the value of these super parameters will have a great influence on the results of model training. Therefore, a very important step in the machine learning workflow is to determine the optimal value of the model’s hyperparameters, that is, hyperparameter tuning. In TensorFlow, we can easily do this tuning using the HParams plug-in.

What is a hyperparameter

Machine learning and deep learning models often contain tens of thousands of parameters, some of which can be optimized through model training and back-propagation algorithm, such as weights and bias in the model, which are called parameters. There are also some parameters that cannot be optimized through model training, such as the learning rate, the number of hidden layers in the deep neural network and the number of hidden units in the hidden layer. We call these hyper parameters.

Hyperparameters are used to adjust the training process of the whole model. They are only configuration variables and do not directly participate in the training process, so we need to constantly adjust and try to make the model effect reach the optimal. It should be noted that during a training iteration, the parameters are constantly updated while the hyperparameters remain constant.

In general, the optimal combination of hyperparameters is selected based on the loss of the model in the training set and validation set, as well as predefined evaluation metrics, and this set of hyperparameters is ultimately applied to formal training and online Serving services.

Hyperparameter tuning strategy

There are generally many hyperparameters of the model, and each hyperparameter will have a large number of candidate values, forming a large parameter space, if only rely on manual test, time and energy cost is undoubtedly huge, so many automatic parameter adjustment algorithms have been proposed.

At present, the commonly used automatic tuning strategies for hyperparameters include grid search and random search.

The grid search

  1. Grid search is to try all hyperparameter combinations among all candidate hyperparameter values, and select the hyperparameter combination that achieves the optimal model effect as the final solution.

  2. For example, if the model has 2 hyperparameters and each hyperparameter has 3 candidate values, then all the hyperparameter combinations will have 3*3=9. Grid search will try all 9 groups of hyperparameters and select the best combination from them.

  3. The disadvantage of grid search is that The Times of model training evaluation will increase exponentially with the increase of the number of hyperparameters and the number of hyperparameter candidate values, resulting in high computational power and time cost. Therefore, it is not suitable for the case of large number of hyperparameters and the number of candidate values.

Random search

  1. Random search means that random combinations of hyperparameters are selected to try each time. It can manually set the number of attempts to avoid traversing the entire hyperparameter space, thus reducing the cost of attempts.

  2. Compared with grid search, random search can carry out hyperparameter tuning more efficiently, but random search can not guarantee to select the optimal hyperparameter value, so it has uncertainty.

HParams hyperparameter tuning steps

TensorFlow provides the HParams plug-in to assist in hyperparameter tuning. The HParams plug-in supports both grid search and random search strategies. Using Keras model training as an example, the steps of hyperparameter tuning using HParams are described below.

Defining hyperparameters

  1. Define all hyperparameters using the HParam class initializer and specify the Domain of the hyperparameters. Domain has three types: IntInterval: consecutive integer values; Discrete: Discrete values, which can be integers, floating point values, or strings; RealInterval: consecutive floating point values.

  2. For example, HP_DEEP_LAYERS = hp.hparam (“deep_layers”, hp.intInterval (1, 3)) indicates the hyperparameter of consecutive integer values, where 1 is the minimum value and 3 is the maximum value. The value range is [1, 3].

  3. HP_DEEP_LAYER_SIZE = hp.HParam(“deep_layer_size”, hp.Discrete([32, 64, 128]))) specifies a Discrete hyperparameter, which can be any element in the parameter list.

  4. HP_LEARNING_RATE = hp.hparam (“learning_rate”, hp.realInterval (0.001, 0.1))) specifies the hyperparameters of consecutive floating-point values, 0.001 being the minimum value and 0.1 being the maximum value. The value ranges from 0.001 to 0.1.

Define evaluation indicators

  1. Use the Metric class to define the metrics we will use to determine the optimal combination of hyperparameters.

  2. Example Hp. Metric(“epoch_auc”, group=”validation”, display_name=” AUc (val.)”) indicates that the evaluation Metric to be used is EPOch_AUC. The metrics must be either recorded by the Tensorboard callback function or a custom scalar, which is stored in a log file and will be called and displayed by the HPARAMS panel during visualization.

  3. The first Metric constructor parameter, tag, is the Metric name. For metrics recorded using Tensorboard callbacks, the Metric name is epoch_tag or batch_tag, such as epoch_auc. For a custom indicator, tag will be the name set in tF.summary.scalar (“test_auc”, auc, step=1), which is test_auc.

  4. The second Metric constructor argument, group, represents the path where metrics are stored. For example, trained metrics are stored in the train directory, validated metrics are stored in the Validation directory, and custom metrics can be stored in the test directory, as shown in the sample program.

  5. The Metric constructor’s third argument, display_name, indicates the name of the Metric displayed in the HPARAMS panel.

Configuration HParams

  1. Hparams_config (hparams= hparams, metrics= metrics) allows you to set the hyperparameters to be selected and the metrics to be used for evaluation as required.

  2. The hparams_config method takes two parameters representing the list of all HParam hyperparameters to be selected and the list of all Metric metrics to be evaluated.

  3. Without this global setting, HParams by default records all hyperparameters used in the model and all index values output by the model and displays them in the HParams panel.

Build a hyperparametric model

  1. Typically, hyperparameters are passed to model constructors in the form of dict to complete model construction.

  2. The code for the model constructor looks like this:

    def model_fn(hparams) :
        model = keras.models.Sequential()
    
        for _ in range(hparams[HP_DEEP_LAYERS]):
            model.add(
                keras.layers.Dense(
                    units=hparams[HP_DEEP_LAYER_SIZE],
                    activation="relu",
                    use_bias=True,
                ))
        model.add(keras.layers.Dense(units=1, activation="sigmoid"))
        model.compile(
            optimizer=tf.keras.optimizers.Adam(
                learning_rate=hparams[HP_LEARNING_RATE]),
            loss=tf.keras.losses.BinaryCrossentropy(),
            metrics=["AUC"],)return model
    Copy the code
  3. The key of the hyperparameter dictionary is the HParam object defined above, and the value is the value of the basic data type.

  4. The model construction process is the same as normal model construction, except that the fixed parameters are replaced by values in the hyperparameter dictionary. Of course, it is also possible to define a configurable subclass model in advance, and then pass in the hyperparameters to build the model more easily.

Model training

  1. During model training, fit method callback parameters need to be specified, including not only Tensorboard callback function, but also hP. KerasCallback callback function.

  2. The first callback function is used to record the value of loss and metrics, and the second callback function is used to record the combination of hyperparameters used in this training and calculate the final loss value and indicator value.

  3. The first parameter in hp.kerascallback (logdir, hparams) is the directory where hparams logs are recorded, and the second parameter is the hyperparameter dictionary, which is the same as that passed to the model constructor.

  4. If multiple sessions are performed using a set of hyperparameters, the final HPARAMS panel displays the average of the multiple evaluations.

Visual tuning results

  1. If the hyperparameter is selected twice, the structure of the log root directory MLP is as follows:

    MLP ├ ─ ─ 0 │ ├ ─ ─ events. Out. Tfevents. 1589257272. Alexander. 4918.34 v2 │ ├ ─ ─ the test │ │ └ ─ ─ Events. Out. Tfevents. 1589257274. Alexander. 4918.2418 v2 │ ├ ─ ─ "train" │ │ └ ─ ─ Events. Out. Tfevents. 1589257272. Alexander. 4918.95 v2 │ └ ─ ─ the validation │ └ ─ ─ Events. Out. Tfevents. 1589257273. Alexander. 4918.1622 v2 ├ ─ ─ 1 │ ├ ─ ─ events. Out. Tfevents. 1589257274. Alexander. 4918.2575 v2 │ ├ ─ ─ the test │ │ └ ─ ─ events. Out. Tfevents. 1589257275. Alexander. 4918.4958 v2 │ ├ ─ ─ "train" │ │ └ ─ ─ Events. Out. Tfevents. 1589257274. Alexander. 4918.2636 v2 │ └ ─ ─ the validation │ └ ─ ─ Events. Out. Tfevents. 1589257274. Alexander. 4918.4162 v2 └ ─ ─ events. Out. Tfevents. 1589257272. Alexander. 4918.5 v2Copy the code
  2. The 0 and 1 directories store a group of data after hyperparameter training and verification respectively.

  3. The training and validation results of the model (including loss and metrics) are stored in the events.out. tfEvents file in the directory specified by the Tensorboard callback function. In this example, the train and Validation directories are in the 0 or 1 directory.

  4. The logs recorded by HParams are stored in the events.out. tfEvents file in the 0 or 1 root directory.

  5. Start Tensorboard and specify its logdir parameter as MLP, then select the HPARAMS panel to see the visual callback results.

Complete hyperparameter tuning example

Grid search example

Grid search requires traversing all hyperparameter combinations, so Discrete domain types should be used when initializing HParam hyperparameter objects to facilitate data traversal. IntInterval and RealInterval data can also be traversed by specifying the step size.

If the domain type of a hyperparameter object is IntInterval or RealInterval, you can obtain the minimum and maximum values of candidate hyperparameters by using the domain-min_value and domain-max_value attributes of the object. If it is a Discrete type, a list of all candidate values for the hyperparameter can be obtained through the domain-values attribute.

Example code for a grid search looks like this (see the run_all function for the search steps) :

import os
import tensorflow as tf
from tensorflow import keras
from tensorboard.plugins.hparams import api as hp
from absl import app, flags
import shutil
import numpy as np

FLAGS = flags.FLAGS
flags.DEFINE_string("logdir"."mlp"."logs dir")

HP_DEEP_LAYERS = hp.HParam("deep_layers", hp.IntInterval(1.3))
HP_DEEP_LAYER_SIZE = hp.HParam("deep_layer_size", hp.Discrete([32.64.128]))
HP_LEARNING_RATE = hp.HParam("learning_rate", hp.RealInterval(0.001.0.1))

HPARAMS = [
    HP_DEEP_LAYERS,
    HP_DEEP_LAYER_SIZE,
    HP_LEARNING_RATE,
]

METRICS = [
    hp.Metric(
        "epoch_auc",
        group="validation",
        display_name="auc (val.)",
    ),
    hp.Metric(
        "epoch_loss",
        group="validation",
        display_name="loss (val.)",
    ),
    hp.Metric(
        "batch_auc",
        group="train",
        display_name="auc (train)",
    ),
    hp.Metric(
        "batch_loss",
        group="train",
        display_name="loss (train)",
    ),
    hp.Metric(
        "test_auc",
        group="test",
        display_name="auc (test)",),def model_fn(hparams) :
    model = keras.models.Sequential()

    for _ in range(hparams[HP_DEEP_LAYERS]):
        model.add(
            keras.layers.Dense(
                units=hparams[HP_DEEP_LAYER_SIZE],
                activation="relu",
                use_bias=True,
            ))
    model.add(keras.layers.Dense(units=1, activation="sigmoid"))

    model.compile(
        optimizer=tf.keras.optimizers.Adam(
            learning_rate=hparams[HP_LEARNING_RATE]),
        loss=tf.keras.losses.BinaryCrossentropy(),
        metrics=["AUC"],)return model

def run(data, hparams, base_logdir, session_id) :
    model = model_fn(hparams)
    logdir = os.path.join(base_logdir, session_id)

    tensorboard_callback = tf.keras.callbacks.TensorBoard(
        log_dir=logdir,
        update_freq=10,
        profile_batch=0,
    )
    hparams_callback = hp.KerasCallback(logdir, hparams)

    ((x_train, y_train), (x_val, y_val), (x_test, y_test)) = data

    model.fit(
        x=x_train,
        y=y_train,
        epochs=2,
        batch_size=128,
        validation_data=(x_val, y_val),
        callbacks=[tensorboard_callback, hparams_callback],
    )

    test_dir = os.path.join(logdir, "test")
    with tf.summary.create_file_writer(test_dir).as_default():
        _, auc = model.evaluate(x_test, y_test)
        tf.summary.scalar("test_auc", auc, step=1)

def prepare_data() :
    x_train, y_train = (
        np.random.rand(6000.32),
        np.random.randint(2, size=(6000.1)),
    )

    x_val, y_val = (
        np.random.rand(1000.32),
        np.random.randint(2, size=(1000.1)),
    )

    x_test, y_test = (
        np.random.rand(1000.32),
        np.random.randint(2, size=(1000.1)))return ((x_train, y_train), (x_val, y_val), (x_test, y_test))

def run_all(logdir) :
    data = prepare_data()
    with tf.summary.create_file_writer(logdir).as_default():
        hp.hparams_config(hparams=HPARAMS, metrics=METRICS)

    session_index = 0
    for deep_layers in range(HP_DEEP_LAYERS.domain.min_value,
                             HP_DEEP_LAYERS.domain.max_value):
        for deep_layer_size in HP_DEEP_LAYER_SIZE.domain.values:
            for learning_rate in np.arange(HP_LEARNING_RATE.domain.min_value,
                                           HP_LEARNING_RATE.domain.max_value,
                                           0.01):
                hparams = {
                    HP_DEEP_LAYERS: deep_layers,
                    HP_DEEP_LAYER_SIZE: deep_layer_size,
                    HP_LEARNING_RATE: learning_rate,
                }
                session_id = str(session_index)
                session_index += 1
                print("--- Running training session %d" % (session_index))
                hparams_string = str(hparams)
                print(hparams_string)
                run(
                    data=data,
                    hparams=hparams,
                    base_logdir=logdir,
                    session_id=session_id,
                )

def main(argv) :
    del argv  # Unused args
    logdir = FLAGS.logdir
    shutil.rmtree(logdir, ignore_errors=True)
    print("Saving output to %s." % logdir)
    run_all(logdir=logdir)
    print("Done. Output saved to %s." % logdir)

if __name__ == "__main__":
    app.run(main)
Copy the code

Random search example

By calling the sample_UNIFORM () method of the domain property of the hyperparameter object, you can randomly select a value from the candidate values of the hyperparameter, and then train with the randomly generated combination of hyperparameters.

The sample_UNIFORM method can also accept a pseudo-random number generator with a seed, such as random.random (seed), which is important in hyperparameter tuning for distributed training. By specifying the same pseudo-random number generator, It can ensure that the combination of hyperparameters obtained by all worker nodes is consistent each time, thus ensuring that distributed training can be carried out normally.

The code for the random run_all section is shown below, and the code for the rest is the same as for the grid search.

def run_all(logdir) :
    data = prepare_data()
    with tf.summary.create_file_writer(logdir).as_default():
        hp.hparams_config(hparams=HPARAMS, metrics=METRICS)

    session_index = 0
    for _ in range(8):
        hparams = {h: h.domain.sample_uniform() for h in HPARAMS}
        hparams_string = str(hparams)
        session_id = str(session_index)
        session_index += 1
        print("--- Running training session %d" % (session_index))
        print(hparams_string)
        run(
            data=data,
            hparams=hparams,
            base_logdir=logdir,
            session_id=session_id,
        )
Copy the code

HPARAMS panel

After starting Tensorboard, you can see the HPARAMS option at the top of the page, and clicking on it will bring you to the HPARAMS panel.

The HPARAMS panel provides two panes, the left pane provides the filtering function, and the right pane provides the visual evaluation function. The following describes their functions respectively.

Screening pane

The Filter pane provides filtering capabilities to control visual rendering of the right pane. It can select the hyperparameters and indicators for display, filter the values of the hyperparameters and indicators, and sort the visualized results, etc. As shown below:

Visual pane

The visualization pane contains three views, each containing different information.

The Table View lists all the combinations of hyperparameters and the values of the corresponding indicators in a Table. You can also click Show Metrics to display the trend chart of indicators changing with the Batch or epoch. As shown below:

The Parallel Coordinates View consists of a series of vertical Coordinates that represent hyperparameters and indicators, with a line linking the value of each hyperparameter and the corresponding index value. Clicking on any line will highlight the set of values. You can mark a region with the mouse on the coordinate axis, and only the values within the region will be displayed, which is very helpful to determine which group of hyperparameters is more important. As shown below:

The Scatter Plot View consists of a series of Scatter plots associated with hyperparameters and indicators. It helps discover potential relationships between hyperparameters or between hyperparameters and indicators. As shown below:

Matters needing attention

  1. If the value of RealInterval starts at 0, it is a floating point 0. Formal representation.

  2. Each group of hyperparameters requires an independent training process, so log files of different groups of hyperparameter training should be written to different directories.

  3. For hyperparameter tuning, the metrics parameter in the model training method FIT is set to either a string or a global Metric object. In this way, the names of metrics recorded by Tensorboard can be consistent over multiple trainings, such as EPOCH_AUC, without epoch_AUC_1 and EPOch_AUC_2. This allows the HPARAMS panel to get the values of metrics and display them.

  4. When conducting random search for hyperparameters in distributed training, it is necessary to specify a pseudo-random number generator with seed, so that the random values selected by each worker node are consistent, so as to ensure the normal progress of distributed training.

The resources

  1. Hyperparameter Tuning with the HParams Dashboard
  2. Hparams demo