This is the third article in a series of source notes for Keras. In the last two installments, we’ve looked at Keras’ treatment of concepts like Tensor and Layer, and how they work to make a directed acyclic graph. This paper focus on the multilayer network model level of abstraction, namely the nearest to the user interface, source code file is/keras/engine/training. Py and/keras/model. Py, class is to observe the model and Sequential.

Tensor, Node and Layer: Container for Keras

Model: Added training informationContainer

Model.compile() mainly completes the configuration of optimizer, loss, metrics, etc., while the implementation of fit, evaluate, etc., is not configured in compile process.

def compile(self, optimizer, loss, metrics=None, loss_weights=None,
            sample_weight_mode=None, **kwargs):
    loss = loss or {}
    self.optimizer = optimizers.get(optimizer)
    self.sample_weight_mode = sample_weight_mode
    self.loss = loss
    self.loss_weights = loss_weights

    loss_function = losses.get(loss)
    loss_functions = [loss_function for _ in range(len(self.outputs))]
    self.loss_functions = loss_functions

    # Prepare targets of model.
    self.targets = []
    self._feed_targets = []
    for i in range(len(self.outputs)):
        shape = self.internal_output_shapes[i]
        name = self.output_names[i]
        target = K.placeholder(ndim=len(shape),
                               name=name + '_target',
                               sparse=K.is_sparse(self.outputs[i]),
                               dtype=K.dtype(self.outputs[i]))
        self.targets.append(target)
        self._feed_targets.append(target)

    # Prepare metrics.
    self.metrics = metrics
    self.metrics_names = ['loss']
    self.metrics_tensors = []

    # Compute total loss.
    total_loss = None
    for i in range(len(self.outputs)):
        y_true = self.targets[i]
        y_pred = self.outputs[i]
        loss_weight = loss_weights_list[i]
        if total_loss is None:
            total_loss = loss_weight * output_loss
        else:
            total_loss += loss_weight * output_loss

    for loss_tensor in self.losses:
        total_loss += loss_tensor

    self.total_loss = total_loss
    self.sample_weights = sample_weightsCopy the code

The fit() method of the Model object encapsulates the _FIT_loop () internal method, while the key step of the _fit_loop() method is done by the _make_train_function() method, which returns the history object for processing the callback function.

def fit(self, x=None, y=None, ...) : self._make_train_function() f = self.train_function return self._fit_loop(f, ins,...Copy the code

In the _fit_loop() method, the callback performs tasks such as monitoring and recording the training process, train_function is also applied to incoming data:

def _fit_loop(self, f, ins, out_labels=None, batch_size=32, epochs=100, verbose=1, callbacks=None, val_f=None, val_ins=None, shuffle=True, callback_metrics=None, initial_epoch=0): self.history = cbks.History() callbacks = [cbks.BaseLogger()] + (callbacks or []) + [self.history] callbacks = cbks.CallbackList(callbacks) out_labels = out_labels or [] callbacks.set_model(callback_model) callbacks.set_params({ 'batch_size': batch_size, 'epochs': epochs, 'samples': num_train_samples, 'verbose': verbose, 'do_validation': do_validation, 'metrics': callback_metrics or [], }) callbacks.on_train_begin() callback_model.stop_training = False for epoch in range(initial_epoch, epochs): callbacks.on_epoch_begin(epoch) batches = _make_batches(num_train_samples, batch_size) epoch_logs = {} for batch_index, (batch_start, batch_end) in enumerate(batches): batch_ids = index_array[batch_start:batch_end] batch_logs = {} batch_logs['batch'] = batch_index batch_logs['size'] = len(batch_ids) callbacks.on_batch_begin(batch_index, Train_function outs = f(ins_batch) callbacks. On_batch_end (batch_index) batch_logs) callbacks.on_epoch_end(epoch, epoch_logs) callbacks.on_train_end() return self.historyCopy the code

The _make_train_function() method gets the parameter information to update from optimizer and passes the function object from Backend:

def _make_train_function(self):
    if self.train_function is None:
        inputs = self._feed_inputs + self._feed_targets + self._feed_sample_weights
        training_updates = self.optimizer.get_updates(
            self._collected_trainable_weights,
            self.constraints,
            self.total_loss)
        updates = self.updates + training_updates
        # Gets loss and metrics. Updates weights at each call.
        self.train_function = K.function(inputs,
                                         [self.total_loss] + self.metrics_tensors,
                                         updates=updates,
                                         name='train_function',
                                         **self._function_kwargs)
Copy the code

The other methods of Model evaluate(), etc., have a similar structure to fit().

Sequential: Builds the outer interface of the model

Sequential objects are a further encapsulation of Model objects and interface directly to the user. Their compile(), fit(), and predict() methods are almost identical to Model, except for the add() method, which is the most basic operation we use to build networks.

The source code for the Sequential.add() method is as follows:

Outputs: def add(self, layer): def add(self, layer): if not self.outputs: if not layer.inbound_nodes: x = Input(batch_shape=layer.batch_input_shape, dtype=layer.dtype, name=layer.name + '_input') layer(x) self.outputs = [layer.inbound_nodes[0].output_tensors[0]] self.inputs = topology.get_source_inputs(self.outputs[0]) topology.Node(outbound_layer=self, ...) else: output_tensor = layer(self.outputs[0]) self.outputs = [output_tensor] self.inbound_nodes[0].output_tensors = self.outputs self.layers.append(layer)Copy the code

As you can see, the add() method always ensures that the first layer of the network is an InputLayer object and applies the added layer to outputs to update it. So, in essence, adding a new layer to the Model is updating the Model’s outputs.

@ddlee

@ddlee

This article followsCreative Commons Attribution-ShareAlike 4.0 International License.

This means that you may reprint this article by name and attach this agreement.

If you want regular updates on my blog posts, feel free to subscribeDongdong monthly report.

Links to this article: Blog. Ddlee. Cn/posts/ddc5b…


Related articles

  • Keras source code analysis Container

  • Keras source code analysis Layer, Tensor, and Node

  • “Westworld” Rhapsody (with spoilers)

  • Some thoughts and puzzles about deep learning

  • Weight attenuation in deep learning