Paddle Fluid is used to allow users to execute programs like PyTorch and Tensorflow Eager Execution. In these systems, there is no longer a model, and the application no longer contains a symbolic description of an Operator graph or series of layers, but rather describes the training or prediction process as a general-purpose program. This column will introduce a series of technical articles, which will provide guidance for those interested in PaddlePaddle by comparing TensorFlow and Paddle Fluid in terms of the concept and use of frameworks. Today we will present the first of a series of articles on Paddle Fluid design ideas and basic concepts.

Evolution of deep learning platforms

Today, deep learning has become the most popular machine learning technology in fact. Many years of research in academia and long-term practice in industry have put forward some effective basic modeling units: full connection, convolution, cyclic neural network, etc. Design various training techniques: initialization methods, cross-layer connections, various norm techniques, etc. A variety of new optimization algorithms were invented: Adadelta, Adam, etc. All kinds of fixed network structures, such as highway, residual, attention, etc., have emerged and are too numerous to enumerate. Years of contributions from academia and industry have contributed to the impact of deep learning methods today.

A large amount of knowledge has been accumulated in academic research and production practice, which can well explain the independent learning ability and characteristics of basic modules in neural network. The combination of basic modules and training techniques can build a kaleidoscope of neural network models. The basic modules and training techniques are limited, but their combinations are infinitely varied, which is the beauty and difficulty of deep learning methods.

It is this highly modular nature that has led researchers and engineers to strive to improve research and production efficiency by avoiding reinventing the wheel, which has led to the development of deep learning platforms that have evolved into an important part of the AI infrastructure. From Theano to DistBelief to TensorFlow; Caffe to Caffe2; Torch to PyTorch; From PaddlePaddle to PaddleFluid, deep learning platform technology has also gone through two generations of evolution and is moving towards the third generation of platform technology.

In today’s historical development, when we are ready to switch and try to use a new deep learning platform as a tool to support our own learning and research, what are the evolvement of platform technology and what convenience can it bring to us? Let’s take a look at three major problems that deep learning frameworks solve:

  • How can computing be described to support new models that may emerge in the future?
  • How to maximize computing power with heterogeneous devices efficiently?
  • How can computers on a network be used for distributed computing to process quadrillion levels of data?

The first of these three questions is most relevant to the user researcher. In this article, we analyze the different design concepts of PaddleFluid and TensorFlow to understand how a deep learning framework abstract the deep learning model, and to see how our experience is transitioned and transferred between different deep learning platforms.

How to describe computation

Let’s start by looking at PaddleFluid and TensorFlow’s respective choices on how to describe machine learning models.

The Computation of TensorFlow Graph

TensorFlow uses the Dataflow Graph to describe all the computations and states involved in machine learning models. A TensorFlow model has only one calculation graph, which includes mathematical operation and operation object (parameter), and even parameter initialization, optimization algorithm part (update rules for learnable parameters), data preprocessing, etc.

A calculation like this can be further explained:

  • A machine learning model, represented by a directed acyclic graph in TensorFlow;
  • The nodes in the figure correspond to a specific Operation in the machine learning model, which is called Operation in TensorFlow.
  • The edges in the figure are the flow of input and output data between operations.

In TenorFlow, the inputs and outputs of Operation are denoted by the Tensor, which simply means that the Tensor forms the edge of the diagram.

Summary:

1. In this paper, we ignore the design of TensorFlow in distributed and heterogeneous computing. TensorFlow uses computational graph (a directed acyclic graph) to describe the machine learning model, and the definition and optimization process of any model will be transformed into a computational graph. The node in the calculation diagram is Operation, indicating how to calculate; The edges in the calculation diagram are the flow of input and output data between operations. Tensor in TensorFlow;

TensorFlow graphs follow deferred execution. A graph that represents the network topology of the neural network must be declared in advance and cannot be changed at runtime.

The Program of PaddleFluid

How to describe the computation largely determines the completeness of the computation function of a neural network framework. After more than 20 years of development, models and methods of deep learning can no longer satisfy the imagination of researchers and tens of thousands of framework users.

From the design goal of PaddleFluid, the goal of PaddleFluid is to create a new computational description method, which can not only describe the mainstream neural network models known so far, but also support any model that will appear in the future.

How does PaddleFluid achieve its goal of supporting new models emerging in the future? PaddleFluid’s design choice is to describe the machine learning model to the user in a Program (which is converted into a description language called ProgramDesc within PaddleFluid) rather than in a computational graph. Program provides a new description language that can describe arbitrarily complex machine learning models in a way that is intuitive to users.

The first lesson for all computer majors to learn programming languages must be to establish the understanding of “the three execution structures of programming languages: sequential execution, conditional selection and cyclic execution”. All computable logic in the computer world is represented by these three execution structures, and the logic described by these three structures is computable. In the same way, a neural network framework that provides support for all three execution structures, as well as a programming language, will be able to describe arbitrarily complex, computable machine learning models. PaddleFluid enables the description of arbitrarily complex models by providing support for these three execution structures.

To be specific:

1. The core design concept of Fluid can be likened to programming language. If you already have programming experience, the experience of constructing neural network model using Fluid will be very similar to that of writing a program.

2. In PaddleFluid, users do not explicitly perceive the concept of “computational graph”, a machine learning model is described as a Fluid Program (called ProgramDesc internally in Fluid);

  • A Fluid Program consists of a set of nested blocks. The concept of a Block is analogous to a pair of curly braces in C++ or Java, or an indent in Python;
  • The computation in a Block is composed of sequential execution, conditional selection, or cyclic execution to form a complex calculation logic.

3. Fluid Program contains descriptions of computation and computation objects. The description of the computation is called Operator; The calculated objects (or the input and output of the Operator) are unified as the Tensor.

  • The choice of deep learning frameworks is the same when it comes to describing computing and what it does, and it’s easy to migrate from platform to platform if you have experience with one.

conclusion

The table below summarizes the different design choices made by TensorFlow and PaddleFluid to describe machine learning models. You can see that Operator and Tensor, the basic elements that make up the Tensor model are similar on both platforms. If you have experience with any platform, you can quickly generalize these concepts across platforms.


Core usage concepts

In the following sections, we’ll look in more detail at how core usage concepts are used on both platforms.

Objects for data representation and calculation: Tensor

Tensor is an extension of vector matrix. It is the basic object of neural network model computation. This is the common choice of all the major deep learning platforms today.

You can easily think of Tensor as an N-dimensional vector, it can have any number of dimensions. A Tensor has two basic characteristics:

1. Data types: All the elements of each Tensor have the same, known data types;

2. Size (or shape) : the number of dimensions and the length of each dimension.

  • The length of some dimensions of the Tensor may not be known at the time of defining the model, but at the time of executing the actual algorithm. For example, the number of samples (batch size) contained in a mini-batch, or the maximum length of sequences in a mini-batch.

The Tensor TensorFlow

In TensorFlow’s internal implementation, the Tensor stores it as an N-dimensional array, where each element is a specific value, such as an integer, a floating point, etc. As the name TensorFlow suggests, Tensor is what’s being calculated in TensorFlow. So in an algorithm, the Operation input is Tensor, the intermediate result is Tensor, the final result is Tensor.

In TensorFlow, there are some special Tensor, some of the more common ones are:

1. Tf. Variable: Variable is used to represent parameters in machine learning algorithms and has global visibility. Compared to normal Tensor, Variable is not bound by sessions, so in distributed computing, multiple units can see the same Varible;

TTF. Placeholder: Tensor has to plug in specific data at the time of execution, usually to bring in input data;

3. Tf. Constant Tensor, used to create common constants like all zeros, all ones, etc.

The Tensor PaddleFluid

The Tensor is also used in PaddleFluid as a unified representation of input and output data in neural networks. The Tensor concept is exactly the same in today’s mainstream deep learning platforms, and it migrates seamlessly between deep learning frameworks.

There are also three different kinds of Tensor in Fluid:

1. Learnable parameters in the model

The duration of learnable parameters in the model is the same as that of the whole training task, and will be updated by the optimization algorithm. In PaddleFluid it is also Variable;

In most cases, users do not need to create learnable parameters in the network themselves, and Fluid provides encapsulation for almost common basic computing modules in neural networks. For the simplest fully connected model, the following code snippet creates two learnable parameters WW and bias directly for the fully connected layer, without explicitly calling the variable-related interface to create learnable parameters.

import paddle.fluid as fluid

y = fluid.layers.fc(input=x, size=128, bias_attr=True)
Copy the code

Input and output Tensor

The input data for the whole neural network is also a special Tensor, where some dimensions cannot be determined when defining the model (usually including: batch size; For example, between mini-batch, the data can be variable, including the maximum length of the sequence, the width and height of the picture, etc.), and placeholder is needed when defining the model;

Fluid. Layer. data needs to supply the shape of the input Tensor. When you have a dimension you can’t determine, you specify it as None, as shown in the code snippet below:

import paddle.fluid as fluid

x = fluid.layers.data(name="x", shape=[2, None, 3], dtype="int64")
Copy the code

You need a Tensor in PaddleFluid by combining the Tensor with fluid. Layers. Assign.

conclusion

1. In TensorFlow and PaddleFluid, Tensor is used to describe the input and output of neural network and the intermediate settlement result.

You have a poor understanding of learnable parameters:

In TensorFlow, the learnable parameter is represented by tf.variable (assuming import TensorFlow as tf has been executed);

In Fluid, the learnable parameter is represented as fluid.Variable (assuming import paddles. Fluid as Fluid is executed here);

Whether using TensorFlow or PaddleFluid, it is usually possible to directly use the higher level API, which has already encapsulated almost all common neural network units, such as full connection, LSTM, CNN, etc. These packages have correctly created the learnable parameters required by the module for the user. You don’t usually need to create learnable parameters yourself.

You have to put in a special Tensor:

TensorFlow placeholder with tF.placeholder;

For users, it can be considered logically equivalent to fluid. Layers. data in PaddleFluid;

Note, however, that the implementation mechanisms are different within the framework. Tf. placeholder is a Tensor, while pd. Layers. data creates an output Tensor, and it also creates an operator for the Feed data.

Calculation primitives: Operation/Operator

Tensor is the unified numerical representation of all the major deep learning frameworks today (inputs, outputs, intermediate calculations, the learnable parameters of the model are all Tensor). On the other hand, data operations are also highly unified as Operator/Operation in mainstream deep learning frameworks. In Chinese, we are used to calling it an operator.

Note: In the official documentation for TensorFlow, there is no essential difference between using Operation for Tensor operations and changes and using Operator for Tensor operations in PaddleFluid. The two are used interchangeably below, but they are really the same concept.

Operation/Operator takes a bunch of Tensor inputs, puts a bunch of Tensor outputs, that’s how you go from input to output.

The Operation of TensorFlow

One Operation, take a bunch of Tensor as input, take a bunch of Tensor as output. You can see that the Operator is the node of the diagram, you take the tensor from the edge that goes into that node and you do the calculation, and then the resulting tensor is the edge that goes from that node. A typical Operator is tm. matmul, which takes two Tensor inputs, multiplishes them, and outputs a Tensor as a result. All operators provided by TensorFlow can be viewed in the API help documentation.

The Operator in PaddleFluid

An Operator in a PaddleFluid is exactly equivalent to an operation in TensorFlow. All operators supported by PaddleFluid can be viewed in the API help documentation.

For user convenience, the Operator in Fluid is further packed into modules like paddle.fluid. Layers, paddle.fluid.Net works on the Python side. That’s because some of the common things you do for Tensor are probably more basic things like: The l2 norm is internally completed by the combination calculation logic of reduce, Elementwise_Add, Scale and other operators. In order to improve the convenience of use, the framework has encapsulated some basic operators. This section describes how to create learnable parameters for Operator dependencies and initialization details for learnable parameters to reduce the cost of repeated development.

All deep learning frameworks face the same encapsulation. In most cases, users rarely deal directly with the Operator at the bottom of the framework, but use layers, Networks and other modules provided by the framework to reduce the amount of code to be developed. No matter what kind of concepts they are, the essence and effect of them is the same from mining to mining: transformations of Tensor.

conclusion

Call them Operation, Operator or layers, and they mean the same thing in every deep learning platform: transformations to the Tensor. Is a deep learning platform that provides basic computing power. This can be found in the respective API help documentation for each platform.

Today, all deep learning platforms have joined the ONNX project, and the basic operators provided by each deep learning platform have converged. At the same time, each platform has its own characteristics, providing some unique operators to facilitate the development of a certain type of task.

Build the model and execute

At this point, we see the building blocks of the model: the Tensor and Operator can migrate directly between the two frameworks. In the final step, let’s look at how the entire training task works on both platforms.

Graph and Session in TensorFlow

TensorFlow describes the machine learning model with a calculation diagram. The nodes in the diagram are Operation and the edge is Tensor. In TensorFlow, tf.graph maintains topology information for the entire Graph;

  • An additional point to note about graph is that a TensorFlow graph has a collection concept corresponding to it. A collection is a key-value table in a graph context that maintains graph-level (that is, global) data. For example, a variable can be set to be globally visible, and all the information related to this variable will be maintained in the corresponding collection of the calculation graph.

2. “above” the figure, TensorFlow uses the session mechanism to actually execute the model.

  • For a defined Graph of TensorFlow (that is, a defined neural network model), create a tf.session for it to perform initialization, running, and other process-level operations on the model.
  • To be more precise, TensorFlow’s Session connects the user program to the backend Runtime, where the user defines and sets up the machine learning model. The back-end Runtime is the program that actually performs real computing tasks such as algorithm training and testing. This “connection” is also a kind of “isolation”, separating the user from the details of distributed computing involved in real computing for easy use.

Program and Executor in Fluid

1. PaddleFluid uses Program to describe the neural network model. For users, there is no concept of computational graph. All of your user-defined Tensor and your operations on the Tensor are going to be put into a Program;

  • A Program consists of nested blocks, but the user does not have to explicitly create the Block or explicitly notice its existence;
  • In PaddleFluid programs, blocks are created by special operators such as while_OP, IF_op, and parallel_DO when they are called;
  • For the user, you just need to know that you’re adding variables and operators into a Fluid Program.

PaddleFluid Uses Executor to execute a Fluid Program.

  • To further understand the role of an Executor in a Fluid, we need to explain how a Fluid program executes. The following figure shows Fluid execution flow on a single machine:

▲ Fig.Fluid Local training task execution Flow chart

1. Fluid design idea and inspiration is very similar to programming language, and is very similar to the process of programming in high-level compiled language C++/Java. Fluid program execution is divided into two important stages: compile time and run time;

At compile time, the user adds variables and Operators or Layers to a Program by calling the operator provided by Fluid. Users only need to describe the forward computing of the core, and do not need to care about reverse computing, distributed computing, and heterogeneous devices.

3. The original Program is converted into an intermediate description language within the platform: ProgramDesc;

4. One of the most important feature modules at compile time is Transpiler. Transpiler takes a segment of ProgramDesc and outputs a changed segment of ProgramDesc as the Fluid Program that the back-end Executor ultimately needs to execute.

The most common Transipler types include:

  • Memory optimization Transipler: By analyzing the variable read and write dependencies, insert the memory recycle Operator to maintain a small memory overhead during running.
  • Transpiler in a distributed environment: Takes a user-defined local Program and generates two programs executed by the Parameter Client and the Parameter Server.

5. The back-end Executor receives the Transpiler output Program and executes the operators in turn (analogous to instructions in a programming language), creating and managing the required inputs and outputs for the operators during execution.

The Fluid Program execution process is divided into: compiler definition Program and Executor execution Program. The process by which an Executor executes a Program is non-interactive and non-interruptible. In PaddleFluid, you can create an extra Program. By default, there are two programs in a PaddleFluid:

Fluid. Framework. Default_startup_program: this program defines various operations such as creating model parameters, input and output, and initialization of learnable parameters in the model.

  • Default_startup_program can be automatically generated by the framework and used without being explicitly created;
  • If the call changes the default initialization of the parameter, the framework automatically adds the change to default_starTUP_program.

2. Fluid. Framework. Default_main_program: defines the neural network model, forward and reverse calculation, and the optimization algorithm to update the learnable parameters in the network;

  • The core of using Fluid is to build default_MAIN_program.

3. Scope in PaddleFluid is similar to the concept of Collection in TensorFlow, but Scope is a back-end concept of framework in Fluid and users cannot operate it directly. Therefore, there is no need to care when using the framework.

conclusion

To the user of the framework, the Graph in TensorFlow can be considered equivalent to the Program in PaddleFluid, and their role in the framework is exactly the same: they complete the definition of the model.

Sessions in TensorFlow are very similar in usage logic to Executors in PaddleFluid.

TensorFlow uses Session to complete initialization on the calculation graph, calculation and other running logic, connecting the front end and back end of TensorFlow.

PaddleFluid executes a user-defined Fluid Program through an Executor.

1. Executor connects the front and back ends of the PaddleFluid;

2. Executor accepts the original user-defined model (a Program) and optimizes the original Program by calling transpilers with different functions in the system.

Full example: How to train a machine learning model

In this section, we take the MNIST handwritten number recognition problem, the machine learning task’s “Hello World” problem and data, as an example, through a complete working instance, Learn how the concepts described above can be used and transferred between TensorFlow and PaddleFluid platforms.

TensorFlow instance

The following uses Tensorflow to define a basic MLP (single hidden layer neural network) to model this problem to illustrate the basic usage of Tensorflow.

Step 1: Define the data

Y = tf.placeholder(tF.int32, shape=[None,]);Copy the code

As mentioned earlier, tF.placeholder is a special Tensor for bringing in data, where x and y_ represent the characteristics and labels of the data, respectively.

Step 2: Define the model

Y = tf. Layers. Dense (Inputs =x, Units =10) # operation loss calculation specifies cross_entropy = tf.losses. Sparse_softmax_cross_entropy (labels=y_, Train_op = tf.train.adamoptimizer ().minimize(cross_entropy)Copy the code

This procedure is divided into three parts:

1. Parameter definition: A single hidden layer MLP, according to Tensorflow’s calculation graph abstraction method, that is, for input xx, y=\omega x+ BY =ωx+b is calculated to obtain output YY. Xx and YY are input and output tensor, \omegaω and BB are parameter tensor.

Tensorflow’s TF.layers provides common operation calculation logic, and tF.layers. Dense is used here to calculate the full connection layer in the neural network.

2. Definition of loss calculation method: In the process of model training, Loss is used to measure the gap between the current model output and the target, and is also the basis for optimization algorithm iteration model. The commonly used loss is defined in TF.Losses, where cross entropy, a common loss in multi-classification situations, is used. Here the parameter y_ refers to the target label, as defined above in the data introduction section.

3. Operation construction: In addition to the parameters and Loss determined above, the optimization algorithm to be used during iteration needs to be specified. The same operation can be executed using different optimization algorithms.

Step 3: Initialize the parameters

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
Copy the code

The training process of the model is managed by tf.session. Tf.session.run () takes Graph after the initialization parameter as the input parameter to prepare for model training.

The default parameter initialization is used here. In fact, there are many options for parameter initialization of neural network training, which is beyond the scope of this article and will be discussed in detail in later chapters.

Step 4: Data input + perform model training

train_reader = data_iterator()
test_lbl, test_img = load_MNIST("testing")
for step in range(100):
    images_batch, labels_batch = next(train_reader)
    _, loss_val = sess.run(
        [train_op, cross_entropy],
        feed_dict={
            x: images_batch,
            y_: labels_batch.astype("int32")
        })
    print("Cur Cost : %f" % loss_val)
Copy the code

The so-called model iteration usually feeds data to the model in batch, and then updates model parameters according to the specified loss and optimization methods. The core function is the call to tf.session.run (), which includes the previously defined operation, optimization methods, and incoming data.

Where, the sources of input data are the following functions:

def data_iterator(dataset="training", path="data", batch_size=128):
    batch_idx = 0
    lbl, img = load_MNIST(dataset, path)
    while True:
        # shuffle labels and features
        idxs = np.arange(0, len(lbl))
        np.random.shuffle(idxs)
        shuf_features = img[idxs]
        shuf_labels = lbl[idxs]
        for batch_idx in range(0, len(lbl), batch_size):
            images_batch = shuf_features[batch_idx:
                                         batch_idx + batch_size] / 255.
            images_batch = images_batch.astype("float32")
            labels_batch = shuf_labels[batch_idx:
                                       batch_idx + batch_size].astype("int32")
            yield images_batch, labels_batch
Copy the code

The tf_load_MNIST used in this program reads data from a file. Shuffle the data, and then organize each batch of data by batch_size.

Step 5: Observe the effect of the model

The above steps have built a complete training program for Tensorflow model. Each batch is observed once for loss, and the iterative effect of the model can be seen intuitively:


▲ Fig. TensorFlow MNIST handwritten digit recognition task cost reduction curve

Attached: complete code

import numpy as np
import tensorflow as tf

from tf_load_MNIST import load_MNIST


def data_iterator(dataset="training", path="data", batch_size=128):
    batch_idx = 0
    lbl, img = load_MNIST(dataset, path)
    while True:
        # shuffle labels and features
        idxs = np.arange(0, len(lbl))
        np.random.shuffle(idxs)
        shuf_features = img[idxs]
        shuf_labels = lbl[idxs]
        for batch_idx in range(0, len(lbl), batch_size):
            images_batch = shuf_features[batch_idx:
                                         batch_idx + batch_size] / 255.
            images_batch = images_batch.astype("float32")
            labels_batch = shuf_labels[batch_idx:
                                       batch_idx + batch_size].astype("int32")
            yield images_batch, labels_batch


def main():
    # define the network topology.
    x = tf.placeholder(tf.float32, shape=[None, 784])
    y_ = tf.placeholder(
        tf.int32, shape=[
            None,
        ])

    y = tf.layers.dense(inputs=x, units=10)
    cross_entropy = tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=y)
    train_op = tf.train.AdamOptimizer().minimize(cross_entropy)

    # define the initializer.
    init = tf.global_variables_initializer()

    sess = tf.Session()
    sess.run(init)

    train_reader = data_iterator()
    for step in range(100):
        images_batch, labels_batch = next(train_reader)
        _, loss_val = sess.run(
            [train_op, cross_entropy],
            feed_dict={
                x: images_batch,
                y_: labels_batch.astype("int32")
            })
        print("Cur Cost : %f" % loss_val)


if __name__ == "__main__":
    main()
Copy the code

Tf_load_mnist.py complete code:

import os
import struct
import numpy as np

def load_MNIST(dataset="training", path="."):
    """
    Python function for importing the MNIST data set.  It returns an iterator
    of 2-tuples with the first element being the label and the second element
    being a numpy.uint8 2D array of pixel data for the given image.
    """
    path = os.path.join(os.path.abspath('.'), "data")

    if dataset is "training":
        fname_img = os.path.join(path, "train-images.idx3-ubyte")
        fname_lbl = os.path.join(path, "train-labels.idx1-ubyte")
    elif dataset is "testing":
        fname_img = os.path.join(path, "t10k-images.idx3-ubyte")
        fname_lbl = os.path.join(path, "t10k-labels.idx1-ubyte")
    else:
        raise ValueError("dataset must be 'testing' or 'training'")

    # Load everything in some numpy arrays
    with open(fname_lbl, "rb") as flbl:
        magic, num = struct.unpack(">II", flbl.read(8))
        lbl = np.fromfile(flbl, dtype=np.int8)

    with open(fname_img, "rb") as fimg:
        magic, num, rows, cols = struct.unpack(">IIII", fimg.read(16))
        img = np.fromfile(fimg, dtype=np.uint8).reshape(len(lbl), rows * cols)

    return lbl, img
Copy the code

PaddleFluid instance

Step 1: Define the data

PaddleFluid receives input data with fluid. Layers. data.

import numpy as np

import paddle.fluid as fluid
import paddle.v2 as paddle

# define the input layers for the network.
x = fluid.layers.data(name="img", shape=[1, 28, 28], dtype="float32")
y_ = fluid.layers.data(name="label", shape=[1], dtype="int64")
Copy the code

The 0 dimension of Tensor in Fluid is fixed as batch size. In the code snippet above, the shape of the image input X is: [1, 28, 28]. The three dimensions are the number of channels, the height and width of the image respectively.

In fact, inside the Fluid framework, an image input is a 4-D Tensor, and the 0th dimension of all Tensor is fixed to batch size. The batch size is automatically filled with placeholders within the framework. You do not need to specify fill placeholders for Batch size.

If the Tensor has a dimension that can only be determined at run time, except batch size, you can specify None at that place.

Step 2: Define the model

A neural network with a hidden layer is defined by calling the operator provided by Fluid. Fluid model is divided into two parts: model structure and optimization method. Much like TensorFlow programs, concepts can be migrated directly.

# define the network topology.
y = fluid.layers.fc(input=x, size=10, act="softmax")
loss = fluid.layers.cross_entropy(input=y, label=y_)
avg_loss = fluid.layers.mean(loss)

# define the optimization algorithm.
optimizer = fluid.optimizer.Adam(learning_rate=1e-3)
optimizer.minimize(avg_loss)
Copy the code

Fluid uses Program to describe the model instead of computational diagrams. In general, the user does not need to care about the details of Program. When calling layers above, a global Program is presented: Fluid.framework.default_main_program insert variables and your operations on variables (layers and Optimzier in the above snippet).

Step 3: Initialize the parameters

As described above, the Executor in a Fluid program is the interface that connects the front and back ends of a Fluid.

By default, a Fluid model has at least two programs. The Program that initializes learnable parameters in the network is called fluid.default_startup_program().

Only the executor can execute a Fluid Program, so you need to create a Fluid Executor before initializing learnable parameters in the network.

# define the executor.
place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
Copy the code

In the above code snippet, place is used to tell executor on which device a Fluid Program should be executed. Common examples are fluid.cpuplace () and fluid.cudaplace ().

Step 4: Data input + perform model training

The neural network model we defined in Step 2 is eventually inserted into a fluid Program called fluid.framework.default_main_program.

Once the network learnable parameter is initialized, it can be trained by having the Executor execute the fluid. Framework. default_main_program.

train_reader = paddle.batch(
        paddle.reader.shuffle(paddle.dataset.mnist.train(), buf_size=5000),
        batch_size=BATCH_SIZE)
feeder = fluid.DataFeeder(place=place, feed_list=[x, y_])

for pass_id in range(100):
    for batch_id, data in enumerate(train_reader()):
        loss = exe.run(
            fluid.framework.default_main_program(),
            feed=feeder.feed(data),
            fetch_list=[avg_loss])
        print("Cur Cost : %f" % (np.array(loss[0])[0]))
Copy the code

As you can see from the code snippet above, the Fluid training process is very similar to the TensorFlow training process, both in a for loop that reads a mini-batch of data, Call the executor to perform Fluiddefault_main_program: receives mini-batch input on which to run forward, reverse, and parameter update calculations.

Note: The above program uses the MNIST data built into Fluid, which is exactly the same as the MNIST data we provided to the TensorFlow sample program.

Step 5: Observe the effect of the model

The above steps have formed a complete training program for Tensorflow model. Each batch is observed once for loss, and the iteration effect of the model can be seen intuitively:


Fig. Cost reduction curve of Fluid MNIST handwritten digit recognition task

Attached: complete code

import numpy as np

import paddle.fluid as fluid
import paddle.v2 as paddle


def main():
    BATCH_SIZE = 128

    # define the input layers for the network.
    x = fluid.layers.data(name="img", shape=[1, 28, 28], dtype="float32")
    y_ = fluid.layers.data(name="label", shape=[1], dtype="int64")

    # define the network topology.
    y = fluid.layers.fc(input=x, size=10, act="softmax")
    loss = fluid.layers.cross_entropy(input=y, label=y_)
    avg_loss = fluid.layers.mean(loss)

    optimizer = fluid.optimizer.Adam(learning_rate=5e-3)
    optimizer.minimize(avg_loss)

    # define the executor.
    place = fluid.CPUPlace()
    exe = fluid.Executor(place)
    exe.run(fluid.default_startup_program())

    train_reader = paddle.batch(
        paddle.reader.shuffle(paddle.dataset.mnist.train(), buf_size=5000),
        batch_size=BATCH_SIZE)
    feeder = fluid.DataFeeder(place=place, feed_list=[x, y_])

    for pass_id in range(100):
        for batch_id, data in enumerate(train_reader()):
            loss = exe.run(
                fluid.framework.default_main_program(),
                feed=feeder.feed(data),
                fetch_list=[avg_loss])
            print("Cur Cost : %f" % (np.array(loss[0])[0]))

if __name__ == "__main__":
    main()
Copy the code

conclusion

In this section, based on the handwritten number recognition dataset MNIST, we demonstrate the implementation of a fully connected neural network with a single hidden layer using TensorFlow and PaddleFluid through a fully operational example. This example demonstrates the core concepts, user interfaces, and design choices of user experience of mainstream deep learning frameworks.

As you can see, although the internal implementation is very different, for users, the core concepts of deep learning model, including Tensor, Operation, Optimzier, network initialization, etc., have corresponding implementations in various mainstream deep learning frameworks. If you have experience with one framework, that experience is easily transferable to other deep learning frameworks.

From the perspective of iteration effect, the simple model in this paper fits the training data as expected, but the effect is not amazing. The reason is that the input data is the image pixel value, and the neural network model here is very simple with limited fitting ability. In the following sections, we will use more complex and practical examples to further compare how different deep learning platforms train the same neural network, and how our experience can be switched and promoted among different frameworks, so as to help us choose the most suitable tool to improve the efficiency of research and production.

PaddlePaddle Developer Networking Group

Want more deep learning frameworks? Join the communication group and communicate with engineers in real time!

Method of joining group: Add wechat ID: PWBOT02 and note “PaddlePaddle” to join group

About PaperWeekly

PaperWeekly is an academic platform for recommending, interpreting, discussing and reporting the cutting-edge papers of artificial intelligence. If you research or engage in the FIELD of AI, welcome to click “Communication Group” in the background of the official account, and the little assistant will bring you into PaperWeekly’s communication group.

Join the community: paperweek.ly

Wechat official account: PaperWeekly

Sina Weibo: @paperweekly