For more in-depth articles, please visit yq.aliyun.com/cloud


Tensorflow is a widely used library that implements machine learning and other algorithms that involve a lot of math. Developed by Google, Tensorflow is one of the most popular machine learning libraries on GitHub. Google uses Tensorflow for machine learning in almost all of its applications. For example, if you are using Google Photos or Google Voice search, you are indirectly using the Tensorflow model, which works on large Clusters of Google hardware and is powerful for task awareness.

The main purpose of this article is to provide beginners with an introduction to TensorFlow, and I assume that you have some familiarity with Python. The core component of TensorFlow is a graph and tensor traversing all nodes through edges. Let’s give you a brief introduction.

Tensor:



Mathematically, a tensor is an N-dimensional vector, meaning that a tensor can be used to represent an N-dimensional data set. The diagram above is a little difficult to understand. Let’s look at a simplified version of this.



The figure above shows some simple tensors of minimum size. As more dimensions are added, data representation becomes more and more complex. For example, if we take a tensor of the form (3 by 3), then I can simply call it a matrix with 3 rows and 3 columns. If I choose a tensor of another form (1000x3x3), I can treat it as a vector or as a set of 1000 3×3 matrices. Here we call (1000x3x3) the shape or dimension of the resulting tensor. Tensors can be constants or variables.

Calculation diagram (flow) :

Now that we know what tensors really mean, it’s time to understand flow. This flow refers to a computational graph or a simple graph, the graph can not cycle, each node in the graph represents addition, subtraction and other operations, and each operation result forms a new tensor.



The figure above shows a simple calculation. The computed graph has the following properties:

The expression above is:

e=(a+b)x(b+1)

Leaf vertices or beginning vertices are always tensors. This means that an operation does not occur at the beginning of the graph, so we can infer that each operation in the graph should accept a tensor and produce a new tensor. Similarly, tensors cannot appear as non-leaf nodes, which means that they are always supplied as inputs to operations/nodes.

Computational diagrams always represent complex operations in a hierarchical order. The above expression can be organized hierarchically by representing a + b as c and b + 1 as D. So we can write e as:

E =(c)x(d) where c = a + b and d = b + 1.

Traversing the graph in reverse order produces a combination of subexpression to form the final expression.

As we move forward, the vertices we encounter are always dependent on the next vertex, such as you can’t get c without a and B, and you can’t get E without solving C and D.

Operations in sibling nodes are independent of each other. This is one of the important properties of computational graphs. When we construct a graph in the way shown in the graph, it is natural, for example, that the nodes of C and D on the same layer are independent of each other, meaning that it is not necessary to know c before evaluating D. So they can be executed in parallel.

Parallelism in computing graphs:

The last attribute mentioned above is certainly one of the most important, and it makes it clear that peer nodes are independent, meaning that they do not need to be idle until C is evaluated, and d can be evaluated in parallel while C is evaluated. Tensorflow makes great use of this property.

Distributed execution:

Tensorflow allows users to perform operations faster using parallel computing devices. Nodes or operations of computation are automatically scheduled for parallel computation. This all happens internally, for example in the figure above, where operation C can be scheduled on the CPU, and operation D can be scheduled on the GPU. The following diagram shows two prospects for distributed execution:



The first is distributed execution of a single system, where a single Tensorflow session (explained later) creates a single worker, which is responsible for scheduling tasks on various devices. In the second case, there are multiple workers, which can be on the same machine or on different machines. Each worker runs in its own environment. In the figure above, worker process 1 runs on a separate machine and arranges operations on all available devices.

Calculation subgraph:

A subgraph is a part of the main graph and is itself a computational graph. For example, in the figure above, we can get a number of subgraphs, one of which is shown below



The graph above is part of the main graph, and from property 2 we can say that a subgraph always represents a subexpression, just as C is a subexpression of E. The subgraph also satisfies the last property. Subgraphs of the same level are also independent and can be executed in parallel. Thus, an entire subgraph can be scheduled on a single device.



The diagram above illustrates parallel execution of subgraphs. There are two matrix multiplications, and because they’re both at the same level, they’re independent of each other, which is consistent with the last property. These nodes are arranged on different devices Gpu_0 and Gpu_1 because of their independence.

Exchange data among workers:

Now we know that Tensorflow distributes all of its operations across different devices managed by the worker. More commonly, data in the form of tensors is exchanged between workers. For example, in the graph e = (c) * (d), once C is calculated, it needs to be further passed to E, so Tensorflows flows from node to node upwards. This action is shown below:



Here, the tensor of device A is passed to device B, which causes some performance delay in the distributed system. The delay depends on one important property, which is the size of the tensor. Device B is in ideal mode until it receives input from device A.

Compression requirements:

So, obviously in the computational graph, tensors flow between nodes. It is important to reduce the delay caused by the flow before it reaches a node that can be processed. One idea for reducing size is to use lossy compression.

Tensor data types play an important role, and let us know why, it is obvious that we demand a higher accuracy in the operation of the machine learning, for example, if we use float32 as tensor data types, so each value is said to use the 32-bit floating point Numbers, so each value takes up the size of the 32-bit, the same is true for 64. Assuming that a tensor of the shape (1000, 440, 440, 3) is 32 times this large number if the data type is 32 bits, the number of values that can be contained in the tensor will be 1000 * 440 * 440 * 3, which takes up significant space in memory and thus delays the flow. Compression techniques can be used to reduce size.

Lossy compression:

Lossy compression deals with the size of compressed data and does not care about its value, meaning that its value can be corrupted or inaccurate during compression. However, if we have a 32-bit floating-point number like 1.01010E-12, then the importance is not so important. Changing or deleting these values won’t make much difference in our calculations. Therefore, Tensorflow automatically converts a 32-bit floating point number to a 16-bit representation, ignoring all ignorable numbers, and reducing the number by nearly half if it is a 64-bit number to 16 bits, which is almost a 75% reduction. So the space taken up by tensors can be minimized.

Once the tensor reaches the node, the 16-bit representation can be returned to its original form by appending 0. Thus, a 32 – or 64-bit representation returns after reaching the node for processing.


This is the end of this section, I hope to help you.


This article is recommended by Beijing Post @ Love coco – Love life teacher, translated by Ali Yunqi Community organization.

Beginner Introduction to TensorFlow (Part-1) Narasimha Prasanna HN, Web and Android developer, Hobbyist programmer (for Python and JavaScript). Translator: Dong Zhaonan

This article is a brief translation. For more details, please refer to the original PDF