Planning to edit | Natalie
The author | Jacob Buckman
The translator | wang qiang, and ignorance
Edit | Vincent, Debra
AI Front Line introduction:“My name is Jacob and I’m a Google AI Residency project scholar. When I entered the project in the summer of 2017, I had a lot of programming experience and a deep understanding of machine learning, but I had never used Tensorflow before. I thought I could learn Tensorflow quickly with my ability, but I didn’t expect it to be so bumpy. Even a few months into the project, I was occasionally confused about how to implement my new ideas with Tensorflow code.





This blog post is like a message in a bottle to my past self: looking back, I wish I had had this introduction when I started studying. I also hope that this article will be helpful and helpful to my peers.” AI Frontier has translated the current Google brain engineer’s post on some of the challenges involved in learning Tensorflow.






Please pay attention to the wechat public account “AI Front”, (ID: AI-front)
What was missing from past tutorials?

Tensorflow has been around for three years and is now a cornerstone of the deep learning ecosystem. However, it is not easy to understand for beginners, especially when compared to run-as-defined neural network libraries like PyTorch or DyNet.

There are many introductory Tensorflow tutorials covering linear regression, MNIST classification, and even machine translation. These specific, practical guides can help people get Tensorflow projects up and running quickly and can serve as a starting point for similar projects. But getting started with Tensorflow can be confusing for developers who develop applications that don’t have good tutorial references, or projects that explore new routes (common in research).

I wrote this article to fill that gap. Instead of looking at a specific task, this article presents a more general approach and resolves the underlying abstractions of Tensorflow. With these concepts in hand, deep learning with Tensorflow becomes much more intuitive.

The target audience

This tutorial is intended for practitioners with some experience in programming and machine learning who want to get started with Tensorflow. They can be: CS students who want to use Tensorflow in the final project of their deep learning course; Software engineers who have just been transferred to projects involving deep learning; Or a confused Google AI newbie (give Jacob a big shout out). If you need a primer, see the following resources. With that in mind, let’s get started!

Understand Tensorflow

Tensorflow is not a normal Python library.

Most Python libraries are written as natural extensions of Python. When you import a library, you get a set of variables, functions, and classes that complement and extend your code toolbox. When you use these libraries, you know what the results are going to be. I think these perceptions should be discarded when talking about Tensorflow, because they are fundamentally inconsistent with Tensorflow and do not reflect the way TF interacts with other code.

The connection between Python and Tensorflow is analogous to the relationship between Javascript and HTML. Javascript is a full-featured programming language that can achieve all sorts of wonderful effects. HTML is a framework for representing some type of useful computational abstraction (in this case, something that can be rendered by a Web browser). The role of Javascript in an interactive web page is to assemble HTML objects that the browser sees and then interact with them by updating them to the new HTML as needed.

Similar to HTML, Tensorflow is a framework for representing a certain type of computational abstraction, called a “computational graph.” When we operate Tensorflow in Python, the first thing we do with Python code is assemble a computation diagram. Then our second task is to interact with it (using Tensorflow’s “session”). But it is important to remember that the computed graph is not inside a variable; it is inside the global namespace. Shakespeare once said, “All RAM is a stage, and all variables are but Pointers.”

The first key abstraction: computational diagrams

As we browse through the Tensorflow documentation, we sometimes find references to “graphs” and “nodes.” If you read and dig deep, you might even have found this page, which I’ll explain in detail in a more precise and technical style. This section will start at the top, grasping key intuitive concepts while skimping over some technical details.

So what is a computational graph? It is essentially a global data structure: a computational graph is a directed graph that captures instructions about a computational method.

Let’s look at how to build an example. In the figure below, the top half is the code we ran and its output, and the bottom half is the resulting calculation diagram.

Obviously, importing Tensorflow alone doesn’t give us an interesting graph, just a lonely, empty global variable. But what happens when we invoke a Tensorflow operation?

Look! We get a node that contains the constant: 2. I know you’re surprised, but what’s surprising is a function called tf.constant. When we print this variable, we see that it returns a tF.tensor object, which is a pointer to the node we just created. To emphasize the point, here’s another example:

Every time we call tf.constant, we create a new node in the diagram. Even if the node is functionally identical to an existing node, even if we reassign the node to the same variable, or even if we don’t assign it to a variable at all, the result is the same.

Conversely, if you create a new variable and set it equal to an existing node, you simply copy the pointer to the node and nothing is added to the graph:

Ok, let’s go one step further.

Now let’s see — that’s the real calculation chart we want! Note that the + operation is overloaded in Tensorflow, so adding both tensors at the same time adds a node to the graph, even though it doesn’t look like a Tensorflow operation.

Ok, so two_node points to a node with 2, three_node points to a node with 3, and sum_node points to a node with… Node of +? What’s going on? Shouldn’t it contain 5?

As it turns out, no. The calculation diagram contains only the steps, not the results. At least… Also have no!

The second key abstraction: sessions

If it is wrongly understood that TensorFlow abstracts also have a March Madness contest (the busy championship season in American college basketball), then “Conversation” will be the no. 1 seed every year. This awkward distinction is due to the fact that Session naming is counter-intuitive, but it is so widely used — almost every Tensorflow program calls tf.session () at least once.

The purpose of the session is to handle memory allocation and optimization, allowing us to actually perform the computations specified by the graph. Think of a compute diagram as a “template” for the calculation we want to perform: it lists all the steps. In order to use this diagram, we also need to initiate a session that enables us to actually complete the task. For example, traverse all nodes of a template to allocate a set of memory for storing computational output. In order to do various calculations with Tensorflow, we need both diagrams and sessions.

The session contains a pointer to the global graph that is constantly updated with Pointers to all nodes. This means that it does not matter whether the session is created before or after the node is created.

After the session object is created, sess.run(node) can be used to return the value of the node, and Tensorflow performs all the calculations needed to determine the value.

Wonderful! We can also pass a list, sess.run([node1, node2… ) and let it return multiple outputs:

In general, the sess.run() call tends to be one of the biggest TensorFlow bottlenecks, so the fewer times it is called, the better. Instead of making multiple calls, return multiple items in a single sess.run() call if possible.

Placeholders and feed_dict

The calculations we’ve done so far have been tedious: there’s no chance of getting input, so they always output the same thing. A practical application might involve building a computational graph that takes input, processes it in some (consistent) way, and returns an output.

The most straightforward way is to use placeholders. A placeholder is a node used to accept external input.

… This is a bad example because it throws an exception. The placeholder was expected to be assigned a value, but we didn’t provide it, so Tensorflow crashed.

To provide a value, we use the feed_dict attribute of sess.run().

Much better. Notice the numeric format passed to feed_dict. These keys should be variables corresponding to the placeholder points in the diagram (as mentioned earlier, it really means Pointers to the placeholder points in the diagram). The corresponding value is the data element to be assigned to each placeholder — usually a scalar or Numpy array. Here is another example of using placeholders:

Why does the second call to sess.run() fail? We’re not checking input_placeholder, so why do we get input_placeholder related errors? The answer lies in the final key Tensorflow abstraction: the computational path. Fortunately, this abstraction is pretty straightforward.

When we call sess.run() on nodes that depend on other nodes in the diagram, we need to calculate the values of those nodes as well. If these nodes have dependencies, then we need to calculate these values (and so on……) Until you reach the “top” of the graph, where all nodes have no front nodes.

Consider the sum_node calculation path:

All three nodes need to be evaluated to calculate the sum_node value. Best of all, it contains our unpopulated placeholders and explains the exceptions!

Instead, look at three_node’s calculation path:

According to the structure of the graph, we can evaluate the nodes we want without counting all the nodes! Running sess.run(three_node) doesn’t raise an exception because we don’t need to evaluate three_node while placeholder_node is being evaluated.

The fact that Tensorflow automatically computes routes only through the required nodes is a huge advantage. If the compute graph is very large and has many unnecessary nodes, it can save a lot of running time. It allows us to build large “multi-purpose” graphs that use a single shared set of core nodes to do different tasks depending on the computing path taken. For almost any application, it is important to consider how sess.run() is called based on the computational path taken.

Variables and side effects

So far, we’ve seen two types of “ancestorless” nodes: TF.constant (the same every run) and TF.placeholder (different every run). There is a third type of node: one that usually has the same value but can also be updated to a new value. That’s where variables come in.

Understanding variables is critical for deep learning with Tensorflow because the parameters of the model are variables. During training, you want to update the parameters at each step by gradient descent, but during computation, you want to leave the parameters unchanged and pass in a large number of different sets of test inputs to the model. All trainable parameters of the model are likely to be variables.

To create variables, use tf.get_variable(). The first two arguments to tf.get_variable() are required; the rest are optional. They are tF.get_variable (name,shape). Name is a string that uniquely identifies the variable object. It must be unique in the global diagram, so make sure you don’t have duplicate names. Shape is an array of integers that correspond to the shape of a tensor, and its syntax is straightforward — one integer for each dimension, arranged in order. For example, a 3×8 matrix may have shapes [3,8]. To create a scalar, use an empty list as a shape: [].

Another exception was found. When a variable node is first created, its value is essentially “null,” and any attempt to evaluate it will throw this exception. We can only assign a value to a variable before we can evaluate it. There are two main methods you can use to assign values to variables: initializers and tf.assign(). Let’s first look at tf.assign() :

Tf.assign (target,value) has some unique attributes compared to the nodes we’ve seen so far:

  • Identify the operation. Tf.assign (target,value) does no calculation, it is always equal to value.

  • Side effects. When evaluating “flows through” Assign_node, there are side effects on other nodes in the diagram. In this case, the side effect is to replace the value of count_variable with the value stored in zero_node.

  • Nondependent edges. Even though count_variable and assign_node are connected in the diagram, neither of them are dependent on other nodes. This means that when any node is evaluated, the computation does not flow back through that edge. Assign_node relies on Zero_node, however, and needs to know what to assign.

The “side effects” nodes are ubiquitous in most Tensorflow deep learning workflows, so make sure you understand them. When we call sess.run(assign_node), the calculation path goes through assign_node and zero_node.

When the computation flows through any node in the diagram, it also allows the side effects (shown in green) controlled by that node to take effect. Due to a special side effect of tf.assign, the memory associated with count_variable (previously “null”) is now permanently set to 0. This means that the next time we call sess.run(count_variable), no exceptions will be thrown. Instead, we’re going to get 0.

Next, let’s look at initializers:

What’s going on here? Why doesn’t the initializer work?

The problem is the separation between the session and the diagram. We’ve already pointed the initializer property of get_variable to const_init_node, but it just adds a new connection between the nodes in the diagram. We haven’t done anything to cause the exception: the memory associated with the variable node (which is held in the session, not in the graph) is still “null.” We need to update the const_init_node variable through the session.

To do this, we added another special node: init = tf.global_variables_initializer(). Similar to tf.assign(), this is a node with side effects. Unlike tf.assign(), we don’t actually need to specify its input! Tf.global_variables_initializer () will view the global diagram when it is created, automatically adding dependencies to each tF.Initializer in the diagram. When we call sess.run(init), it tells each initializer to do their job and initialize the variables so that sess.run(count_variable) is called without error.

Variable Shared

You might run into Tensorflow code with variable sharing, which has their scope and is set to “reuse=True”. I strongly recommend that you do not use variable sharing in your code. If you want to use a single variable in more than one place, just use a pointer to the node of the variable and use it as needed. In other words, tF.get_variable () should be called only once for each parameter you intend to keep in memory.

The optimizer

Finally: Do true deep learning! If you’re still in the state, the rest of the concepts should be pretty simple to you.

In deep learning, typical “inner loop” training is as follows:

  • Get input and true_Output

  • Calculate a “guess” based on the input and parameters

  • Calculate a “loss” based on the difference between guess and true_output

  • Update parameters according to the gradient of loss

Let’s put everything in a script and solve a simple linear regression problem:

As you can see, the losses are basically unchanged, and we have a good estimate of the true parameters. Only one or two lines of this code will be new to you:

Now that you have a good understanding of the basic concepts of Tensorflow, this code should be easy to interpret! The first line, the optimizer = tf. Train. GradientDescentOptimizer (1 e – 3) do not add nodes to the figure. It simply creates a Python object that contains some useful functions. The second line, train_op = optimizer.minimize(loss), adds a node to the graph and assigns a pointer to train_op. The train_op node has no output, but has a very complicated side effect:

Train_op retraces the computational path of its input to find the variable node. For each variable node found, it calculates the gradient of the variable associated with the loss. It then computes a new value for the variable: the current value minus the gradient times the learning rate. Finally, it performs an assignment operation to update the value of the variable.

Basically, when we call sess.run(train_op), it does a gradient descent for all of our variables. Of course, we also need to use feed_dict to populate the input and output placeholders, and we want to print out the losses because it’s easy to debug.

Use tF.print for debugging

As you start to do more complex things with Tensorflow, you’ll need to debug. In general, it is difficult to check what is happening in a computational graph. You can’t use regular Python print statements because you’ll never be able to access the values to print — they’re locked in the sess.run() call. For example, suppose you want to check the median value of a calculation that does not exist until sess.run() is called. However, when the sess.run() call returns, the median value is missing!

Let’s look at a simple example.

And we see that the result is 5. But what if we want to check the intermediate values two_node and three_node? One way to check for intermediate values is to add a return parameter to sess.run() that points to each intermediate node to be checked, and then print it when it returns.

This is usually fine, but it can be a little awkward as the code gets more complex. A more convenient approach is to use the tf.print statement. Confusingly, tf.print is actually a node of Tensorflow, with outputs and side effects! It takes two required parameters: a node to copy and a list of content to print. The node to copy can be any node in the diagram, and tf.print is the identification operation associated with the node to copy, that is, it will Print a copy of its input. However, it has the side effect of printing all the values in the print list.

An important but subtle point about tf.print is that printing is really just a side effect of it. As with all other side effects, printing occurs only if the computation flows through the tf.print node. If the tf.print node is not in the computed path, nothing is printed. Even if the original node that the tf.print node is copying is on the computed path, the tF.print node itself may not be. Pay attention to this problem! When this happens, it can be very frustrating and you have to wade through it to figure out what the problem is. In general, it is best to create the tF.print node immediately after the node to copy is created.

Here (https://wookayin.github.io/tensorflow-talk-debugging/#1) have a very good resources, provides a more practical debugging Suggestions.

conclusion

Hopefully this article has helped you better understand Tensorflow, how it works and how to use it. After all, the concepts presented here are important to all Tensorflow programs, but they are only superficial. During your Tensorflow adventure, you’ll likely come across all sorts of other interesting things you might want to use: conditional, iterative, distributed Tensorflow, variable scope, save and load models, multi-graph, multi-session, multi-core data loader queues, and more.

Original link:

https://jacobbuckman.com/post/tensorflow-the-confusing-parts-1/#understanding-tensorflow