• Numpy on GPU/TPU
  • Sambasivarao. K
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: samyu2000
  • Proofreader: PingHGao, Kimhooo

Numpy performance on GPU/TPU

Numpy is by far the most widely used array operator library. It is the foundation of many machine learning and data science libraries. It contains a large number of higher-order array operations. Numpy is well known for its speed. Numpy handles array objects 50 times faster than Python’s own List library. Numpy also supports vectorization, which replaces the loop control structure in Python.

Can Numpy run faster? The answer is yes.

Tensorflow introduced a number of features of the Numpy API in the form of tF.experiment.numpy and was released in version 2.4. This makes Numpy-related code run faster and better if running on a GPU/TPU.

The benchmark

Before delving further, let’s compare the performance of Numpy with tensorflow-Numpy. For tasks consisting of small operations (less than 10 microseconds), Where TensorFlow’s task scheduling system spends most of its time at runtime, NumPy performs better. In other respects, TensorFlow generally performs better.

Tensorflow creates a test program for performance comparisons. They perform sigmoid functions using numpy and Tensorlfow-Numpy and run them multiple times on the CPU and GPU, respectively. The results of the test are as follows:

As you can see, numpy performs better for small tasks, and TF-Numpy performs better for larger tasks. Performance is better when running on a GPU than when running on a CPU.

TensorFlow NumPy ND array

Now that we are convinced that Tensorflow-numpy performs better than NUMpy, let’s dig into the API:

NDArray is tf. The experimental. Numpy. NDArray one instance, on behalf of a multi-dimensional array. It also contains a constant TF. Tensor object, so you can interoperate with tF. Tensor. It also implements the Array interface to pass these objects to environments that require NumPy or array-like objects (such as Matplotlib). Interoperability does not replicate data, even data located on accelerators or remote devices.

Tf. Tensor objects don’t need to copy data, they can be passed to tF.experimental.numpy apis.

Operator priority: TensorFlow NumPy defines an array_priority higher than NumPy. This means that operations involving ND Array and NP.ndarray take precedence. For example, an input NP.ndarray object will be converted to ND Array, and TensorFlow NumPy is valid for implementing the operator and calling the code when the program is run.

Type: ND Array supports a range of NUMPY data types. Type promotion operations follow NUMPY semantics. Also, the use of broadcasts and indexes is similar to arrays in Numpy.

Support for device selection: Because ND Array is included with TS. Tensor, both GPU and TPU support ND Array. As shown in the figure below, we can select the device on which to run the code by setting tf.device.

Graph and Eager modes: Eager mode execution is similar to Python code execution, so it supports NDArray, just like Numpy, step by step execution. However, the same code can be executed in graphical mode by putting it into tf.function. Here is an example of code to do this.

Use restrictions

  • Some dTypes are not supported.
  • The Mutation type is not supported. ND Array contains the immutable TF.tensor.
  • Fortran sorting, expansion, and partitioning are not supported.
  • NumPy C API, NumPy’s Cython, Swig Integration are not supported.
  • Only some functions and modules are supported.

That’s all we need to talk about. We looked at Tensorflow-Numpy and some of its features. Tf-numpy is interoperable and therefore can be used when writing tensorFlow as well as plain NUMpy program code. You can also use this library to run complex Numpy programs on a GPU.

In the next article, we will develop a neural network from scratch using TensorFlow-Numpy and train it on a GPU using the TensorFlow automatic differentiation mechanism. We will also look at Tensorflow-related acceleration techniques.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.