Tensor2Tensor: Organizing models and data in the world. Our speaker is Laurence Moroney.

Review Tensorflow

First, a review of Tensorflow:

Tensorflow can run anywhere. Tf.data can help you build an efficient data input pipeline, tf.Layers and tf.keras.Model can help you quickly build a neural network, and tD.Estimator and DistributionStrategy can help you quickly build distributed training.

Tensor2Tensor

However, for frontier AI, this is not enough. For example, in areas such as image recognition, text translation, and text analysis, many people do not have the knowledge or experience to master these best practices, making it difficult for them to enjoy the latest AI research results. Tensor2Tensor was created to give the community a good sharing platform.

Tensor2Tensor publishes with some data sets and models and their hyperparameters:

Through various investigations and studies, we found that the Settings of these hyperparameters have the best performance for the corresponding models and data sets. Without Tensor2Tensor, you can only adjust the parameters by yourself and experiment constantly, which is very inefficient. That’s what Tensor2Tensor was designed for.

To make things better out of the box, Tensor2Tensor comes with a set of tools, such as hyperset Settings, distributed training on gpus or Tpus, which are available in Tensor2Tensor.

Tensor2Tensor open source

Tensor2Tensor is fully open source on GitHub:

Tensor2Tensor keeps up with the academic cutting edge

Tensor2Tensor keeps up with the academic cutting edge.

Here’s an interesting example of a person who tweeted:

AMSGrad algorithm is the latest SGD optimization algorithm.

Then, another user replied:

This is no longer the latest SGD optimization, the latest is AdaFactor which was implemented in Tensor2Tensor three weeks ago.

The person was quickly hired by Google. Smile: -d

Of course, Laurence also has a screenshot of the pseudo-code AdaFactor algorithm for those interested in taking a closer look:

In addition, Tensor2Tensor also implemented the Transformer model:

The Transformer model is a brand new model proposed by Google in its 2017 paper Attention Is All You Need. The Transformer model is at the top of its day by using only the Attention mechanism instead of the traditional CNN and RNN. This model is widely used in NLP fields such as machine translation, question answering systems, text summarization and speech recognition.

At the moment, we have a lot of people involved in our Tensor2Tensor project:

We strongly encourage researchers to use Tensor2Tensor to help with their studies.

Meet t2t-trainer

Let’s take a look at the T2T-trainer, a tool from Tensor2Tensor that allows people who don’t understand code to do things with machine learning.

With Tensor2Tensor, you just need to define a few parameters and you’ll be done.

pip install tensor2tensor & t2t-trainer \ --problem=$PROBLEM \ --model=$MODEL \ --hparams_set=$HPARAMS \ --generate_data  \ --data_dir=$DATA_DIR \ --output_dir=$TRAIN_DIR \ --train_steps=$TRAIN_STEPS \ --eval_steps=$EVAL_STEPSCopy the code

There are three main parameters:

  • –problem: problem or task
  • –model: The selected model
  • –hparams_set: indicates the super parameter set

The hyperparameter set is easy to explain. For the hyperparameters in the model, we can change some parameters and construct a new set of hyperparameters.

Here are a few common examples.

Text in this paper,

The text summary task is to extract key information from a long text.

Here’s what you can do:

pip install tensor2tensor & t2t-trainer \
  --problem=summarize_cnn_dailymail32k \
  --model=transformer \
  --hparams_set=transformer_big \
  --generate_data \
  --data_dir=$DATA_DIR \
  --output_dir=$TRAIN_DIR \
  --train_steps=$TRAIN_STEPS \
  --eval_steps=$EVAL_STEPS
Copy the code

With just a few lines like this, you’ll have a pretty good text summary model at the end of your training!

Image classification

All you need to do is run a command like this:

pip install tensor2tensor & t2t-trainer \
  --problem=image_cifar10 \
  --model=shake_shake \
  --hparams_set=shake_shake_big \
  --generate_data \
  --data_dir=$DATA_DIR \
  --output_dir=$TRAIN_DIR \
  --train_steps=$TRAIN_STEPS \
  --eval_steps=$EVAL_STEPS
Copy the code

The model selected here and the set of parameters trained by the model was the best model a year ago!

translation

To implement an EN-DE (English-German) translation model, all you need is:

pip install tensor2tensor & t2t-trainer \
  --problem=translate_ende_wmt32k \
  --model=transformer \
  --hparams_set=transformer_big \
  --generate_data \
  --data_dir=$DATA_DIR \
  --output_dir=$TRAIN_DIR \
  --train_steps=$TRAIN_STEPS \
  --eval_steps=$EVAL_STEPS
Copy the code

Achieved results:

  • >29 BLEU, current best results!

Speech recognition

If you want to implement a speech recognition model, all you need is a few lines of command:

pip install tensor2tensor & t2t-trainer \
  --problem=librispeech \
  --model=tranformer \
  --hparams_set=transformer_librispeech \
  --generate_data \
  --data_dir=$DATA_DIR \
  --output_dir=$TRAIN_DIR \
  --train_steps=$TRAIN_STEPS \
  --eval_steps=$EVAL_STEPS
Copy the code

Achieved results:

  • < 7.5 WERThis is close to the best result!

Image generation

pip install tensor2tensor & t2t-trainer \
  --problem=librispeech \
  --model=tranformer \
  --hparams_set=transformer_librispeech \
  --generate_data \
  --data_dir=$DATA_DIR \
  --output_dir=$TRAIN_DIR \
  --train_steps=$TRAIN_STEPS \
  --eval_steps=$EVAL_STEPS
Copy the code

Achieved results:

  • ~ 2.92 bits/dim, current best

scale

For a lot of data, training on a regular laptop is impractical. We need training at scale. Such as clustering machines using gpus or even the cloud. Tensor2Tensor supports this kind of scale-up training very well.

In a multi-GPU environment, all you need is:

t2t-trainer \
  --worker_gpu=8 \
  --problem=translate_ende_wmt32k \
  --model=transformer \
  --hparams_set=transformer_big \
  --generate_data \
  --data_dir=$DATA_DIR \
  --output_dir=$TRAIN_DIR \
  --train_steps=$TRAIN_STEPS \
  --eval_steps=$EVAL_STEPS
Copy the code

By adding just one line –worker_gpu=8 — your model can be trained in parallel on 8 Gpus!

In the Cloud TPU environment, all you need is:

t2t-trainer \
  --use_tpu --cloud_tpu_name=$TPU_NAME \
  --problem=translate_ende_wmt32k \
  --model=transformer \
  --hparams_set=transformer_big \
  --generate_data \
  --data_dir=$DATA_DIR \
  --output_dir=$TRAIN_DIR \
  --train_steps=$TRAIN_STEPS \
  --eval_steps=$EVAL_STEPS
Copy the code

In the Cloud ML engine with hyperparametric tuning, all you need is:

t2t-trainer \
  --cloud_mlengine --worker_gpu=8 \
  --autotune --autotune_maximize \
  --autotune_objective='metrics/neg_log_perplexity' \
  --autotune_max_trails=100 \
  --autotune_parallel_trials=20 \
  --hparams_range=transformer_base_range \
  --problem=translate_ende_wmt32k \
  --model=transformer \
  --hparams_set=transformer_big \
  --generate_data \
  --data_dir=$DATA_DIR \
  --output_dir=$TRAIN_DIR \
  --train_steps=$TRAIN_STEPS \
  --eval_steps=$EVAL_STEPS
Copy the code

Want more control?

Tensor2Tensor has a lot of very handy tools, but what if I want more elaborate control?

Datasets data set

The first thing a lot of people want to control is the data set. For example, a lot of people don’t want to use the data set in Tensor2Tensor, but just some of them, so what should I do?

First, we create the corresponding problem, specify a data directory data_dir, and generate the data.

Now, we have this data, so you can do whatever you want, so you have more fine-grained control over the data set.

Implement the model with Keras

Others want to implement the model with Keras Layers.

Tensor2Tensor already implements a lot of models and if someone wants to build a better model on top of that, they need to do this (for example) :

# Select the hyperparameter
hparams = registry.hparams('bytenet_base')
# Instantiate the model
model = tensor2tensor.models.byte_net.ByteNet(hparams,mode='train')

# Call model
features = {'inputs':embedded_inputs,'targets':embedded_targets}
outputs,_ = model(feature)

Copy the code

You get the hyperparameters, you build the model, and then you get the output by calling it.

What the speaker said and the title seem a little far-fetched, there is no direct relationship with Keras.

Implement your own data sets and models

To implement your own data set and model, you can do this:

  • inheritanceProblemOr its subclasses to create custom datasets
  • inheritanceT2TModelTo implement their own model

conclusion

For now, our Tensor2Tensor contains the following:

  • Datasets data set
  • Models model
  • Scripts script

Next, we will improve our work mainly from the following aspects:

Thank you! Read more tech tips from Google Developer Conference 2018