TensorFlow Serving allows you to quickly deploy the TensorFlow model, live with GRPC or REST APIs.

The official Docker deployment recommendation also gives a full tutorial on training to deployment: Servers: TFX for TensorFlow Serving. This article is just an exercise that follows the tutorial to help you understand TensorFlow from training to deployment.

Prepare the environment

To prepare the TensorFlow environment, import the dependencies:

import sys

# Confirm that we're using Python 3
assert sys.version_info.major == 3, 'Oops, not running Python 3. Use Runtime > Change runtime type'
import tensorflow as tf
from tensorflow import keras

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt
import os
import subprocess

print(f'TensorFlow version: {tf.__version__}')
print(f'TensorFlow GPU support: {tf.test.is_built_with_gpu_support()}')

physical_gpus = tf.config.list_physical_devices('GPU')
print(physical_gpus)
for gpu in physical_gpus:
  # memory growth must be set before GPUs have been initialized
  tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(physical_gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
2.4.1 TensorFlow GPU Support True [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] 1 Physical GPUs, 1 Logical GPUs

Create the model

Load the Fashion MNist dataset:

fashion_mnist = keras.datasets.fashion_mnist (train_images, train_labels), (test_images, Test_labels) = fashion_mnist.load_data() # scale the values to 0.0 to 1.0 train_images = train_images / 255.0 # reshape for feeding into the model train_images = train_images.reshape(train_images.shape[0], 28, 28, 1) test_images = test_images.reshape(test_images.shape[0], 28, 28, 1) class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'] print('\ntrain_images.shape: {}, of {}'.format(train_images.shape, train_images.dtype)) print('test_images.shape: {}, of {}'.format(test_images.shape, test_images.dtype))
train_images.shape: (60000, 28, 28, 1), of float64
test_images.shape: (10000, 28, 28, 1), of float64

Using the simplest CNN training model,

Model = keras.sequential ([keras.layers.Conv2D(input_shape=(28,28,1), filters=8, kernel_size=3, Strides =2, activation='relu', name='Conv1'), keras.layers.Flatten(), keras.layers.Dense(10, name='Dense') ]) model.summary() testing = False epochs = 5 model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=[keras.metrics.SparseCategoricalAccuracy()]) model.fit(train_images, train_labels, epochs=epochs) test_loss, test_acc = model.evaluate(test_images, test_labels) print('\nTest accuracy: {}'.format(test_acc))
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= Conv1 (Conv2D) (None, 13, 13, 8) 80 _________________________________________________________________ flatten (Flatten) (None, 1352) 0 _________________________________________________________________ Dense (Dense) (None, 10) 13530 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Total params: 13610 Trainable params: 13610 Non - trainable params: 0 _________________________________________________________________ Epoch 1/5 1875/1875 [==============================] -3S 722US/STEP-LOSS: 0.7387 - sparse_categorical_accuracy: 0.7449 Epoch 2/5 1875/1875 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 1 s 793 us/step - loss: 0.4561 - sparse_categorical_accuracy: 0.8408 Epoch 3/5 1875/1875 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 1 s 720 us/step - loss: 0.4097 - sparse_categorical_accuracy: 0.8566 Epoch 4/5 1875/1875 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 1 s 718 us/step - loss: 0.3899 - sparse_categorical_accuracy: 0.8636 Epoch 5/5 1875/1875 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 1 s 719 us/step - loss: 0.3673 - sparse_categorical_accuracy: 0.8701 313/313 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 s 782 us/step - loss: 0.3937-SPARSE_CATEGORICAL_ACCURACY: 0.8630 Test accuracy: 0.8629999756813049

Save the model

Save the model in the savedModel format with the version number in the path so that you can select the model version when TensorFlow ad-serving.

# Fetch the Keras session and save the model # The signature definition is defined by the input and output tensors, # and stored with the default serving key import tempfile MODEL_DIR = os.path.join(tempfile.gettempdir(), 'tfx') version = 1 export_path = os.path.join(MODEL_DIR, str(version)) print('export_path = {}\n'.format(export_path)) tf.keras.models.save_model( model, export_path, overwrite=True, include_optimizer=True, save_format=None, signatures=None, options=None ) print('\nSaved model:') ! ls -l {export_path}
export_path = /tmp/tfx/1

INFO:tensorflow:Assets written to: /tmp/tfx/1/assets

Saved model:
total 88
drwxr-xr-x 2 john john  4096 Apr 13 15:10 assets
-rw-rw-r-- 1 john john 78169 Apr 13 15:12 saved_model.pb
drwxr-xr-x 2 john john  4096 Apr 13 15:12 variables

See the model

Use the saved_model_cli tool to look at the MetaGraphDefs (the models) and SignatureDefs (the methods you can call) of the model for information.

! saved_model_cli show --dir '/tmp/tfx/1' --all
The 2021-04-13 15:12:29. 433576: I tensorflow stream_executor/platform/default/dso_loader. Cc: 49] Successfully the opened the dynamic library libcudart. So. 11.0 MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs: signature_def['__saved_model_init_op']: The given SavedModel SignatureDef contains the following input(s): The given SavedModel SignatureDef contains the following output(s): outputs['__saved_model_init_op'] tensor_info: dtype: DT_INVALID shape: unknown_rank name: NoOp Method name is: signature_def['serving_default']: The given SavedModel SignatureDef contains the following input(s): inputs['Conv1_input'] tensor_info: dtype: DT_FLOAT shape: (-1, 28, 28, 1) name: serving_default_Conv1_input:0 The given SavedModel SignatureDef contains the following output(s): outputs['Dense'] tensor_info: dtype: DT_FLOAT shape: (-1, 10) name: StatefulPartitionedCall:0 Method name is: tensorflow/serving/predict Defined Functions: Function Name: '__call__' Option #1 Callable with: Argument #1 Conv1_input: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='Conv1_input') Argument #2 DType: bool Value: False Argument #3 DType: NoneType Value: None Option #2 Callable with: Argument #1 inputs: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='inputs') Argument #2 DType: bool Value: False Argument #3 DType: NoneType Value: None Option #3 Callable with: Argument #1 inputs: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='inputs') Argument #2 DType: bool Value: True Argument #3 DType: NoneType Value: None Option #4 Callable with: Argument #1 Conv1_input: TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='Conv1_input') Argument #2 DType: bool Value: True Argument #3 DType: NoneType Value: None ...

Deployment model

Installation of Serving

echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -

sudo apt update
sudo apt install tensorflow-model-server

Open the Serving

Start TensorFlow ad-serving, which provides the REST API:

  • REST_API_PORT: REST request port.
  • Model_name: REST request URL, custom name.
  • Model_base_path: The directory where the model resides.
nohup tensorflow_model_server \
  --rest_api_port=8501 \
  --model_name=fashion_model \
  --model_base_path="/tmp/tfx" >server.log 2>&1 &
$ tail server.log To enable them in other operations, Rebuild TensorFlow with the appropriate compiler flags. 2021-04-13 15:12:10.706648: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle. 2021-04-13 15:12:10. 726722: external I/org_tensorflow/tensorflow/core/platform/profile_utils/cpu_utils. Cc: 112] the CPU Frequency: 2599990000 Hz 2021-04-13 15:12:10. 756506: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at Path: / TMP/TFX / 1 2021-04-13 15:12:10. 759935: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: Success: OK. Took 110653 microseconds. 2021-04-13 15:12:10.760277: Took 110653 microseconds. I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:59] No warmup data file found at / TMP/TFX / 1 / assets. Extra/tf_serving_warmup_requests 15:12:10 2021-04-13. 760486: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: fashion_model version: 1} 2021-04-13 15:12:10. 763938: I tensorflow_serving/ Model_Servers/Server. cc:371] Running GRPC ModelServer at 0.0.0.0:8500... [evhttp_server.cc : 238] NET_LOG: Entering the event loop ... The 2021-04-13 15:12:10. 765308: I tensorflow_serving/model_servers/server.cc:391] Exporting HTTP/REST API at:localhost:8501 ...

Access the service

Randomly display a test graph:

def show(idx, title): PLT. Figure (PLT), imshow (test_images [r]. Independence idx reshape (28), 28) PLT, axis (' off ') PLT. Title (' \ n \ n {} '. The format (the title), fontdict={'size': 16}) import random rando = random.randint(0,len(test_images)-1) show(rando, 'An Example Image: {}'.format(class_names[test_labels[rando]]))

Create a JSON object and give three graphs to predict:

import json
data = json.dumps({"signature_name": "serving_default", "instances": test_images[0:3].tolist()})
print('Data: {} ... {}'.format(data[:50], data[len(data)-52:]))
Data: {"signature_name": "serving_default", "instances": ...  [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]]]]}

REST requests

The latest version of the model predicts:

! pip install -q requests import requests headers = {"content-type": "application/json"} json_response = requests.post('http://localhost:8501/v1/models/fashion_model:predict', data=data, headers=headers) predictions = json.loads(json_response.text)['predictions'] show(0, 'The model thought this was a {} (class {}), and it was actually a {} (class {})'.format( class_names[np.argmax(predictions[0])], np.argmax(predictions[0]), class_names[test_labels[0]], test_labels[0]))

Specify model version for prediction:

headers = {"content-type": "application/json"} json_response = requests.post('http://localhost:8501/v1/models/fashion_model/versions/1:predict', Data =data, headers=headers) Predictions = JSON.loads (json_respons.text)[' Predictions '] for I in range(0,3): show(i, 'The model thought this was a {} (class {}), and it was actually a {} (class {})'.format( class_names[np.argmax(predictions[i])], np.argmax(predictions[i]), class_names[test_labels[i]], test_labels[i]))

Gocoding personal practice experience sharing, can pay attention to the public number!