This is the sixth day of my November challenge.

Today we demonstrate how to use Keras, Redis, Flask and Apache for deep learning in a production environment

The project structure

Keras - complete - the rest - API ├ ─ ─ helpers. Py ├ ─ ─ jemma. PNG ├ ─ ─ keras_rest_api_app. Wsgi ├ ─ ─ run_model_server. Py ├ ─ ─ Run_web_server.py ├── settings.py ├── simple_request.py stress_test.pyCopy the code

Document Explanation:

  • Run_web_server.py contains all of our Flask Web server code — Apache will load it when we start our deep learning Web application.
  • Run_model_server. Py will:
    • Load our Keras model from disk
    • Constantly poll Redis for new images to classify
    • Classify images (batch processing for efficiency)
    • Write the reasoning results back to Redis so that they can be returned to the client via Flask.
  • Settings.py contains all the Python-based Settings for our deep learning production service, such as Redis host/port information, image classification Settings, image queue names, etc.
  • Helpers. py contains utility functions (that is, base64 encoding) that both run_web_server.py and run_model_server.py will use.
  • Keras_rest_api_app.wsgi contains our WSGI Settings, so we can serve the Flask application from our Apache server.
  • Simple_request.py can be used programmatically to use the results of our deep learning API services.
  • Jema.png is a picture of my beagle. We’ll use her as a sample image when calling the REST API to verify that it actually works.
  • Finally, we will use stress_test.py to stress our server and measure image classification throughout the process.

We have an endpoint/PREDICT on the Flask server. This method, located in run_web_server.py, computes the classification of the input image as needed. Image preprocessing is also handled in run_web_server.py.

To get our servers ready for production, I pulled the classification process function from last week’s single script and placed it in run_model_server.py. This script is important because it will load our Keras model and grab images from the image queue in Redis for classification. The results are written back to Redis (the/PREDICT endpoint and corresponding functions in run_web_server.py monitor Redis to send the results back to the client).

But unless we know the capabilities and limitations of a deep learning REST API server, what good is it?

In stress_test.py, we test our server. We will do this by starting 500 concurrent threads that send our images to the server for parallel classification. I recommend running it on the server localhost to start, and then running it from a remote client.

Build our deep learning network application

Figure 1: Data flow diagram of a deep learning REST API server built using Python, Keras, Redis, and Flask.

Almost every line of code used in this project comes from our previous article on building extensible deep learning REST apis — the only change is that we moved some of the code into separate files to facilitate extensibility in production environments.

Setup and Configuration

# initialize Redis connection settings
REDIS_HOST = "localhost"
REDIS_PORT = 6379
REDIS_DB = 0
# initialize constants used to control image spatial dimensions and
# data type
IMAGE_WIDTH = 224
IMAGE_HEIGHT = 224
IMAGE_CHANS = 3
IMAGE_DTYPE = "float32"
# initialize constants used for server queuing
IMAGE_QUEUE = "image_queue"
BATCH_SIZE = 32
SERVER_SLEEP = 0.25
CLIENT_SLEEP = 0.25
Copy the code

In settings.py, you will be able to change server connection, image size + data type, and server queue parameters.

# import the necessary packages
import numpy as np
import base64
import sys
def base64_encode_image(a) :
	# base64 encode the input NumPy array
	return base64.b64encode(a).decode("utf-8")
def base64_decode_image(a, dtype, shape) :
	# if this is Python 3, we need the extra step of encoding the
	# serialized NumPy string as a byte object
	if sys.version_info.major == 3:
		a = bytes(a, encoding="utf-8")
	# convert the string to a NumPy array using the supplied data
	# type and target shape
	a = np.frombuffer(base64.decodestring(a), dtype=dtype)
	a = a.reshape(shape)
	# return the decoded image
	return a
Copy the code

Helpers. py contains two functions — one for base64 encoding and one for decoding.

Encoding is necessary so that we can serialize + store our images in Redis. Again, decoding is necessary so that we can deserialize the image into NumPy array format prior to preprocessing.

Deep learning web server

# import the necessary packages
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.applications.resnet50 import preprocess_input
from PIL import Image
import numpy as np
import settings
import helpers
import flask
import redis
import uuid
import time
import json
import io
# initialize our Flask application and Redis server
app = flask.Flask(__name__)
db = redis.StrictRedis(host=settings.REDIS_HOST,
	port=settings.REDIS_PORT, db=settings.REDIS_DB)
def prepare_image(image, target) :
	# if the image mode is not RGB, convert it
	ifimage.mode ! ="RGB":
		image = image.convert("RGB")
	# resize the input image and preprocess it
	image = image.resize(target)
	image = img_to_array(image)
	image = np.expand_dims(image, axis=0)
	image = preprocess_input(image)
	# return the processed image
	return image
@app.route("/")
def homepage() :
	return "Welcome to the PyImageSearch Keras REST API!"
@app.route("/predict", methods=["POST"])
def predict() :
	# initialize the data dictionary that will be returned from the
	# view
	data = {"success": False}
	# ensure an image was properly uploaded to our endpoint
	if flask.request.method == "POST":
		if flask.request.files.get("image") :# read the image in PIL format and prepare it for
			# classification
			image = flask.request.files["image"].read()
			image = Image.open(io.BytesIO(image))
			image = prepare_image(image,
				(settings.IMAGE_WIDTH, settings.IMAGE_HEIGHT))
			# ensure our NumPy array is C-contiguous as well,
			# otherwise we won't be able to serialize it
			image = image.copy(order="C")
			# generate an ID for the classification then add the
			# classification ID + image to the queue
			k = str(uuid.uuid4())
			image = helpers.base64_encode_image(image)
			d = {"id": k, "image": image}
			db.rpush(settings.IMAGE_QUEUE, json.dumps(d))
			# keep looping until our model server returns the output
			# predictions
			while True:
				# attempt to grab the output predictions
				output = db.get(k)
				# check to see if our model has classified the input
				# image
				if output is not None:
					# add the output predictions to our data
					# dictionary so we can return it to the client
					output = output.decode("utf-8")
					data["predictions"] = json.loads(output)
					# delete the result from the database and break
					# from the polling loop
					db.delete(k)
					break
				# sleep for a small amount to give the model a chance
				# to classify the input image
				time.sleep(settings.CLIENT_SLEEP)
			# indicate that the request was a success
			data["success"] = True
	# return the data dictionary as a JSON response
	return flask.jsonify(data)
# for debugging purposes, it's helpful to start the Flask testing
# server (don't use this for production
if __name__ == "__main__":
	print("* Starting web service...")
	app.run()
Copy the code

In run_web_server.py, you’ll see predict, which is associated with our REST API/PREDICT endpoint.

The Predict function pushes the encoded image to the Redis queue, where it loops/polls until it retrieves the prediction data from the model server. We then JSON encode the data and instruct Flask to send it back to the client.

Deep learning model server

# import the necessary packages
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import decode_predictions
import numpy as np
import settings
import helpers
import redis
import time
import json
# connect to Redis server
db = redis.StrictRedis(host=settings.REDIS_HOST,
	port=settings.REDIS_PORT, db=settings.REDIS_DB)
def classify_process() :
	# load the pre-trained Keras model (here we are using a model
	# pre-trained on ImageNet and provided by Keras, but you can
	# substitute in your own networks just as easily)
	print("* Loading model...")
	model = ResNet50(weights="imagenet")
	print("* Model loaded")
	# continually pool for new images to classify
	while True:
		# attempt to grab a batch of images from the database, then
		# initialize the image IDs and batch of images themselves
		queue = db.lrange(settings.IMAGE_QUEUE, 0,
			settings.BATCH_SIZE - 1)
		imageIDs = []
		batch = None
		# loop over the queue
		for q in queue:
			# deserialize the object and obtain the input image
			q = json.loads(q.decode("utf-8"))
			image = helpers.base64_decode_image(q["image"],
				settings.IMAGE_DTYPE,
				(1, settings.IMAGE_HEIGHT, settings.IMAGE_WIDTH,
					settings.IMAGE_CHANS))
			# check to see if the batch list is None
			if batch is None:
				batch = image
			# otherwise, stack the data
			else:
				batch = np.vstack([batch, image])
			# update the list of image IDs
			imageIDs.append(q["id"])
		# check to see if we need to process the batch
		if len(imageIDs) > 0:
			# classify the batch
			print("* Batch size: {}".format(batch.shape))
			preds = model.predict(batch)
			results = decode_predictions(preds)
			# loop over the image IDs and their corresponding set of
			# results from our model
			for (imageID, resultSet) in zip(imageIDs, results):
				# initialize the list of output predictions
				output = []
				# loop over the results and add them to the list of
				# output predictions
				for (imagenetID, label, prob) in resultSet:
					r = {"label": label, "probability": float(prob)}
					output.append(r)
				# store the output predictions in the database, using
				# the image ID as the key so we can fetch the results
				db.set(imageID, json.dumps(output))
			# remove the set of images from our queue
			db.ltrim(settings.IMAGE_QUEUE, len(imageIDs), -1)
		# sleep for a small amount
		time.sleep(settings.SERVER_SLEEP)
# if this is the main thread of execution start the model server
# process
if __name__ == "__main__":
	classify_process()
Copy the code

The run_model_server.py file contains our Classify_process function. This function loads our model and then runs the prediction on a batch of images. This process is best done on a GPU, but you can also use a CPU.

For simplicity in this example, we will use ResNet50, which is pre-trained on the ImageNet dataset. You can modify classify_Process to take advantage of your own deep learning model.

WSGI configuration

# add our app to the system path
import sys
sys.path.insert(0."/var/www/html/keras-complete-rest-api")
# import the application and away we go...
from run_web_server import app as application
Copy the code

The file keras_rest_API_app.wsgi is a new component of our deep learning REST API. This WSGI configuration file adds our server directory to the system path and imports the Web application to start all operations. We point to this file in the Apache server setup file /etc/apache2/sites-available-/000-default.conf, which will be covered later in this blog post.

Pressure test

# import the necessary packages
from threading import Thread
import requests
import time
# initialize the Keras REST API endpoint URL along with the input
# image path
KERAS_REST_API_URL = "http://localhost/predict"
IMAGE_PATH = "jemma.png"
# initialize the number of requests for the stress test along with
# the sleep amount between requests
NUM_REQUESTS = 500
SLEEP_COUNT = 0.05
def call_predict_endpoint(n) :
	# load the input image and construct the payload for the request
	image = open(IMAGE_PATH, "rb").read()
	payload = {"image": image}
	# submit the request
	r = requests.post(KERAS_REST_API_URL, files=payload).json()
	# ensure the request was sucessful
	if r["success"] :print("[INFO] thread {} OK".format(n))
	# otherwise, the request failed
	else:
		print("[INFO] thread {} FAILED".format(n))
# loop over the number of threads
for i in range(0, NUM_REQUESTS):
	# start a new thread to call the API
	t = Thread(target=call_predict_endpoint, args=(i,))
	t.daemon = True
	t.start()
	time.sleep(SLEEP_COUNT)
# insert a long sleep so we can wait until the server is finished
# processing the images
time.sleep(300)
Copy the code

Our Stress_test.py script will help us test the server and determine its limits. I always recommend stress-testing your deep learning REST API server so you know if (and more importantly, when) you need to add additional Gpus, CPUS, or RAM. This script starts the NUM_REQUESTS thread and POST to the /predict endpoint. This depends on our Flask network application.

Compile and install Redis

Redis is an efficient in-memory database that will act as our queue/message broker. Getting and installing Redis is very simple:

$ wget http://download.redis.io/redis-stable.tar.gz
$ tar xvzf redis-stable.tar.gz
$ cd redis-stable
$ make
$ sudo make install
Copy the code

Create your Deep learning Python virtual environment

Install additional packs:

$ workon dl4cv
$ pip install flask
$ pip install gevent
$ pip install requests
$ pip install redis
Copy the code

Install the Apache Web server

Other Web servers, such as Nginx, could be used, but since I have more experience with Apache (and therefore am generally more familiar with Apache), I will use Apache in this example. Apache can be installed in the following ways:

$ sudo apt-get install apache2
Copy the code

If you created a virtual environment using Python 3, you will need to install the Python 3 WSGI + Apache module:

$ sudo apt-get install libapache2-mod-wsgi-py3
$ sudo a2enmod wsgi
Copy the code

To verify that Apache is installed, open a browser and enter the IP address of the Web server. If you do not see the server splash screen, be sure to open ports 80 and 5000. In my case, my server’s IP address is 54.187.46.215 (yours will be different). Type this into a browser and I see:

Sym-link Links your Flask + deep learning application

By default, Apache provides content from /var/www/html. I recommend creating a symbolic link from /var/www/html to the Flask Web application. I have uploaded my deep learning + Flask application to the home directory in a directory named Keras-complete-Rest-API:

$ ls ~
keras-complete-rest-api
Copy the code

I can symlink it to /var/www/html in the following way:

$ cd /var/www/html/
$ sudo ln -s ~/keras-complete-rest-api keras-complete-rest-api
Copy the code

Update your Apache configuration to point to the Flask application

To configure Apache to point to our Flask application, we need to edit /etc/apache2/sites-available/000-default.conf. Open it in your favorite text editor (I’ll use vi here) :

$ sudo vi /etc/apache2/sites-available/000-default.conf
Copy the code

Provide your WSGIPythonHome (path to the Python bin directory) and WSGIPythonPath (path to the Python site package directory) configurations at the top of the file:

WSGIPythonHome /home/ubuntu/.virtualenvs/keras_flask/bin WSGIPythonPath / home/ubuntu /. Virtualenvs keras_flask/lib/python3.5 / site - packages < VirtualHost * : 80 >... </VirtualHost>Copy the code

On Ubuntu 18.04, you may need to change the first line to:

WSGIPythonHome /home/ubuntu/.virtualenvs/keras_flask

Since we are using the Python virtual environment in this example (I named my virtual environment keras_flask), we provide paths to the bin and site-packages directories for the Python virtual environment. Then in the body of ServerAdmin and DocumentRoot, add:

<VirtualHost *:80>
	...
	
	WSGIDaemonProcess keras_rest_api_app threads=10
	WSGIScriptAlias / /var/www/html/keras-complete-rest-api/keras_rest_api_app.wsgi
	
	<Directory /var/www/html/keras-complete-rest-api>
		WSGIProcessGroup keras_rest_api_app
		WSGIApplicationGroup %{GLOBAL}
		Order deny,allow
		Allow from all
	</Directory>
	
	...
</VirtualHost>
Copy the code

Symlinked CUDA libraries (optional, GPU only)

If you’re using your GPU for deep learning and want to leverage CUDA (and why don’t you), unfortunately Apache doesn’t know about CUDA’s *.so library in /usr/local/cuda/lib64.

I’m not sure what the “right” way to tell Apache where these CUDA libraries are, but the “full hack” solution is to link all files from /usr/local/cuda/lib64 symbically to /usr/lib:

$ cd /usr/lib
$ sudo ln -s /usr/local/cuda/lib64/* ./
Copy the code

Restart the Apache Web server

After editing the Apache configuration file and selecting the SYMlink CUDA deep learning library, restart the Apache server in the following ways:

$ sudo service apache2 restart
Copy the code

Test your Apache Web server + deep learning endpoint

To test if Apache is properly configured to serve the Flask + deep learning application, refresh your Web browser:

You should now see the text “Welcome to the PyImageSearch Keras REST API!” In your browser. Once you’ve reached this stage, your Flask deep learning app should be ready to go. In summary, if you run into any problems, be sure to refer to the next section…

Tip: If you encounter problems, monitor the Apache error log

I’ve been using Python + Web frameworks such as Flask and Django for years and still get errors when configuring the environment properly. While I wish there was a bulletproof way to make sure everything went right, the truth is that something could go wrong along the way. The good news is that WSGI logs Python events, including failures, to the server log. On Ubuntu, Apache server logs are located at /var/log/apache2/:

$ ls /var/log/apache2
access.log error.log other_vhosts_access.log
Copy the code

When debugging, I often open a running terminal:

$ tail -f /var/log/apache2/error.log
Copy the code

. So I can see the second mistake rolling in. Use the error log to help you start and run Flask on the server.

Start your deep learning model server

Your Apache server should already be running. If not, you can start it by:

$ sudo service apache2 start
Copy the code

You will then start Redis storage:

$ redis-server
Copy the code

And start the Keras model server in a separate terminal:

$ python run_model_server.py * Loading model... . * Model loadedCopy the code

From there try submitting sample images to your deep learning API service:

$ curl -X POST -F [email protected] 'http://localhost/predict'
{
  "predictions": [{"label": "beagle"."probability": 0.9461532831192017
    }, 
    {
      "label": "bluetick"."probability": 0.031958963721990585
    }, 
    {
      "label": "redbone"."probability": 0.0066171870566904545
    }, 
    {
      "label": "Walker_hound"."probability": 0.003387963864952326
    }, 
    {
      "label": "Greater_Swiss_Mountain_dog"."probability": 0.0025766845792531967}]."success": true
}
Copy the code

If all goes well, you should receive formatted JSON output from the Deep Learning API model server with category prediction + probability.

Stress test your deep learning REST API

Of course, this is just one example. Let’s stress test the deep learning REST API. Open another terminal and execute the following command:

$ python stress_test.py 
[INFO] thread 3 OK
[INFO] thread 0 OK
[INFO] thread 1 OK
...
[INFO] thread 497 OK
[INFO] thread 499 OK
[INFO] thread 498 OK
Copy the code

In the run_model_server.py output, you will begin to see the following lines logged to the terminal:

* Batch size: (4, 224, 224, 3) * Batch size: (9, 224, 224, 3) * Batch size: (9, 224, 224, 3) * Batch size: (8, 224, 224, 3)... * Batch size: (2, 224, 224, 3) * Batch size: (10, 224, 224, 3) * Batch size: (7, 224, 224, 3)Copy the code

Even with a new request every 0.05 seconds, our batch size does not exceed about 10-12 images per batch. Our model server can easily handle loads without sweating, and can easily scale beyond that. If you do overload the server (perhaps your batch size is too large and your GPU is out of memory and displaying error messages), you should stop the server and clear the queue using the Redis CLI:

$ redis-cli
> FLUSHALL
Copy the code

From there you can adjust the Settings in settings.py and /etc/apache2/sites-available-/000-default.conf. You can then restart the server.

Suggestions for deploying your own deep learning model into production

One of the best advice I can give is to keep your data, especially the Redis server, close to the GPU. You might want to start a huge Redis server with hundreds of GB of RAM to process multiple image queues and service multiple GPU machines. The problem here will be I/O latency and network overhead.

Assuming that 224 x 224 x 3 images are represented as a float32 array, the batch size of 32 images will be ~19MB of data. This means that for each batch request from the model server, Redis will need to extract 19MB of data and send it to the server. On fast switching, this is not a big deal, but you should consider running both the model server and Redis on the same server to keep the data close to the GPU.

conclusion

In today’s post, we learned how to deploy a deep learning model into a production environment using Keras, Redis, Flask, and Apache. Most of the tools we use here are interchangeable. You can replace TensorFlow or PyTorch with Keras. You can use Django instead of Flask. Nginx can be replaced with Apache.

The only tool I wouldn’t recommend replacing is Redis. Redis is arguably the best solution for in-memory data storage. Unless you have a specific reason not to use Redis, I recommend using Redis for queuing operations. Finally, we stress-tested the Deep learning REST API.

We submitted a total of 500 image classification requests to our servers with a delay of 0.05 seconds between each request — our servers were not staged (CNN’s batch size never exceeded about 37%).