This paper mainly records the process used in the deployment of Flask, the problems encountered and the corresponding solutions.

1. Project Introduction

This section briefly introduces the work done in the past period:

  • Implement a simple image classification problem based on deep learning
  • Deploy it to a Web application using the Flask framework
  • High concurrency requirements

This is the first time to deploy the Web application of deep learning model. In the whole process, it further reflects the narrow scope of previous knowledge and realizes a version in the continuous pit and pit unpit.

2. Project process

This part starts from the process of project implementation and records the work done and the tools used.

2.1 Image classification model

1. Model selection

In the need of image classification, the first reaction is to use the more mature and classic classification network structure, such as VGG series (VGG16, VGG19), ResNet series (such as ResNet50), InceptionV3, etc.

Considering that it is to classify images of unknown types and there is no directly available training data, the pre-training model trained on Imagenet basically meets the requirements.

If you have strict requirements on performance (time), you are advised to use a shallow network structure, such as VGG16 and MobileNet.

The MobileNet network is a network designed for deep learning applications on both mobile and embedded terminals, so that ideal speed requirements can be achieved on the CPU. Is a lightweight deep network architecture.

MobileNet was proposed by the Google team and published in CVPR-2017 with the title MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

2. Frame selection

  • The Keras framework is often used. The underlying Keras library uses Theano or Tensorflow, also known as the back end of Keras. Keras is a high-level API built on top of Tensorflow that is much easier to learn than Tensorflow.

  • The classification network mentioned above has been basically implemented in Keras. The network structure that has been implemented in Keras is shown as follows:

  • Easy to use, can be directly imported, as follows:

Therefore, Keras was chosen as the deep learning framework.

3. Code examples

Taking Keras framework and VGG16 network as an example, image classification is carried out.

from keras.models import Model
from keras.applications.vgg16 import VGG16, preprocess_input
import keras.backend.tensorflow_backend as KTF
import tensorflow as tf
os.environ["CUDA_VISIBLE_DEVICES"] = "0, 1" # using the GPU
Use GPU memory on demand
gpu_options = tf.GPUOptions(allow_growth=True)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
KTF.set_session(sess)

# to build the modelBase_model = VGG16 (weights = 'imagenet, include_top =True)
model = Model(inputs=base_model.input,
outputs=base_model.get_layer(layer).output) Layer is the layer name

# Make a prediction
img = load_image(img_name, target_size=(224.224))  Load the image and resize it to 224x224

# Image preprocessing
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x) 

feature = model.predict(x) # Extracting features
Copy the code

2.2 Model performance test

After running the classification model, we need to test their performance, such as time consuming, CPU usage, memory usage and GPU memory usage.

1. Time consuming

Time consuming refers to the time taken for image classification feature extraction, including the sum of image pretreatment time and model prediction time.

Use the time module in Python
importtime t0 = time.time() .... Image processing and feature extraction.... print(time.time()-t0)# Time, in seconds
Copy the code

2. GPU memory usage

Use nvidia command line nvidia-SMI to view video memory usage.

3. The CPU or MEM is occupied

Run the top or htop command to check the CPU usage and memory usage.

Memory usage can also be viewed using the free command:

  • Free-h: with the -h option, the output is friendly and the appropriate units are given

  • If you need to continuously observe the memory condition, you can specify the interval of seconds using the -s option: free-h-s 3 (update every 3 seconds, press Ctrl+ C to stop the update)

The default free version of Ubuntu 16.04 is buggy and an error is reported when using the -s option.

Adjust the network structure and video memory occupancy options according to the above three test results.

For the specific meaning of the command, please refer to the blog:

Linux checks CPU and memory usage

2.3 use Redis

Redis=Remote DIctionary Server, is a high-performance key-value storage system written by Salvatore Sanfilippo. Redis is an open source API written in ANSI C, BSD compliant, network-enabled, memory-based and persistent daily execution, key-value database, and multiple languages.

Redis supports the storage types of String, list, set, zset and Hash, which are widely used in large-scale data reading and writing scenarios.

1. Basic use

Install redis

pip install redis

# test
import redis
Copy the code

Basic introduction

Redis. Py provides two classes: StrictRedis implements most of the official commands and uses the official syntax and commands Redis is a subclass of StrictRedis, For forward compatibility redis. Py normally we use StrictRedis.

Use the sample

# 1. Import Redis
from redis import StrictRedis

# 2. Connect to database, specify host, port number, databaseR = StrictRedis (host = 'localhost', the port =6379, db=2)

# 3. Store in Redis
r.set('test1'.'value1')  # Single data store
r.set('test2'.'value2')

# 4. Get the value from Redis
r.get('test1')

# 5. Batch operation
r.mset(k1='v1', k2='v2')
r.mset({'k1':'v1'.'k2':'v2'})
r.mget('k1'.'k2')
r.mget(['k1'.'k2'])
Copy the code

2. Redis stores arrays

Redis cannot store arrays directly. If you store arrays directly, the retrieved numeric types change, as shown in the following figure. The retrieved numeric types are bytes.

Import numpy as NP from redis import StrictRedis r = StrictRedis(host= 'localhost', port=6379, Db x1 = np = 2). An array ([[0.2, 0.1, 0.6], [10.2, 4.2, 0.9])) r.s et (' test1 ', X1) > > > True r. gutierrez et (' test1) > > > b '[[0.2 0.1 0.6] \ n [10.2 4.2 0.9]]' type (r. gutierrez et (' test1 ') # after obtaining the data type of the > > > < class > 'bytes'Copy the code

To keep the data type consistent before and after storage, serialize the array before storing it and deserialize it when retrieving it.

Use Python’s pickle module for serialization.

import pickle
r.set('test2', pickle.dumps(x1))
>>> True
pickle.loads(r.get('test2'))
>>> array([[ 0.2.0.1.0.6],
         [10.2.4.2.0.9]])
Copy the code

In this way, the type of data before and after storage is the same.

2.4 Web development framework – Flask

Before learning Python, I never paid attention to the chapter of Web development because it was not covered in my job description. Now it needs a fresh look.

In the early days, software was mainly run on the desktop, and software like database was run on the Server. This Client/Server mode was referred to as CS architecture. With the rise of the Internet, CS architecture is not suitable for The Web, the biggest reason is that the modification and upgrade of Web applications are very frequent. CS architecture requires each client to upgrade the desktop App one by one. Therefore, Browser/Server mode becomes popular, referred to as BS architecture.

In THE BS architecture, the client only needs the browser, and the logic and data of the application program are stored in the server. The browser only needs to request the server, obtain the Web page, and show the Web page to the user. Web pages are also extremely interactive these days.

Python has a long history before the Web. Because Python is an interpreted scripting language and can be developed efficiently, it is ideal for Web development.

Python has hundreds of open source Web frameworks, including Flask and Django. Flask is used as an example to describe how to use Flask for Web deployment.

For an introduction to Web development frameworks, check out this blog post: Three of the most popular Python Web development frameworks right now, and you deserve them!

You can refer to other blog posts on how to use Flask. The following uses specific examples to illustrate:

1. Installation and use

  1. Install the Flask

    pip install flask
    
    import flask # import
    flask.__version__ # version
    >>> '1.1.1' # Current version
    Copy the code
  2. A simple Flask example

    Flask uses Python decorators to automatically associate urls with functions internally.

    # hello.py
    from flask import Flask, request
    
    app = Flask(__name__) Flask class. The first parameter is the name of the module or package
    app.config['JSON_AS_ASCII'] =False # Support Chinese display
    
    @app.route('/', methods=['GET', 'POST']) # Use methods to handle different HTTP methods
    def home(a):
        return 'Hello, Flask'
    
    if __name__ == '__main__':
        app.run()
    Copy the code
    • useroute()Decorator to tell Flask the URL to trigger the function;
    • The function name is used to generate the associated URL. The function returns the information that needs to be displayed in the user’s browser.

    Run this file, it will prompt * Running on http://127.0.0.1:5000/, open this url in the browser, it will automatically call the home function, return Hello, Flask, you will see Hello, Flask on the browser page.

    App. The parameters of the run

    app.run(host="0.0.0.0", port="5000", debug=True, processes=2, threaded=False)
    Copy the code
    • hostSet to0.0.0.0, you can make the server publicly accessible
    • port: Specifies the port number. The default is port number5000
    • debug: Whether to enable the Debug model. If you enable debug mode, the server automatically restarts after modifying the application code and provides a useful debugger when the application fails.
    • processes: Number of threads1
    • threaded:boolType, whether to enable multithreading. Note: Multi-threading cannot be enabled at the same time when multiple processes are enabled.

    Note: You should never use the debugger in a production environment

2. The Flask response

The return value of the view function is automatically converted to a response object. If the return value is a string, it is converted to a string containing the body of the response, a 200 OK error code, and a response object of type Text/HTML. If the return value is a dictionary, jsonify() is called to generate a response. Here are the rules for conversion:

  • If the view returns a response object, return it directly.
  • If a string is returned, a response object is generated for the return based on the string and the default parameters.
  • If a dictionary is returned, call jsonify to create a response object.
  • If a tuple is returned, the items in the tuple can provide additional information. The tuple must contain at least one item, and the item should consist of (Response, status), (Response, headers), or (Response, status, headers). The value of status overloads the status code, and HEADERS is a list or dictionary of additional header values.
  • If none of the above, Flask assumes that the return value is a valid WSGI application and converts it into a response object.

JSON format API

JSON responses are common, and writing such apis in Flask is easy to pick up. If a dict is returned from the view, it is converted to a JSON response.

@app.route("/me")
def me_api(a):
    user = get_current_user()
    return {
        "username": user.username,
        "theme": user.theme,
        "image": url_for("user_image", filename=user.image),
    }
Copy the code

If dict isn’t enough and you need to create other types of JSON-formatted responses, you can use the jsonify() function. This function serializes any supported JSON data types.

@app.route("/users")
def users_api(a):
    users = get_all_users()
    return jsonify([user.to_json() for user in users])
Copy the code

3. Run the development server

  1. Use the development server from the command line

    Flask command line scripts (command line interface) are highly recommended for development because of their powerful reloading capabilities and excellent reloading experience. The basic usage is as follows:

    $ export FLASK_APP=my_application
    $ export FLASK_ENV=development
    $ flask run
    Copy the code

    Doing so starts the development environment (including interactive debuggers and overloaders) and provides services at http://localhost:5000/.

    You can control the server’s individual functionality by using different run parameters. For example, to disable an overload:

    $ flask run --no-reload

  2. Use the development server through code

    The alternative is to start the application with the flask.run () method, which immediately runs a local server and has the same effect as using the Flask script.

    Example:

    if __name__ == '__main__':
        app.run()
    Copy the code

    This is fine in general, but not for development.

2.5 use Gunicorn

When we execute app.py above, we start the Web service using the flask’s own server. In the production environment, the flask servers cannot meet the performance requirements, so we use Gunicorn as wsGI containers to deploy flask applications.

Gunicorn (Green Unicorn) is a Python WSGI UNIX HTTP server. Ported from Ruby’s Unicorn project. The Gunicorn server acts as a container for wsGi Apps and is compatible with various Web frameworks for very simple, lightweight resource consumption. Gunicorn starts directly with commands, no configuration files are written, and is much easier than uWSGI.

In Web development, deployment is similar.

1. Installation and use

pip install gunicorn
Copy the code

If you want Gunicorn to support asynchronous workers, you need to install the following three packages:

pip install gevent
pip install eventlet
pip install greenlet
Copy the code

Specify the process and port number to start the server:

Gunicorn -w 4 -b 127.0.0.1:5001 Run file name :Flask program instance name

Take the hello.py file above as an example:

Gunicorn -w 4-b 127.0.0.1:5001 Hello :app

Parameter: -w: indicates the process (worker). -b: indicates the IP address and port number (bind).

Gunicorn -h usually writes the configuration parameters to the configuration file, such as gunicorn_conf.py

Important parameters:

  • bind: Listens on the address and port
  • workers: worker Number of processes. Suggested values:2~4 x (NUM_CORES), the default value is 1.
  • worker_class: worker How the process works. There are:sync(Default value),eventlet.gevent.gthread.tornado
  • threads: The number of threads in the worker process. Suggested values:2~4 x (SUM_CORES), the default value is 1.
  • reload: When the code changes,Automatic restart of workers. Applicable to the development environment. The default value isFalse
  • daemon: Indicates whether the application usesdaemonMode, whether to start as a daemon, defaultFalse
  • accesslog: Access log file path
  • errorlog: Error log path
  • loglevel: Log level.Debug, INFO, Warning, Error, critical.

An example of parameter configuration:

# gunicorn_conf.py
bind: '0.0.0.0:5000' # monitor address and port number
workers = 2 # processes
worker_class = 'sync' # Working mode: Sync, GEvent, Eventlet, Gthread, Tornado, etc
threads = 1 # specifies the number of threads per process. Default is 1
worker_connections = 2000 # Maximum customer concurrency
timeout = 30 The default timeout period is 30s
reload = True # Development mode, automatic restart when code updates
daemon = False # Guard the Gunicorn process. Default is False

accesslog = './logs/access.log' Access log files
errorlog = './logs/error.log'
loglevel = 'debug' # Log output level: DEBUG, INFO, Warning, error, critical
Copy the code

Call command:

gunicorn -c gunicorn_conf.py hello:app

Gunicorn /example_config.py at master · benoitc/gunicorn

3. Code examples

#flask_feature.app
import numpy as np
from flask import Flask, jsonify
from keras.models import Model
from keras.applications.vgg16 import VGG16
from keras.backend.tensorflow_backend import set_session

app = Flask(__name__)
app.config['JSON_AS_ASCII'] =False

@app.route("/", methods=["GET", "POST"])
def feature(a):
    img_feature = extract()
    return jsonify({'result':'true'.'msg':'success'})

def extract(img_name):
    # Image preprocessing
    img = load_image(img_name, target_size=(feature_params["size"], feature_params["size"])) 

    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    
    with graph.as_default():
        set_session(sess)
        res = model.predict(x)
    
    return res
    
    
if __name__ == '__main__':
    tf_config = some_custom_config
    sess = tf.Session(config=tf_config)
    set_session(sess)
    base_model = VGG16(weights=model_weights, include_top=True)
    model = Model(inputs=base_model.input,
                    outputs=base_model.get_layer(layer).output)
    graph = tf.get_default_graph()
    
    app.run()
Copy the code

Start the service command with Gunicorn:

gunicorn -c gunicorn_conf.py flask_feature:app

4. Problems encountered

Record the problems encountered during the entire deployment and the solutions.

4.1 Flask multithreading and multiprocess problems

Due to the high time performance requirements of the algorithm, we try to test the effect of multithreading and multi-process options in Flask. Flask app.run() : processes specifies how many processes are enabled, and threaded specifies whether threads are enabled.

Flask starts debug mode, dubug mode starts a tensorflow thread when the service is started, resulting in graph dislocation when tensorflow is called.

4.1 Flask and Keras problem

Note the problems encountered and references when starting the Flask service here.

Tensor is not an element of this graph

Error message:

"Tensor Tensor(\"pooling/Mean:0\", shape=(? , 1280), dtype=float32) is not an element of this graph.".Copy the code

Description: The code using the pre-training model in Keras for image classification feature extraction can run normally. When the service is started by Flask and the prediction function is accessed, the above error occurs.

Reason: Dynamic graph is used, that is, when making prediction, the graph loaded is not the graph when the model is initialized for the first time, so there is no information such as parameters and nodes in the model.

Some people offer the following solutions:

import tensorflow as tf
global graph, model
graph = tf.get_default_graph()

# When forecasting is needed
with graph.as_default():
    y = model.predict(x)
Copy the code

Q2: Start the service with Flask, load the model twice and occupy two copies of video memory

The cause of this problem is that the debug mode (debug=True) is enabled when Flask is used to start the service. In dubug mode, a tensorFlow thread is started. In this case, the GPU memory usage is checked. Two processes occupy the same memory.

Close the Debug model (debug=False).

References:

[1]:Keras + Flask provides interface services for the pit ~~~

4.2 Problems related to gunicorn service startup

The following problems occur when starting the gunicorn service:

Q1: Failed precondition

Specific issues:

2 root error(s) found.\n 
(0) Failed precondition: Error while reading resource variable block5_conv2/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/block5_conv2/kernel)\n\t [[{{node block5_conv2/convolution/ReadVariableOp}}]]\n\t [[fc2/Relu/_7]]\n 
(1) Failed precondition: Error while reading resource variable block5_conv2/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/block5_conv2/kernel)\n\t [[{{node block5_conv2/convolution/ReadVariableOp}}]]\n0 successful operations.\n0 derived errors ignored."
Copy the code

Solutions:

Create a reference to the session used to load the model, and then set up the session using Keras on each request that needs to be used. Details are as follows:

from tensorflow.python.keras.backend import set_session
from tensorflow.python.keras.models import load_model

tf_config = some_custom_config
sess = tf.Session(config=tf_config)
graph = tf.get_default_graph()

# IMPORTANT: models have to be loaded AFTER SETTING THE SESSION for keras! 
# Otherwise, their weights will be unavailable in the threads after the session there has been set
set_session(sess)
model = load_model(...)

# In each request:
global sess
global graph
with graph.as_default():
    set_session(sess)
    model.predict(...)
Copy the code

Tensorflow’s graphs and sessions are not thread-safe. By default, each thread creates a new session (excluding weights, models, etc.). Therefore, you can solve the problem by saving a global session that contains all the models and setting it up to be used by Keras in each thread.

Some netizens extract a way to improve:

# on thread 1
session = tf.Session(graph=tf.Graph())
with session.graph.as_default():
    k.backend.set_session(session)
    model = k.models.load_model(filepath)

# on thread 2
with session.graph.as_default():
    k.backend.set_session(session)
    model.predict(x, **kwargs)
Copy the code

The novelty here allows multiple models to be loaded (at once) and used in multiple threads. By default, the “default” Session and the “default” graph are used when loading the model. But in this case it’s creating a new one. Also note that the Graph is stored in the Session object, which is more convenient.

I’ve tested it and it doesn’t seem to work

Q2: Failed to start the service, CRITICAL WORKER TIMEOUT

When gunicorn was used to start the Flask service, the server status and log files were checked and it was found that attempts to start the flask service had been unsuccessful.

CRITICAL WORKER TIMEOUT

This is caused by the gunicorn configuration parameter timeout. The default value is 30s. After 30s, the process is killed and the restart is restarted.

When the service initialization time exceeds the timeout value, the service will continue to start, kill, restart.

You can increase the value as required.

References:

tensorflow – GCP ML-engine FailedPreconditionError (code: 2) – Stack Overflow

5. Reference materials

Flask Welcome to the World of Flask — Flask Chinese Documentation (1.1.1)

Gunicorn- Configuration details

At Runtime : “Error while reading resource variable softmax/kernel from Container: localhost” · Issue #28287 · tensorflow/tensorflow

[Resolved] Online environment failed to run Flask via Gunicorn: CRITICAL WORKER TIMEOUT