• How to easily Detect Objects with Deep Learning on Raspberry Pi
  • By Sarthak Jain
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: Starrier
  • Proofreader: Luochen1992, Jasonxia23

The real-world challenges are limited data and small hardware, such as phones and raspberry PI, that cannot run complex deep learning models. This article demonstrates how to use raspberry PI for object detection, like cars on the road, oranges in the fridge, signatures on documents, and Tesla in space.

Disclaimer: I’m building Nanonets.com with less data and no hardware to help build machine learning.

If you don’t have the patience to read any further, browse to the bottom of Github’s repository.

Check vehicles on Bombay Road.

Why detect objects? Why raspberry pie?

The Raspberry PI is an excellent piece of hardware that has captured the hearts and minds of people who have sold 15 million devices at the same time, and even hackers have used it to build cooler projects. Given the popularity of deep learning and raspberry PI cameras, we thought it would be very meaningful to be able to detect any object through deep learning with Raspberry PI.

Now you will be able to spot a Potobomber in your selfie, someone into Harambe’s cage, where someone lets a Sriracha or Amazon delivery man into your house.

What is object detection?

Human vision has evolved considerably in 20 years of evolution. Thirty percent of the human brain’s neurons are responsible for processing vision (compared to eight percent for touch and three percent for hearing). Humans have two major advantages over machines. Stereoscopic vision and almost unlimited training data (a five-year-old baby takes about 2.7b of images at 30fps).

To mimic the level of performance at the human level, scientists have broken down visual perception tasks into four different categories.

  1. Categorize, specify a label for the entire image
  2. Localization, which specifies a border for a particular tag
  3. Object detection, drawing multiple boundary boxes in the image
  4. Image segmentation, creating an exact part of the image where the object is located

Object detection is good enough for a variety of applications (even though image segmentation results are more accurate, it suffers from the complexity of creating training data. For a human tagger, it takes 12 times longer to segment an image than to draw a bounding box; This is more anecdotal, but lacks a source). And after the object is detected, it can be separately split from the bounding box.

Using object detection:

Object detection has important practical significance and has been widely used in various industries. Here are some examples:

How can I use object detection to solve my own problems?

Object detection can be used to answer a variety of questions. Here’s a rough breakdown:

  1. Are there any objects in my image? For example, is there an intruder in my house?
  2. Where is the object, in the image? For example, when a car is trying to drive around the world, it’s important to know where objects are.
  3. How many objects are there? Are they all in the picture? Object detection is one of the most effective methods to calculate objects. For example, how many boxes are on a shelf in a warehouse.
  4. What are the different types of objects in the image? Like which animal is in which part of the zoo?
  5. What is the size of the object? Especially with a still camera, it’s easy to figure out the size of an object. What is the size of a mango?
  6. ** How do different objects interact? ** How does formation on a football pitch affect the result?
  7. Where objects are in relation to time (tracking objects) such as tracking a moving object such as a train and calculating its speed etc.

Object detection in 20 lines or less

Visualization of YOLO algorithm.

There are multiple models/architectures for object detection. Trade off between speed, size, and accuracy. We picked a favorite: YOLO (you only saw it once). And shows how it works (if you ignore the annotated Haunted) in less than 20 lines of code.

Note: This is pseudocode and will not be a useful example. It has a partial black box close to CNN standards, as shown below.

You can read the full article here: pjreddie.com/media/files…

Convolutional neural network structure in YOLO.

#this is an Image of size 140x140. We will assume it to be black and white (ie only one channel, it would have been 140x140x3 for rgb)
image = readImage()

#We will break the Image into 7 coloumns and 7 rows and process each of the 49 different parts independently
NoOfCells = 7

#we will try and predict if an image is a dog, cat, cow or wolf. Therfore the number of classes is 4
NoOfClasses = 4
threshold = 0.7

#step will be the size of step to take when moving across the image. Since the image has 7 cells step will be 140/7 = 20
step = height(image)/NoOfCells

#stores the class for each of the 49 cells, each cell will have 4 values which correspond to the probability of a cell being 1 of the 4 classes
#prediction_class_array[I,j] is a vector of size 4 which would look like [0.5 #cat, 0.3 #dog, 0.1 # Wolf, 0.2 #cow]
prediction_class_array = new_array(size(NoOfCells,NoOfCells,NoOfClasses))

#stores 2 bounding box suggestions for each of the 49 cells, each cell will have 2 bounding boxes, with each bounding box having x, y, w ,h and c predictions. (x,y) are the coordinates of the center of the box, (w,h) are it's height and width and c is it's confidence
predictions_bounding_box_array = new_array(size(NoOfCells,NoOfCells,NoOfCells,NoOfCells))

#it's a blank array in which we will add the final list of predictions
final_predictions = []

#minimum confidence level we require to make a predictionThreshold = 0.7for (i<0; i<NoOfCells; i=i+1):
	for(j<0; j<NoOfCells; j=j+1):#we will get each "cell" of size 20x20, 140(image height)/7(no of rows)=20 (step) (size of each cell)"
		#each cell will be of size (step, step)
		cell = image(i:i+step,j:j+step) 

		#we will first make a prediction on each cell as to what is the probability of it being one of cat, dog, cow, wolf
		#prediction_class_array[I,j] is a vector of size 4 which would look like [0.5 #cat, 0.3 #dog, 0.1 # Wolf, 0.2 #cow]
		#sum(prediction_class_array[i,j]) = 1
		#this gives us our preidction as to what each of the different 49 cells are
		#class predictor is a neural network that has 9 convolutional layers that make a final prediction
		prediction_class_array[i,j] = class_predictor(cell)

		#predictions_bounding_box_array is an array of 2 bounding boxes made for each cell
		# size (predictions_bounding_box_array [I, j) is (2, 5)
		#predictions_bounding_box_array[i,j,1] is bounding box1, predictions_bounding_box_array[i,j,2] is bounding box 2
		#predictions_bounding_box_array[i,j,1] has 5 values for the bounding box [x,y,w,h,c]
		#the values are x, y (coordinates of the center of the bounding box) which are whithin the bounding box (values ranging between 0-20 in your case)
		#the values are h, w (height and width of the bounding box) they extend outside the cell and are in the range of [0-140]
		#the value is c a confidence of overlap with an acutal bounding box that should be predicted
		predictions_bounding_box_array[i,j] = bounding_box_predictor(cell)

		#predictions_bounding_box_array[i,j,0, 4] is the confidence value for the first bounding box prediction
		best_bounding_box =  [0 if predictions_bounding_box_array[i,j,0, 4] > predictions_bounding_box_array[i,j,1, 4] else 1]

		# We will get the class which has the highest probability, for [0.5 #cat, 0.3 #dog, 0.1 # Wolf, 0.2 #cow], 0.5 is the highest probability corresponding to cat which is at position 0. So index_of_max_value will return 0
		predicted_class = index_of_max_value(prediction_class_array[i,j])

		# We will check if the prediction is above a certain threshold (could be something like 0.7)
		if predictions_bounding_box_array[i,j,best_bounding_box, 4] * max_value(prediction_class_array[i,j]) > threshold:

			#the prediction is an array which has the x,y coordinate of the box, the height and the width
			prediction = [predictions_bounding_box_array[i,j,best_bounding_box, 0:4], predicted_class]

			final_predictions.append(prediction)


print final_predictions
Copy the code

YOLO in <20 lines of code.

How do we build a deep learning model for object detection?

The six main steps of the deep learning workflow will be divided into three phases

  1. Collecting training data
  2. Training model
  3. Predict new image


Phase 1 — Collect training data

Step 1 Collect images (at least 100 images per object) :

In this task, you need several hundred images per object. Try to capture the data on which you will eventually make predictions.

Step 2 Notes (draw these images manually) :

Draws bounding boxes on the image. You can use a tool like labelImg. You need someone to annotate your image. This is a fairly intensive and time-consuming task.


Phase 2 — Train the model on the GPU machine

The third step is to find a pre-training model that can transfer learning

You can find it at medium.com/nanonets/na… Read more about this in. You need a pre-training model so that you can reduce the amount of data required for training. Without it, you might need hundreds of thousands of images to train your model.

You can find some pre-training models here

Step 4, train on a GPU (cloud services like AWS/GCP or your own GPU machine) :

Docker mirror

The process of training the model is not necessary, but creating a Docker image makes training easier difficult to simplify.

You can start the training model by running the following:

sudo nvidia-docker run -p 8000:8000 -v `pwd`:data docker.nanonets.com/pi_training -m train -a ssd_mobilenet_v1_coco -e ssd_mobilenet_v1_coco_0 -p '{" batch_size ": 8," learning_rate ": 0.003}' 
Copy the code

See this link for more information on how to use

The Docker image has a run.sh script that can be called with the following arguments

run.sh [-m mode] [-a architecture] [-h help] [-e experiment_id] [-c checkpoint] [-p hyperparameters]

-h          display this help and exit
-m          mode: should be either `train` or `export`
-p          key value pairs of hyperparameters as json string
-e          experiment id. Used as path inside data folder to run current experiment
-c          applicable when mode is export, used to specify checkpoint to use for export
Copy the code

You can find more details below:

  • NanoNets/RaspberryPi-ObjectDetection-TensorFlow: RaspberryPi- object detection-tensorflow -ObjectDetection using TensorFlow on RaspberryPi

To train the model, you need to select the correct hyperparameters.

Find the right parameters

The art of “deep learning” is a little ironic, but it tries to figure out which of the best parameters will get the highest accuracy for your model. Associated with this is a degree of dark magic and a bit of theory. This is a good resource for finding the right parameters.

Quantitative models (made smaller to fit smaller devices like raspberry PI or cell phones)

Small devices like cell phones and raspberry PI have very little memory and computing power.

Training neural networks is accomplished by applying many tiny boosts to weights, and these tiny increments typically require floating point accuracy to work (although there are efforts here to use quantization representations).

Taking a pre-trained model and running the reasoning is very different. One of the magic properties of deep neural networks is that they tend to handle high noise in their input very well.

Why is it quantified?

For example, neural network models can take up a lot of disk space, and AlexNet was originally in floating point format of 200 MB or more. Because there are often millions of neural connections in a single model, almost all sizes are determined by the weight of the neural connections.

The nodes and weights of the neural network were initially stored as 32-bit floating-point numbers. The simplest quantization motivation was to reduce the file size by storing the minimum and maximum values for each layer, and then compress each floating-point value into an 8-bit integer, thereby reducing the file size by 75%.

Quantization code:

curl -L "https://storage.googleapis.com/download.tensorflow.org/models/inception_v3_2016_08_28_frozen.pb.tar.gz" |
  tar -C tensorflow/examples/label_image/data -xz
bazel build tensorflow/tools/graph_transforms:transform_graph
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=tensorflow/examples/label_image/data/inception_v3_2016_08_28_frozen.pb \
  --out_graph=/tmp/quantized_graph.pb \
  --inputs=input \
  --outputs=InceptionV3/Predictions/Reshape_1 \
  --transforms='add_default_attributes strip_unused_nodes (type = float, shape = "1299299") remove_nodes (op = Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms quantize_weights quantize_nodes strip_unused_nodes sort_by_execution_orderCopy the code

Stage 3: Use raspberry PI to predict new images

Step 5: Capture a new image through the camera

You need raspberry PI to live and work. A new image is then captured

See this link for installation instructions

import picamera, os
from PIL import Image, ImageDraw
camera = picamera.PiCamera()
camera.capture('image1.jpg')
os.system("xdg-open image1.jpg")
Copy the code

Code to capture a new image.

Step 6: Predict the new image

Download the model

Once you’re done training the model, you can download it to your raspberry PI. To export the model to run:

sudo nvidia-docker run -v `pwd`:data docker.nanonets.com/pi_training -m export -a ssd_mobilenet_v1_coco -e ssd_mobilenet_v1_coco_0 -c /data/0/model.ckpt-8998
Copy the code

Then download the model to the raspberry PI.

Download Tensorflow on Raspberry PI

Depending on your device, you may need to change the installation slightly

sudo apt-get install libblas-dev liblapack-dev python-dev libatlas-base-dev gfortran python-setuptools libjpeg-dev sudo pip install Pillow sudo pip install http://ci.tensorflow.org/view/Nightly/job/nightly-pi-zero/lastSuccessfulBuild/artifact/output-artifacts/tensorflow-1.4.0 -cp27-none-any.whl gitclone [https://github.com/tensorflow/models.git](https://github.com/tensorflow/models.git)

sudo apt-get install -y protobuf-compiler

cd models/research/protoc object_detection/protos/*.proto --python_out=.

export PYTHONPATH=$PYTHONPATH:/home/pi/models/research:/home/pi/models/research/slim
Copy the code

Run the model to predict new images

python ObjectDetectionPredict.py --model data/0/quantized_graph.pb --labels data/label_map.pbtxt --images /data/image1.jpg /data/image2.jpg
Copy the code

Performance benchmarks for raspberry PI

Raspberry Party has memory and computing limitations (the Version of Tensorflow compatible with raspberry PI Gpus is still not available). Therefore, how much time each model takes to predict new images is important for benchmarking.

Benchmarks that run different object detections in raspberry PI.


Our goal at NanoNets is to make deep learning easier. Object detection is a major area of focus for us, and we have developed a workflow to address many of the challenges of implementing deep learning models.

How NanoNets makes the process easier:

1. No notes

We have removed the need to annotate images and we have professional annotators for your images.

2. Automatic optimization model and selection of hyperparameters

We automate training the best models for you. To do this, we run a set of models with different parameters to choose the best model for your data.

3. No expensive hardware and GPU required

NanoNets runs entirely in the cloud and requires no hardware. This makes it easier to use.

4. Suitable for mobile devices like raspberry PI

Because devices like the Raspberry PI and phones aren’t built to run complex computing tasks, you can outsource the workload to our cloud, which does all the computing for you

Here is a simple snippet of an image prediction using the NanoNets API

import picamera, json, requests, os, random
from time import sleep
from PIL import Image, ImageDraw

#capture an image
camera = picamera.PiCamera()
camera.capture('image1.jpg')
print('caputred image')

#make a prediction on the image
url = 'https://app.nanonets.com/api/v2/ObjectDetection/LabelFile/'
data = {'file': open('image1.jpg'.'rb'), \
    'modelId': (' '.'YOUR_MODEL_ID')}
response = requests.post(url, auth=requests.auth.HTTPBasicAuth('YOUR_API_KEY'.' '), files=data)
print(response.text)

#draw boxes on the image
response = json.loads(response.text)
im = Image.open("image1.jpg")
draw = ImageDraw.Draw(im, mode="RGBA")
prediction = response["result"] [0] ["prediction"]
for i in prediction:
    draw.rectangle((i["xmin"],i["ymin"], i["xmax"],i["ymax"]), fill=(random.randint(1, 255),random.randint(1, 255),random.randint(1, 255),127))
im.save("image2.jpg")
os.system("xdg-open image2.jpg")
Copy the code

Use NanoNets to predict the new image code

Build your own NanoNet

You can try to build your own model from:

1. Using the GUI (can also automatically annotate images) :Nanonets.com/objectdetec…

2. Use our API:Github.com/NanoNets/ob…

Step 1: Clone the warehouse

git clone[https://github.com/NanoNets/object-detection-sample-python.git](https://github.com/NanoNets/object-detection-sample-pyt hon.git)cd object-detection-sample-python
sudo pip install requests
Copy the code

Step 2: Get the key to your free API

From app.nanonets.com/user/api_ke… To get your free API key

Step 3: Set the API key to the environment variable

export NANONETS_API_KEY=YOUR_API_KEY_GOES_HERE
Copy the code

Step 4: Create a new model

python ./code/create-model.py
Copy the code

Note: This will generate the model ID needed for the next step

Step 5: Add the model ID as the environment variable

export NANONETS_MODEL_ID=YOUR_MODEL_ID
Copy the code

Step 6: Upload training data

Collect the image of the object you want to detect. You can use our web UI (https://app.nanonets.com/ObjectAnnotation/?appId=YOUR_MODEL_ID) for comments, or use open source tools like labelImg. Once you’ve prepared the dataset, images and Annotations, in your folder, you’re ready to upload the dataset.

python ./code/upload-training.py
Copy the code

Step 7: Train the model

Once the image is uploaded, the model is trained

python ./code/train-model.py
Copy the code

Step 8: Get the model state

Model training takes 2 hours. Once the model has been trained, you will receive an email. Also check the state of the model

watch -n 100 python ./code/model-state.py
Copy the code

Step 9: Predict

Once the model is trained, you can use it to make predictions

python ./code/prediction.py PATH_TO_YOUR_IMAGE.jpg
Copy the code

Code (GitHub repository)

Github repository for training models:

  1. Tensorflow code for model training and quantification
  2. NanoNets model training code

The GitHub repository that makes predictions for Raspberry PI (i.e. detects new objects) :

  1. Tensorflow code for prediction on raspberry PI
  2. NanoNets code for prediction on raspberry PI

Annotated dataset:

  1. Vehicles visible on Indian roads, a dataset of vehicles extracted from images of Indian roads
  2. Coco data set

The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.