As a Web developer, I find the rapid development of computer vision and machine learning exciting, but I don’t have any background experience in using these technologies. Finally, I decided to take two years to transform.

Begin to learn

I began this journey when a book on deep learning and computer vision was published. Author Adrian Rosebrock from PyImageSearch.com has written a three-volume tome that covers the high level ideas and low level applications of computer vision and deep learning. While exploring deep learning, I encountered many new algorithms for linear regression, Naive Bayesian applications, random forest/decision tree learning, and more.

 

I spent a few weeks reading the book and connecting all the various posts I had read to the mathematical concepts of queues, the concept of abstract thinking, and its practical programming applications. I read the book quickly to better understand how to approach the field as a whole. The biggest lesson I learned: build your own tools and hardware to build computer vision software.

Hardware implementation

Inspired, I found a Raspberry Pi and RPI camera to analyze video streams. I never thought it would take so long to configure the raspberry pie. Initially, I wanted to get raspberry PI up and running just with video streaming, and work with video on my computer. I tried to make the raspberry PI operating system work as well as POSSIBLE. Once I realized what was wrong, THEN I accidentally installed the wrong graphics driver and accidentally installed the software that caused the conflict. What I initially thought was video processing turned into a debugging nightmare for more than an hour.

By now, I’ve realized that tuning machines is a big part of getting started with machine learning and computer vision.

 

Aiyprojects.withgoogle.com/vision#list…

My original raspberry Pi-based inspiration was the idea of configuring a simple device with a camera and GPS signal. The idea is to consider how many vehicles in the future will need many cameras to navigate, no matter what kind of vehicle it is, it will need many cameras to navigate. Whether for insurance purposes or basic functionality, it is conceivable that a large number of video clips will be created and used in the future. In the process, a vast library of media will be left idle and become a vast database for understanding the world.

I ended up exploring the raspberry PI’s computer vision abilities without having succeeded in producing any interesting results as I had hoped. I found plenty of inexpensive raspberry-like devices that still have interoperability and camera functionality on a PCB much smaller than an entire Raspberry PI. I realized that rather than go the hardware route, I would use the old iPhone to develop some software.

A brief attempt to explore the hardware components of deep learning made me realize that I should stick with software as much as possible. When the software part doesn’t solve the problem, including a new variable only adds complexity.

For more raspberry PI articles click here!

Open source tools

In my first month of searching around for machine learning resources, I found a number of open source tools that were very easy to get up and running. I read that the FANG tech companies offer a lot of proprietary services, but I’m not sure how they compete with open source solutions. Image recognition and OCR tools that can be used as SAAS tools on IBM, Google, Amazon, and Microsoft are relatively simple to use. To my surprise, there are plenty of good open source solutions that are worth configuring to avoid unnecessary dependencies.



For example, a few years ago, I started an IOS app to collect and share graffiti photos. I index images from public apis with geotagged images like Instagram and Flickr. Using these resources, I use basic features like tags and location data to distinguish whether an image has been doodled. I started indexing thousands of images a week and soon grew to hundreds of thousands a month. I quickly noticed that many of the images indexed were not graffiti and were replaced by images that were destructive to the community I was trying to build. I can’t stop people from taking low-quality images themselves, or downloading poorly marked images from someone else’s seed that are a security risk to use. Therefore, I decided to shut down the whole project.

 

Now, with machine learning services and open source implementations for object detection and human detection, I can launch my own easy-to-use image retrieval service. I used to have to pay a service to do this quality check, which would cost hundreds if not thousands of dollars in API fees. Instead, I can now download an AMI from some “data science” AWS framework and create my own API to check for unwanted image content.

An introduction to

Before going through this process, I thought I understood most of the principles of image recognition and machine learning in theory. When I first connected the machine learning content I needed to use, it was clear to me what concepts I needed to learn. For example, I didn’t just know that linear algebra was important for machine learning, but now I understand how to break down a problem into multi-dimensional arrays/matrices and batch them to find patterns that can be theoretically represented. Before that, I knew that there were abstractions between features and how to represent them as numbers that could be compared in a series of evaluation items. I now have a better understanding of how the dimensions of machine learning are represented by many directly and indirectly interrelated factors. The multidimensional form of feature detection and evaluation in matrix mathematics is still a mystery to me, but I can understand higher-level concepts.

 

A previously illegible diagram of the network architecture now looks clear

Adrian Rosebrock’s book gave me an Epiphany on the schematic of decoding machine learning algorithms. The breakdown of deep learning network architectures is now also a little more understandable, and I am familiar with the differences between the baseline data sets used for various image recognition models (MNIST, CIFAR-10, ImageNet) and image recognition models (VGG-16, Inception, etc.).

Practice makes perfect.

Looking back at the material I have been reading, I found that most of the knowledge is still not proficient, so I decided to carry out more actions in conjunction with reading relevant blogs. Perhaps only the combination of theory and practice can achieve faster growth rate. To that end, I bought a GPU with which I no longer have to be constrained by training models and manipulating data sets.

If you want to get better at machine learning, don’t just read papers and blogs, try to do it yourself, and you’ll get more out of it.

 

This article is recommended by Beijing Post @ Love coco – Love life teacher, translated by Ali Yunqi Community organization.

Two months Exploring deep Learning and Computer Vision

By Leonard Bogdonoff

Translator: Mags, edited by Yuan Hu.

The article is a brief translation. For more details, please refer to the original text