The Heart of the Machine by Tony Peng.

This is not Shi jianping’s first visit to CVPR. In the past eight years, she has barely missed a term, but she is used to getting an American visa every year. However, over the years, the status of participating in CVPR has changed: from an undergraduate, to a doctoral student, to a researcher, and now to the research director of Sensetime.

She has brought five CVPR papers this year, one for oral and two for Spotlight, an impressive number. Sensetime also received a record 44 entries this year, compared with Google’s 45. It would have been absolutely unthinkable in the past for one company in industry to have more than 40 papers selected for CVPR.

Over the past decade, CVPR has changed a lot. Computer vision, once an academic darling, has emerged from the ivory tower to become the brightest technological star in the spotlight. The protagonist of the conference, from the vector machine in those days to the popular deep learning today; ImageNet Challenge ILSVRC, born in 2010, also came to an end last year…

Ten years ago, only 1,500 people attended the conference. This year, more than 6,500 attended the conference. Many of the students who used to attend the conference are now senior researchers or chief scientists in industry…

The changes over the years, which have been facilitated by the committee, are also the inevitable result of the changes of The Times. Most people are happy with these changes, but many are worried about their gains and losses. The story is worth retracing.


Ignorant Chinese scholar

In 2011, Shi Jianping, who will graduate from Zhu Kezhen College of Zhejiang University as a senior, received an email from the CVPR Committee: Congratulations, your paper has been selected for this year’s oral Presentation. It made her very excited.

CVPR, IEEE International Conference on Computer Vision and Pattern Recognition, is one of the top three conferences in the field of computer vision, including ICCV and ECCV. The pace of computer science is fast, the frontiers are constantly changing, and academics tend to submit papers to annual conferences rather than journals that take one or two years to appear.

Being selected for oral means that your paper is not only well received by the committee, but that you will be able to give a 15-20 minute presentation (now shorter) to the conference attendees. The overall paper acceptance rate of CVPR is between 25% and 30%, and the oral acceptance rate is no more than 5% (3.5% in 2011). The rest are poster, and the third form, spotlight, which only appeared in 2016, is short speech report.

According to incomplete statistics, Shi Jianping may have been the first undergraduate whose thesis was selected into CVPR Oral in China. Her research topic is A non-convex Relaxation Approach to Sparse Dictionary Learning. Sparce Dictionary Learning was a relatively mainstream research topic at that time (to find sparse representation of input data in the form of linear combination of basic elements and the basic elements themselves, also known as sparse coding). With the advent of deep learning, however, there is no place for these methods.

CVPR is shi Jianping’s first contact with the international top conference. With her visa in order, she was ready to leave for America.

That year, CVPR had just come to Colorado, an inland city in the United States, from San Francisco the year before, and the number of participants had dropped from 2,000 the year before to 1,000. Unlike today’s CVPR to choose in a certain convention center, that year’s specifications in a Crowne Plaza hotel is enough. The two ballrooms in the hotel are used as oral, the other three rooms on the ground floor are used to display posters, and the hotel is dotted with booths from industrial companies, with a few small tables stacked on top of each other.

For years, the CVPR’s schedule has been this way: Tuesdays to Thursdays are conference days, Mondays and Fridays, and even Saturdays, workshops and Tutorials.

Shi Jianping’s oral was scheduled at noon on Tuesday, but everything went well. The rest of the time, she uses poster. At that time, I didn’t receive many papers. I read more than 40 papers in an hour and a half, but since it was my first time here, she couldn’t understand most of them.

Screenshot of Shi Jianping’s oral video in CVPR 2011

Before 2011, even the top academic institutions in China, including Tsinghua University, Zhejiang University and institute of Automation of Chinese Academy of Sciences, were not able to invest in CVPR papers at that time. It is not because of the lack of research ability of domestic scholars, but because of the need for many “tips” to submit a paper on CVPR: is the topic appropriate? Is the English writing standard? How do you do the experiment? At that time, domestic academic institutions lacked returnees and the corresponding academic environment, which made them often unable to take the pulse of international conferences.

CVPR 2017 chairman, now at Oregon state university in the field of Li Fuxin, recalled 2008 years ago when Dr Automation at scene, laments, “at that time many students of English writing paper also is I change. Lack of professional academic training, don’t know how to handle all the details of design of experiment and essay writing. “

Li Fuxin also talked about one of the details of writing the paper, “When I first went abroad, the word get was changed to obtain, and” to do something “was changed to” in order to do something “, which looked the same. But it’s the difference between spoken and written.”

Another reason worth mentioning is that domestic academic institutions didn’t pay much attention to CVPR until the China Computer Association designated it as an A-level conference around 2010 (the exact date is impossible to check).

Microsoft Research Asia (Microsoft Research Asia) and multimedia Laboratory of the Chinese University of Hong Kong (CUHK laboratory) were the two major centers of computer vision in China at that time. The best paper of CVPR in 2009 was written by these two academic institutions, and the first author of the paper, He Kaiming, has since brought the residual network ResNet, for which he won the best paper of CVPR in 2016, which is already a story.

Shi jianping was lucky when Zhejiang University parachuted in a returnee scholar, Zhang Zhihua from the University of California, Berkeley. According to Shi, Zhang zhihua was a scholar who focused on pure scientific research and strongly recommended his students to read math books rather than do anything related to the project. Zhang later taught at Shanghai Jiao Tong University and then at Peking University, where he is now a professor at the School of Mathematical Sciences.

Zhang Zhihua led Shi Jianping to computer vision. Although the teacher who wrote the recommendation letter urged her to study a hot field, such as data mining, Shi chose the Chinese University of Hong Kong, which was then the center of China’s computer vision field, and joined professor Jia Jiayya in the computer vision department. The latter joined Tencent as an outstanding scientist in Youtu Lab in 2017.


Professor Tang Xiaoou’s outlook

In 2012, university of Toronto professor Geoffry Hinton, known as the “father of deep learning”, and his students led AlexNet to the top of the ImageNet ILSVRC challenge. The top5 error rate was 10 percent lower than the second place winner. This paper, included in NIPS 2012, is also considered a milestone that started the deep learning boom.

At the time, however, the Hinton group was not the only one applying deep learning to computer vision. In 2011, Ng, then at Stanford, joined forces with Jeff Dean and Greg Corrado at Google to create Project Google X, which uses 16,000 central processing unit cores, deep neural networks, Tell the system to identify cats just by watching lots of Youtube videos.

In China, a lab at the Chinese University of Hong Kong led by Professor Tang Xiaoou has been exploring the possibility of deep learning in face recognition since 2011.

Tat Wah Lam is now the director of the laboratory at the Chinese University of Hong Kong. After graduating from THE University of Science and Technology of China in 2005, Lam studied for his master’s degree in the laboratory of CUHK, where he had a relationship with Professor Tang. In 2007, he received a full scholarship to study for a PhD in computer science at MIT. In 2014, At the invitation of Professor Tang, Lam returned to CUHK to teach and became a founding member of Sensetime.

He recalled that the cuHK laboratory’s research on face recognition dates back to 2000. “When I started my master’s program in 2005, we used subspace analysis — a linear model — to do face recognition, and we had some success, but until deep learning, the performance level was not commercially viable.”

The breakthrough came in 2011, when Li Deng, a professor at Microsoft Asia Research and now chief artificial intelligence officer at US financial giant Citadel, pioneered the application of deep learning to speech recognition and achieved significant performance improvements. This makes Professor Tang, who also works at Microsoft Asia Research Institute, smell an opportunity to revolutionise deep learning in the field of vision.

However, the transition will not be easy. The academic world at that time was very skeptical of deep learning. Neural networks emerged as early as the 1980s, and were not reused for many years because they had no advantages in performance, and the black-box nature of neural networks made it difficult for researchers to understand the learning and decision-making process of networks.

In addition, he overturns his previous work to fully accept deep learning, spends a large amount of money to purchase GPU to establish parallel computing power cluster, and decides to completely independently develop the deep learning platform, which is a risk for Professor Tang. Many later scholars have described Tang as “a forward-thinking scholar whose exploration of deep learning laid the foundation for much of the work that followed.”

The huge investment soon paid off. From 2011 to 2013, Professor Tang’s team published a total of 14 deep learning papers at ICCV and CVPR, two top conferences in the field of computer vision, accounting for nearly half of the total number of deep learning papers (29) at these two conferences around the world.

In June 2014, Professor Tang led the Multimedia Lab of CUHK to publish the DeepID series of algorithms, which achieved 98.52% accuracy in face recognition, surpassing Facebook and breaking the human eye recognition capability for the first time in the world. This paper was also included in CVPR 2014.

While these achievements are far from enough, academics see the commercial possibilities of face recognition.

CVPR also fully embraced deep learning in 2014 and 2015. In CVPR 2016, according to incomplete statistics, nearly 60% of papers are related to deep learning, and nearly 100% of oral reports are from deep learning.

Shi jianping, then a PhD student at CUHK, saw the difference between deep learning and previous algorithms.” Deep learning has indeed achieved a lot of things. In the past, we might have adopted many different technical solutions, but everyone was still at the same level without a breakthrough, but deep learning directly raised the accuracy of many problems to a higher level.”

Sensing the commercial opportunity, Professor Tang founded Sensetime at the end of 2014 with his former disciple Wang Xiaogang and a group of lab staff from Hong Kong University. In the summer of 2015, Shi Jianping graduated with her doctor degree and joined Sensetime Research Institute following her “brothers and sisters”. Her big brother, xu Li, is now CEO of Sensetime.


From academia to industry

Stanford University professor Andr Ng, founder of Deeplearning.ai, has said that supervised learning is behind 99% of today’s AI applications. The visual field has also benefited first in the past few years from easily annotated two-dimensional image data, improved computing power and the evolution of deep learning algorithms.

From the moment you open your eyes, computer vision begins its day: identifying your face to unlock your phone, to logging into your bank account; Cameras at traffic lights keep an eye on jaywalkers and magnify their faces on street screens; Instead of showing your id card when entering an office building, look directly at the camera at the door for id verification…

Deep learning brings the potential for large-scale commercial use to the visual field, and it also gives CVPR a new look.

In the past, problems in the field of vision could not be applied to practical scenarios, and research was mostly confined to the laboratory. But now that the field of vision is so closely integrated with practice, new issues are emerging, and industry needs to use research to push its own business boundaries, naturally exporting research to international conferences like CVPR.

In recent years, the most visible contributors have been Chinese A.I. companies such as Sensetime. After 23 papers were selected for CVPR in 2017, Sensetime has contributed another 44 papers this year (including Sensetime, CuHK – Sensetime Joint Laboratory and other Sensetime joint Laboratory), 3 of which are oral (CVPR Oral acceptance rate is 1.88% this year). The content covers more than a dozen topics, including large-scale distributed training, human understanding and pedestrian re-recognition, autonomous driving scene understanding and analysis, underlying visual algorithms, comprehensive understanding of vision and natural language, object detection, recognition and tracking, deep generative models, video and behavior understanding, etc.

In addition, Tencent AI Lab has 21 papers selected, Alibaba 18, Tencent Youtu 10, megvii 8.

Shi Jianping thinks that sensetime does not encourage people to write papers. After all, as a commercial company, it still tries its best to do some things related to actual products and projects. The main reason for the large number of papers is the atmosphere here.

“More and more students to enter cuhk joint laboratory Thomson after Thomson or port is next to his classmates doing similar things, it will be easier to start. In the process of the actual product, we have a lot of ideas come out, this time you can to submit some experiments for students to do it, they will also be able to improve the skills quickly.”

From the best paper awarded by the Multimedia Lab of CUHK in 2009 to 44 papers selected by Sensetime in 2018, this is a consistent result.

“The input of resources, the emergence of new problems, certainly has a very positive driving effect on the development of this field,” Lin said.

“, of course, it also brings some problems, the research in the field of ‘tropism is more than ten years ago, everyone can follow some will soon fall to the ground, can effective immediately, but for some basic problems are the fundamental importance of relative decline. Including actually these top of these meetings in recent years has this aspect of the trend.”


“Always wanted to sponsor CVPR for a year”

After attending CVPR several times, Shi Jianping no longer has the same excitement, but she still has new pursuit.

“When I talked with my senior brothers and senior sisters, they all said they had a dream: after participating in CVPR for so many years, they always wanted to come to sponsor one year.”

Before 2015, even the company’s perennial sponsorship of CVPR was no more than a small booth outside of Poster. And, over the years, CVPR’s list of sponsors has always been the same American companies: Google, Microsoft, Amazon’s A9, IBM…

At the 2015 CVPR, an A.I. The company bought Platinum Sponsor that year, co-founded by Hou Xiaodi, CTO of today’s self-driving company Tucson Technologies, and had its badge printed on every conference attendee’s badge.

Sponsoring CVPR has the consideration of the industry, which reflects the rapid warming of computer vision in the industry and the thirst for talent. In China alone, more than 30 facial recognition companies were founded between 2011 and 2015, and more than 70 in 2016. Among them, there are both SENsetime as the representative of AI enterprises, and then also include Kuang Shi, Yitu, Yuncong, Yu Shi, Ge Ling Deep Tong, Code long technology and so on; BAT also has LABS in A.I. and computer vision: Alibaba has set up iDST, Tencent has set up Youtu and Baidu has set up an ARTIFICIAL intelligence lab.

According to IDC’s 2018 China Computer Vision Application Market Research (I) report released in May this year, by the end of December 2017, China’s computer vision application market size reached 1.545 billion YUAN, an increase of 184.0% compared to 2016. Among them, government, finance and Internet are the three industries with the largest computer vision technology expenditure. Safe city in the government industry and face authentication in the financial industry are the two largest technology spending scenarios.

At a time when talent was scarce at A.I., a conference like CVPR was like a big job fair. These scholars come from universities or the able person of industry, the core researcher that becomes manufacturer likely in the future.

Also in 2015, the newly established Sensetime became the highest-level sponsor of CVPR. For Shi jianping’s brothers and sisters in Shangtang, sponsoring CVPR is not only brand promotion, but also a wish fulfillment. Sensetime has been on the sponsor list every year since.

In 2016, CVPR ushered in its first ever Expo. Nearly 100 companies participated that year, and this year there are more than 115, bringing in $2 million in sponsorship revenue for the committee.

CVPR 2016, held at Caesars Palace in Las Vegas, attracted 3,500 people, an all-time high at the time. Held in the Octavius Ballroom at Caesars Palace, the show brought together nearly 100 companies. Platinum Sponsor’s booth is 20 by 20ft. Similar to industrial shows, companies set up screens to show demos or technical products.

In that year’s Sponsor list, besides Sensetime, we also saw more Chinese faces: Tucson Technology, DJI, Baidu, Didi… These companies have also become regular customers of CVPR in recent years.

Back then, sensetime’s booth was relatively simple: a poster on each side, three screens in the middle for a demo, and tables behind.

In 2018, sensetime’s CVPR booth, in addition to the red background as usual, has been completely different decoration: the booth is surrounded by four display walls, with a screen on both sides, used to display a kind of Sensetime technology products. With senseTime’s layout spanning a dozen areas, from SensePortrait, a facial-recognition system, to SenseAR, an augmented reality engine and platform, to SenseDrive, an in-car driver monitoring system, there are just too many demos to show.

Sensetime’s booth at CVPR 2018

It is a stroke of luck for computer vision scholars that these papers, which have appeared on a CVPR in the past few years, are now reproduced on CVPR as demos or even products.


Write at the end:

“I want to feel it once (CVPR).” This is the voice of many young computer vision students who have not experienced CVPR.

Lin still remembers the first year he took part in CVPR in 2005, when he also got an oral exam. “I prepared for two or three weeks and repeated it in front of the teacher. Oral time was also longer at that time, nearly 20 minutes.”

“When I was a student, I would always come to a meeting with curiosity, but now it is quite different.”

Shi jianping feels the same way. When she arrived in 2011, she was still a student and curious about everything. This year she brought several sensetime interns, some of them for the first time, and she looked a lot like herself.