So far, this is all you need to know about the autonomous driving dataset. This is the first in a series of eight shared autonomous driving datasets:

“Highlight this issue”

  • Tsinghua University has launched the world’s first vehicle-road collaborative autonomous driving research data set

  • The Nexar video dataset covers more than 1,400 cities in more than 70 countries

  • Pedestrian target detection data set overview: KAIST, ETH, Daimler, Tinghua-Daimler, Caltech, NightOwls, ECP

  • Nighttime images are also clearly visible: Kaist pedestrian data set, FLIR thermal imaging data set, University of Tokyo infrared data set

“Eight Series Overview”

Autonomous driving data set sharing is a new sharing series launched by Integer Intelligence. In this series, we will introduce all the open autonomous driving data sets launched by various scientific research institutions and enterprises so far. The dataset is mainly divided into eight series:

  • Series 1: Target detection dataset 🔗

  • Series two: Semantically segmented data sets

  • Series 3: Lane line detection data set

  • Series 4: Optical flow data sets

  • Series 5: Stereo Dataset

  • Series 6: Location and mapping data sets

  • Series 7: Driving behavior data sets

  • Series 8: Simulation data sets

This article is the second in a series of three object detection datasets.

There are 15 data sets as follows:

01 “DAIR-V2X Dataset”

  • Published by: Intelligent Industry Research Institute of Tsinghua University (AIR), Beijing High-level Autonomous Driving Demonstration Zone, Beijing Che Technology Development Co., LTD., Baidu Apollo, Beijing Zhiyuan Artificial Intelligence Research Institute

  • Download address:

    thudair.baai.ac.cn/cooptest

  • Release date: 2022

  • Dair-v2x data set is the world’s first large-scale, multi-modal and multi-perspective data set for vehicle-road collaborative autonomous driving research. All data are collected from real scenes, including 2D and 3D annotations

  • Characteristics of the

    • A total of 71,254 frames of image data and 71,254 frames of point cloud data

      • Dair-v2x Collaborative dataset (DAIR-V2X-C) contains 38845 frames of image data and 38845 frames of point cloud data

      • Dair-v2x terminal data set (DAIR-V2X-I), containing 10084 frames of image data and 10084 frames of point cloud data

      • Dair-v2x terminal data set (DAIR-V2X-V), containing 22325 frames of image data and 22325 frames of point cloud data

    • It is the first time to realize the spatio-temporal simultaneous annotation of vehicle and road cooperation

    • Sensor types are rich, including vehicle end camera, vehicle end LiDAR, road end camera and road end LiDAR sensors

    • The 3D annotation attributes of obstacle targets are comprehensive, and common obstacle targets of 10 types of roads are marked

    • The data were collected from 10 kilometers of urban roads, 10 kilometers of expressways and 28 intersections in Beijing high-level autonomous driving demonstration zone

    • The data covers various scenarios such as sunny/rainy/foggy days, day/night, urban roads/highways, etc

    • Complete data, including desensitization of the original image and point cloud data, annotation data, time stamps, calibration files, etc

    • Training and validation sets have been released, and test sets will be released with subsequent Challenge activities

02 “Argoverse”

  • Published by Argo AI, Carnegie Mellon University, Georgia Institute of Technology

  • Download address:

    www.argoverse.org/av1.html

  • Thesis Address:

    Arxiv.org/pdf/1911.02…

  • Release date: 2019

  • Description: The Argoverse data set contains 3D Tracking and Motion Forecasting. The Argoverse dataset is a bit different from Waymo in that, while it also contains lidar and camera data, it covers only 113 scenes recorded in Miami and Pittsburgh. What makes it special is that it is the first dataset to contain high-resolution map data

  • Characteristics of the

    • The first dataset to contain hd map data: 290 km of lane maps for Pittsburgh and Miami with information such as location, connections, traffic signals, elevation, etc

    • Sensors: 2 Lidar, 7 high resolution ring cameras (1920 × 1200), 2 stereo cameras (2056 × 2464)

    • Argoverse 3D tracking

    • Contains 3D trace annotations for 113 scenes, each 15-30 seconds long, and a total of 11,052 trace objects

    • Mark objects within 5 meters with 15 labels

    • Vehicles accounted for 70 percent, and pedestrians, bicycles and motorcycles accounted for 30 percent

    • Argoverse Motion Forecasting

    • From 1,006 hours of driving recorded in Miami and Pittsburgh, a total of 320 hours

    • Contains 324,557 scenes, each 5 seconds, and contains a 2D aerial view of each tracking object sampled at 10 Hz

03 “KAIST Multispectral Pedestrian”

  • Publisher: Korea Advanced Institute of Science and Technology

  • Download address 1:

    Sites.google.com/site/pedest…

  • Download address 2:

    Sites.google.com/site/pedest…

  • Thesis Address:

    Openaccess.thecvf.com/content\_cv…

  • Release date: 2015

  • Summary: This dataset is a multispectral pedestrian detection dataset that provides color-thermal image pairs for day and night. The data set is complementary to the advantages of color image and thermal imaging to improve the accuracy of pedestrian detection and overcome the problems of pedestrian occlusion, background confusion and unclear nighttime imaging in previous pedestrian detection data

  • Characteristics of the

    • At the same time, 95,328 pairs of day and night color-thermal images are provided, and the images are aligned by beam splitter processing to eliminate image parallax

    • The location of data acquisition was Seoul, South Korea, and the image resolution was 640×480

    • 103,128 manual 2D box markings, 1,182 pedestrians

    • Four different types of annotations: Person, people (unclear portrait), cyclist, Person? (Not sure if it is a pedestrian)

    • Acquisition equipment: acquisition equipment including thermal imager, RGB camera, beam splitter, etc

04 “ETH Pedestrian”

  • Published by ETH Zurich

  • Download address:

    Icu. Ee. Ethz. Ch/research/da…

  • Thesis Address:

    www.vision.rwth-aachen.de/media/paper…

  • Release date: 2009

  • Description: ETH is a pedestrian detection data set that uses a camera to capture a total of three video clips with only one pedestrian tag in the data set

  • Characteristics of the

    • The test set contains 3 video clips, a total of 4800 frames, a frame rate of 15, a total of 1894 labels

    • 2.5D annotation was used to annotate extracted frames once every four frames

    • Collected from a crowded neighborhood in Zurich, Switzerland

    • Use a video camera for filming

05 “Daimler Pedestrian”

  • Publisher: Daimler AG

  • Download address:

    www.lookingatpeople.com/download-da…

  • Thesis Address:

    gavrila.net/pami09.pdf

  • Release date: 2008

  • Size: 8.5 GB

  • Introduction: Daimler Pedestrian detection data set is a pedestrian detection data set collected in the urban environment, which is collected in the daytime. The data set is divided into two parts: training set and test set. The training set includes pedestrian images and images without pedestrians

  • Characteristics of the

    • 27 minutes of video footage

    • There were 15,560 images of pedestrians (with a resolution of 48×96 after shearing) and 6,744 images without pedestrians

    • 21,790 images (640×480 resolution), 56,492 2D manual annotations

    • The video was captured from a camera in a moving vehicle, all on daytime urban roads

06 “Tsinghua-Daimler Cyclist”

  • Published by: Daimler AG and Tsinghua University

  • Download address:

    www.lookingatpeople.com/download-ts…

  • Thesis Address:

    www.gavrila.net/Publication…

  • Release date: 2016

  • Introduction: This data set aims to enrich the data of cyclists and improve the accuracy of the automatic driving algorithm for the detection of cyclists. Before this, no data set specifically for the detection of cyclists has been launched

  • Characteristics of the

    • Nearly 6 hours of video data with a resolution of 2048×1024

    • 14,674 frames with labeled data and 32,361 labeled objects, including cyclists, pedestrians and other cyclists

    • The dataset is divided into partial annotated dataset and full annotated dataset. Part of the annotation data set only includes complete and clear cyclists, while the latter includes pedestrians, bicycles, tricycles, wheelchairs, motorcycles and other cyclists

    • Vehicle-mounted stereo cameras were used for collection in Haidian and Chaoyang districts of Beijing

07 “Caltech Data Set”

  • California Institute of Technology

  • Download address:

    www.vision.caltech.edu/Image\_Data…

  • Thesis Address:

    www.vision.caltech.edu/Image\_Data…

  • Release date: 2009

  • Introduction: Caltech pedestrian Data sets are collected from urban roads in Los Angeles, and video data is collected from vehicle mounted cameras

  • Characteristics of the

    • Contains nearly 10 hours of 640×480 30Hz data set

    • The data set is divided into training set and test set. The training set is divided into 6 subsets, and the test set is divided into 4 subsets, each of which is about 1GB in size

    • Contains approximately 250,000 frames of pedestrian labeling data, 350,000 2D frames, and 2,300 pedestrians for a total time of approximately 137 minutes

    • Distinguish between visible and invisible parts of marked pedestrians

    • The video was captured in six heavily traveled Los Angeles boroughs: LAX, Santa Monica, Hollywood, Pasadena, and Little Tokyo

08 “NightOwls”

  • Oxford Visual Geometry Group

  • Download address:

    www.nightowls-dataset.org/download/

  • Thesis Address:

    www.robots.ox.ac.uk/~vgg/public…

  • Release date: 2018

  • Overview: The NightOwls dataset mainly provides pedestrian data at night. Pedestrian detection at night is more challenging because of poor illumination and variations in reflection, blur and contrast compared to daytime

  • Characteristics of the

    • 279,000 frames of data, image resolution 1024 x 640, frame rate 15

    • All frames are labeled with 2D boxes, with tracking information, and contain 42,273 pedestrians

    • The big idea, Bicycledriver, motorist driver, and Ignore areas

    • Four categories of tag attributes: Pose, Difficulty, Occlusion, and Truncation

    • Diversity: covering three countries (Germany, UK, Netherlands), spring, summer, autumn and winter, dawn and night, different weather conditions such as rain and snow

09 “EuroCity Persons Dataset”

  • Published by Delft University of Technology (TU Delft)

  • Download address:

    Eurocity – dataset. Tudelft. Nl/eval/user/l…

  • Thesis Address:

    Arxiv.org/pdf/1805.07…

  • Release date: 2018

  • ECP is a diverse set of pedestrian detection data collected by on-board cameras in various European countries

  • Characteristics of the

    • Large diverse dataset: covering 4 seasons, 12 countries, 31 cities, 47,300 images, 238,200 people

    • Labels are divided into pedestrians and cyclists, who are further divided into bicycles, strollers, motorcycles, scooters, tricycles, wheelchairs and so on

    • The annotation of cyclists is divided into two parts: the annotation of people and the annotation of cycling tools

    • In addition to 2D frames, annotation information also includes location information

10 “Urban Object Detection”

  • Published by: The Robotics and Tridimensional Vision Group (RoViT, University of Alicante)

  • Download address:

    www.rovit.ua.es/dataset/tra…

  • Thesis Address:

    www.mdpi.com/2079-9292/7…

  • Release date: 2018

  • Summary: The data in this dataset is derived from existing datasets such as PASCAL VOC, UDacity, And Sweden, while a portion of the data (around 1%) is collected by hd cameras mounted on the vehicle. The dataset adds label categories to the public datasets. Some of the data are weakly labeled data, which can be used to test weakly supervised learning techniques

  • Characteristics of the

    • The data set is divided into two parts: Traffic Objects and Traffic Signs

    • The traffic Objects dataset is annotated in 2D, including cars, motorcycles, people, signal lights, buses, bicycles and traffic signs

    • Traffic Signs contain 43 common traffic signs on European streets, according to DATA from GTSRB and Sweden

    • It contains 12,000 traffic signs

11 “Road Damage Dataset 2018-2020”

  • Published by: University of Tokyo

  • Download address:

    Github.com/sekilab/Roa…

  • Thesis Address:

    Arxiv.org/abs/1801.09…

    www.sciencedirect.com/science/art…

  • Release time: 2018-2020

  • Introduction to the

    • Road Damage Dataset 2018: This Dataset is the first to collect a large-scale Road Damage Dataset, covering a total of more than 40 hours of data from seven Japanese cities. Comprised of 9,053 images of road diseases taken by smartphones mounted on cars, these road images contain 15,435 instances of road diseases covering 8 disease types. In each image, the location and type of road damage are marked

    • Road Damage Dataset 2020: This Dataset contains 26,336 Road images from India, Japan and the Czech Republic, with more than 31,000 instances of Road Damage, captured using in-car smartphones. The data set collected four types of road damage: longitudinal cracks, transverse cracks, crocodile cracks, and potholes

12 “FLIR Thermal Sensing”

  • Posted by Teledyne FLIR

  • Download address:

    www.flir.eu/oem/adas/ad…

  • Release date: 2018

  • Size: 17 gb

  • Summary: The ability to sense thermal infrared radiation or heat provides complementary and unique advantages to existing sensor technologies such as visible light cameras, lidar and radar systems. The data set provides thermal images through thermal sensing technology to detect and distinguish pedestrians, cyclists, animals and motor vehicles under challenging weather conditions such as complete darkness, smog, inhospitable weather and glare, facilitating the development of visible light + Thermal sensor fusion algorithm (” RGBT “)

  • Characteristics of the

    • 26,442 fully labeled frames, 520,000 2D boxes, 9,711 thermal images and 9,233 RGB images

    • 15 tag categories: Pedestrian, bicycle, car, motorcycle, bus, train, truck, traffic light, fire hydrant, etc

    • Thermal Imager Specifications: Teledyne FLIR Tau 2 640×512, 13mm F /1.0

13 “TuSimple Lane Line Detection Dataset”

  • Publisher: Tusimple

  • Download address:

    Github.com/TuSimple/tu…

  • Release date: 2017

  • Summary: Tusimple held a lane detection competition using camera image data, and made some of the data and labeling information public

  • Characteristics of the

    • 7,000 1-second video clips, 20 frames each

    • Environmental characteristics: daytime, fair to moderate weather, highway

    • The training set contains 3626 video clips, and the test set contains 2782 video clips

    • Using line annotation, each line is actually a coordinate set of point sequence, rather than a region set

14 “NEXET”

  • Publisher: Nexar

  • Download address:

    www.kaggle.com/solesensei/…

  • Release date: 2017

  • Size: 11 g

  • Nexar contains rich and diverse road data, captured using dashcams and cell phone cameras, and is the largest national and city autonomous driving dataset to date

  • Characteristics of the

    • Over 2.5 million hours of video, 50,000 training sets with 2D box-labeled images, and a test set containing 41,190 images

    • Diversity: covering more than 1,400 cities in 77 countries, three lighting conditions (day, night, dusk), four seasons, multiple road conditions (urban, rural, highway, residential and even desert roads), multiple weather conditions (sunny, fog, rain, snow)

    • Labeling: 2D labeling is adopted, and THE 2D frame does not completely fit the vehicle

15 “Multi-Spectral Object Detection”

  • Published by: University of Tokyo

  • Download address:

    Drive.google.com/drive/folde…

  • Thesis Address:

    Dl.acm.org/doi/pdf/10….

  • Release date: 2017

  • Size: 6.85 GB

  • Description: The dataset consists of RGB, near-infrared, mid-infrared and far-infrared images taken in a campus environment

  • Characteristics of the

    • 7,512 images, 3,740 daytime images and 3,772 nighttime images

    • Capture: Images were taken by RGB, near-infrared, mid-infrared and far-infrared cameras mounted on a cart to simulate driving conditions

    • Environment: University campus in Tokyo, including day and night data

    • Label: Contains 2D boxes and labels for obstacles such as pedestrians, bicycles, and vehicles

“Contact us”

Integer intelligence hope that through professional ability in the field of data processing, in the next three years, assigned to more than 1000 + AI enterprises, become the enterprise of “partners” data, so we are looking forward to and you are reading this article, with further communication, welcome to contact us, together to explore more cooperation possibilities, our contact information is as follows:

Contact person: Mr. Qi

Telephone: 13456872274

More details can be found at www.molardata.com