MIT has released a new large dataset, ADE20K: for a variety of tasks, including scene awareness and semantic understanding

MIT has released ADE20K, a dataset for scene awareness, parsing, segmentation, multi-object recognition, and semantic understanding. The size of the entire dataset (including all images and segmentation) is 3.8GB. MIT gives an overview of the data in terms of download, description, browsing, and evaluation. Heart of the machine has compiled the original text, the data set download address and link to the original text please see the article.

The address of the project: http://groups.csail.mit.edu/vision/datasets/ADE20K/



Data set download page

describe

Images and annotations

Each folder contains images categorized by scene category. For each image, object and part segmentation are stored as two different PNG files. All images and part examples are annotated separately.

browse

The annotated images cover the range of scenarios in the SUN and Places datasets. Here are some examples showing image segmentation, target segmentation, and part segmentation. You can also view other images through the ADE20K browser.

The visualization below gives a list of the number of targets, widgets, and annotation examples. The tree shows only targets with more than 250 sample annotations and parts with more than 10 sample annotations.

Some categories can be both targets and components. For example, a “door” can be an object (in an interior picture) or a part (when it is a door to a car). Some objects are often parts (a leg, a hand), although in some cases they appear to be independent of the whole (a car wheel in a garage, for example); Some objects are never parts (a person, a truck, etc.). Depending on the object to which the part belongs, the same name category (such as door) can correspond to several visual categories. For example, a car door is visually different from a cabinet door. However, they also share some affordances. The value of proportionClassIsPart(c) can be used to determine whether a classification is primarily an object or a component. When the target is not a part of another target, its segmentation mask appears in * _seg.png. If the classification is a part, the partition mask will appear in * _seg_parts.png. Correctly detecting a target requires distinguishing whether the target appears to be an independent target or a part of another target.

assessment

Evaluate your algorithm using validation sets. You can use assessment kits for scenario resolution challenges.

Data set bias

In the training set:

  • The median aspect ratio of the image is 4/3.
  • The median image size is 307200 pixels. The average image size is 1.3m pixels.
  • The pattern of object segmentation is shown below, consisting of four objects (from top to bottom) : sky, wall, building, and floor.

  • The mode of part segmentation consists of two categories: Windows and doors.

In the test set:

  • When simply using patterns to segment images, it achieved an average of 20.3% of the pixels of each image in the validation set.
  • In The validation set, IoU (The Intersection over Union) classifies The four categories represented in The segmentation mode as follows:



Annotation noise analysis

To analyze the coherence of the annotations, we took a subset of 64 randomly selected images from the validation set and asked them to be annotated again. Twenty of these images are annotated by two external annotators. We would expect some difference between the two comments, even if the task is being done by the same person. Typically 82% of pixels get the same annotation. The following figure shows one image and two segmentation done by the same annotator.

Scene recognition
The data set
MIT