This article has participated in the activity of “New person creation Ceremony”, and started the road of digging gold creation together.

1. Introduction

In the beginning of deep learning, it was inevitable to use some public data sets, but now I have time to document how to quickly download some classic data sets. Learning from official documents is one of the most popular ways to learn, so this blog will start with official documents.

Because I work on CV, I use the TorchVision library as an example. From the official website: This library is part of the [PyTorch](http://pytorch.org/) project. PyTorch is an open source machine learning framework.

The [torchvision] package consists of popular datasets, model architectures, and common image transformations for computer vision.

Including many popular data sets, such as CIFAR, COCO and MINST, we should be familiar with.I will document my process in a moment, using CIFAR as an example.

2. What about official documents

  1. First let’s look at the documentation for the CIFAR class:

    Parameters:

    Root: indicates the directory to store the downloaded dataset

    root (string): Root directory of dataset where directory ``cifar-10-batches-py`` exists or will be saved to if download is set to True.
    Copy the code

    Train: Whether it is a training data set

    train (bool, optional): If True, creates dataset from training set, otherwise creates from test set.
    Copy the code

    Transform: A function that preprocesses an image and returns a transform

    A function/transform that takes in an PIL image and returns a transformed version.
    Copy the code

    Download: Whether to download the data set.

    download (bool, optional):If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
    Copy the code

3. Start coding

  1. The sample code

    # Import torchVision package
    import torchvision
    
    # Function for data processing of the original image
    dataset_transform = torchvision.transforms.Compose([
        torchvision.transforms.ToTensor()
    ])
    
    Generate training and test datasets
    The training dataset is stored in the dataset folder of the root directory as the training dataset and downloaded
    train_set = torchvision.datasets.CIFAR10(root="./dataset", train=True, transform=dataset_transform, download=True)
    The test dataset is stored in the root directory of the dataset folder, not as a training dataset, and downloaded
    test_set = torchvision.datasets.CIFAR10(root="./dataset", train=False, transform=dataset_transform, download=True)
    
    print(test_set[0])
    Copy the code
  2. And then we right click to run and download

    You can see that the data set has started to download, but it’s slow because it’s downloaded from toronto.edu. Here’s a faster way: we abort the run, copy the link, use thunderbolt download, and we’ll be done soon. Then unzip the downloaded. Gz file and put it in the dataset directory:

  3. Run again, and you can use the data set as normal.

4. How to visualize

I visualized it with TensorBoard, so if you’re interested you can explore the library tensorboard.

import torchvision
from torch.utils.tensorboard import SummaryWriter
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

dataset_transform = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor()
])

# return type
train_set = torchvision.datasets.CIFAR10(root="./dataset", train=True, transform=dataset_transform, download=True)
test_set = torchvision.datasets.CIFAR10(root="./dataset", train=False, transform=dataset_transform, download=True)

print(test_set[0])
writer = SummaryWriter("p10")
for i in range(10):
    img, target = test_set[i]
    writer.add_image("test_set", img, i)

writer.close()
Copy the code

You can see the image in your browser:

Problem: SSL. SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)

If you encounter the same problem in the download, you need to import SSL:

import ssl
ssl._create_default_https_context = ssl._create_unverified_context
Copy the code

The last word: writing is not easy, if you like or help remember to like + follow or favorites oh ~