While writing while updating the blog 😄 will be updated in the near future, you can collect first

Homework official link

Supporting course B station recording and broadcasting 2017 old edition 2020 new edition

1. setup

The official link above gives two experimental methods, the first is to use Google’s cloud server, the second is to run locally, I chose the second. Download a folder containing Jupyter Notebook and Starter Code, and run the PIP command to download some of the packages used. If there is an error in this step, please refer to the prompt. There may be conflicts between the dependencies of the package.

pip install -r requirements.txt
Copy the code

Next, execute the script to download the data set, which is small, over 100 megabytes.

cd cs231n/datasets
./get_datasets.sh
Copy the code

Run jupyter Notebook.

jupyter notebook
Copy the code

Select the corresponding guide.

2. KNN (20 points)

Just run through the Jupyter notebook

Set up the notebook first. This code can insert the generated image directly into the text, which is very convenient.

This introduces the data set and takes into account the multiple runs. It automatically cleans the previous data and finally prints the shape data. The training data has 50,000 pieces and the test data has 10,000 pieces. “32, 32, 3” is 32 * 32 pixels * the RGB value of each pixel (3). Labels are a one-dimensional array pointing to a species, pig, dog, cow, sheep, etc.

Simple visualization of data sets.

Narrowing the data set down to tens of thousands of images on a laptop is still a struggle

K_nearest_neighbor. py code completion

KNN’s “training” is simply saving training sets.

Compute_distances_two_loops () requires completion. The input is test set and the output is a two-digit array. Dists [I][j] stands for euclidian distance between test set[I] and training set[j].

Instantiate a KNearestNeighbor, train and print the shape test function completed above dists. 500 by 5,000 printed out is correct

Visualize this array, I can’t see it, run it locally.

Is a 500*5000 array, as mentioned below, the larger the Euclidean distance, the lighter the color.

Let’s zoom in.

You can see that the visualization is just a bunch of bright and dark points, and you can see a lot of bright lines. Here are the questions:

Inline Question 1

Notice the structured patterns in the distance matrix, where some rows or columns are visible brighter. (Note that with the default color scheme black indicates low distances while white indicates high distances.)

  • What in the data is the cause behind the distinctly bright rows?
  • What causes the columns?

1. The horizontal bright line indicates that the Euclidean distance between the test data and all train data is very large, that is, it is different from all train data.

2. The bright vertical lines indicate that the Euclidean distance between training data and all test data is very large, which is different from all test data.

At the beginning, I did not run the code on the Jupyter notebook. I ran the code locally. I encountered the following two pits for reference. If we were running on Jupyter, it would be fine.

Pit point 1:

python knn.py Traceback (most recent call last): File "knn.py", line 5, in <module> from cs231n.data_utils import load_CIFAR10 File "/Users/cherish/Documents/assignment1/cs231n/data_utils.py",  line 7, in <module> from imageio import imread ImportError: No module named imageioCopy the code

The local environment has both PYTHON 2.7 and Python 3.7. The former corresponds to Python, and the latter corresponds to PYTHon3. Instead of

python3 knn.py
Copy the code

Problem solved.

Pit point 2:

Python3 KNN. py Training Data Shape: (50000, 32, 32, 3) Training labels Shape: (50000,) Test data shape: (10000, 32, 32, 3) Test labels shape: (10000,) [1] 49571 segmentation fault python3 knn.pyCopy the code

There is a semgent fault in plt.show(). The solution is to add two lines

Use ('TkAgg') # Import matplotlib.pyplot as PLT #Copy the code