The goal of the numpy exercises is to serve as a reference as well as to get you to apply numpy beyond the basics. The questions are of 4 levels of difficulties with L1 being the easiest to L4 being the hardest.

If you want a quick refresher on numpy, the numpy basics and the advanced numpy tutorials might be what you are looking for.

1. Import numpy as np and see the version

Difficulty Level: L1

Q. Import numpy as `np` and print the version number.

Import numpy as NP print(np.__version__) #> 1.13.3Copy the code

You must import numpy as np for the rest of the codes in this exercise to work.

To install numpy its recommended to use the installation provided by anaconda.

2. How to create a 1D array?

Difficulty Level: L1

Q. Create a 1D array of numbers from 0 to 9

Desired output:

#> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Copy the code
arr = np.arange(10)
arr
#> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Copy the code

3. How to create a boolean array?

Difficulty Level: L1

Q. Create a 3 x 3 numpy array of all True’s

np.full((3, 3), True, dtype=bool) #> array([[ True, True, True], #> [ True, True, True], #> [ True, True, True]], Dtype =bool) # Alternate method: np.ones((3,3), dtype=bool)Copy the code

4. How to extract items that satisfy a given condition from 1D array?

Difficulty Level: L1

Q. Extract all odd numbers from arr

Input:

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])`
Copy the code

Desired output:

#> array([1, 3, 5, 7, 9])
Copy the code
# Input
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Solution
arr[arr % 2 == 1]
#> array([1, 3, 5, 7, 9])
Copy the code

5. How to replace items that satisfy a condition with another value in numpy array?

Difficulty Level: L1

Q. Replace all odd numbers in arr with -1

Input:

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Copy the code

Desired Output:

#> array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])
Copy the code
arr[arr % 2 == 1] = -1
arr
#> array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])
Copy the code

6. How to replace items that satisfy a condition without affecting the original array?

Difficulty Level: L2

Q. Replace all odd numbers in arr with -1 without changing arr

Input:

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Copy the code

Desired Output:

out
#> array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])
arr
#> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Copy the code
arr = np.arange(10)
out = np.where(arr % 2 == 1, -1, arr)
print(arr)
out
#> [0 1 2 3 4 5 6 7 8 9]
array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])
Copy the code

7. How to reshape an array?

Difficulty Level: L1

Q. Convert a 1D array to a 2D array with 2 rows

Input:

np.arange(10)

#> array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])
Copy the code

Desired Output:

#> array([[0, 1, 2, 3, 4],
#>        [5, 6, 7, 8, 9]])
Copy the code
arr = np.arange(10) arr.reshape(2, -1) # Setting to -1 automatically decides the number of cols #> array([[0, 1, 2, 3, 4], #> [5, 6, 7, 8, 9]]Copy the code

8. How to stack two arrays vertically?

Difficulty Level: L2

Q. Stack arrays a and b vertically

Input

a = np.arange(10).reshape(2,-1)
b = np.repeat(1, 10).reshape(2,-1)
Copy the code

Desired Output:

#> array([[0, 1, 2, 3, 4],
#>        [5, 6, 7, 8, 9],
#>        [1, 1, 1, 1, 1],
#>        [1, 1, 1, 1, 1]])`
Copy the code
a = np.arange(10).reshape(2,-1)
b = np.repeat(1, 10).reshape(2,-1)

# Answers
# Method 1:
np.concatenate([a, b], axis=0)

# Method 2:
np.vstack([a, b])

# Method 3:
np.r_[a, b]
#> array([[0, 1, 2, 3, 4],
#>        [5, 6, 7, 8, 9],
#>        [1, 1, 1, 1, 1],
#>        [1, 1, 1, 1, 1]])
Copy the code

9. How to stack two arrays horizontally?

Difficulty Level: L2

Q. Stack the arrays a and b horizontally.

Input

a = np.arange(10).reshape(2,-1)

b = np.repeat(1, 10).reshape(2,-1)
Copy the code

Desired Output:

#> array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
#>        [5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])
Copy the code
a = np.arange(10).reshape(2,-1)
b = np.repeat(1, 10).reshape(2,-1)

# Answers
# Method 1:
np.concatenate([a, b], axis=1)

# Method 2:
np.hstack([a, b])

# Method 3:
np.c_[a, b]
#> array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
#>        [5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])
Copy the code

10. How to generate custom sequences in numpy without hardcoding?

Difficulty Level: L2

Q. Create the following pattern without hardcoding. Use only numpy functions and the below input array a.

Input:

A = np. Array ([1, 2, 3]) `Copy the code

Desired Output:

#> array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])
Copy the code
np.r_[np.repeat(a, 3), np.tile(a, 3)]
#> array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])
Copy the code

11. How to get the common items between two python numpy arrays?

Difficulty Level: L2

Q. Get the common items between a and b

Input:

A = np. Array (,2,3,2,3,4,3,4,5,6 [1]) b = np. Array (,2,10,2,7,4,9,4,9,8 [7])Copy the code

Desired Output:

array([2, 4])Copy the code
A = np. Array (,2,3,2,3,4,3,4,5,6 [1]) b = np. Array (,2,10,2,7,4,9,4,9,8 [7]) np. Intersect1d (a, b) # > array ([2, 4])Copy the code

12. How to remove from one array those items that exist in another?

Difficulty Level: L2

Q. From array a remove all items present in array b

Input:

A = np. Array ([1, 2, 3, 4, 5]) b = np. Array (,6,7,8,9 [5])Copy the code

Desired Output:

Array ([1, 2, 3, 4])Copy the code
A = np. Array ([1, 2, 3, 4, 5]) b = np. Array (,6,7,8,9 [5]) # From 'a'. Remove all of the 'b' np setdiff1d (a, b) # > array ([1, 2, 3, 4])Copy the code

13. How to get the positions where elements of two arrays match?

Difficulty Level: L2

Q. Get the positions where elements of a and b match

Input:

A = np. Array (,2,3,2,3,4,3,4,5,6 [1]) b = np. Array (,2,10,2,7,4,9,4,9,8 [7])Copy the code

Desired Output:

#> (array([1, 3, 5, 7]),)
Copy the code
A = np. Array (,2,3,2,3,4,3,4,5,6 [1]) b = np. Array (,2,10,2,7,4,9,4,9,8 [7]) np. Where # (a = = b) > (array ([1, 3, 5, 7)),)Copy the code

14. How to extract all numbers between a given range from a numpy array?

Difficulty Level: L2

Q. Get all items between 5 and 10 from a.

Input:

a = np.arange(15)Copy the code

Desired Output:

(array([ 5,  6,  7,  8,  9, 10]),)Copy the code
a = np.arange(15)

# Method 1
index = np.where((a >= 5) & (a <= 10))
a[index]

# Method 2:
index = np.where(np.logical_and(a>=5, a<=10))
a[index]
#> (array([ 5,  6,  7,  8,  9, 10]),)
Copy the code

# Method 3: (thanks loganzk!)

a[(a >= 5) & (a <= 10)] [/expand]

15. How to make a python function that handles scalars to work on numpy arrays?

Difficulty Level: L2

Q. Convert the function maxx that works on two scalars, to work on two arrays.

Input:

def maxx(x, y):
    """Get the maximum of two items"""
    if x >= y:
        return x
    else:
        return y


maxx(1, 5)
#> 5
Copy the code

Desired Output:

a = np.array([5, 7, 9, 8, 6, 4, 5]) b = np.array([6, 3, 4, 8, 9, 7, 1]) pair_max(a, b) #> array([ 6., 7., 9., 8., 9., 7, 5.)Copy the code
def maxx(x, y):
    """Get the maximum of two items"""
    if x >= y:
        return x
    else:
        return y

pair_max = np.vectorize(maxx, otypes=[float])

a = np.array([5, 7, 9, 8, 6, 4, 5])
b = np.array([6, 3, 4, 8, 9, 7, 1])

pair_max(a, b)
#> array([ 6.,  7.,  9.,  8.,  9.,  7.,  5.])
Copy the code

16. How to swap two columns in a 2d numpy array?

Difficulty Level: L2

Q. Swap columns 1 and 2 in the array arr.

Arr = np. Arange (9). Reshape arr (3, 3)Copy the code
# Input arr = np. Arange (9). Reshape arr # (3, 3) Solution arr [:,0,2 [1]] # > array ([[1, 0, 2), # > [4, 3, 5], # > [7, 6, 8]])Copy the code

17. How to swap two rows in a 2d numpy array?

Difficulty Level: L2

Q. Swap rows 1 and 2 in the array arr:

Arr = np. Arange (9). Reshape arr (3, 3)Copy the code
# Input arr = np. Arange (9). Reshape # (3, 3) Solution arr [,0,2 [1], :] # > array ([[3, 4, 5], # > [0, 1, 2], # > [6, 7, 8]])Copy the code

18. How to reverse the rows of a 2D array?

Difficulty Level: L2

Q. Reverse the rows of a 2D array arr.

# Input arr = np.arange(9). 0Copy the code
0 # Input ARr = np.Arange (9). 0 # Solution ArR [::-1]Copy the code
array([[6, 7, 8],
       [3, 4, 5],
       [0, 1, 2]])
Copy the code

19. How to reverse the columns of a 2D array?

Difficulty Level: L2

Q. Reverse the columns of a 2D array arr.

# Input arr = np.arange(9). 0Copy the code
# Input arr = np. Arange (9). Reshape # (3, 3) Solution arr [: : : 1] # > array ([[2, 1, 0], # > [5, 4, 3], # > [8, 7, 6]])Copy the code

20. How to create a 2D array containing random floats between 5 and 10?

Difficulty Level: L2

Q. Create a 2D array of shape 5×3 to contain random decimal numbers between 5 and 10.

0 # Input arr = np.Arange (9). 0 # Solution Method 1: 0 rand_arr = np.random.randint(low=5, high=10, Size =(5,3)) + np.random. Random ((5,3)) # print(rand_arr) # Solution Method 2: Rand_arr = np. Random. Uniform (5, 10, Size =(5,3)) print(rand_arr) #> [[8.50061025 9.10531502 6.85867783] #> [9.76262069 9.87717411 7.13466701] #> [ 7.48966403 8.33409158 6.16808631] #> [7.75010551 9.94535696 5.27373226] #> [8.0850361 5.56165518 7.31244004]]Copy the code

21. How to print only 3 decimal places in python numpy array?

Difficulty Level: L1

Q. Print or show only 3 decimal places of the numpy array rand_arr.

Input:

Rand_arr = np. Random. The random () (5, 3)Copy the code
Random ((5,3)) # Create the random array rand_arr = np.random. Random ([5,3]) # Limit to 3 Decimal Places Np.set_printoptions (precision=3) rand_arr[:4] #> array([[0.443, 0.109, 0.97], #> [0.388, 0.447, 0.191], # > [0.891, 0.474, 0.212], # > [0.609, 0.518, 0.403]])Copy the code

22. How to pretty print a numpy array by suppressing the scientific notation (like 1e10)?

Difficulty Level: L1

Q. Pretty print rand_arr by suppressing the scientific notation (like 1e10)

Input:

Seed (100) rand_arr = np.random.random([3,3])/1e3 rand_arr #> array([[5.434049e-04, 2.783694E-04, 4.245176E-04], #> [8.447761E-04, 4.718856e-06, 1.215691E-04], #> [6.707491E-04, 8.258528e-04, 1.367066 e-04]])Copy the code

Desired Output:

#> Array ([[0.000543, 0.000278, 0.000425], #> [0.0045, 0.000005, 0.000122], #> [0.000671, 0.0026, 0.000137]])Copy the code
# Reset printoptions to default np.set_printoptions(suppress=False) # Create the random array np.random.seed(100) Rand_arr = np.random. Random ([3,3])/1e3 rand_arr #> array([[5.434049e-04, 2.783694e-04, 4.245176e-04], #> [8.447761E-04, 4.718856E-06, 1.215691E-04], #> [6.707491E-04, 8.258528E-04, 1.367066e-04]])Copy the code
Np. set_printoptions(suppress=True, precision=6) # is optional rand_arr #> array([[0.000543, 0.000278, 0.000425], #> [0.000.0045, 0.000005, 0.000122], #> [0.000671, 0.0026, 0.000137]])Copy the code

23. How to limit the number of items printed in output of numpy array?

Difficulty Level: L1

Q. Limit the number of items printed in python numpy array a to a maximum of 6 elements.

Input:

a = np.arange(15)
#> array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
Copy the code

Desired Output:

#> array([ 0,  1,  2, ..., 12, 13, 14])
Copy the code
np.set_printoptions(threshold=6)
a = np.arange(15)
a
#> array([ 0,  1,  2, ..., 12, 13, 14])
Copy the code

24. How to print the full numpy array without truncating

Difficulty Level: L1

Q. Print the full numpy array a without truncating.

Input:

np.set_printoptions(threshold=6)
a = np.arange(15)
a
#> array([ 0,  1,  2, ..., 12, 13, 14])
Copy the code

Desired Output:

a
#> array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
Copy the code
# Input np.set_printoptions(threshold=6) a = np.arange(15) # Solution np.set_printoptions(threshold=np.nan) a #> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)Copy the code

25. How to import a dataset with numbers and texts keeping the text intact in python numpy?

Difficulty Level: L2

Q. Import the iris dataset keeping the text intact.

# Solution url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') # Print the first 3 rows of iris [3] : # > array ([[b' 5.1 ', b '3.5', b '1.4', b '0.2', b 'iris - setosa], # > [b' 4.9 ', B '3.0', b '1.4', b '0.2', b 'Iris - setosa], # > [b' 4.7 ', b '3.2', b '1.3', b '0.2', b 'Iris - setosa]], dtype = object)Copy the code

Since we want to retain the species, a text field, I have set the dtype to object. Had I set dtype=None, a 1d array of tuples would have been returned.

26. How to extract a particular column from 1D array of tuples?

Difficulty Level: L2

Q. Extract the text column species from the 1D iris imported in previous question.

Input:

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)
Copy the code
# Input:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)
print(iris_1d.shape)

# Solution:
species = np.array([row[4] for row in iris_1d])
species[:5]
#> (150,)
#> array([b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa',
#>        b'Iris-setosa'],
#>       dtype='|S18')
Copy the code

27. How to convert a 1d array of tuples to a 2d numpy array?

Difficulty Level: L2

Q. Convert the 1D iris to 2D array iris_2d by omitting the species text field.

Input:

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)
Copy the code
# Input: url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_1d = np.genfromtxt(url, delimiter=',', dtype=None) # Solution: # Method 1: Convert each row to a list and get the first 4 items iris_2d = np.array([row.tolist()[:4] for row in iris_1d]) iris_2d[:4] # Alt Method 2: Import only the first 4 columns from source url iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', ,1,2,3 usecols = [0]) iris_2d [: 4] # > array ([[5.1, 3.5, 1.4, 0.2], # > [4.9, 3., 1.4, 0.2], # > [4.7, 3.2, 1.3, 0.2]. #> [4.6, 3.1, 1.5, 0.2]]Copy the code

28. How to compute the mean, median, standard deviation of a numpy array?

Difficulty: L1

Q. Find the mean, median, standard deviation of iris’s sepallength (1st column)

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
Copy the code
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

# Solution
mu, med, sd = np.mean(sepallength), np.median(sepallength), np.std(sepallength)
print(mu, med, sd)
#> 5.84333333333 5.8 0.825301291785
Copy the code

29. How to normalize an array so the values range exactly between 0 and 1?

Difficulty: L2

Q. Create a normalized form of iris‘s sepallength whose values range exactly between 0 and 1 so that the minimum has value 0 and maximum has value 1.

Input:

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])
Copy the code
# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) # Solution Smax, Smin = sepallength.max(), Sepallength-min () S = (sepallength-smin)/(smax-smin) print(S) # 0.028 0.167 #> 0.306 0.139 0.139 0. 0.417 0.389 0.306 0.222 0.389 0.222 #> 0.306 0.222 0.139 0.194 0.194 0.25 0.25 0.111 #> 0.139 0.306 0.25 0.333 0.167 0.194 0.333 0.167 0.028 0.222 #> 0.194 0.056 0.028 0.194 0.222 0.139 0.222 0.083 0.278 0.194 #> 0.75 0.583 0.722 0.333 0.611 0.389 0.556 0.167 0.639 0.25 #> 0.194 0.444 0.472 0.5 0.361 0.667 0.361 0.417 0.528 0.361 #> 0.444 0.5 0.556 0.5 0.583 0.639 0.694 0.667 0.472 0.389 #> 0.333 0.333 0.417 0.472 0.306 0.472 0.667 0.556 0.361 0.333 #> 0.333 0.5 0.417 0.194 0.361 0.389 0.389 0.528 0.222 0.389 #> 0.556 0.417 0.778 0.556 0.611 0.917 0.167 0.833 0.667 0.806 #> 0.611 0.583 0.694 0.389 0.417 0.583 0.611 0.944 0.944 0.472 #> 0.722 0.361 0.944 0.556 0.667 0.806 0.528 0.5 0.583 0.806 #> 0.861 1. 0.583 0.556 0.5 0.944 0.556 0.583 0.472 0.722 #> 0.667 0.722 0.417 0.694 0.667 0.667 0.556 0.611 0.528 0.444]Copy the code

30. How to compute the softmax score?

Difficulty Level: L3

Q. Compute the softmax score of sepallength.

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])
Copy the code
# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') sepallength = np.array([float(row[0]) for row in iris]) # Solution def softmax(x): """Compute softmax values for each sets of scores in x. https://stackoverflow.com/questions/34968722/how-to-implement-the-softmax-function-in-python""" e_x = np.exp(x - Np.max (x)) return e_x/e_x.sum(axis=0) print(softmax(sepallength)) #> [0.002 0.002 0.000.000.002 0.003 0.000.002 0.002 0.002 0.002 #> 0.003 0.002 0.003 #> 0.004 0.002 0.002 #> 0.002 0.002 0.002 0.002 #> 0.002 0.002 0.002 0.002 0.003 0.003 0.002 #> 0.002 0.003 0.003 0.003 0.003 0.003 0.003 0.004 0.004 0.003 0.004 0.004 #> 0.005 0.006 0.006 0.008 0.01 0.012 0.011 0.005 0.004 #> 0.003 0.003 0.004 0.002 0.002 0.003 #> 0.003 0.003 0.004 0.004 #> 0.004 0.004 0.004 0.007 0.008 0.0012 0.004 0.004 0.008 0.009 0.003 0.03 0.005 #> 0.0013 0.004 0.007 0.007 0.006 0.008 0.018 #> 0.022 0.037 0.007 0.006 0.007 0.008 0.005 0.013 #> 0.011 0.0012 0.0011 0.0011 0.007 0.009 0.007 0.005]Copy the code

31. How to find the percentile scores of a numpy array?

Difficulty Level: L1

Q. Find the 5th and 95th percentile of iris’s sepallength

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])
Copy the code
# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) # Solution np.percentile(sepallength, q=[5, #> array([4.6, 7.255])Copy the code

32. How to insert values at random positions in an array?

Difficulty Level: L2

Q. Insert np.nan values at 20 random positions in iris_2d dataset

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')
Copy the code
# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='object') # Method 1 i, j = np.where(iris_2d) # i, j contain the row numbers and column numbers of 600 elements of iris_x np.random.seed(100) iris_2d[np.random.choice((i),  20), np.random.choice((j), 20)] = np.nan # Method 2 np.random.seed(100) iris_2d[np.random.randint(150, size=20), np.random.randint(4, Size = 20)] = np. Nan # Print first 10 rows Print (iris_2d [10]) # > [[b '5.1' b '3.5' b '1.4' b '0.2' b 'Iris - setosa] # > [b' 4.9 ' B '3.0' b '1.4' b '0.2' b 'Iris - setosa] # > [b' 4.7 'b' 3.2 'b' 1.3 'b' 0.2 'b' Iris - setosa] # > [b '4.6' b '3.1' b '1.5' b '0.2' B 'Iris - setosa] # > [b' 5.0 'b' 3.6 'b' 1.4 'b' 0.2 'b' Iris - setosa] # > [b '5.4' b '3.9' b '1.7' b '0.4' b 'Iris - setosa] # > [b '4.6' b '3.4' b '1.4' b '0.3' b 'Iris - setosa] # > [b' 5.0 'b' 3.4 'b' 1.5 'b' 0.2 'b' Iris - setosa] # > [b '4.4' nan b '1.4' B '0.2' b 'Iris - setosa] # > [b' 4.9 'b' 3.1 'b' 1.5 'b' 0.1 'b' Iris - setosa]]Copy the code

33. How to find the position of missing values in numpy array?

Difficulty Level: L2

Q. Find the number and position of missing values in iris_2d‘s sepallength (1st column)

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float')
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan
Copy the code
# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, Delimiter =',', dtype='float', usecols=[0,1,2,3]) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # Solution print("Number of missing values: \n", np.isnan(iris_2d[:, 0]).sum()) print("Position of missing values: \n", np.where(np.isnan(iris_2d[:, 0]))) #> Number of missing values: #> 5 #> Position of missing values: #> (array([ 39, 88, 99, 130, 147]),)Copy the code

34. How to filter a numpy array based on two or more conditions?

Difficulty Level: L3

Q. Filter the rows of iris_2d that has petallength (3rd column) > 1.5 and sepallength (1st column) < 5.0

# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, = ', 'delimiter, dtype =' float ', usecols =,1,2,3 [0])Copy the code
# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, = ', 'delimiter, dtype =' float ', usecols = #,1,2,3 [0]) Solution condition = (iris_2d [:, 2) > 1.5) & (iris_2d [:, 0] < 5.0) iris_2d [condition] # > array ([[4.8, 3.4, 1.6, 0.2], # > [4.8, 3.4, 1.9, 0.2], # > [4.7, 3.2, 1.6, 0.2]. # > [4.8, 3.1, 1.6, 0.2], # > [4.9, 2.4, 3.3, 1], # > [4.9, 2.5, 4.5, 1.7]])Copy the code

35. How to drop rows that contain a missing value from a numpy array?

Difficulty Level: L3:

Q. Select the rows of iris_2d that does not have any nan value.

# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, = ', 'delimiter, dtype =' float ', usecols =,1,2,3 [0])Copy the code
# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, Delimiter =',', dtype='float', usecols=[0,1,2,3]) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # Solution # No direct numpy function for this. any_nan_in_row = np.array([~np.any(np.isnan(row)) for The row in iris_2d]) iris_2d [any_nan_in_row] [5] # > array ([[4.9, 3., 1.4, 0.2], # > [4.7, 3.2, 1.3, 0.2], # > [4.6, 3.1, # > 1.5, 0.2], [5., 3.6, 1.4, 0.2], # > [5.4, 3.9, 1.7, 0.4]])Copy the code

36. How to find the correlation between two columns of a numpy array?

Difficulty Level: L2

Q. Find the correlation between SepalLength(1st column) and PetalLength(3rd column) in iris_2d

# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, = ', 'delimiter, dtype =' float ', usecols =,1,2,3 [0])Copy the code
# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, = ', 'delimiter, dtype =' float ', usecols =,1,2,3 [0]) # 1 np Solution. Corrcoef (iris [0] :,, iris [:, 2]) [0, 1] # Solution 2 from scipy.stats.stats import pearsonr corr, p_value = pearsonr(iris[:, 0], iris[:, 2]) print(corr) # Correlation coef indicates the degree of linear relationship between two numeric variables. # It can range between -1 to +1. # The p-value roughly indicates the probability of an uncorrelated system producing # datasets That have a correlation at least as extreme as the one computed. # the lower the p-value (<0.01), It is not an indicator of the strength. #> 0.871754157305Copy the code

37. How to find if a given array has any null values?

Difficulty Level: L2

Q. Find out if iris_2d has any missing values.

# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, = ', 'delimiter, dtype =' float ', usecols =,1,2,3 [0])Copy the code
# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, = ', 'delimiter, dtype =' float ', usecols = np,1,2,3 [0]). The isnan (iris_2d.) any () # > FalseCopy the code

38. How to replace all missing values with 0 in a numpy array?

Difficulty Level: L2

Q. Replace all ccurrences of nan with 0 in numpy array

# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, Delimiter =',', dtype='float', usecols=[0,1,2,3]) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nanCopy the code
# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, Delimiter =',', dtype='float', usecols=[0,1,2,3]) iris_2d[np.random.randint(150, size=20), np.random.randint(4, Size = 20)] = np. Nan # Solution iris_2d [np. Isnan (iris_2d)] = 0 iris_2d [: 4] # > array ([[5.1, 3.5, 1.4, 0.], # > [4.9, 3., 1.4, 0.2], # > [4.7, 3.2, 1.3, 0.2], # > [4.6, 3.1, 1.5, 0.2]])Copy the code

39. How to find the count of unique values in a numpy array?

Difficulty Level: L2

Q. Find the unique values and the count of unique values in iris’s species

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
Copy the code
# Import iris keeping the text column intact
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Solution
# Extract the species column as an array
species = np.array([row.tolist()[4] for row in iris])

# Get the unique values and the counts
np.unique(species, return_counts=True)
#> (array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'],
#>        dtype='|S15'), array([50, 50, 50]))
Copy the code

40. How to convert a numeric to a categorical (text) array?

Difficulty Level: L2

Q. Bin the petal length (3rd) column of iris_2d to form a text array, such that if petal length is:

  • Less than 3 — > ‘small’
  • 3-5 – > ‘medium’
  • “> = 5 – >” large”
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
Copy the code
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Bin petallength 
petal_length_bin = np.digitize(iris[:, 2].astype('float'), [0, 3, 5, 10])

# Map it to respective category
label_map = {1: 'small', 2: 'medium', 3: 'large', 4: np.nan}
petal_length_cat = [label_map[x] for x in petal_length_bin]

# View
petal_length_cat[:4]
<#> ['small', 'small', 'small', 'small']
Copy the code

41. How to create a new column from existing columns of a numpy array?

Difficulty Level: L2

Q. Create a new column for volume in iris_2d, where volume is (pi x petallength x sepal_length^2)/3

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
Copy the code
# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='object') # Solution # Compute volume sepallength = iris_2d[:, 0].astype('float') petallength = iris_2d[:, 2].astype('float') volume = (np.pi * petallength * (sepallength**2))/3 # Introduce new dimension to match iris_2d's volume = volume[:, np.newaxis] # Add the new column out = np.hstack([iris_2d, Volume]) # # the View out [: 4] > array ([[b '5.1', b '3.5', b '1.4', b '0.2', b 'Iris - setosa, 38.13265162927291], # > [b' 4.9 ', B '3.0', b '1.4', b '0.2', b 'Iris - setosa, # > 35.200498485922445], [b' 4.7 ', b '3.2', b '1.3', b '0.2', b 'Iris - setosa', [b], 30.0723720777127 # > '4.6', b '3.1', b '1.5', b '0.2', b 'Iris - setosa, 33.238050274980004]], dtype = object)Copy the code

42. How to do probabilistic sampling in numpy?

Difficulty Level: L3

1. Crystal sample Iris’s species such that setose is twice the number of versicolor and virginica

# Import iris keeping the text column intact
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
Copy the code
# Import iris keeping the text column intact url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') # Solution # Get the species column species = iris[:, 4] # Approach 1: Generate Probablistically np.random.seed(100) a = np.array(['Iris-setosa', 'Iris-versicolor', 'iris-virginica ']) species_out = Np.random. Choice (a, 150, p=[0.5, 0.25, 0.25]) # Approach 2: Seed (100) probs = NP.r_ [NP.linspace (0, 0.500, num=50), NP.linspace (0.501, .750, num=50), np.linspace(.751, 1.0, num=50)] index = np.searchsorted(probs, np.random.random(150)) species_out = species[index] print(np.unique(species_out, return_counts=True)) #> (array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'], dtype=object), array([77, 37, 36]))Copy the code

Approach 2 is preferred because it creates an index variable that can be used to sample 2d tabular data.

43. How to get the second largest value of an array when grouped by another array?

Difficulty Level: L2

Q. What is the value of second longest petallength of species setosa

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
Copy the code
# Import iris keeping the text column intact url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') # Solution # Get the species and petal length columns petal_len_setosa = iris[iris[:, 4] == b' iris-setosa ', [2]. Astype ('float') # Get the second last value np.unique(np.sort(petal_len_setosa))[-2] #> 1.7Copy the code

44. How to sort a 2D array by a column

Difficulty Level: L2

Q. Sort the iris dataset based on sepallength column.

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
Copy the code
# Sort by column position 0: SepalLength print (iris [iris [: 0] argsort (a)] [20]) # > [[b '4.3' b '3.0' b '1.1' b '0.1' b 'iris - setosa] # > [b' 4.4 'b' 3.2 ' B '1.3' b '0.2' b 'Iris - setosa] # > [b' 4.4 'b' 3.0 'b' 1.3 'b' 0.2 'b' Iris - setosa] # > [b '4.4' b '2.9' b '1.4' b '0.2' B 'Iris - setosa] # > [b' 4.5 'b' 2.3 'b' 1.3 'b' 0.3 'b' Iris - setosa] # > [b '4.6' b '3.6' b '1.0' b '0.2' b 'Iris - setosa] # > [b '4.6' b '3.1' b '1.5' b '0.2' b 'Iris - setosa] # > [b' 4.6 'b' 3.4 'b' 1.4 'b' 0.3 'b' Iris - setosa] # > [b '4.6' b '3.2' b '1.4' B '0.2' b 'Iris - setosa] # > [b' 4.7 'b' 3.2 'b' 1.3 'b' 0.2 'b' Iris - setosa] # > [b '4.7' b '3.2' b '1.6' b '0.2' b 'Iris - setosa] # > [b '4.8' b '3.0' b '1.4' b '0.1' b 'Iris - setosa] # > [b' 4.8 'b' 3.0 'b' 1.4 'b' 0.3 'b' Iris - setosa] # > [b '4.8' b '3.4' b '1.9' B '0.2' b 'Iris - setosa] # > [b' 4.8 'b' 3.4 'b' 1.6 'b' 0.2 'b' Iris - setosa] # > [b '4.8' b '3.1' b '1.6' b '0.2' b 'Iris - setosa] # > [b '4.9' b '2.4' b '3.3' b '1.0' b 'Iris - versicolor'] # > [b '4.9' b '2.5' b '4.5' b '1.7' b 'Iris - virginica] # > [b' 4.9 'b' 3.1 ' B '1.5' b '0.1' b 'Iris - setosa] # > [b' 4.9 'b' 3.1 'b' 1.5 'b' 0.1 'b' Iris - setosa]]Copy the code

45. How to find the most frequent value in a numpy array?

Difficulty Level: L1

Q. Find the most frequent value of petal length (3rd column) in iris dataset.

Input:

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
Copy the code
# Input: url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') # Solution: (Iris [:, 3], return_counts=True) print(Iris [:, 3]) #> B 'Copy the code

46. How to find the position of the first occurrence of a value greater than a given value?

Difficulty Level: L2

Q. Find the position of the first occurrence of a value greater than 1.0 in petalwidth 4th column of iris dataset.

# Input:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
Copy the code
# Input: url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, Delimiter =',', dtype='object') # Solution: print(np.argmax(iris[:, 3].astype('float') > 1.0)) #> 50Copy the code

47. How to replace all values greater than a given value to a given cutoff?

Difficulty Level: L2

Q. From the array a, replace all values greater than 30 to 30 and less than 10 to 10.

Input:

Np np. Random. Seed (100). The random. Uniform (1, 50, 20)Copy the code
# Input np.set_printoptions(precision=2) NP.random. Seed (100) a = NP.random. Uniform (1,50, 20) # Solution 1: Using np.clip np.clip(a, a_min=10, a_max=30) # Solution 2: Using np.where print(np.where(a < 10, 10, np.where(a > 30, 30, A))) #> [27.63 14.64 21.8 30.10.10.10.30.30.30.29.18 30. #> 11.25 10.08 10.11.77 30.30.30.14.43]Copy the code

48. How to get the positions of top n values from a numpy array?

Difficulty Level: L2

Q. Get the positions of top 5 maximum values in a given array a.

Np.random. Seed (100) a = np.random. Uniform (1,50, 20)Copy the code
# Input np. Random. Seed (100) a = np. Random. Uniform (1,50, 20) # Solution: Print (a [a.a rgsort (a)] [- 5:]) # > [41. 41.47 42.39 44.67 48.95] # Method 2: print (np) sort (a) / - 5:) # Mthod 3: np.partition(a, kth=-5)[-5:]Copy the code

49. How to compute the row wise counts of all possible values in an array?

Difficulty Level: L4

Q. Compute the counts of unique values row-wise.

Input:

Np.random. Seed (100) ARR = NP.random. Randint (1,11,size=(6, 10)) ARR > array([[9, 9, 4, 8, 8, 1, 5, 3, 6, 3], > [3, 3, 2, 1, 9, 5, 1, 10, 7, 3], > [5, 2, 6, 4, 5, 5, 4, 8, 2, 2], > [8, 8, 1, 3, 10, 10, 4, 3, 6, 9], > [2, 1, 8, 7, 3, 1, > [9, 2, 6, 5, 3, 9, 4, 6, 1, 10]]Copy the code

Desired Output:

> [[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
>  [2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
>  [0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
>  [1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
>  [2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
>  [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]]
Copy the code

Output contains 10 columns representing numbers from 1 to 10. The values are the counts of the numbers in the respective rows.

For example, Cell(0,2) has the value 2, which means, the number 3 occurs exactly 2 times in the 1st row.

# Input: Np.random. Seed (100) arr = NP.random. Randint (1,11,size=(6, 10)) ARR #> array([[9, 9, 4, 8, 8, 1, 5, 3, 6, 3], #> [3, 3, 2, 1, 9, 5, 1, 10, 7, 3], # > [5, 2, 6, 4, 5, 5, 4, 8, 2, 2], # > [8, 8, 1, 3, 10, 10, 4, 3, 6, 9], # > [2, 1, 8, 7, 3, 1, 9, 3, 6, 2), # > [9, 2, 6, 5, 3, 9, 4, 6, 1, 10]])Copy the code
# Solution def counts_of_all_values_rowwise(arr2d): # Unique values and its counts row wise num_counts_array = [np.unique(row, return_counts=True) for row in arr2d] # Counts of all values row wise return([[int(b[a==i]) if i in a else 0 for i in np.unique(arr2d)] for a, B in num_counts_array]) # Print Print (np.arange(1,11)) #> [1, 2, 3, 4] #> [[1, 0, 2, 1, 1, 1, 0, 2, 2, 0], # > [2, 1, 3, 0, 1, 0, 1, 0, 1, 1], # > [0, 3, 0, 2, 3, 1, 0, 1, 0, 0], # > [1, 0, 2, 1, 0, 1, 0, 2, 1, 2], # > [2, 2, 2, 0, 0, 1, 1, 1, 1, 0], # > [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]]Copy the code
# Example 2: arr = np.array([np.array(list('bill clinton')), np.array(list('narendramodi')), np.array(list('jjayalalitha'))]) print(np.unique(arr)) counts_of_all_values_rowwise(arr) #> [' ' 'a' 'b' 'c' 'd' 'e' 'h'  'i' 'j' 'l' 'm' 'n' 'o' 'r' 't' 'y'] #> [[1, 0, 1, 1, 0, 0, 0, 2, 0, 3, 0, 2, 1, 0, 1, 0], #> [0, 2, 0, 0, 2, 1, 0, 1, 0, 0, 1, 2, 1, 2, 0, 0], # > [0, 4, 0, 0, 0, 0, 1, 1, 2, 2, 0, 0, 0, 0, 1, 1]]Copy the code

50. How to convert an array of arrays into a flat 1d array?

Difficulty Level: 2

Q. Convert array_of_arrays into a flat linear 1d array.

Input:

# Input: Arr1 = np.arange(3) arr2 = np.arange(3) arr3 = np.arange(7,10) array_of_arrays = np.array([arr1, arr2, arr3]) array_of_arrays #> array([array([0, 1, 2]), array([3, 4, 5, 6]), array([7, 8, 9])], dtype=object)Copy the code

Desired Output:

#> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Copy the code
# Input: Arr1 = np.arange(3) arr2 = np.arange(3) arr3 = np.arange(7,10) array_of_arrays = np.array([arr1, arr2, arr3]) print('array_of_arrays: ', array_of_arrays) # Solution 1 arr_2d = np.array([a for arr in array_of_arrays for a in arr]) # Solution 2: arr_2d = np.concatenate(array_of_arrays) print(arr_2d) #> array_of_arrays: [array([0, 1, 2]) array([3, 4, 5, 6]) array([7, 8, 9])] #> [0 1 2 3 4 5 6 7 8 9]Copy the code

51. How to generate one-hot encodings for an array in numpy?

Difficulty Level L4

Q. Compute the one-hot encodings (dummy binary variables for each unique value in the array)

Input:

Np.random. Seed (101) arr = NP.random. Randint (1,4, size=6) ARR #> array([2, 3, 2, 2, 2, 1])Copy the code

Output:

#> array([[ 0.,  1.,  0.],
#>        [ 0.,  0.,  1.],
#>        [ 0.,  1.,  0.],
#>        [ 0.,  1.,  0.],
#>        [ 0.,  1.,  0.],
#>        [ 1.,  0.,  0.]])
Copy the code
# Input: np.random. Seed (101) ARR = NP.random. Randint (1,4, size=6) ARR #> array([2, 3, 2, 2, 2, 1]) # Solution: # Input: NP.random. Seed (101) arR = NP.random. def one_hot_encodings(arr): uniqs = np.unique(arr) out = np.zeros((arr.shape[0], uniqs.shape[0])) for i, k in enumerate(arr): out[i, k-1] = 1 return out one_hot_encodings(arr) #> array([[ 0., 1., 0.], #> [ 0., 0., 1.], #> [ 0., 1., 0.], #> [ 0., 1., 0.], #> [ 0., 1., 0.], #> [ 1., 0., 0.]]) # Method 2: (arr[:, None] == np.unique(arr)).view(np.int8)Copy the code

52. How to create row numbers grouped by a categorical variable?

Difficulty Level: L3

Q. Create row numbers grouped by a categorical variable. Use the following sample from iris species as input.

Input:

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
species_small = np.sort(np.random.choice(species, size=20))
species_small
#> array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
#>        'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
#>        'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
#>        'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
#>        'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
#>        'Iris-virginica', 'Iris-virginica', 'Iris-virginica'],
#>       dtype='<U15')
Copy the code

Desired Output:

# > [0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 6, 7]Copy the code
# Input:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
np.random.seed(100)
species_small = np.sort(np.random.choice(species, size=20))
species_small
#> array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
#>        'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
#>        'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
#>        'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
#>        'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
#>        'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
#>        'Iris-virginica'],
#>       dtype='<U15')
Copy the code
print([i for val in np.unique(species_small) for i, grp in enumerate(species_small[species_small==val])])
Copy the code
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3, 4, 5]
Copy the code

53. How to create groud ids based on a given categorical variable?

Difficulty Level: L4

Q. Create group ids based on a given categorical variable. Use the following sample from iris species as input.

Input:

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
species_small = np.sort(np.random.choice(species, size=20))
species_small
#> array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
#>        'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
#>        'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
#>        'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
#>        'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
#>        'Iris-virginica', 'Iris-virginica', 'Iris-virginica'],
#>       dtype='<U15')
Copy the code

Desired Output:

# > [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]Copy the code
# Input:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
np.random.seed(100)
species_small = np.sort(np.random.choice(species, size=20))
species_small
#> array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
#>        'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
#>        'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
#>        'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
#>        'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
#>        'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
#>        'Iris-virginica'],
#>       dtype='<U15')
Copy the code
# Solution: output = [np.argwhere(np.unique(species_small) == s).tolist()[0][0] for val in np.unique(species_small) for s in species_small[species_small==val]] # Solution: For Loop version output = [] uniqs = np.unique(species_small) for val in uniqs: # uniq values in group for s in species_small[species_small==val]: # each element in group groupid = np.argwhere(uniqs == s).tolist()[0][0] # groupid output.append(groupid) print(output) # > [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]Copy the code

54. How to rank items in an array using numpy?

Difficulty Level: L2

Q. Create the ranks for the given numeric array a.

Input:

np.random.seed(10)
a = np.random.randint(20, size=10)
print(a)
#> [ 9  4 15  0 17 16 17  8  9  0]
Copy the code

Desired output:

[4 2 6 0 8 7 9 3 5 1]Copy the code
np.random.seed(10)
a = np.random.randint(20, size=10)
print('Array: ', a)

# Solution
print(a.argsort().argsort())
print('Array: ', a)
#> Array:  [ 9  4 15  0 17 16 17  8  9  0]
#> [4 2 6 0 8 7 9 3 5 1]
#> Array:  [ 9  4 15  0 17 16 17  8  9  0]
Copy the code

55. How to rank items in a multidimensional array using numpy?

Difficulty Level: L3

Q. Create a rank array of the same shape as a given numeric array a.

Input:

Np.random. Seed (10) a = np.random. Randint (20, size=[2,5]) print(a) #> [[9 4 15 0 17] #> [16 17 8 9 0]]Copy the code

Desired output:

#> [[4 2 6 0 8]
#>  [7 9 3 5 1]]
Copy the code
# Input: np.random.seed(10) a = np.random.randint(20, Size = (2, 5)) print # Solution (a) print (a.r avel () argsort () argsort (). Reshape (a.s hape) # > [17] [9 April 15 0 # > [16 17 August 9 0] #> [[4 2 6 0 8] #> [7 9 3 5 1]]Copy the code

56. How to find the maximum value in each row of a numpy array 2d?

DifficultyLevel: L2

Q. Compute the maximum for each row in the given array.

Np, the random seed (100) a = np. Random. Randint (1, 10, [5]) a # > array ([[9, 9, 4], # > [8, 8, 1], # > [5, 3, 6], # > [3, 3, 3]. # > [2, 1, 9]])Copy the code
# Input np.random. Seed (100) a = np.random. Randint (1,10, [5,3]) a # Solution 1 np.amax(a, axis=1) # Solution 2 np.apply_along_axis(np.max, arr=a, axis=1) #> array([9, 8, 6, 3, 9])Copy the code

57. How to compute the min-by-max for each row for a numpy array 2d?

DifficultyLevel: L3

Q. Compute the min-by-max for each row for given 2d numpy array.

Np, the random seed (100) a = np. Random. Randint (1, 10, [5]) a # > array ([[9, 9, 4], # > [8, 8, 1], # > [5, 3, 6], # > [3, 3, 3]. # > [2, 1, 9]])Copy the code
# Input np.random. Seed (100) a = np.random. Randint (1,10, [5,3]) a # Solution np.apply_along_axis(lambda x: Np. Min (x)/np. Max (x), arr = a, axis = 1) # > array ([0.44444444, 0.125, 0.5, 1, 0.11111111])Copy the code

58. How to find the duplicate records in a numpy array?

Difficulty Level: L3

Q. Find the duplicate entries (2nd occurrence onwards) in the given numpy array and mark them as True. First time occurrences should be False.

# Input
np.random.seed(100)
a = np.random.randint(0, 5, 10)
print('Array: ', a)
#> Array: [0 0 3 0 2 4 2 2 2 2]Copy the code

Desired Output:

#> [False  True False  True False False  True  True  True  True]Copy the code
# Input np.random.seed(100) a = np.random.randint(0, 5, 10) ## Solution # There is no direct function to do this as of 1.13.3 # Create an all True array out = np.full(a.shape[0], True) # Find the index positions of unique elements unique_positions = np.unique(a, return_index=True)[1] # Mark those positions as False out[unique_positions] = False print(out) #> [False True False True  False False True True True True]Copy the code

59. How to find the grouped mean in numpy?

Difficulty Level L3

Q. Find the mean of a numeric column grouped by a categorical column in a 2D numpy array

Input:

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
Copy the code

Desired Solution:

# > [[b 'Iris - setosa, 3.418], # > [b' Iris - versicolor, 2.770], # > [b 'Iris - virginica, 2.974]]Copy the code
# Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') # Solution # No direct way to implement this. Just a version of a workaround. numeric_column = iris[:, 1].astype('float') # sepalwidth grouping_column = iris[:, 4] # species # List comprehension version [[group_val, numeric_column[grouping_column==group_val].mean()] for group_val in np.unique(grouping_column)] # For Loop version output = [] for group_val in np.unique(grouping_column): Append ([group_val, numeric_column[grouping_column==group_val].mean()]) output #> [[b' iris-setosa ', 3.418], # > [b 'Iris - versicolor, 2.770], # > [b' Iris - virginica, 2.974]]Copy the code

60. How to convert a PIL image to numpy array?

Difficulty Level: L3

Q. Import the image from the following URL and convert it to a numpy array.

URL = “https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg”

from io import BytesIO
from PIL import Image
import PIL, requests

# Import image from URL
URL = 'https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg'
response = requests.get(URL)

# Read it as Image
I = Image.open(BytesIO(response.content))

# Optionally resize
I = I.resize([150,150])

# Convert to numpy array
arr = np.asarray(I)

# Optionaly Convert it back to an image and show
im = PIL.Image.fromarray(np.uint8(arr))
Image.Image.show(im)
Copy the code

61. How to drop all missing values from a numpy array?

Difficulty Level: L2

Q. Drop all nan values from a 1D numpy array

Input:

Np. Array ([1, 2, 3, np. Nan, 5, 6, np, nan])

Desired Output:

array([ 1.,  2.,  3.,  5.,  6.,  7.])Copy the code
A = np. Array ([1, 2, 3, np. Nan, 5, 6, np, nan]] a [~ np. Isnan (a)] # > array ([1, 2, 3., 5., 6, 7])Copy the code

62. How to compute the euclidean distance between two arrays?

Difficulty Level: L3

Q. Compute the euclidean distance between two arrays a and b.

Input:

A = np. Array ([1, 2, 3, 4, 5]) b = np. Array (,5,6,7,8 [4])Copy the code
# Input a = np.array([1,2,3,4,5]) b = np.array([4,5,6,7,8]) # Solution dist = np.linalg.norm(a-b) #> 6.7082039324993694Copy the code

63. How to find all the local maxima (or peaks) in a 1d array?

Difficulty Level: L4

Q. Find all the peaks in a 1D numpy array a. Peaks are points surrounded by smaller values on both sides.

Input:

a = np.array([1, 3, 7, 1, 2, 6, 0, 1])

Desired Output:

#> array([2, 5])Copy the code

where, 2 and 5 are the positions of peak values 7 and 6.

a = np.array([1, 3, 7, 1, 2, 6, 0, 1])
doublediff = np.diff(np.sign(np.diff(a)))
peak_locations = np.where(doublediff == -2)[0] + 1
peak_locations
#> array([2, 5])
Copy the code

64. How to subtract a 1d array from a 2d array, where each item of 1d array subtracts from respective row?

Difficulty Level: L2

Q. Subtract the 1d array b_1d from the 2d array a_2d, such that each item of b_1d subtracts from respective row of a_2d.

3 filling a_2d = np. Array ([[], [4,4,4], [5,5,5]]) b_1d = np, array (,1,1 [1]Copy the code

Desired Output:

#> [[2 2 2] #> [2 2 2]Copy the code
# Input a_2d = np. Array ([filling [3], [4,4,4], [5,5,5]]) b_1d = np, array ([1, 2, 3]) # Solution print (a_2d - b_1d [: None]) # > [[2 #> [2 2 2] #> [2 2 2]Copy the code

65. How to find the index of n ‘th repetition of an item in an array

Difficulty Level L2

Q. Find the index of 5th repetition of number 1 in x.

x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])Copy the code
x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])
n = 5

# Solution 1: List comprehension
[i for i, v in enumerate(x) if v == 1][n-1]

# Solution 2: Numpy version
np.where(x == 1)[0][n-1]
#> 8
Copy the code

66. How to convert numpy’s datetime64Object to a datetime ‘sdatetime object?

Difficulty Level: L2

Q. Convert numpy’s datetime64 object to datetime’s datetime object

# Input: a numpy datetime64 object
dt64 = np.datetime64('2018-02-25 22:10:10')Copy the code
# Input: a numpy datetime64 object
dt64 = np.datetime64('2018-02-25 22:10:10')

# Solution
from datetime import datetime
dt64.tolist()

# or

dt64.astype(datetime)
#> datetime.datetime(2018, 2, 25, 22, 10, 10)
Copy the code

67. How to compute the moving average of a numpy array?

Difficulty Level: L3

Q. Compute the moving average of window size 3, for the given 1D array.

Input:

np.random.seed(100)
Z = np.random.randint(10, size=10)
Copy the code
# Solution # Source: https://stackoverflow.com/questions/14313510/how-to-calculate-moving-average-using-numpy def moving_average(a, n=3) : ret = np.cumsum(a, dtype=float) ret[n:] = ret[n:] - ret[:-n] return ret[n - 1:] / n np.random.seed(100) Z = np.random.randint(10, size=10) print('array: ', Z) print('moving average: ', moving_average(Z, n=3).round(2)) #> array: [8 8 3 7 7 0 4 2 5 2] #> Moving Average: [6.33 6.5.67 4.67 3.67 2. 3.67 3]Copy the code

68. How to create a numpy array sequence given only the starting point, length and the step?

Difficulty Level: L2

Q. Create a numpy array of length 10, starting from 5 and has a step of 3 between consecutive numbers

length = 10
start = 5
step = 3

def seq(start, length, step):
    end = start + (step*length)
    return np.arange(start, end, step)

seq(start, length, step)
#> array([ 5,  8, 11, 14, 17, 20, 23, 26, 29, 32])
Copy the code

69. How to fill in missing dates in an irregular series of numpy dates?

Difficulty Level: L3

Q. Given an array of a non-continuous sequence of dates. Make it a continuous sequence of dates, by filling in the missing dates.

Input:

# Input dates = np.arange(np.datetime64('2018-02-01'), np.datetime64('2018-02-25'), 2) print(dates) #> ['2018-02-01' '2018-02-03' '2018-02-05' '2018-02-07' '2018-02-09' #> '2018-02-11' '2018-02-13' '2018-02-15', '2018-02-17' 2018-02-19 '# >' 2018-02-21 ', '2018-02-23')Copy the code
# Input dates = np.arange(np.datetime64('2018-02-01'), np.datetime64('2018-02-25'), 2) print(dates) # Solution --------------- filled_in = np.array([np.arange(date, (date+d)) for date, d in zip(dates, np.diff(dates))]).reshape(-1) # add the last day output = np.hstack([filled_in, dates[-1]]) output # For loop version ------- out = [] for date, d in zip(dates, np.diff(dates)): out.append(np.arange(date, (date+d))) filled_in = np.array(out).reshape(-1) # add the last day output = np.hstack([filled_in, dates[-1]]) output #> ['2018-02-01' '2018-02-03' '2018-02-05' '2018-02-07' '2018-02-09' #> '2018-02-11' '2018-02-13' '2018-02-15' '2018-02-17' '2018-02-19' #> '2018-02-21' '2018-02-23'] #> array(['2018-02-01', '2018-02-02', # > '2018-02-03', '2018-02-04', '2018-02-05', '2018-02-06', '2018-02-07', '2018-02-08', # > '2018-02-09', '2018-02-10', # > '2018-02-11', '2018-02-12', '2018-02-13', '2018-02-14', '2018-02-15', '2018-02-16', # > '2018-02-17', '2018-02-18', '2018-02-19', '2018-02-20', #> '2018-02-21', '2018-02-22', '2018-02-23'], dtype='datetime64[D]')Copy the code

70. How to create strides from a given 1D array?

Difficulty Level: L4

Q. From the given 1d array arr, generate a 2d matrix using strides, with a window length of 4 and strides of 2, ,1,2,3 like [[0], 5-tetrafluorobenzoic [2], [4, 7].]

Input:

arr = np.arange(15) 
arr
#> array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
Copy the code

Desired Output:

#> [0 12 3] #> [2 3 4 5] #> [4 5 6 7] #> [6 7 8 9] #> [8 9 10 11] #> [10 11 12 13]]Copy the code
def gen_strides(arr, stride_len=5, window_len=5):
    n_strides = ((a.size-window_len)//stride_len) + 1
    # return np.array([a[s:(s+window_len)] for s in np.arange(0, a.size, stride_len)[:n_strides]])
    return np.array([a[s:(s+window_len)] for s in np.arange(0, n_strides*stride_len, stride_len)])

print(gen_strides(np.arange(15), stride_len=2, window_len=4))
#> [[ 0  1  2  3]
#>  [ 2  3  4  5]
#>  [ 4  5  6  7]
#>  [ 6  7  8  9]
#>  [ 8  9 10 11]
#>  [10 11 12 13]]
Copy the code

# >