Data Mining Python Basics NumPy Basics

preface

After two months, I came back, during which a lot of things happened, but also had doubts about their own persistence of doubt. But always forge ahead after decadence, and life also want to continue, can only with “xiongguan road is really like iron, now move from scratch” to encourage. I’m going home for quarantine. I’m going to set up a flag day.

We’ve covered some of the syntactical basics of Python, including Pyhton’s basic data structures, function and file reading and writing, and Pyhton’s object-oriented content. Next we’ll look at some common packages for data mining, of which NumPy is the most important base package for numerical computation.

ndarray

Ndarray is a highly efficient multi-dimensional array that provides convenient array-based arithmetic operations and flexible broadcast functions. We can quickly generate a 2*3 array.

import numpy as np


data = np.random.randn(2, 3)
print(data)
Copy the code

As a result of

[[0.52828809 0.75873811-0.81223681] [2.13722235 0.40123476-0.07276397]Copy the code

In fact, Ndarray is a universal multidimensional array container, which contains elements of the same type. We can check the dimension of the array through its Shape attribute, and check its data type through the dtype attribute. Examples are as follows

import numpy as np

data = np.random.randn(2, 3)
print(data.shape)
print(data.dtype)
Copy the code

The results are as follows

(2, 3)
float64
Copy the code

Generate ndarray

The simplest way to generate Ndarray is the array function, which takes any sequence object and generates a new NumPy array containing the passed data. Examples are as follows:

import numpy as np

data1 = [1, 2, 3, 4]
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr1 = np.array(data1)
arr2 = np.array(data2)
arr1 = arr1 * 10
arr2 = arr2 + arr1
print(arr1)
print(arr2)
Copy the code

The results are as follows

[10 20 30 40] [[11 22 33 44] [15 26 37 48]Copy the code

We can see that the array function converts arrays to NDARray, and we can also see that NDARRay simplifies array operations, eliminating a lot of for loops.

We can also create all-0 arrays with Zeros, all-1 arrays with ones, and uninitialized arrays with empty. The following code

import numpy as np

arr1 = np.zeros(10)
arr2 = np.zeros((5, 2))
print(arr1)
print(arr2)
Copy the code

The results are as follows

[0. 0 0. 0. 0, 0, 0, 0, 0, 0.] [[0. 0.] [0. 0.] [0. 0.] [0. 0.] [0. 0.]]Copy the code

Note that when we create arrays using Empty, we sometimes return uninitialized garbage values.

The following code

import numpy as np

arr1 = np.empty(10)
arr2 = np.empty((5, 2))
print(arr1)
print(arr2)
Copy the code

The results are as follows

[0/0/0.] [[6.95006917E-310 1.29189234E-316] [5.39246171E-317 5.39246171E-317] [6.95006798E-310 [5.39246171E-317 5.39246171E-317] [5.39247752E-317 6.95006795E-310]Copy the code

Index base and slice

Indexing and slicing one-dimensional arrays is simple, much like slicing Python lists.

The following code

import numpy as np

arr1 = np.arange(10)
arr2 = arr1[5:8]
print(arr1)
print(arr2)
Copy the code

Note that the slice of the array is the view of the original array, which means that the data is not copied, and any changes to the view are reflected on the original array. The reason for this is that NdarRay is designed to handle large arrays, so you can imagine that copying the array is expensive.

The following code

import numpy as np

arr1 = np.arange(10)
arr1[5:8] = 12
print(arr1)
Copy the code

The results are as follows

[0 12 3 4 12 12 12 12 8 9]Copy the code

A slice index of a multidimensional array is the same as a slice index of a one-dimensional array, except that the elements of a one-dimensional array slice are numbers, while the elements of a multidimensional array slice are either a one-dimensional array or a multidimensional array.

The following code

import numpy as np

arr1 = np.random.randn(3, 3)
print(arr1)
print(arr1[:2])
Copy the code

The results are as follows

[[0.60673463-0.84261761-0.55674384] [1.49376061-1.23850612-0.10686775] [1.3516511-0.65024839-1.68451601]] [[0.60673463-0.84261761-0.55674384] [1.49376061-1.23850612-0.10686775] [1.3516511-0.65024839-1.68451601] 0.60673463-0.84261761-0.55674384] [1.49376061-1.23850612-0.10686775]Copy the code

We can put [0.60673463-0.84261761-0.55674384] [1.49376061-1.23850612-0.10686775] [1.3516511-0.65024839-1.68451601] [1.3516511-0.65024839-1.68451601] As three elements, arr1[:2] takes the first two elements.

For multi-dimensional arrays we can also do multi-group slicing

The following code

import numpy as np

arr1 = np.random.randn(3, 3)
print(arr1)
print(arr1[1:, :2])
Copy the code

The results are as follows

[[1.51132511-0.16890946-0.78987301] [0.41426026-0.09105493 1.44744887] [1.79046674 0.27690028 1.31201169]] [[1.51132511-0.16890946-0.78987301] [1.79046674 0.27690028 1.31201169] 0.41426026-0.09105493] [1.79046674 0.27690028]Copy the code

We can also pass in booleans to slice arrays for more flexibility, where the Boolean array length must match the length of the array axis index.

The following code

import numpy as np

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7, 4)
print(names == 'Bob')
print(data[names == 'Bob'])
Copy the code

The results are as follows

[True False False False False] [[1.20875931 0.54870492-0.45572233-0.58897014] [-1.42004058-0.81150623 1.03740228 0.91427144]]Copy the code

From the above results we can see that the array index passed in a Boolean value is the row that returns true.

We can also slice an array by passing in a Boolean value

The following code

import numpy as np

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7, 4)
print(names == 'Bob')
print(data[names == 'Bob', :3])
Copy the code

The results are as follows

[True False False False False False] [[1.08094968-0.29838004 0.80950847] [0.10917791 0.79569972 0.47027354]Copy the code

It’s important to note that Python keywords and and or for Boolean array is useless, must use & and | instead.

Array transpose and transpose.

Arrays can be transposed by T, and inner products can be computed by dot

The following code

import numpy as np

arr = np.arange(15).reshape((3, 5))
print(arr)
print(arr.T)
print(np.dot(arr, arr.T))
Copy the code

The results are as follows

[[0 12 3 4] [5 6 7 8 9] [10 11 12 13 14]] [[0 5 10] [1 6 11] [2 7 12] [3 8 13] [4 9 14]] [[30 80 130] [80 255 [430] 130, 430, 730]]Copy the code

Ndarray can also use transpose to pass in the number of shafts

The following code

import numpy as np

arr = np.arange(16).reshape((2, 2, 4))
print(arr)
print(arr.transpose(1, 0, 2))
Copy the code

The results are as follows

[[0 12 3] [4 5 6 7]] [[8 9 10 11] [12 13 14 15]]] [[[0 12 3] [8 9 10 11]] [[4 5 6 7] [12 13 14 15]]] [[4 5 6 7] [12 13 14 15]]Copy the code

The last

More confusion, less gain. More exciting content can pay attention to the public number QStack, pursue the purest technology, enjoy the joy of programming.

Data Mining Python Basics NumPy Basics

preface

ndarray

Generate ndarray

Index base and slice

Array transpose and transpose.

The last

Related Posts

NLP monographs interpretation: from Chatbot to NER | PaperDaily # 11

Exploration of multi-data set connection problem

A NUMPY builds the neural network