Structured data types for Python Numpy

“This is the 8th day of my participation in the First Challenge 2022. For details: First Challenge 2022”

preface

As we all know, the NUMPY library is the foundation of scientific computing. We learned earlier that numpy ndarray objects require homogeneity of the data elements in an array. At the same time, numpy array element values have the same size of memory space, and the underlying c-ORDER (row first storage) or Fortran-ORDER (column first storage) storage method. Make numpy arrays faster and more memory-wise than Python Lists.

In numpy indexes and slicing and Numpy advanced indexes, you know that arrays support indexes as well as Python lists, etc. Numpy array indexes are implemented through tuples of non-negative integers, bores, other arrays, or integers.

Numpy data requires that all data types must be homogeneous, so how do you calculate different types of data?

We all know that if we use Python’s built-in library, we all know the data types that support key-value dictionaries, and we can easily implement the above scenario.

So if Numpy doesn’t support this scenario and we give it up, how can it be central to the scientific Python and PyData ecosystem?

The numpy array also has a concept called structured array.

In this installment, we will learn about structured arrays in numpy library, Let’s go~

1. Overview of Numpy structured arrays

What is a structured array?

A NUMpy structured array is an array of NDARray fields defined by DTYPE.

Dict in Python can easily define the following data using key-value forms, lists, or tuples.

Python dict dictionary implementation:

stu1 = {"name":"Tom"."age":10."weight":50}
stu2 = {"name":"Anne"."age":12."weight":42}
stu_list = [stu1,stu2]
Copy the code

Python list + the tuple

stu_list = [("Tom".10.50), ("Anne".12.42)]
Copy the code

For numpy structured arrays, each field data type is defined by dtype, as follows

>>> stu_list = np.array([("Tom".10.50), ("Anne".12.42)],dtype=[("name"."U10"), ("age"."i4"), ("weight"."i8")])
>>> stu_list
array([('Tom'.10.50), ('Anne'.12.42)],
      dtype=[('name'.'<U10'), ('age'.'<i4'), ('weight'.'<i8')]) > > >Copy the code

In numpy arrays, we can query the data types of the elements in the array by dtype.

In the case above,we define the name field data type as Unicode, the age field as int32, and the weight field as int64

The data types supported by NUMpy correspond to the data types of THE C language. The common data types are as follows and the corresponding built-in codes

The data type	The built-in code	meaning
int8	i1	Bytes (-128 to 127)
int16	i2	Integer,16 bit bytes
int32	i4	Integer,32 bit bytes
int64	i8	Integer,64 bytes
float16	f2	Floating-point type, 16 bit bytes
float32	f4	Floating point type, 32 bit bytes
float64	f8	Floating-point, 64-bit bytes
bool_	b	Boolean type
Unicode	U	Unicode
String	S	string

Structured Array features
- Structured data types are created from C language data structures and can share memory space
- Numpy structured arrays are low-level operations that address C code interfaces and structured buffers
- Structured arrays support data nesting, federation, and control over their memory layout
- Associative arrays have poor cache behavior due to their C-structured memory layout, which is not suitable for manipulating table data

2. Types of structured data types

The main difference between a numpy structured array and a generic array is that the dTYPE is used to define the types of the data fields in the array.

Thus, structured data types can be thought of as a sequence of bytes of a certain length (itemsize), usually interpreted as a collection of fields.

Typically, a field consists of three main parts: field name, data type, and byte offset (optional)

3. Create structured data types

In the numpy library we can use numpy.dtypej to create structured data types.

Method 1: Form a tuple list

The creation of structured data types can be defined using tuples.
- Each tuple represents a field of the form (name,datatype,shape)
- The Shape field in the tuple is optional
- Datatype can be defined as any type
```
>>> np.dtype([("address"."S5"), ("family"."U10", (2.2))])
dtype([('address'.'S5'), ('family'.'<U10', (2.2))) > > >Copy the code
```
Method 2: Separate them with commas

The NUMpy library supports comma-delimited basic format strings to define dtypes.
- The format of the string is “i7, F4,U10”.
- The name in the field is automatically generated by the system in the form of F0 and F1
- Deviation in the field is automatically confirmed by the system
```
>>> np.dtype("i8,S4,f4")
dtype([('f0'.'<i8'), ('f1'.'S4'), ('f2'.'<f4')])
>>> np.dtype("I8, S4, f4 (5, 3)")
dtype([('f0'.'<i8'), ('f1'.'S4'), ('f2'.'<f4', (5.3))) > > >Copy the code
```
Method 3: Express each parameter in dictionary form

Define each field parameter type as a Python dictionary key-value.
- Dictionary form define fields such as: {” name “: [],” formats “: [],” offsetd “: [],” itemsize “:}
- Name: list of field names with the same length
- Formats stands for: dtype List of basic formats
- Offsets: List of offsets, optional field.
- Itemsize: Describes the total size of dType. Optional field
The dictionary form represents the contents of the field and allows you to control the amount of field deviation and itemSize
```
>>> np.dtype({"names": ["name"."age"]."formats": ["S6"."i4"]})
dtype([('name'.'S6'), ('age'.'<i4')])
>>> np.dtype({"names": ["name"."age"]."formats": ["S6"."i4"]."offsets": [2.3]."itemsize":12})
dtype({'names': ['name'.'age'].'formats': ['S6'.'<i4'].'offsets': [2.3].'itemsize':12}) > > >Copy the code
```
Method 4: Represent field names in dictionary form

In this method, the dictionary key represents the field name, and the value represents the specified type and deviation as a tuple.

In Python 3.6, dictionary order operations are not reserved, but the order of fields in numpy structured DTYPE makes sense. Therefore, this method is officially not recommended to be created using this method
```
>>> np.dtype({"name": ("S6".0),"age": ("i8".1)})
dtype({'names': ['name'.'age'].'formats': ['S6'.'<i8'].'offsets': [0.1].'itemsize':9}) > > >Copy the code
```

conclusion

In this issue, we will focus on numpy module’s understanding of structured array and four methods to create structured data types.

Structured data type consists of three parts: field name, data type and offset.

That’s the content of this episode. Please give us your thumbs up and comments. See you next time

Structured data types for Python Numpy

preface

1. Overview of Numpy structured arrays

What is a structured array?

Structured Array features

2. Types of structured data types

3. Create structured data types

Method 1: Form a tuple list

Method 2: Separate them with commas

Method 3: Express each parameter in dictionary form

Method 4: Represent field names in dictionary form

conclusion

Related Posts

A summary of common String methods in C#

Common log framework and source code analysis

Do you Get the hidden knowledge in CopyOnWriteArrayList?