“This is the 8th day of my participation in the First Challenge 2022. For details: First Challenge 2022”

preface

As we all know, the NUMPY library is the foundation of scientific computing. We learned earlier that numpy ndarray objects require homogeneity of the data elements in an array. At the same time, numpy array element values have the same size of memory space, and the underlying c-ORDER (row first storage) or Fortran-ORDER (column first storage) storage method. Make numpy arrays faster and more memory-wise than Python Lists.

In numpy indexes and slicing and Numpy advanced indexes, you know that arrays support indexes as well as Python lists, etc. Numpy array indexes are implemented through tuples of non-negative integers, bores, other arrays, or integers.

Numpy data requires that all data types must be homogeneous, so how do you calculate different types of data?

We all know that if we use Python’s built-in library, we all know the data types that support key-value dictionaries, and we can easily implement the above scenario.

So if Numpy doesn’t support this scenario and we give it up, how can it be central to the scientific Python and PyData ecosystem?

The numpy array also has a concept called structured array.

In this installment, we will learn about structured arrays in numpy library, Let’s go~

1. Overview of Numpy structured arrays

  • What is a structured array?

    A NUMpy structured array is an array of NDARray fields defined by DTYPE.

    Dict in Python can easily define the following data using key-value forms, lists, or tuples.

    Python dict dictionary implementation:

    stu1 = {"name":"Tom"."age":10."weight":50}
    stu2 = {"name":"Anne"."age":12."weight":42}
    stu_list = [stu1,stu2]
    Copy the code

    Python list + the tuple

    stu_list = [("Tom".10.50), ("Anne".12.42)]
    Copy the code

    For numpy structured arrays, each field data type is defined by dtype, as follows

    >>> stu_list = np.array([("Tom".10.50), ("Anne".12.42)],dtype=[("name"."U10"), ("age"."i4"), ("weight"."i8")])
    >>> stu_list
    array([('Tom'.10.50), ('Anne'.12.42)],
          dtype=[('name'.'<U10'), ('age'.'<i4'), ('weight'.'<i8')]) > > >Copy the code

    In numpy arrays, we can query the data types of the elements in the array by dtype.

    In the case above,we define the name field data type as Unicode, the age field as int32, and the weight field as int64

    The data types supported by NUMpy correspond to the data types of THE C language. The common data types are as follows and the corresponding built-in codes

    The data type The built-in code meaning
    int8 i1 Bytes (-128 to 127)
    int16 i2 Integer,16 bit bytes
    int32 i4 Integer,32 bit bytes
    int64 i8 Integer,64 bytes
    float16 f2 Floating-point type, 16 bit bytes
    float32 f4 Floating point type, 32 bit bytes
    float64 f8 Floating-point, 64-bit bytes
    bool_ b Boolean type
    Unicode U Unicode
    String S string
  • Structured Array features

    • Structured data types are created from C language data structures and can share memory space
    • Numpy structured arrays are low-level operations that address C code interfaces and structured buffers
    • Structured arrays support data nesting, federation, and control over their memory layout
    • Associative arrays have poor cache behavior due to their C-structured memory layout, which is not suitable for manipulating table data

2. Types of structured data types

The main difference between a numpy structured array and a generic array is that the dTYPE is used to define the types of the data fields in the array.

Thus, structured data types can be thought of as a sequence of bytes of a certain length (itemsize), usually interpreted as a collection of fields.

Typically, a field consists of three main parts: field name, data type, and byte offset (optional)

3. Create structured data types

In the numpy library we can use numpy.dtypej to create structured data types.

  • Method 1: Form a tuple list

    The creation of structured data types can be defined using tuples.

    • Each tuple represents a field of the form (name,datatype,shape)

    • The Shape field in the tuple is optional

    • Datatype can be defined as any type

    >>> np.dtype([("address"."S5"), ("family"."U10", (2.2))])
    dtype([('address'.'S5'), ('family'.'<U10', (2.2))) > > >Copy the code
  • Method 2: Separate them with commas

    The NUMpy library supports comma-delimited basic format strings to define dtypes.

    • The format of the string is “i7, F4,U10”.

    • The name in the field is automatically generated by the system in the form of F0 and F1

    • Deviation in the field is automatically confirmed by the system

    >>> np.dtype("i8,S4,f4")
    dtype([('f0'.'<i8'), ('f1'.'S4'), ('f2'.'<f4')])
    >>> np.dtype("I8, S4, f4 (5, 3)")
    dtype([('f0'.'<i8'), ('f1'.'S4'), ('f2'.'<f4', (5.3))) > > >Copy the code
  • Method 3: Express each parameter in dictionary form

    Define each field parameter type as a Python dictionary key-value.

    • Dictionary form define fields such as: {” name “: [],” formats “: [],” offsetd “: [],” itemsize “:}
    • Name: list of field names with the same length
    • Formats stands for: dtype List of basic formats
    • Offsets: List of offsets, optional field.
    • Itemsize: Describes the total size of dType. Optional field

    The dictionary form represents the contents of the field and allows you to control the amount of field deviation and itemSize

    >>> np.dtype({"names": ["name"."age"]."formats": ["S6"."i4"]})
    dtype([('name'.'S6'), ('age'.'<i4')])
    >>> np.dtype({"names": ["name"."age"]."formats": ["S6"."i4"]."offsets": [2.3]."itemsize":12})
    dtype({'names': ['name'.'age'].'formats': ['S6'.'<i4'].'offsets': [2.3].'itemsize':12}) > > >Copy the code
  • Method 4: Represent field names in dictionary form

    In this method, the dictionary key represents the field name, and the value represents the specified type and deviation as a tuple.

    In Python 3.6, dictionary order operations are not reserved, but the order of fields in numpy structured DTYPE makes sense. Therefore, this method is officially not recommended to be created using this method

    >>> np.dtype({"name": ("S6".0),"age": ("i8".1)})
    dtype({'names': ['name'.'age'].'formats': ['S6'.'<i8'].'offsets': [0.1].'itemsize':9}) > > >Copy the code

conclusion

In this issue, we will focus on numpy module’s understanding of structured array and four methods to create structured data types.

Structured data type consists of three parts: field name, data type and offset.

That’s the content of this episode. Please give us your thumbs up and comments. See you next time