“This is the ninth day of my participation in the First Challenge 2022. For details: First Challenge 2022”

preface

The core object of the NUMpy library is nDARray, which can represent an n-dimensional array. Ndarray has the same type of data blocks, and each data block has the same memory size and is stored in a contiguous physical address, making it dominant in the field of scientific computing.

However, for different types of data block storage requirements, NUMpy refers to the structural concepts of THE C language and provides “structured arrays”. In our previous article on Numpy structured Data types, we learned that Numpy structured arrays are nDARray named fields consisting of a series of data types, and can also share memory space.

Numpy structured arrays differ from regular arrays in the data type of structured arrays. In the last article, we learned that a structured data type is a long sequence of bytes (ItemSize), which is a collection of fields.

Each field of a structured datatype consists of a field name, a datatype, and a byte offset. And you can create structured data types using the numpy.dtype() method, of which there are four options.

Now that we have learned how to create structured data types, in this installment, we will continue to learn how to query structured data types

1. Name of the structured data type field

Structured data types are composed of a series of fields. The fields are mainly name, datatype, and offset

  • In the structured data type field names, we can display them through the names property.
  • The names property is in the properties of the DType object
  • The NAMES property queries for a sequence of strings of the same length
  • Structured data type field names can be modified through the string sequence assigned by names
>>> stu_type = np.dtype({"names": ["name"."age"]."formats": ["S6"."i4"]."offsets": [2.3]."itemsize":12})
>>> stu_type.names
('name'.'age')
>>> stu_type.names = ("name1"."age1")
>>> stu_type
dtype({'names': ['name1'.'age1'].'formats': ['S6'.'<i4'].'offsets': [2.3].'itemsize':12}) > > >Copy the code

Dtype is the object that creates the description data type.

The dType object contains the following common attributes:

attribute instructions
dtype.name Description field name
dtype.names List of field names
dtype.ndim Dimension of the subarray, otherwise
dtype.shape Describes the shape of an array
dtype.flags Describes a bit flag that explains a data type
dtype.fields Array life field dictionary
dtype.base Array data source
dtype.alignment The byte offset of the array

A dType object has a fields property that is a dictionary-like property, a fields key value is the field name, and a value value is a tuple of datatype and byte offset for each field

>>> stu_type
dtype({'names': ['name1'.'age1'].'formats': ['S6'.'<i4'].'offsets': [2.3].'itemsize':12})
>>> stu_type.fields
mappingproxy({'name1': (dtype('S6'), 2), 'age1': (dtype('int32'), 3)}) > > >Copy the code

For a normal array, the names and fields properties are the same. For example, if we query a one-dimensional array, names and fields are None:

>>> y = np.array([1.2.3.4])
>>> y.dtype.names
>>> print(y.dtype.names)
None
>>> print(y.dtype.fields)
None
>>>
Copy the code

So, to determine whether an array is structured, we can use dtype.names

If the string is a structured array, the result is returned as a tuple.

2. Automatic byte offset and alignment

The numpy structured array relies on the align=True information of numpy.dtype for automatic field offsets and alignment.

The default value of the align attribute in numpy.dtype is False

  • Numpy packs the fields together
  • Fields are stored consecutively in memory
  • Each field starts with the byte offset of the previous field
>>> d = np.dtype("S1,S1,U1,i4,S1,i8")
>>> [d.fields[name][1] for name in d.names]
[0.1.2.6.10.11]
>>> d.itemsize
19
>>>

Copy the code

When align is True, fields can be automatically aligned

  • Numpy will use the C compiler c-strue method to fill the structure
  • The padding strategy uses interbyte insertion padding so that the offset of each byte will be a multiple of the alignment of the field
  • For simple data types, will be equal to the byte size of the field
  • Field tail padding is also required if the size in the project is not a multiple of the maximum field alignment
  • Padding improves performance by aligning fields, but reduces the overall data type size
>>> d = np.dtype("S1,S1,U1,i4,S1,i8",align=True)
>>> [d.fields[name][1] for name in d.names]
[0.1.4.8.12.16]
>>> d.itemsize
24
>>>

Copy the code

Align = True if offsets of numpy structured arrays are offsets based on optional keys in dTYPE. The system internally needs to check whether the offset of each field is a multiple of its size. The item size is a multiple of the size of the largest field, otherwise an exception will be raised.

3. Field title

A field contains the associated field title in addition to the field name.

  • Field title, as an additional description of the field name
  • Field titles can be used for indexes in the same way as field names

If you need to add a field title to dTYPE, you need to specify the following conditions:

  • When making dType use a tuple form, you need to add another tuple form such as (field name, field title) to include it

    >>> stu_type = np.dtype([(("name"."nickname"),"S1")])
    >>> stu_type
    dtype([(('name'.'nickname'), 'S1')]) > > >Copy the code
  • When using field parameter dictionary form, you can add a titles key to define it

    >>> stu_type = np.dtype({"names": ["name"."age"]."titles": ["nickname"."age1"]."formats": ["S6"."i4"]."offsets": [2.3]."itemsize":12})
    >>> stu_type
    dtype({'names': ['name'.'age'].'formats': ['S6'.'<i4'].'offsets': [2.3].'titles': ['nickname'.'age1'].'itemsize':12}) > > >Copy the code
  • When using the dictionary form of field names, you need to add headings in the form (datatype,offset,title)

    >>> stu_type = np.dtype({"name": ("S6".0."nickname"),"age": ("i8".1)})
    >>> stu_type
    dtype({'names': ['name'.'age'].'formats': ['S6'.'<i8'].'offsets': [0.1].'titles': ['nickname'.None].'itemsize':9}) > > >Copy the code

If the structured array contains a title field, the fields property query will see it stored internally twice.

Using the names property, we query the fields list to see the field name with the title.

>>> [stu_type.fields[name][:2] for name in stu_type.names]
[(dtype('S6'), 0), (dtype('int64'), 1)] > > >Copy the code

conclusion

Dtype align=True = dtype align=True = dtype align=True

Also, the field name contains an alternative field, the field title. When a field setting has a title, the fields dictionary says it twice.

That’s the content of this episode. Please give us your thumbs up and comments. See you next time