“This is the 14th day of my participation in the First Challenge 2022.

preface

Following on from the previous section on numpy recording array helper methods, we’ve learned a little about recarray helper methods. We all know that recarray helper methods are provided by the RecFunctions module in numpy.lib as a set of function methods that are structured for creation and manipulation.

  • The apply_along_fields() method can use the application function to shrink fields in a structured array
  • The append_fields() method adds the new field to the structured array
  • The drop_fields() method removes the specified field from a structured array and returns a new array
  • The join_by() method joins two arrays by key
  • The merge_arrays() method merges arraylists

When the array length is inconsistent, the system will automatically fill the short array with the missing value, depending on the corresponding type.

Filling it instructions
– 1 Integer types
1.0 Floating point type
“-“ character
‘1’ string
True Boolean value

In this installment, we will continue to look at the recfunctions module’s approach to structured array manipulation, Let’s go~

1. Operations related to obtaining the name of a structured array field

The RecFunctions module also provides retrieval of field names for structured data types.

1.1 Return the field name as a dictionary

The recFunctions module provides the get_fieldStructure () method to return structured data type fields as dictionaries.

The get_fieldStructure () method is similar to embedded structured data and can be simplified.

get_fieldstructure(adtype, lastname=None, parents=None.)Copy the code

Parameter Description:

parameter instructions
adtype Structured data is similar to Np.dtype ()
lastname Last processed field name, optional
parents Parent field dictionary
>>> import numpy as np
>>> from numpy.lib import recfunctions as rfn
>>> arr_dtype = np.dtype([("A"."i8"), ("B"."i8")])
>>> rfn.get_fieldstructure(arr_dtype)
{'A': [].'B': []} > > >Copy the code

The get_fieldStructure () method can be simplified especially for embedded structured data.

>>> import numpy as np
>>> from numpy.lib import recfunctions as rfn
>>> stu_dtype = np.dtype([("school"."S16"), ("class", [("classA"."S6"), ("classB"."S6")]])>>> rfn.get_fieldstructure(stu_dtype)
{'school': [].'class': [].'classA': ['class'].'classB': ['class']} > > >Copy the code

1.2 Return the field name as a tuple

The recFunctions module provides the get_names() method, which returns structured data type fields as tuples

get_names(adtype)
Copy the code

Parameter Description:

parameter instructions
adtype Input data type
>>> arr_dtype = np.dtype([("A"."i8"), ("B"."i8"), ("C", [("C1"."i8"), ("C2"."i4")]])>>> rfn.get_names(arr_dtype)
('A'.'B', ('C', ('C1'.'C2'))) > > >Copy the code

1.3 Return the field name in metagroup flat format

The recfunctions module provides get_name_flat() returns a tuple for field names of embedded structured data types.

get_names_flat(adtype)
Copy the code

Parameter Description:

parameter instructions
adtype Structured data type

In contrast to get_names(), for embedded field names, get_names_flat() returns unembedded structures and a group of cells.

>>> arr_dtype = np.dtype([("A"."i8"), ("B"."i8"), ("C", [("C1"."i8"), ("C2"."i4")]])>>> rfn.get_names(arr_dtype)
('A'.'B', ('C', ('C1'.'C2')))
>>> rfn.get_names_flat(arr_dtype)
('A'.'B'.'C'.'C1'.'C2') > > >Copy the code

2. Find duplicates in structured arrays

The recFunctions module provides duplicates of find_Duplicates () that can structure an array based on a specific key

find_duplicates(a, key=None, ignoremask=True, return_index=False)
Copy the code

Parameter Description:

parameter instructions
a Input array
key The name of the field used to check for repeatability. The default is None
ignoremask Whether to discard data
return_index Whether to return an index with a duplicate value

If the find_duplicates() method is used, the nP.ma.array () type is required for arrays

>>> arr =np.ma.array([1.1.2.3.1.3.1],dtype = [("A"."i8")])
>>> rfn.find_duplicates(arr)
masked_array(data=[(1,), (1,), (1,), (1,), (3,), (3,)],
             mask=[(False,), (False,), (False,), (False,), (False,),
                   (False,)],
       fill_value=(999999,),
            dtype=[('A'.'<i8')]) > > >Copy the code

If the data created by Np.array () is used to find duplicates, AttributeError is reported

>>> arr =np.array([1.1.2.3.1.3.1],dtype = [("A"."i8")])
>>> rfn.find_duplicates(arr)
Traceback (most recent call last):
  File "<stdin>", line 1.in <module>
  File "<__array_function__ internals>", line 6.in find_duplicates
  File "C:\Users\user\AppData\Roaming\Python\Python37\site-packages\numpy\lib\recfunctions.py", line 1388.in find_duplicates
    sorteddata = sortedbase.filled()
AttributeError: 'numpy.ndarray' object has no attribute 'filled'
Copy the code

3. Assign the field name

The recFunctions module also provides the assign_fields_by_name() method, which assigns field values from array A to array B.

  • The assign_fields_by_name() method is copied by field name, from fields in the source array to target fields for assignment.
  • This method uses recursion and is ideal for structured arrays with nested structures
assign_fields_by_name(dst, src, zero_unassigned=True)
Copy the code

Parameter Description:

parameter instructions
dst The source array
src The target array
zero_unassigned Optional, if True. Fields in SRC that do not match in DST will be filled with 0
>>> arr =np.array([1.1.2.3.1.3.1],dtype = [("A"."i8")])
>>> arr2 =np.array([10.10.20.30.1.3.1],dtype = [("A"."i8")])
>>> rfn.assign_fields_by_name(arr,arr2)
>>> arr
array([(10,), (10,), (20,), (30,), ( 1,), ( 3,), ( 1,)],
      dtype=[('A'.'<i8')]) > > >Copy the code

Note that DST and SRC arrays must be the same as shape, otherwise ValueError will be reported

>>> arr =np.array([1.1.2.3.1.3.1],dtype = [("A"."i8")])
>>> arr2 = np.array([(1.2), (3.3), (1.2)],dtype=[("A"."i8"), ("B"."i8")])
>>> rfn.assign_fields_by_name(arr2,arr)
Traceback (most recent call last):
  File "<stdin>", line 1.in <module>
  File "<__array_function__ internals>", line 6.in assign_fields_by_name
  File "C:\Users\user\AppData\Roaming\Python\Python37\site-packages\numpy\lib\recfunctions.py", line 1200.in assign_fields_by_name

    zero_unassigned)
  File "<__array_function__ internals>", line 6.in assign_fields_by_name
  File "C:\Users\user\AppData\Roaming\Python\Python37\site-packages\numpy\lib\recfunctions.py", line 1191.inassign_fields_by_name dst[...]  = src ValueError: couldnot broadcast input array from shape (7) into shape (3) > > >Copy the code

4. Field collapse

The recFunctions module overlays the arraylist fields and returns a new array.

stack_arrays(arrays, defaults=None, usemask=True, asrecarray=False,
                 autoconvert=False)
Copy the code

Parameter Description:

parameter instructions
arrays An array or array sequence
defaults Dictionary type that maps field names to corresponding default values
usemask Whether to return an array of masks
asrecarray Whether to return an array of records
autoconvert Whether to automatically convert the field type to the maximum value
>>> arr =np.array([1.1.2.3.1.3.1],dtype = [("A"."i8")])
>>> arr2 = np.array([(1.2), (3.3), (1.2)],dtype=[("A"."i8"), ("B"."i8")])
>>> new_arr = rfn.stack_arrays((arr,arr2))
>>> new_arr
masked_array(data=[(1, -), (1, -), (2, -), (3, -), (1, -), (3, -), (1, -), (1.2), (3.3), (1.2)],
             mask=[(False.True), (False.True), (False.True),
                   (False.True), (False.True), (False.True),
                   (False.True), (False.False), (False.False),
                   (False.False)],
       fill_value=(999999.999999),
            dtype=[('A'.'<i8'), ('B'.'<i8')]) > > >Copy the code

Note: In the array sequence field overlay process, the missing value of the field name is filled with “–” by default.

5. Structured and unstructured transformations

Methods for converting structured and unstructured arrays are also supported in the RecFunctions module.

5.1 Transformation from structured to unstructured

The recFunctions module provides the structured_to_unstructured() method to convert structured arrays into unstructured arrays.

  • The structured_to_unstructured() method converts nstructured arrays to (n+1) D unstructured arrays
  • The new array takes a new last dimension whose size is equal to the number of field elements in the input array
  • If no output data type is provided, it is determined by the NUMPY data type rules
structured_to_unstructured(arr, dtype=None, copy=False, casting='unsafe')
Copy the code

Parameter Description:

parameter instructions
arr Structured array
dtype Specifies the dTYPE to output an unstructured array
copy The default is false, if True, a copy is returned, otherwise the view is returned
casting Optional values include “no”,”equiv”,”safe”,”some_kind”,”unsafe”, which controls data type conversion
>>> arr =np.array([1.1.2.3.1.3.1],dtype = [("A"."i8")])
>>> arr
array([(1,), (1,), (2,), (3,), (1,), (3,), (1,)], dtype=[('A'.'<i8')])
>>> rfn.structured_to_unstructured(arr)
array([[1],
       [1],
       [2],
       [3],
       [1],
       [3],
       [1]], dtype=int64)
>>> arr2 = np.array([(1.2), (3.3), (1.2)],dtype=[("A"."i8"), ("B"."i8")])
>>> rfn.structured_to_unstructured(arr2)
array([[1.2],
       [3.3],
       [1.2]], dtype=int64)
>>>

Copy the code

5.2 Transforming unstructured into structured

The recFunctions module also provides an unstructured_to_structured() method that supports converting unstructured arrays to structured arrays.

  • This method converts an n-dimensional unstructured array into an (n-1) -dimensional structured array
  • The last dimension of the input array is converted to a structure with the number of field elements equal to the size of the last dimension of the input array
  • By default, output fields have the DTYPE of the input array.
  • You can provide output structured DTYPE fields
unstructured_to_structured(arr, dtype=None, names=None, align=False,
                               copy=False, casting='unsafe')
Copy the code

Parameter Description:

parameter instructions
arr Unstructured array
dtype The structured array DTYPE to output
names String list
align Whether to create an aligned memory layout
copy Whether to return a copy
casting Controls the data type conversions that occur
>>> arr = np.array([(1.2), (3.3), (1.2)])
>>> arr_dtype = np.dtype([("A"."i8"), ("B"."i8")])
>>> rfn.unstructured_to_structured(arr)
array([(1.2), (3.3), (1.2)], dtype=[('f0'.'<i4'), ('f1'.'<i4')])
>>> rfn.unstructured_to_structured(arr,arr_dtype)
array([(1.2), (3.3), (1.2)], dtype=[('A'.'<i8'), ('B'.'<i8')]) > > >Copy the code

conclusion

In this installment, we’ve provided the recfunctions module with structured array operations such as stacking array sequence fields using the stack_arrays() method to return new arrays. And we can use unstructured_to_structured() to convert unstructured arrays into structured arrays, and structured_to_unstructured arrays into unstructured arrays.

The recArray helper method can help us use structured arrays better, and we use it in practice.

That’s the content of this episode. Please give us your thumbs up and comments. See you next time