Introduction to the

If there are many NaN values in the data, storage will waste space. To solve this problem, Pandas introduced a Sparse data structure to efficiently store these NaN values.

Spare Data example

We create an array, set most of its data to NaN, and then use this array to create SparseArray:

In [1]: arr = np.random.randn(10) In [2]: arr[2:-2] = np.nan In [3]: ts = pd.Series(pd.arrays.SparseArray(arr)) In [4]: TS OUT [4]: 0 0.469112 1-0.282863 2 NaN 3 NaN 4 NaN 5 NaN 6 NaN 7 NaN 8-0.861849 9-2.104569 DTYPE: Sparse[float64, nan]

The Dtype here is Sparse[float64, NaN], which means that NaN is not actually stored in the array. Only non-NaN data is stored, and the type of the data is float64.

SparseArray

Array. sparseArray is an ExtensionArray that stores SparseArray types.

In [13]: arr = np.random.randn(10) In [14]: arr[2:5] = np.nan In [15]: arr[7:8] = np.nan In [16]: sparr = pd.arrays.SparseArray(arr) In [17]: sparr Out[17]: [-1.9556635297215477, -1.6588664275960427, Nan, Nan, Nan, 1.1589328886422277, 0.14529711373305043, Nan, Inindex Indices: Array ([0, 1, 5, 6, 8, 9], Dtype = INT32)

We can convert this to a normal array using numpy.asArray () :

In [18]: np.asarray Out[18]: array([-1.9557, -1.6589, NaN, NaN, 1.1589, 0.1453, NaN, 0.606, 1.3342])

SparseDtype

SparsedType means Spare type. It contains two types of information, the first is a non-NaN data type, and the second is a constant value when populated, such as NaN:

In [19]: sparr.dtype
Out[19]: Sparse[float64, nan]

A SPARSEDTYPE can be constructed as follows:

In [20]: pd.SparseDtype(np.dtype('datetime64[ns]'))
Out[20]: Sparse[datetime64[ns], NaT]

You can specify the value of the padding:

In [21]: pd.SparseDtype(np.dtype('datetime64[ns]'), .... : fill_value=pd.Timestamp('2017-01-01')) .... : Out[21]: Sparse[datetime64[ns], Timestamp('2017-01-01 00:00:00')]

The properties of Sparse

Sparse can be accessed via.sparse:

IN [24]: S = PD.Series ([0, 0, 1, 2], DTYPE =" SPARSE [int]") IN [24]: Sparse. Density Out[24]: Sparse. s.sparse.fill_value Out[25]: 0

The calculation of Sparse

NP’s evaluation functions can be used directly in a SparseArray and will return a SparseArray.

In [26]: arr = pd.arrays.SparseArray([1., np.nan, np.nan, -2., np.nan]) In [27]: np.abs(arr) Out[27]: [1.0, NaN, NaN, 2.0, NaN] Fill: NaN IntIndex: array([0, 3], dtype=int32)

SparseSeries and SparseDataFrame

SparseSeries and SparseDataFrame were removed in the 1.0.0 release. They have been replaced by the more powerful SparseArray.

Take a look at the differences in usage:

# Previous way
>>> pd.SparseDataFrame({"A": [0, 1]})
# New way
In [31]: pd.DataFrame({"A": pd.arrays.SparseArray([0, 1])})
Out[31]: 
   A
0  0
1  1

For Sparse Matrix in SciPy, dataframe.sparse.from_spmatrix() can be used:

# Previous way
>>> from scipy import sparse
>>> mat = sparse.eye(3)
>>> df = pd.SparseDataFrame(mat, columns=['A', 'B', 'C'])
# New way
In [32]: from scipy import sparse

In [33]: mat = sparse.eye(3)

In [34]: df = pd.DataFrame.sparse.from_spmatrix(mat, columns=['A', 'B', 'C'])

In [35]: df.dtypes
Out[35]: 
A    Sparse[float64, 0]
B    Sparse[float64, 0]
C    Sparse[float64, 0]
dtype: object

This article has been included in http://www.flydean.com/13-python-pandas-sparse-data/

The most popular interpretation, the most profound dry goods, the most concise tutorial, many you do not know the tips to wait for you to discover!