Pandas’ 10 indexes

Public account: You and the cabin by: Peter Editor: Peter

Hello, I’m Peter

There are 10 indexes you must Learn about Pandas.

Indexes are actually quite common in our daily life, like:

  • A book has its own table of contents and specific chapters. When we want to find a certain knowledge point, we can turn to the corresponding chapter.
  • For example, books in the library are classified into literature, history, technology, fiction, etc., plus the number of books, we can quickly find the books we want.
  • An a la carte menu for eating out, ranging from staples, drinks/soups, cold dishes, etc., to specific dish names, etc

Each of the different usages above can be considered a specific indexing application.

Therefore, indexes created based on actual requirements are of great guiding significance to our business work. Creating an appropriate index in Pandas will facilitate our data processing.

The study’s website address: pandas.pydata.org/docs/refere…

Here are 10 common indexes in Pandas and how to create them.

pd.Index

Index is a common Index function used in Pandas to build various types of indexes.

pandas.Index(
  data=None.One-dimensional arrays or array-like data
  dtype=None.# NumPy data type (default: object)
  copy=False.Whether to generate a copy
  name=None.# index name
  tupleize_cols=True.# If True, try to create MultiIndex if possible
  **kwargs
)
Copy the code

Import the two required libraries:

import pandas as pd
import numpy as np
Copy the code

The default data type is INT64

In [2]:

Create pd.index ([1,2,3,4])Copy the code

Out[2]:

Int64Index([1, 2, 3, 4], dtype='int64')
Copy the code

At creation time, you can also specify the data type directly:

In [3]:

Index([1,2,3,4], dtype="float64")Copy the code

Out[3]:

Float64Index([1.0, 2.0, 3.0, 4.0], dType ='float64')Copy the code

Specify name and data type dtype at creation time:

In [4]:

Pd. Index([1,2,3,4], dtype="float64", name="Peter")Copy the code

Out[4]:

Float64Index([1.0, 2.0, 3.0, 4.0], dType ='float64', name='Peter')Copy the code

In [5]:

Pd. Index(list("ABCD"))Copy the code

Out[5]:

Index(['A', 'B', 'C', 'D'], dtype='object')
Copy the code

Use tuples to create:

In [6]:

# create pd.Index(("a","b","c","d"))Copy the code

Out[6]:

Index(['a', 'b', 'c', 'd'], dtype='object')
Copy the code

Use collections to create. The set itself is unordered, so the final result is not necessarily in the given order of elements:

In [7]:

Pd. Index({"x","y","z"})Copy the code

Out[7]:

Index(['z', 'x', 'y'], dtype='object')
Copy the code

pd.RangeIndex

Generates an index within an interval, mainly based on the Python range function, with the syntax:

pandas.RangeIndex(
  start=None.The default value is 0
  stop=None.# end value
  step=None.Step size, default is 1
  dtype=None.# type
  copy=False.Whether to generate a copy
  name=None)  # the name
Copy the code

Here are several examples:

In [8]:

Pd.rangeindex (8) # default start is 0, step is 1Copy the code

The default value is 0, the end value is 8 (not included), and the step is 1:

Out[8]:

RangeIndex(start=0, stop=8, step=1)
Copy the code

In [9]:

Pd.rangeindex (0,8) #Copy the code

Out[9]:

RangeIndex(start=0, stop=8, step=1)
Copy the code

Change the step size to 2:

In [10]:

Pd. RangeIndex (0,8,2)Copy the code

Out[10]:

RangeIndex(start=0, stop=8, step=2)
Copy the code

In [11]:

The list (pd) RangeIndex,8,2 (0))Copy the code

Display the result as a list without the stop value 8:

Out[11]:

[0, 2, 4, 6]
Copy the code

Change the step size to -1 in the following example:

In [12]:

Pd. RangeIndex (8, 0, 1)Copy the code

Out[12]:

RangeIndex(start=8, stop=0, step=-1)
Copy the code

In [13]:

The list (pd) RangeIndex (8, 0, 1))Copy the code

Out[13]:

[8, 7, 6, 5, 4, 3, 2, 1Copy the code

pd.Int64Index

Specifies that the data type is an int64 integer

pandas.Int64Index(
  data=None.Generate index data
  dtype=None.The default index type is INT64
  copy=False.Whether to generate a copy
  name=None)  # use name
Copy the code

In [14]:

Pd. Int64Index ([1, 2, 3, 4])Copy the code

Out[14]:

Int64Index([1, 2, 3, 4], dtype='int64')
Copy the code

In [15]:

Pd.int64index ([1,2.0,3,4]Copy the code

Out[15]:

Int64Index([1, 2, 3, 4], dtype='int64')
Copy the code

In [16]:

Pd. Int64Index ([1, 2, 3, 4], name = "Peter")Copy the code

Out[16]:

Int64Index([1, 2, 3, 4], dtype='int64', name='Peter')
Copy the code

An error is reported if the data contains decimals:

In [17]:

# pd.Int64Index([1,2,3,4.4]Copy the code

pd.UInt64Index

The data type is an unsigned UInt64

pandas.UInt64Index(
  data=None, 
  dtype=None, 
  copy=False, 
  name=None
)
Copy the code

In [18]:

pd.UInt64Index([1, 2, 3, 4])
Copy the code

Out[18]:

UInt64Index([1, 2, 3, 4], dtype='uint64')
Copy the code

In [19]:

UInt64Index([1, 2, 3, 4],name="Tom") #Copy the code

Out[19]:

UInt64Index([1, 2, 3, 4], dtype='uint64', name='Tom')
Copy the code

In [20]:

Pd. UInt64Index ([1, 2.0, 3, 4], name = "Tom")Copy the code

Out[20]:

UInt64Index([1, 2, 3, 4], dtype='uint64', name='Tom')
Copy the code
UInt64Index([1, 2.4, 3, 4],name="Tom")Copy the code

pd.Float64Index

The data type is Float64, allowing decimals:

pandas.Float64Index(
  data=None.# data
  dtype=None.# type
  copy=False.Whether to generate a copy
  name=None  # index name
)
Copy the code

In [22]:

pd.Float64Index([1, 2, 3, 4])
Copy the code

Out[22]:

Float64Index([1.0, 2.0, 3.0, 4.0], dType ='float64')Copy the code

In [23]:

Pd. Float64Index ([1.5, 2.4, 3.7, 4.9])Copy the code

Out[23]:

Float64Index([1.5, 2.4, 3.7, 4.9], dType ='float64')Copy the code

In [24]:

Pd. Float64Index ([1.5, 2.4, 3.7, 4.9], name = "Peter")Copy the code

Out[24]:

Float64Index([1.5, 2.4, 3.7, 4.9], dType ='float64', name=' Peter ')Copy the code

Note: in Pandas1.4.0, all three functions are unified as pd.NumericIndex methods.

pd.IntervalIndex

pd.IntervalIndex(
  data,  # Data to be indexed (one dimension)
  closed=None.{' left ', 'right', 'both', 'neither'}, default 'right'
  dtype=None.# data type
  copy=False.# create a copy
  name=None.The name of the index
  verify_integrity=True  # Determine if it matches
)
Copy the code

A new IntervalIndex is usually constructed using the interval_range() function.

In [24]:

pd.interval_range(start=0, end=6)
Copy the code

Out[24]:

IntervalIndex ([[0, 1], (1, 2], (2, 3], (3, 4), (4, 5), (5, 6]], closed = 'right', # by default is off on the right dtype = 'interval [int64]')Copy the code

In [25]:

Pd. interval_range(start=0, end=6, closed="neither"Copy the code

Out[25]:

IntervalIndex([(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6)],
              closed='neither',
              dtype='interval[int64]')
Copy the code

In [26]:

Pd.interval_range (start=0, end=6, closed="both"Copy the code

Out[26]:

IntervalIndex([[0, 1], [1, 2], [2, 3], [3, 4], [4, 5], [5, 6]],
              closed='both',
              dtype='interval[int64]')
Copy the code

In [27]:

Pd.interval_range (start=0, end=6, closed="left"Copy the code

Out[27]:

IntervalIndex([[0, 1), [1, 2), [2, 3), [3, 4), [4, 5), [5, 6)],
              closed='left',
              dtype='interval[int64]')
Copy the code

In [28]:

pd.interval_range(start=0, end=6, name="peter")
Copy the code

Out[28]:

IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6]],
              closed='right',
              name='peter',
              dtype='interval[int64]')
Copy the code

pd.CategoricalIndex

pandas.CategoricalIndex(
  data=None.# data
  categories=None.# Classified data
  ordered=None.# sort
  dtype=None.# data type
  copy=False.A copy of the #
  name=None)  # the name
Copy the code

In the following example, we take a batch of clothing sizes as simulation data:

In [29]:

C1 = # specified data pd. CategoricalIndex ([" S ", "M", "L", "XS", "M", "L", "S", "M", "L", "XL"]) c1Copy the code

Out[29]:

CategoricalIndex(
		# data
    ['S'.'M'.'L'.'XS'.'M'.'L'.'S'.'M'.'L'.'XL'].# Different elements appear
    categories=['L'.'M'.'S'.'XL'.'XS'].# sort by default
    ordered=False.# data type
    dtype='category'
    )
Copy the code

In [30]:

C2 = pd. CategoricalIndex ([" S ", "M", "L", "XS", "M", "L", "S", "M", "L", "XL"], # specified data classification categories = [" XS ", "S", "M", "L", "XL"]) c2Copy the code

Out[30]:

CategoricalIndex(
	['S', 'M', 'L', 'XS', 'M', 'L', 'S', 'M', 'L', 'XL'], 
	categories=['XS', 'S', 'M', 'L', 'XL'], 
	ordered=False, 
	dtype='category'
	)
Copy the code

In [31]:

c3 = pd.CategoricalIndex(
    # data
    ["S"."M"."L"."XS"."M"."L"."S"."M"."L"."XL"].# category name
    categories=["XS"."S"."M"."L"."XL"].# select sort
    ordered=True
)

c3
Copy the code

Out[31]:

CategoricalIndex(
	['S'.'M'.'L'.'XS'.'M'.'L'.'S'.'M'.'L'.'XL'], 
	categories=['XS'.'S'.'M'.'L'.'XL'], 
	ordered=True.# already sorted
	dtype='category')
Copy the code

In [32]:

C4 = pd. CategoricalIndex (# to sort the data [" S ", "M", "L", "XS", "M", "L", "S", "M", "L", "XL"]. Categories =["XS","S","M","L","XL"], # ordered=True, # ordered name="category") c4Copy the code

Out[32]:

CategoricalIndex(
	['S', 'M', 'L', 'XS', 'M', 'L', 'S', 'M', 'L', 'XL'], 
	categories=['XS', 'S', 'M', 'L', 'XL'], 
	ordered=True, 
	name='category', 
	dtype='category'
	)
Copy the code

An index object can also be instantiated from the Categorical() method:

In [33]:

c5 = pd.Categorical(["a", "b", "c", "c", "b", "c", "a"])

pd.CategoricalIndex(c5)
Copy the code

Out[33]:

CategoricalIndex( ['a', 'b', 'c', 'c', 'b', 'c', 'a'], categories=['a', 'b', 'c'], ordered=False, Dtype ='category')Copy the code

In [34]:

Pd.CategoricalIndex(c5, ordered=True) #Copy the code

Out[34]:

CategoricalIndex ([' a ', 'b', 'c', 'c', 'b', 'c', 'a'], categories = [' a ', 'b', 'c'], ordered = True, # sort dtype = 'category')Copy the code

pd.DatetimeIndex

Date_range (date_range, date_range, date_range, date_range, date_range)

pd.DatetimeIndex(
  data=None.# data
  freq=NoDefault.no_default,  # frequency
  tz=None.# time zone
  normalize=False.# normalize
  closed=None.# Whether the interval is closed
  # 'infer', bool-ndarray, 'NaT', default 'raise'
  ambiguous='raise',  
  dayfirst=False.The first day #
  yearfirst=False.# in the first year
  dtype=None.# data type
  copy=False.A copy of the #
  name=None  # the name
)
Copy the code

The date_range function uses time and date as the index, as shown in the following example:

In [35]:

Pd.date_range ("2022-01-01",periods=6)Copy the code

Out[35]:

DatetimeIndex( ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06'], Dtype ='datetime64[ns]', freq='D' #Copy the code

In [36]:

Date_range ("2022-01-01", periods=6, freq="D") d1Copy the code

Out[36]:

DatetimeIndex(
	['2022-01-01', '2022-01-02', 
  '2022-01-03', '2022-01-04',
  '2022-01-05', '2022-01-06'],
  dtype='datetime64[ns]', 
  freq='D')
Copy the code

In [37]:

Date_range ("2022-01-01",periods=6, freq="H")Copy the code

Out[37]:

DatetimeIndex(
	['2022-01-01 00:00:00', '2022-01-01 01:00:00',
  '2022-01-01 02:00:00', '2022-01-01 03:00:00',
  '2022-01-01 04:00:00', '2022-01-01 05:00:00'],
  dtype='datetime64[ns]', 
  freq='H')
Copy the code

In [38]:

Date_range ("2022-01-01",periods=6, freq="3M")Copy the code

Out[38]:

DatetimeIndex(
	['2022-01-31', '2022-04-30', 
  '2022-07-31','2022-10-31',
  '2023-01-31', '2023-04-30'],
  dtype='datetime64[ns]', 
  freq='3M')
Copy the code

In [39]:

Date_range ("2022-01-01",periods=6, freq="Q")Copy the code

The results are shown in the frequency of one quarter to three months:

Out[39]:

DatetimeIndex(
	['2022-03-31', '2022-06-30', 
  '2022-09-30','2022-12-31',
  '2023-03-31', '2023-06-30'],
	dtype='datetime64[ns]', 
	freq='Q-DEC')
Copy the code

In [40]:

Periods =6, tz="Asia/Calcutta")Copy the code

Out[40]:

DatetimeIndex(
	['2022-01-01 00:00:00+05:30', '2022-01-02 00:00:00+05:30',
  '2022-01-03 00:00:00+05:30', '2022-01-04 00:00:00+05:30',
  '2022-01-05 00:00:00+05:30', '2022-01-06 00:00:00+05:30'],
  dtype='datetime64[ns, Asia/Calcutta]', freq='D')
Copy the code

pd.PeriodIndex

PeriodIndex is an index dedicated to periodic data, which is convenient for processing data with a certain period. Its usage is as follows:

pd.PeriodIndex(
  data=None.# data
  ordinal=None.# ordinal
  freq=None.# frequency
  dtype=None.# data type
  copy=False.A copy of the #
  name=None.# the name
  **fields
)
Copy the code

Mode 1 of generating the Pd. PeriodIndex object: Specify the start time and period frequency

In [41]:

pd.period_range('2022-01-01 09:00', periods=5, freq='H')
Copy the code

Out[41]:

PeriodIndex(
['2022-01-01 09:00', '2022-01-01 10:00', 
'2022-01-01 11:00','2022-01-01 12:00', '2022-01-01 13:00'],
dtype='period[H]', freq='H')
Copy the code

In [42]:

pd.period_range('2022-01-01 09:00', periods=6, freq='2D')
Copy the code

Out[42]:

PeriodIndex(
['2022-01-01', '2022-01-03', 
'2022-01-05', '2022-01-07',
'2022-01-09', '2022-01-11'],
dtype='period[2D]', 
freq='2D')
Copy the code

In [43]:

pd.period_range('2022-01', periods=5, freq='M')
Copy the code

Out[43]:

PeriodIndex(
['2022-01', '2022-02', 
'2022-03', '2022-04', '2022-05'], 
dtype='period[M]', freq='M')
Copy the code

In [44]:

p1 = pd.DataFrame( {"name":["xiaoming","xiaohong","Peter","Mike","Jimmy"]}, Period_range ('2022-01-01 09:00', periods=5, freq='3H')) p1Copy the code

Method 2 of generating the Pd. PeriodIndex object: Use the pd.PeriodIndex method directly

In [45]:

pd.PeriodIndex(
['2022-01-01', '2022-01-02', 
'2022-01-03', '2022-01-04'], 
freq = '2H')
Copy the code

Out[45]:

PeriodIndex(
['2022-01-01 00:00', '2022-01-02 00:00', 
'2022-01-03 00:00','2022-01-04 00:00'],
dtype='period[2H]', freq='2H')
Copy the code

In [46]:

pd.PeriodIndex(
['2022-01', '2022-02', 
'2022-03', '2022-04'], 
freq = 'M')
Copy the code

Out[46]:

PeriodIndex(
['2022-01', '2022-02', 
'2022-03', '2022-04'], 
dtype='period[M]', 
freq='M')
Copy the code

In [47]:

pd.PeriodIndex(['2022-01', '2022-07'], freq = 'Q')
Copy the code

Out[47]:

PeriodIndex(
['2022Q1', '2022Q3'], 
dtype='period[Q-DEC]', 
freq='Q-DEC')
Copy the code

Method 3 of generating pd.PeriodIndex object: Use the date_range function to become a DatetimeIndex object

In [48]:

data = pd.date_range("2022-01-01",periods=6)
data
Copy the code

Out[48]:

DatetimeIndex(
['2022-01-01', '2022-01-02', 
'2022-01-03', '2022-01-04',
'2022-01-05', '2022-01-06'],
dtype='datetime64[ns]', 
freq='D')
Copy the code

In [49]:

pd.PeriodIndex(data=data)
Copy the code

Out[49]:

PeriodIndex(
['2022-01-01', '2022-01-02', 
'2022-01-03', '2022-01-04',
'2022-01-05', '2022-01-06'],
dtype='period[D]', freq='D')
Copy the code

In [50]:

DataFrame(np.random. Randn (400, 1), columns=['number'], columns= pd.period_range('2021-01-01 8:00', periods=400, freq='D')) p2Copy the code

pd.TimedeltaIndex

pd.TimedeltaIndex(
  data=None.# data
  unit=None.# Minimum unit
  freq=NoDefault.no_default,  # frequency
  closed=None.# specify the location to close
  dtype=dtype('<m8[ns]'),  # data type
  copy=False.A copy of the #
  name=None  # the name
)
Copy the code

Creation method 1: Specify data and minimum unit

In [51]:

pd.TimedeltaIndex([12, 24, 36, 48], unit='s')
Copy the code

Out[51]:

TimedeltaIndex(
	['0 days 00:00:12', '0 days 00:00:24', 
	'0 days 00:00:36','0 days 00:00:48'],
	dtype='timedelta64[ns]', 
	freq=None)
Copy the code

In [52]:

Pd.TimedeltaIndex([1, 2, 3, 4], unit='h') #Copy the code

Out[52]:

TimedeltaIndex(
	['0 days 01:00:00', '0 days 02:00:00', 
	'0 days 03:00:00','0 days 04:00:00'],
	dtype='timedelta64[ns]', 
	freq=None)
Copy the code

In [53]:

pd.TimedeltaIndex([12, 24, 36, 48], unit='h') 
Copy the code

Out[53]:

TimedeltaIndex( ['0 days 12:00:00', '1 days 00:00:00', '1 days 12:00:00','2 days 00:00:00'], dtype='timedelta64[ns]', # data type freq=None)Copy the code

In [54]:

pd.TimedeltaIndex([12, 24, 36, 48], unit='D')
Copy the code

Out[54]:

TimedeltaIndex(
	['12 days', '24 days', '36 days', '48 days'], 
	dtype='timedelta64[ns]', freq=None)
Copy the code

Creation method 2: Use the timedelta_range function to generate data indirectly

In [55]:

data1 = pd.timedelta_range(start='1 day', periods=4)
data1
Copy the code

Out[55]:

TimedeltaIndex(['1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq='D')
Copy the code

In [56]:

pt1 = pd.TimedeltaIndex(data1)

pt1
Copy the code

Out[56]:

TimedeltaIndex(
	['1 days', '2 days', '3 days', '4 days'], 
	dtype='timedelta64[ns]', freq='D')
Copy the code

In [57]:

data2 = pd.timedelta_range(start='1 day', end='3 days', freq='6H')
data2
Copy the code

Out[57]:

TimedeltaIndex(
	['1 days 00:00:00', '1 days 06:00:00', '1 days 12:00:00',
  '1 days 18:00:00', '2 days 00:00:00', '2 days 06:00:00',
  '2 days 12:00:00', '2 days 18:00:00', '3 days 00:00:00'],
  dtype='timedelta64[ns]', freq='6H')
Copy the code

In [58]:

pt2 = pd.TimedeltaIndex(data2)

pt2
Copy the code

Out[58]: