“This is the 27th day of my participation in the Gwen Challenge in November. See details of the event: The Last Gwen Challenge in 2021”

Pandas Timestamp index -DatetimeIndex

Pd.datetimeindex () and TimeSeries TimeSeries

Pd.datetimeindex () can directly generate timestamp index, support STR, datetime.datetime. A single Timestamp is of type Timestamp and multiple timestamps are of type DatetimeIndex, as shown in the following example:

rng = pd.DatetimeIndex(['12/1/2017'.'12/2/2017'.'12/3/2017'.'12/4/2017'.'12/5/2017'])
print(rng,type(rng))
print(rng[0].type(rng[0]))
>>>
DatetimeIndex(['2017-12-01'.'2017-12-02'.'2017-12-03'.'2017-12-04'.'2017-12-05'],
              dtype='datetime64[ns]', freq=None) <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
2017-12-01 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
Copy the code
What is a TimeSeries TimeSeries?

A TimeSries time Series whose index is DatetimeIndex

st = pd.Series(np.random.rand(len(rng)), index = rng)
print(st,type(st))
print(st.index)
>>>
2017-12- 010.081920
2017-12.0.921781
2017-12- 030.489779
2017-12- 040.257632
2017-12- 050.805373
dtype: float64 <class 'pandas.core.series.Series'>
DatetimeIndex(['2017-12-01'.'2017-12-02'.'2017-12-03'.'2017-12-04'.'2017-12-05'],
              dtype='datetime64[ns]', freq=None)
Copy the code

Pd.date_range ()- Build date range

Pd.date_range () Generates the date range in two ways (the default frequency is day) :

  • Start time (start) + End time (end)
  • Periods: Start time/End time + periods

Here’s an example:

date1 = pd.date_range('2017/1/1'.'2017/10/1',normalize=True)
print(date1)
date2 = pd.date_range(start = '1/1/2017', periods = 10)
print(date2)
date3 = pd.date_range(end = '1/30/2017 15:00:00', periods = 10,normalize=True)  # added hours, minutes, and seconds
print(date3)
>>>
DatetimeIndex(['2017-01-01'.'2017-01-02'.'2017-01-03'.'2017-01-04'.'2017-01-05'.'2017-01-06'.'2017-01-07'.'2017-01-08'.'2017-01-09'.'2017-01-10'.'2017-09-22'.'2017-09-23'.'2017-09-24'.'2017-09-25'.'2017-09-26'.'2017-09-27'.'2017-09-28'.'2017-09-29'.'2017-09-30'.'2017-10-01'],
              dtype='datetime64[ns]', length=274, freq='D')
DatetimeIndex(['2017-01-01'.'2017-01-02'.'2017-01-03'.'2017-01-04'.'2017-01-05'.'2017-01-06'.'2017-01-07'.'2017-01-08'.'2017-01-09'.'2017-01-10'],
              dtype='datetime64[ns]', freq='D')
DatetimeIndex(['2017-01-21'.'2017-01-22'.'2017-01-23'.'2017-01-24'.'2017-01-25'.'2017-01-26'.'2017-01-27'.'2017-01-28'.'2017-01-29'.'2017-01-30'],
              dtype='datetime64[ns]', freq='D')
Copy the code
pd.date_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None, **kwargs)
Copy the code

The common parameters are as follows:

  • Start: indicates the start time
  • End: indicates the end time
  • Periods
  • Freq: frequency, default day, pd.date_range() default frequency is calendar day, pd.bdate_range() default frequency is workday
  • Tz: time zone
  • Normalize: Time parameter values are regularized to the midnight timestamp
  • Closed: If the default value is None, the left is closed and the right is closed. The left is closed and the right is closed

Here’s an example of the normalize parameter in action:

rng4 = pd.date_range(start = '1/1/2017 hold', periods = 10, name = 'hello world! ', normalize = True)
print(rng4)
>>>
DatetimeIndex(['2017-01-01'.'2017-01-02'.'2017-01-03'.'2017-01-04'.'2017-01-05'.'2017-01-06'.'2017-01-07'.'2017-01-08'.'2017-01-09'.'2017-01-10'],
              dtype='datetime64[ns]', name='hello world! ', freq='D')
Copy the code
Use of FREQ (1) – Generation of fixed frequency time series

Basic usage is as follows:

print(pd.date_range('2017/1/1'.'2017/1/4'))  # default freq = 'D' : daily calendar day
print(pd.date_range('2017/1/1'.'2017/1/4', freq = 'B'))  # B: Every working day
print(pd.date_range('2017/1/1'.'2017/1/2', freq = 'H'))  # H: every hour
print(pd.date_range('2017/1/1 12:00'.'2017/1/1 they', freq = 'T'))  # T/MIN: per minute
print(pd.date_range('2017/1/1 12:00:00'.'2017/1/1 12:00:10', freq = 'S'))  # S: per second
print(pd.date_range('2017/1/1 12:00:00'.'2017/1/1 12:00:10', freq = 'L'))  # L: every millisecond (thousandth of a second)
print(pd.date_range('2017/1/1 12:00:00'.'2017/1/1 12:00:10', freq = 'U'))  # U: per microsecond (millionth of a second)
Copy the code

Advanced use is as follows:

print(pd.date_range('2017/1/1'.'2017/2/1', freq = 'W-MON'))  
# w-mon: every week from the day of the week specified
What day # abbreviation: MON/TUE/WED/THU FRI/SAT/SUN

print(pd.date_range('2017/1/1'.'2017/5/1', freq = 'WOM-2MON'))  
# wom-2mon: The number of days of the month starts with the second Monday of the month
Copy the code
Use of FREQ (2) – Time series required for diversification generation

Generates calendar days for the specified frequency:

print(pd.date_range('2017'.'2018', freq = 'M'))  
print(pd.date_range('2017'.'2020', freq = 'Q-DEC'))  
print(pd.date_range('2017'.'2020', freq = 'A-DEC'))
print('-- -- -- -- -- -)
# M: The last calendar day of the month
# Q- Month: Specifies the month at the end of the quarter, the last calendar day of the last month at the end of each quarter
# A- Month: The last calendar day of A specified month of the year
# abbreviation: JAN/FEB/MAR/APR/MAY/JUN/JUL/AUG/SEP/OCT/NOV/DEC
# So there are only three cases of q-month: 1-4-7-10,2-5-8-11,3-6-9-12
Copy the code

Generate the working days of the specified frequency:

print(pd.date_range('2017'.'2018', freq = 'BM'))  
print(pd.date_range('2017'.'2020', freq = 'BQ-DEC'))  
print(pd.date_range('2017'.'2020', freq = 'BA-DEC'))
print('-- -- -- -- -- -)
# BM: Last working day of the month
# BQ- Month: Specifies the month at the end of the quarter, the last business day of the last month at the end of each quarter
# BA- Month: The last working day of a specified month of the year
Copy the code

A special time to generate a specified rule:

print(pd.date_range('2017'.'2018', freq = 'MS'))  
print(pd.date_range('2017'.'2020', freq = 'QS-DEC'))  
print(pd.date_range('2017'.'2020', freq = 'AS-DEC'))
print('-- -- -- -- -- -)
# M: first calendar day of the month
# QS- Month: Specifies the month at the end of the quarter, the first calendar day of the last month at the end of each quarter
# AS-month: the first calendar day of a specified month of the year

print(pd.date_range('2017'.'2018', freq = 'BMS'))  
print(pd.date_range('2017'.'2020', freq = 'BQS-DEC'))  
print(pd.date_range('2017'.'2020', freq = 'BAS-DEC'))
print('-- -- -- -- -- -)
# BMS: first business day of the month
# BQS- Month: Specifies the month as the end of the quarter, the first business day of the last month at the end of each quarter
# BAS- Month: the first working day of a specified month of the year
Copy the code
Use of FREQ (3) – Use of compound frequencies

Generate a time series for the specified compound frequency:

print(pd.date_range('2017/1/1'.'2017/2/1', freq = '7D'))  # 7 days
print(pd.date_range('2017/1/1'.'2017/1/2', freq = '2h30min'))  # 2 hours, 30 minutes
print(pd.date_range('2017'.'2018', freq = '2M'))  # First calendar day every 2 months
Copy the code
Asfreq – Period frequency conversion

How can a time series with a day interval be modified to a time series with a smaller unit interval?

ts = pd.Series(np.random.rand(4),
              index = pd.date_range('20170101'.'20170104'))
print(ts)
print(ts.asfreq('4H',method = 'ffill'))
# change the frequency, here D is changed to 4H
None is not interpolated, ffill is filled with the previous value, bfill is filled with the later value
Copy the code
How to advance/lag data?

The following leading/lagging data moves values:

ts = pd.Series(np.random.rand(4),
              index = pd.date_range('20170101'.'20170104'))
print(ts)
print(ts.shift(2))
print(ts.shift(-2))
print('-- -- -- -- -- -)
# positive: the value moves backward (lags); Negative: numerical advance (ahead)
>>>
2017- 01-010.575076
2017- 01-020.514981
2017- 01-030.221506
2017- 01-040.410396
Freq: D, dtype: float64
2017-01-01         NaN
2017-01-02         NaN
2017- 01-030.575076
2017- 01-040.514981
Freq: D, dtype: float64
2017- 01-010.221506
2017- 01-020.410396
2017-01-03         NaN
2017-01-04         NaN
Freq: D, dtype: float64
Copy the code

The freq offset argument offsets the index timestamp instead of the value:

print(ts.shift(2, freq = 'D'))
print(ts.shift(2, freq = 'T'))
# plus the freq parameter: shift the timestamp instead of the value
Copy the code

Pandas indicates the Period

Pd.period () Creation Period

Generate a time constructor with a frequency of months starting from 2017-01:

p = pd.Period('2017', freq = 'M')
print(p, type(p))
>>>
2017- 01 <class 'pandas._period.Period'>
Copy the code

We can shift the period as a whole by adding and subtracting integers:

p = pd.Period('2017', freq = 'M')
print(p, type(p))
print(p + 1)
print(p - 2) > > >2017.2016-11
Copy the code
Pd.period_range () Creation period range

Create a specified period range:

prng = pd.period_range('1/1/2011'.'1/1/2012', freq='M')
print(prng,type(prng))
>>>
PeriodIndex(['the 2011-01'.'the 2011-02'.'the 2011-03'.'the 2011-04'.'the 2011-05'.'the 2011-06'.'the 2011-07'.'the 2011-08'.'the 2011-09'.'the 2011-10'.'the 2011-11'.'2011-12'.'the 2012-01'],
            dtype='int64', freq='M') <class 'pandas.tseries.period.PeriodIndex'>
Copy the code

Combined with the period sequence above, create a time series:

ts = pd.Series(np.random.rand(len(prng)), index = prng)
print(ts,type(ts))
print(ts.index)
>>>
2011- 010.342571
2011.0.826151
2011- 030.370505
2011- 040.137151
2011- 050.679976
2011- 060.265928
201107 -0.416502
2011- 080.874078
2011- 090.112801
2011-10    0.112504
2011-11    0.448408
2011-12    0.851046
2012- 010.370605
Freq: M, dtype: float64 <class 'pandas.core.series.Series'>
PeriodIndex(['the 2011-01'.'the 2011-02'.'the 2011-03'.'the 2011-04'.'the 2011-05'.'the 2011-06'.'the 2011-07'.'the 2011-08'.'the 2011-09'.'the 2011-10'.'the 2011-11'.'2011-12'.'the 2012-01'],
            dtype='int64', freq='M')
Copy the code
Pd. Period-asfreq: frequency conversion

The.asfreq(freq, method=None, how=None) method can be used to convert the previously generated frequency to another frequency

p = pd.Period('2017'.'A-DEC')
print(p)
print(p.asfreq('M', how = 'start'))  # how = 's'
print(p.asfreq('D', how = 'end'))  # how = 'e'
>>>
2017
2017- 012017-12-31
Copy the code

Asfreq can also convert the index of TIMESeries:

prng = pd.period_range('2017'.'2018',freq = 'M')
ts1 = pd.Series(np.random.rand(len(prng)), index = prng)
ts2 = pd.Series(np.random.rand(len(prng)), index = prng.asfreq('D', how = 'start'))
print(ts1.head(),len(ts1))
print(ts2.head(),len(ts2))
Copy the code

Conversion between timestamp and period

Pd.to_period () and pd.to_timestamp() are used to convert timestamps to periods.

rng = pd.date_range('2017/1/1', periods = 10, freq = 'M')
prng = pd.period_range('2017'.'2018', freq = 'M')

ts1 = pd.Series(np.random.rand(len(rng)), index = rng)
print(ts1.head())
print(ts1.to_period().head())
# Last day of the month, convert to the month

ts2 = pd.Series(np.random.rand(len(prng)), index = prng)
print(ts2.head())
print(ts2.to_timestamp().head())
Convert to the first day of the month
>>>
2017- 01 -31    0.125288
2017- 02 -28    0.497174
2017- 03 -31    0.573114
2017- 04 -30    0.665665
2017- 05 -31    0.263561
Freq: M, dtype: float64
2017- 010.125288
2017.0.497174
2017- 030.573114
2017- 040.665665
2017- 050.263561
Freq: M, dtype: float64
2017- 010.748661
2017.0.095891
2017- 030.280341
2017- 040.569813
2017- 050.067677
Freq: M, dtype: float64
2017- 01-010.748661
2017- 02-010.095891
2017- 03-010.280341
2017- 04-010.569813
2017- 05-020.067677
Freq: MS, dtype: float64
Copy the code

Index and slice of time series

The index

The index method of time series is also applicable to Dataframe, and the time series is sorted according to time sequence, so there is no need to consider the order problem.

The basic location index uses a similar method to a list:


from datetime import datetime

rng = pd.date_range('2017/1'.'2017/3')
ts = pd.Series(np.random.rand(len(rng)), index = rng)
print(ts.head())

print(ts[0])
print(ts[:2) > > >2017- 01-010.107736
2017- 01-020.887981
2017- 01-030.712862
2017- 01-040.920021
2017- 01-050.317863
Freq: D, dtype: float64
0.107735945027
2017- 01-010.107736
2017- 01-020.887981
Freq: D, dtype: float64
Copy the code

In addition to the basic position index, there is a time series tag index:

from datetime import datetime

rng = pd.date_range('2017/1'.'2017/3')
ts = pd.Series(np.random.rand(len(rng)), index = rng)
print(ts['2017/1/2'])
print(ts['20170103'])
print(ts['1/10/2017'])
print(ts[datetime(2017.1.20)]) > > >0.887980757812
0.712861778966
0.788336674948
0.93070380011
Copy the code
slice

The use of slices is mentioned in the basic position index of the index section above.

rng = pd.date_range('2017/1'.'2017/3',freq = '12H')
ts = pd.Series(np.random.rand(len(rng)), index = rng)
print(ts['2017/1/5':'2017/1/10') > > >2017- 01-0500:00:00    0.462085
2017- 01-0512:00:00    0.778637
2017- 01-0600:00:00    0.356306
2017- 01-0612:00:00    0.667964
2017- 01-0700:00:00    0.246857
2017- 01-0712:00:00    0.386956
2017- 01-0800:00:00    0.328203
2017- 01-0812:00:00    0.260853
2017- 01-0900:00:00    0.224920
2017- 01-0912:00:00    0.397457
2017- 01 -10 00:00:00    0.158729
2017- 01 -10 12:00:00    0.501266
Freq: 12H, dtype: float64


# Here we can pass in the month and get the slice of the whole month directly
print(ts['2017/2'].head())
>>>
2017- 02-0100:00:00    0.243932
2017- 02-0112:00:00    0.220830
2017- 02-0200:00:00    0.896107
2017- 02-0212:00:00    0.476584
2017- 02-0300:00:00    0.515817
Freq: 12H, dtype: float64
Copy the code
A time series of repeated indexes
dates = pd.DatetimeIndex(['1/1/2015'.'1/2/2015'.'1/3/2015'.'1/4/2015'.'1/1/2015'.'1/2/2015'])
ts = pd.Series(np.random.rand(6), index = dates)
print(ts)
# we can check whether the value or index is duplicated by is_unique
print(ts.is_unique,ts.index.is_unique)
>>>
2015- 01-010.300286
2015- 01-020.603865
2015- 01-030.017949
2015- 01-040.026621
2015- 01-010.791441
2015- 01-020.526622
dtype: float64
True False
Copy the code

From the above results, it can be seen that in the above time series, the index(ts.index.is_unique) is repeated but the value (ts.is_unique) is not repeated.

We can solve the problem of duplicate indexes by averaging the corresponding values of duplicate indexes through time series:

print(ts.groupby(level = 0).mean())
# groupby groupby. Duplicate values are averaged here
>>>
2015- 01-010.545863
2015- 01-020.565244
2015- 01-030.017949
2015- 01-040.026621
dtype: float64
Copy the code