Author | SAURABH JAJU compile | Flin source | analyticsvidhya

introduce

Time series prediction and modeling play an important role in data analysis. Time series analysis is a branch of statistics, which is widely used in econometrics and operations research. This skill test is designed to test your knowledge of time series concepts.

A total of 1,094 people signed up for the skills test. This test is designed to test your level of knowledge about time series. If you missed the skills test, here are some questions and solutions. If you missed the live quiz, read this article to see how many questions you could answer correctly.

This is a leaderboard of all the contestants

  • Datahack.analyticsvidhya.com/contest/avd…

All points

Below are distributed scores that will help you evaluate your performance.

You can check out the scores at the link below.

  • Datahack.analyticsvidhya.com/contest/avd…

More than 300 people took the skills test, with the highest score being 38. Here are some statistics about the distribution.

  • Average score: 17.13
  • Median: 19
  • The number: 19

1) Which of the following is an example of a time series problem?

  1. Estimated number of hotel room reservations for the next 6 months.
  2. Estimate the total sales of the insurance company over the next three years.
  3. Estimate the number of calls for the next week.

A) only 3 B) 1 and 2 C) 2 and 3 D) 1 and 3 E) 1,2 and 3

Solution :(E)

All of the above choices have to do with time series.

2) Which of the following is not an example of a time series model?

A) The naive method

B) Exponential smoothing

C) Moving average

D) None of the above

Solution :(D)

Naive method: An estimation technique in which the actual conditions of the last period are used as predictions for the period without adjustment or attempt to determine causal factors, applicable to more stable sequences. It is used only for comparison with predictions generated by more sophisticated techniques.

In exponential smoothing, the relative importance of old data gradually decreases while that of new data gradually increases.

In time series analysis, moving average (MA) model is a common modeling method of univariate time series. The moving average model specifies that the output variable is linearly dependent on the current value of the random term (which is not entirely predictable) and various past values.

3) Which of the following cannot be part of a time series diagram?

A) Seasonality

B) the trend

C) Periodicity

D) noise

E) None of the above

Solution :(E)

There is a seasonal pattern when a range of factors is influenced by seasonal factors (for example, a quarter of the year, a month, or a day of the week). Seasonality is always a fixed and known period. Therefore, seasonal time series are sometimes called periodic time series

The season is always a fixed and known period. There is a cyclical pattern when data goes up and down in irregular cycles.

Trends are defined as “long-term” movements in a time series that have no relation to the calendar and have irregular effects. It is the result of population growth, rising prices and general economic changes. The chart below depicts a series with a clear upward trend over time.

Noise: In discrete time, white noise is a discrete signal whose samples are treated as a series of unrelated random variables with zero mean and finite variance.

Therefore, all of the above is part of a time series.

4) In time series modeling, which of the following is easier to estimate?

A) Seasonality

B) Periodicity

C) There is no difference between seasonal and cyclical

Solution :(A)

As we saw in the previous solution, seasonality is easier to estimate because it has a smooth structure.

5) The time series chart below contains both cyclical and seasonal components

A) true B) false

Solutions: (B)

The above chart repeats similar trends at regular intervals, so they are really seasonal.

6) Adjacent observations in time series data (excluding white noise) are independent and uniformly distributed (IID)

A) true B) false

Solution :(B)

They tend to be strongly correlated with time as the time interval between observations becomes shorter. Because time series prediction is based on previous observation data rather than current observation data, there is no high correlation between data like classification or regression.

7) Smoothing parameters close to 1 will give greater weight or influence to the latest observations in the prediction

A) true

B) false

Solution :(A)

It is more sensible to apply a larger weight to newer observations than to observations from historical data. This is the concept behind simple exponential smoothing. The predictions are calculated using a weighted average that decreases exponentially as past observations increase – the smallest weight will be associated with the earliest observations.

8) The sum of the weights of exponential smoothing is _____

A) less than B) less than C) less than D) less than

Solutions: (B)

Table 7.1 shows the weights of the observations for four different α values when the prediction is made using simple exponential smoothing. Note that for any reasonable sample size, even for small alpha, the weights will add up to about 1.

Observation Alpha = 0.2 Alpha = 0.4 Alpha = 0.6 Alpha = 0.8
yT 0.2 0.4 0.6 0.8
YT – 1 0.16 0.24 0.24 0.16
YT – 2 0.128 0.144 0.096 0.032
YT – 3 0.102 0.0864 0.0384 0.0064
YT – 4 (0.2) (0.8) (0.4) (0.6) (0.6) (0.4) (0.8) (0.2)
YT – 5 (0.2) (0.8) (0.4) (0.6) (0.6) (0.4) (0.8) (0.2)

9) The previous stage forecast was 70, while demand was 60. What is simple exponential smoothing? When Alpha = 0.4, predict the next cycle.

A) 63.8 B) 65 C) 62 D) 66

Solution :(D) yt-1 = 70 st-1 = 60 Alpha = 0.4

Substitute in, get:

0.4 * 60 + 0.6 * 70 = 24 + 42 = 66

10) What does autocovariance measure?

A) The linear correlation between multiple points on different sequences observed at different times

B) The quadratic correlation between two points on the same sequence observed at different times

C) Linear relationship between two points of different sequences observed at the same time

D) The linear relationship between two points on the same sequence observed at different times

Solution :(D)

Choice D is the definition of the self-covariance.

11) Which of the following is not a necessary condition for weakly stationary time series?

A) The average value is constant and does not depend on time

B) the autocovariance function only through its poor | s – t | depend on s and t (t and s for the moment)

C) The time series considered is a process of finite variance

D) Time series are Gaussian

Solution :(D)

Gaussian time series implies that the stationarity is strict stationarity.

12) Which of the following is not a technique for smoothing time series?

A) Nearest neighbor regression

B) Smoothing of locally weighted scatterplot

C) Tree-based models such as (CART)

D) Smooth spline curves

Solution :(C)

Time series smoothing and filtering can be represented by a local regression model. Polynomials and regression splines also provide important techniques for smoothing. CART based models do not provide equations to be superimposed on time series and therefore cannot be used for smoothing. All the other techniques are well-documented smoothing techniques.

13) If October 2016 demand is 100, November 2016 demand is 200, December 2016 demand is 300 and January 2017 demand is 400. What is the 3-month moving average for February 2017?

A)300

B) 350

C) 400

D) More information is needed

Solution :(A)

X’ = (xt-3 + xt-2 + xt-1) /3

200 plus 300 plus 400 divided by 3 is 900/3 is 300

14) Looking at the ACF diagram below, do you recommend using AR or MA for ARIMA modeling techniques?

A) are B) are C) are D) are

Solution :(A)

Consider using the MA model if the autocorrelation function (ACF) of the difference sequence shows significant truncation or the partial correlation coefficient shows trailing, then consider adding the MA term of a model. The lag time for ACF cutoff is the MA term.

However, since there is no obvious truncation, the AR model must be used.

15) Suppose you’re a data scientist at Analytics Vidhya. You noticed an increase in comments to articles in January-March. Traffic declined between November and December.

Do these statements represent seasonality in the data?

A) true B) false C) impossible to judge

Solution :(A)

Yes, this is a definite seasonal trend, because the view changes at certain times.

Keep in mind that “seasonality” refers to changes that occur within a specific periodic time interval.

16) Which of the following graphs can be used to detect seasonality in time series data?

1. Multiple boxes 2. Autocorrelation

A) only 1 B) only 2 C) 1 and 2 D) None of these

Solution :(C)

Seasonality is the existence of variations within specific periodic intervals.

Changes in distribution can be observed in multiple box plots. Therefore, seasonality can be easily detected. Autocorrelation plots should show peaks at a lag equal to the period.

17) Stationarity is an ideal property of time series processes.

A) Yes B) False

Solution :(A)

When the following conditions are met, the time series is stationary.

  1. The average is constant and does not depend on time
  2. Since the covariance function only depends on the s and t – t | | s. (where T and S are time points)
  3. The time series considered is a finite-variance process

These conditions are necessary prerequisites for the mathematical representation of time series to be used for analysis and prediction. So, stationarity is an ideal property.

18) Suppose you get a Time series dataset with only 4 columns (ID, Time, X, Target)

Given window size 2, what is the moving average of feature X?

Note: the X column represents the sliding average.

A)

B)

C)

D) None of the above

Solutions: (B)

X` = Xt-2 + Xt-1 / 2

According to the above formula :(100 +200) / 2 = 150; 200 plus 300 divided by 2 is 250, and so on.

19) Imagine that you are working with a time series data set. Your manager has asked you to build a highly accurate model. You start building two types of models.

Model 1: Decision tree model

Model 2: Time series regression model

At the end of your evaluation of both models, you find that Model 2 is better than Model 1.

A) Model 1 does not map linear relationships as model 2 does B) Model 1 is always better than Model 2 C) you cannot compare decision trees with time series regression D) None of these

Solutions: (A)

A time series model is similar to a regression model. Therefore, it is good at finding simple linear relationships. Tree-based models, while effective, are not as good at discovering and exploiting linear relationships.

20) Which type of analysis is most effective for temperature prediction based on the following types of data.

A) time series analysis B) classification C) clustering D) none of the above

Solution :(A)

This problem captures data over several consecutive days, so the most effective type of analysis is time series analysis.

21) What is the first difference of temperature/precipitation variables?

A) 15,12.2, -43.2, -23.2,14.3, -7 B) 38.17, -46.11, -4.98,14.29, -22.61 C) 35,38.17, -46.11, -4.98,14.29, -22.61 D) 36.21, -43.23 5.43, 17.44, 22.61

Solution :(B)

73.17-35 = 38.17 27.05-73.17 = — 46.11, and so on. 13.75 — 36.36 = -22.61

22) Consider the following data sets:

{23.32 32.33 32.88 28.98 33.16 26.33 29.88 32.69 18.98 21.23 26.66 29.89} What is the autocorrelation of time series lagging one sample?

A) 0.26 B) 0.52 C) 0.13 D) 0.07

Solution :(C)

ρ 1 = PTt = 2 (x t-1-x ¯) (xt-x¯) PTt = 1 (xt-x¯) ^2

23.32 x = (‘) (32.33 x ‘) + 32.33 x (‘) (32.88 x ‘) +… t = 1 PT (xt – x ‘) ^ 2

= 0.130394786

Where x is the average of the series, which is 28.0275

23) Any stationary time series can be approximated as a random superposition of sines and cosines oscillating at various frequencies.

A) true B) false

Solution :(A)

Weakly stationary time series XT is a process of finite variance, therefore

  • The mean value function µt is constant and independent of time t; And (ii) to define the autocovariance function of gamma (s, t) only depends on the s and t – t | | s.

The random superposition of sines and cosines oscillating at various frequencies is white noise. White noise is faint or smooth. If the white noise variable is also normally distributed or Gaussian distributed, the sequence is also strictly stationary.

24) The autocovariance function of weakly stationary time series does not depend on _______?

A) the interval of xs and xt B) h = | s – t | C) at A certain time point position

Solution :(C)

By defining the weakly stationary time series described in the previous problem.

25) If _____, then the two time series are jointly stationary.

A) they are both stationary B) the cross variance function is only A function of the lag h C) only A D) A and B

Solution :(D)

Joint stationarity is defined in terms of the above two conditions.

26) In the autoregressive model _______

A) The current value of A dependent variable is affected by the current value of the independent variable B) the current value of A dependent variable is affected by the current and past values of the independent variable C) The current value of A dependent variable is affected by the past values of the dependent variable and the independent variable D) None of the above

Solution :(C)

Autoregressive models are based on the idea that the current value xt of a sequence can be interpreted as p past values xT-1, XT-2… , a function of xt-p, where P determines the number of past steps required to predict the current value. For example xt = xt-1 -.90xt-2 + wt,

Where XT-1 and XT-2 are the past values of dependent variables and WT, and white noise can represent independent values.

The example can be extended to include multiple sequences similar to multiple linear regression.

27) For the MA (moving mean) model, the same autocovariance functions are produced for σ=1 and θ=5 as for σ=25 and θ=1/5.

A) true B) false

Solution :(A)

Right, because the MA model’s autocovariance is invertible

Note that for MA (1) models, ρ (h) is the same for θ and 1 /θ.

28) By looking at the ACF and PACF graphs below, how many AR and MA terms can be included in the time series?

A) AR (1) MA (0) B) AR (0) MA (1) C) AR (2) MA (1) D) AR (1) MA (2) E

Solution :(B)

The strong negative correlation of lag 1 indicates that MA has only one significant lag. Read this article for a better understanding.

  • www.analyticsvidhya.com/blog/2015/1…

29) Which of the following is true of white noise?

A) mean = 0 B) autocovariance = 0 C) autocovariance = 0 (except for zero lag D) quadratic difference

Solution :(C)

White noise processes must have constant mean, constant variance, and no self-covariance structure (except hysteresis zero (variance)).

30) For the following MA (3) process yt = μ + Εt+ theta.1Εt-1+ theta.2Εt-2+ theta.3Εt-3And the sigmatThe variance is sigma2Zero mean white noise process.

A) ACF = 0 in hysteresis 3 B) ACF = 0 in hysteresis 5 C) ACF = 1 in hysteresis 1 D) ACF = 0 in hysteresis 2 E) ACF = 0 in hysteresis 3 and 5

Solution :(B)

Recall that MA (q) processes only have memories of length Q. This means that all autocorrelation coefficients are zero after the lag q. This can be seen by examining the MA equation and seeing that only the past Q perturbation terms enter the equation.

Therefore, if we iterate this equation forward for more than q cycles, the current value of the perturbation term will no longer affect Y. Since the autocorrelation function of hysteresis zero is the correlation of y at time t with y at time t (that is, the correlation of y_t with itself), the autocorrelation function at hysteresis zero must, by definition, be 1.

31) Consider the AR (1) model below, whose perturbation terms have zero mean and unit variance. yt= 0.4 + 0.2yt-1 + ut, the (unconditional) variance of y is given by ____.

A) 1.5 B) 1.04 C) 0.5 D) 2

Solution :(B)

The variance of the disturbance divided by 1 minus the autoregressive coefficient squared.

In this case: 1 / (1- (0.2 ^ 2)) = 1/0.96 = 1.041

32) PACF (partial autocorrelation function) is necessary to distinguish ______.

A) AR and MA models: false B) AR and ARMA models: true C) MA and ARMA models: False D) Different models in ARMA series

Solution :(B)

33) Which trend can quadratic difference of time series help eliminate?

A) quadratic B) linear C) both A and B D) none of the above

Solution :(A)

The first difference is expressed as xt = xt− xt−1. (1)

As we can see, the first difference eliminates the linear trend. The second difference (the difference of (1)) eliminates the quadratic trend, and so on.

34) Which of the following cross-validation techniques is more suitable for time series data?

C) Stratified Shuffle Split D) Forward chain Split

Solution :(D)

Time series are ordered data. Therefore, the validation data must be sorted. Forward chains ensure this. It works as follows:

  • Fold 1: training [1], testing [2]
  • Fold 2: training [1 2], testing [3]
  • Fold 3: training [1 2 3], testing [4]
  • Fold 4: training [1 2 3 4], testing [5]
  • Fold 5: training [1 2 3 4 5], testing [6]

35) BIC penalizes complex models better than AIC.

A) true B) false

Solution :(A)

AIC = -2 * LN (likelihood) + 2 * K,

BIC = -2 * LN (likelihood) + LN (N) * K,

When:

K = degree of freedom of model

N = number of observations

BIC is more tolerant of free parameters than AIC when N is relatively low (7 and below), but less tolerant when N is relatively high (because the natural logarithm of N is greater than 2).

36) The figure below shows the estimated and partial autocorrelation of the time series with n = 60 observations. Based on these pictures, we should ____.

A) To transform data by obtaining logs B) to differentiate the series to obtain stationary data C) to fit MA (1) model to time series

Solution :(B)

Autocorrelation shows a definite trend, while partial autocorrelation shows a fluctuating trend, in which case logarithms are useless. The only choice is to differentiate the sequence to obtain a stationary sequence.

37, 38

37) Using the estimated exponential smoothness given above, and predicting the temperature for the next 3 years (1998-2000)

These results summarize simple exponential smoothing and time series fitting.

A) 0.2, 0.32, 0.6 B) 0.33, 0.33, 0.33 C) 0.27, 0.27, 0.27 D) 0.4, 0.3, 0.37

Solution :(B)

Exponential smoothing predicts the same value for three years, so all we need is next year’s value. The expression for smoothness is theta

Smootht = α YT + (1 — α) smooth T-1

Thus, for the next point, the next value of smoothing (the prediction of the next observation) is

Smoothn = α yn + (1 — α) smooth n-1

= 0.39680.43 + (1 — 0.3968) 0.3968

= 0.3297

38) Find the 95% range of temperature predictions for 1999.

These results summarize the fitting of time series by simple exponential smoothing.

A) 0.3297 2 * 0.1125 B) 0.3297 2 * 0.121c C) 0.3297 2 * 0.129d) 0.3297 2 * 0.22

Solution :(B)

The standard deviation of the prediction error is

One cycle is 0.1125

The two cycles were 0.1125 SQRT (1+α2) = 0.1125 * SQRT (1+ 0.39682) ≈ 0.121

39) Which of the following statements is true?

  1. If the autoregressive parameter (p) in ARIMA model is 1, it means that there is no autocorrelation in the sequence.
  2. If the moving average component (q) in ARIMA model is 1, it indicates that there is a lag 1 autocorrelation in the sequence
  3. If the integral component (d) in the ARIMA model is 0, it means that the series is not stationary.

A) only 1 B) both 1 and 2 C) only 2 D) All statements

Solution :(C)

Autoregressive component: AR stands for autoregressive. The autoregressive parameter is represented by P. When p = 0, there is no autocorrelation in the sequence. When p = 1, it indicates that the sequence is autocorrelated to a lag.

Integration: In ARIMA time series analysis, integration is represented by D. An integral is the reciprocal of a derivative.

  • When d is equal to 0, that means the series is stationary, and we don’t have to take the difference.
  • When d = 1, that means that the sequence is not stationary, and to make it stationary, we need to find the first difference.
  • When d = 2, it means that the sequence requires quadratic difference.
  • In general, more than two differences are unreliable.

Moving average component: MA represents the moving average, denoted by Q. In ARIMA, the moving average q = 1 means that it is an error term and there is a lagging autocorrelation.

40) In the time series prediction problem, if the seasonal indexes of the first, second and third quarters are 0.80, 0.90 and 0.95 respectively. What do you think of the seasonal index for the fourth quarter?

A) less than 1 B) greater than 1 C) equal to 1 D) seasonality does not exist E) insufficient data

Solution :(B)

Since there are four seasons, the seasonal index must total 4. B) 0.80 + 0.90 + 0.95 = 2.65 B) 0.80 + 0.90 + 0.95 = 2.65

The original link: www.analyticsvidhya.com/blog/2017/0…

Welcome to panchuangai blog: panchuang.net/

Sklearn123.com/

Welcome to docs.panchuang.net/