The original:tecdat.cn/?p=3726

This time, we will use the K-Shape time series clustering method to check the time series of stock returns of companies with which we have business relations.

 

Business-to-business transactions and stock prices

In this study, we will study the similarity of time series of price change rates of firms with trading relationships. Because of the larger sales volume of a particular customer compared to that of the supplier company, the response to the supplier company’s stock price is perceived to be greater when the customer company’s stock price changes.

 k-Shape

K-shape [Paparrizos and Gravano, 2015] is a time series clustering method that focuses on the Shape of time series. Before we get into K-Shape, let’s talk about the invariance of time series and the distance measure between common time series.

 

Time series distance measure

Euclidean distance (ED) and dynamic time warping (DTW) are commonly used as distance measurements for comparison between time series.

Two time series x = (x1… , xm) and y = (y1… , YM) ED is as follows.

DTW is an extension of ED, allowing local and nonlinear alignment.

  

K-shape proposes a distance called shape-based distance (SBD).

K – Shape algorithm

K-shape clustering focuses on invariance of scaling and shift. K-shape has two main features: Shape-based distance (SBD) and time series Shape extraction.

SBD

Cross-correlation is a metric often used in the field of signal processing. FFT (+α) was used instead of DFT to improve calculation efficiency.

  

Normalized cross-correlation (coefficient normalization) NCCc is the geometric mean of the cross-correlation column divided by the autocorrelation of a single series. Detect the maximum position of NCCc, ω.

  

SBD takes the value between 0 and 2, and the closer the two time series get to 0, the more similar they are.

Extract the shape

The centroid vector of time series clustering is found by SBD.

  

The whole algorithm of K-shape is as follows.

K-shape allocates clusters for each time series through an iterative process like K-means.

  1. Each time series is compared with the centroid vector of each cluster and assigned to the cluster of the nearest centroid vector
  2. Update the cluster centroid vector

Repeat steps 1 and 2 above until there are no changes in the cluster members or the maximum number of iterations is reached.

 R k – Shape language

 

> start <- "2014-01-01" > df_7974 %>% + filter(date > as.Date(start)) # A tibble: 1,222 x 10 date open high low close volume close_adj change rate_of_change code 1 2014-01-06 14000 14330 13920 14320 1013000 14320 310 0.0221 7974 2 2014-01-07 14200 14380 14060 14310 887900 14310-10-0.000698 7974 3 2014-01-08 14380 16050 14380 15850 3030500 15850 1540 0.108 7974 4 2014-01-09 15520 15530 15140 15420 1817400 15420-430-0.0271 7974 5 2014-01-10 15310 16150 15230 16080 2124100 16080 660 0.0428 7974 6 2014-01-14 15410 15755 15370 15500 1462200 15500-580 -0.0361 7974 7 2014-01-15 15750 15880 15265 15360 1186800 15360-140 -0.00903 7974 8 2014-01-16 15165 15410 14940 15060 1606600 15060-300-0.0195 7974 9 2014-01-17 15100 15270 14575 14645 1612600 14645-415-0.0276 7974 10 2014-01-20 11945 13800 11935 13745 10731500 13745-9Copy the code

The missing measure is supplemented with the value of the previous business day. (K-Shape allows some deviation, but just in case)

The stock price of each stock and the rate of change of stock price.

Taking zscore as “preproc”, “SBD” as distance, and Centroid = “shape”, k-shape clustering results are as follows.

> df_res %>% + Arrange (cluster) cluster centroid_dist Code name 1 1 0.1897561 1928 Sewold weather hijack 21 0.2196533 6479 miku ネ Corattract miku 3 1 0.1481051 8411 construct ずほ 4 2 0.3468301 6658 SitAL - Electronics 5 2 0.2158674 6804 ホ sital デ 6 2 0.2372485 7974 NintendoCopy the code

Nintendo, Hosiden, and Siray Electronics Industries are assigned to the same cluster. Hosiden accounted for 50.5 percent of nintendo’s sales in 2016, suggesting that business relationships between companies can also affect stock price movements. MinebeaMitsumi, on the other hand, became another cluster, but in 2017 Mitsumi merged with Minebea in 2017, which did not cope with the share price spike when Pokemon Go was launched in July 2016.

 

If you have any questions, please comment below.


Most welcome insight

1.R language K-Shape algorithm stock price time series clustering

2. Comparison of different types of clustering methods in R language

3. K-medoids clustering modeling and GAM regression are performed for time series data of electricity load using R language

4. Hierarchical clustering of IRIS data set of R. language

5.Python Monte Carlo K-means clustering

6. Use R to conduct website comment text mining clustering

7. Python for NLP: Multi-label text LSTM neural network using Keras

8.R language for MNIST data set analysis and exploration of handwritten digital classification data

9.R language deep learning image classification based on Keras small data sets