The original address: cf020031308. Making. IO/cca shut / 2017…

Traffic forecast

Traffic timing data of graph structure: for time t, there is a graph Gt=(Vt,E,W)G_t =(V_t, E,W) Gt=(Vt,E,W), where VtV_tVt is a point set with different values at different moments,E is an edge set, and W is a weight matrix.

Traffic prediction refers to the prediction of the most likely traffic indicators after H time steps in the future under the condition that M previous traffic observations are known.

Known W, by v = (n – M + 1,…, n) v = (v_ {t – M + 1}, \ cdots, v_t) v = (n – M + 1,…, n) estimated vt + Hv_} {t + H n + H.

Existing methods

Before this paper, the medium and long term prediction method of traffic flow (interval over 30 minutes) is as follows:

  • Dynamic modeling. Such as differential equations
    • Large amount of calculation
    • It relies heavily on expert modeling, improper assumptions and simplification will affect the effect of the model
  • Data driven
    • Classical statistics. Such as ARIMA
      • Ignoring the space-time connection
    • Machine learning

Due to the high nonlinearity and complexity of traffic flow, traditional methods are difficult to meet the needs of medium and long term prediction tasks, and they often ignore the spatial relevance.

Spatio-temporal Graph Convolutional Networks (STGCN)

Spatial graph convolution and gated sequential convolution

By the way, what kind of tool is suitable for drawing this kind of graph?

  • Spatial graph convolution

    • It’s convolving the graph frame by frame in time
    • ChebNet and its first-order approximation (i.e. GCN) are used for graph convolution respectively
    • Activation layer ReLU(x) = Max (0, x)
  • Gated sequential convolution

    • It’s a one-dimensional convolution node by node
    • The convolution kernel size of each channel is Kt×2CoK_t \times 2C_oKt×2Co, without padding, so time frames will be reduced after transformation
    • Active layer GLU([x, y]) = x⊗σ(y)x \otimes \sigma(y)x⊗σ(y)

The overall architecture

  • ST-Conv Block is composed of two gated sequential convolution and a spatial graph convolution
  • Following each ST-Conv Block there is a Layer of Normalization = X −E[x]D[x]+ϵ⋅γ+βy = \frac{x-e [x]}{\ SQRT {D[x] + \epsilon}} \cdot \gamma + \ [x] + betay = D ϵ x – E [x] ⋅ gamma + beta
  • Output Layer is a fully connected Layer, which outputs the hidden state at the target time
  • There was a direct connection in Temporal powdered-conv. The original text mentioned that residual connection was used, but the shape had changed after convolution and it should not be directly connected

The experimental setup

data

  • Two real data sets were used:
    1. BJER4, East Fourth Ring Road, Beijing
    2. California highway system PeMSD7
      • Two subsets of large and small size, named PeMSD7(L) and PeMSD7(M), were randomly selected.
  • Use only weekday data
  • Use the distance between observation stations to construct the weight matrix: The closer the distance is, the greater the weight of the two edges (if the distance is lower than a certain value, it is set to 0 and considered not connected)

Error metric

  • Mean Absolute Errors, MAE
  • Mean Absolute Percentage Errors, MAPE
  • Root Mean Squared Errors, RMSE

The baseline

  • HA: historical average
  • LSVR: Linear regression
  • ARIMA: Time series model
  • FNN: Forward neural network
  • Fc-lstm: indicates the fully connected LSTM
  • GCGRU: the name should indicate the use of GRU on the timeline, otherwise similar to STGCN

Model is set up

  • STGCN(Cheb) : Graph convolution uses the STGCN of order 3 ChebNet
  • STGCN(1st) : Graph convolution uses THE STGCN of GCN

The experimental results

Error comparison

BJER4 has a similar result, but I won’t play it here.

  • Part of GCGRU results are *, because the memory occupation is too high during training (RNN problem), and the Batch Size is reduced by half, which is different from other experimental Settings.
  • STGCN makes full use of spatial information
    • This allows STGCN to perform better than traditional methods that typically do not use spatial information, such as the worst-performing ARIMA
    • On the other hand, THE sensor network of PeMSD7 is more complex and has more spatial information than that of BJER4, so the effect of STGCN on PeMSD7 is more significant than that of BJER4

Peak forecast

Some methods predict the morning and evening peak

Because GCGRU is based on RNN (GRU), historical data has a greater impact and has a higher time delay for the prediction task.

Speed comparison

  • RNN cannot be parallel and is time-consuming, which is the main reason why the author uses CNN in the timeline
  • GCN is faster on large data sets than ChebNet, while delivering similar results