Participate in the 24th day of The November Gwen Challenge, see the details of the event: 2021 Last Gwen Challenge

  1. This is a pytorch version of a reading note for Hands-on Deep Learning. More articles in this series can be found here: Juejin.

  2. Github address: DeepLearningNotes/d2l(github.com)

Still in update …………


The word Sequence is a translation of Sequence, and what is a Sequence? I have a bunch of consecutive numbers that are related to the former. For example, voice data, text data, video data and a series of data with continuous relationship.

Another example is a stock. The price of my stock tomorrow is definitely influenced by the price of my stock today and the price of my stock a few days ago. Xix_ixi is the stock price on day I and if you were to predict the stock price on day I you would sum up xT ~ P(xT ∣xt−1… , the x1) x_t \ sim P (x_t \ mid x_ {1} t -, \ ldots, x_1). The xt ~ P (xt ∣ xt – 1,… ,x1).

However, with the increase of input data, the result we want to calculate will also become more difficult to calculate with the increase of data volume, so an approximate method is needed to make the calculation easier to handle.

There are two ideas:

Autoregressive model | markov model

Just take the time span of length τ\tau tau.

If you think about the stock price, the stock price tomorrow might be related to the stock price today, it might be related to the stock price yesterday, it might be related to the stock price the day before yesterday, but it doesn’t have much to do with the stock price a year ago. So when forecasting stock prices, we don’t need to take last year’s stock price into account. So just go ahead to Tautautau day.

Using the sequence Xt −1,… , xt – tau x_ {1} t -, \ ldots, x_ t – \ tau} {xt – 1,… Such models are known as autoregressive models because they simply regressive their own data.

Use the xt – 1,… , xt – tau x_ {1} t -, \ ldots, x_ t – \ tau} {xt – 1,… ,xt−τ instead of xt−1,… , x1x_ {1} t -, \ ldots, x_1xt – 1,… X1 to estimate xTX_txt. As long as the approximation is accurate, the sequence is said to satisfy Markov condition, also known as Markov model.

Implicit variable autoregressive model

Retain some summary HTH_THT of past observations, and update both prediction X ^t\hat{x}_tx^t and summary HTH_THt. This is produced based on x ^ t = P (xt ∣ ht) \ hat {x} _t = P (x_t \ mid h_ {t}) x ^ t = P (xt ∣ ht) estimate xtx_txt, Formula and ht = g (ht – 1, xt – 1) h_t = g (h_ {1} t -, x_} {t – 1) ht = g (ht – 1, xt – 1) update model. This approach is known as latent autoregressive models.

Let’s implement markov model in code

import torch
from torch import nn
from d2l import torch as d2l
Copy the code
T = 1000  A total of 1000 points are generated

# Generate data set, a sine function adds some noise
time = torch.arange(1, T + 1, dtype=d2l.float32)
x = torch.sin(0.01 *  time) + torch.normal(0.0.2, (T,))

d2l.plot(time, [x], 'time'.'x', xlim=[1.1000], figsize=(5.3))
Copy the code

Generate a data set, using sin(x) and add noise to it.

The raw data visualization looks like this:

So the data set is y equals sin of x plus noise

tau = 4

features = torch.zeros((T - tau, tau))
for i in range(tau):
    features[:, i] = x[i: T - tau + i]
labels = x[tau:].reshape((-1.1))
Copy the code
  • Tau =4 tau=4 tau=4 tau=4 tau=4 tau=4 tau=4 tau=4

  • Features is our old x, labels is our y.

    Here we put x into features. A set of data tau tau tau elements. And then labels means the current value.

  • I don’t know what you’re talking about, but let me write it down for you.

    T = 10
    x = torch.arange(T)
    tau = 4
    features = torch.zeros((T - tau, tau))
    for i in range(tau):
        features[:, i] = x[i: T - tau + i]
    labels = x[tau:].reshape((-1.1))
    print(x)
    print(features)
    print(x[tau:])
    Copy the code
    >>
    tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    tensor([[0., 1., 2., 3.],
            [1., 2., 3., 4.],
            [2., 3., 4., 5.],
            [3., 4., 5., 6.],
            [4., 5., 6., 7.],
            [5., 6., 7., 8.]])
    tensor([4, 5, 6, 7, 8, 9])
    Copy the code

    Here we generate a list with x being 0-9, τ=4\tau=4τ=4, and labels start at position 4. That is, its value depends on the four-digit number in front of it, which is the corresponding row vector in Features.

batch_size, n_train = 16.600
Only the first 'n_train' sample is used for training
train_iter = d2l.load_array((features[:n_train], labels[:n_train]),batch_size, is_train=True)
Copy the code

Pretend the first 600 numbers are training sets.

Function to initialize network weights
def init_weights(m) :
    if type(m) == nn.Linear:
        nn.init.xavier_uniform_(m.weight)

# A simple multilayer perceptron
def get_net() :
    net = nn.Sequential(nn.Linear(4.10),
                        nn.ReLU(),
                        nn.Linear(10.1))
    net.apply(init_weights)
    return net

# squared loss
loss = nn.MSELoss()
Copy the code

The training process can use a simple multilayer perceptron.

def train(net, train_iter, loss, epochs, lr) :
    trainer = torch.optim.Adam(net.parameters(), lr)
    for epoch in range(epochs):
        for X, y in train_iter:
            trainer.zero_grad()
            l = loss(net(X), y)
            l.backward()
            trainer.step()
        print(f'epoch {epoch + 1}, '
              f'loss: {d2l.evaluate_loss(net, train_iter, loss):f}')

net = get_net()
train(net, train_iter, loss, 5.0.01)
print(train)
Copy the code
>>
epoch 1, loss: 0.069232
epoch 2, loss: 0.056763
epoch 3, loss: 0.054675
epoch 4, loss: 0.050940
epoch 5, loss: 0.048722
<function train at 0x0000017DFBD29F70>
Copy the code

That’s it. It’s not very low.