Machine learning has been used to predict stock returns, and there are very good results, such as “Empirical Asset Pricing meets Machine learning”. So could the same approach be applied to bonds? According to “Bond Risk with Machine Learning” in Review of Financial Studies in 2020, the excess yield of bonds is predicted by using Machine Learning method. The writer is at Queen Mary, University of London Daniele Bianchi of the University of London, Matthias Buchner of the University of Warwick, and Andrea Tamoni of Rutgers Business School.

1 Study Design

1.1 Experimental Framework

Two experiments were conducted in this paper:

  1. Using yield curve data (i.e., the annualized yield to maturity of zero coupon bonds maturing 1-10 years at the current time, a total of 10), forecast the excess yield of bonds maturing at a certain time in the next year (i.e., the price increase minus the risk-free rate);

  2. Take another 128 macro variables and make the same prediction based on the yield curve. These macroeconomic variables is divided into several groups: real output and income, employment and work hours, retail, manufacturing, sales, international trade, consumer spending and housing construction, inventory and inventory sales, orders, and did not fulfil orders, compensation and labor costs, price index, interest rates and spreads, stock, foreign exchange measures.

The data set is from the zero-coupon bond yield curve (Liu and Wu, 2020). The data set is a daily frequency and constructs the annualized yield to maturity of 0-coupon bonds maturing in 1-360 months at each point in time.

The training set and the validation set are divided by 85% and 15% of the past data (in time series order), and the test set is the excess rate of return for the coming year. After each test, the training set and validation set recursively expand forward by one month as a whole, and keep the ratio of the two unchanged, keeping the length of the test set at one year. The entire dataset is from 1971.08 to 2018.12 (us 10-year bonds were issued from 1971.9), and the first test was conducted from 1990.1. The schematic diagram is as follows:

For each test, the predicted results are recorded and out-of-sample R2R^2R2 is calculated at the end:


R OOS 2 = 1 t 0 = 1 T 1 ( x r t + 1 ( n ) x r ^ t + 1 ( n ) ( M s ) ) t 0 = 1 T 1 ( x r t + 1 ( n ) x r t + 1 ( n ) ) R^2_\text{OOS}=1-\dfrac{\sum_{t_0=1}^{T-1}\left(xr_{t+1}^{(n)}-\widehat{xr}_{t+1}^{(n)}(\mathcal{M}_s)\right)}{\sum_{t_0 =1}^{T-1}\left(xr_{t+1}^{(n)}-\overline{xr}_{t+1}^{(n)}\right)}

Where XRT +1(n) xR_ {t+1}^{(n)} XRT +1(n) represents the excess return rate of zero-coupon bonds maturing at t+nt+nt+n from TTT to T +1t+1t+1, and Ms\mathcal{M}_sMs is the machine learning algorithm used.

In this paper, we only predict the zero-coupon bonds with maturities of 2, 3, 4, 5, 7 and 10 years. We also construct equal-weight portfolios of these six bonds and predict the R2R^2R2 of these portfolios.

The null hypothesis for the test is R2<0R^2<0R2<0, using MSPE adjusted Clark and West statistics (2007).

1.2 Machine learning algorithm

Here is a brief introduction to the machine learning algorithm used in this paper. First, principal component regression and partial least square method are used, and then several regression methods with penalty term are used, namely LASSO regression, ridge regression and elastic network. In addition, an extension of regression tree — Gradient lifting tree, random forest and limit tree was also used, as shown below:

Finally, some different neural network structures are used. When only yield curve data is used for prediction, classical feedforward neural network structure is used, as shown in the figure below:

When both yield curve data and macro data are used for prediction, this paper designs three different network structures, as shown in the figure below:

The first is to directly access the interest rate data to the output layer without passing through the hidden layer (called hybrid network), the second is to combine the two networks (interest rate network and macro variable network), and the third is to group macro variables on the basis of the second (known as group ensembling).

2 Experimental Results

2.1 Prediction using yield curve only

The experimental results are shown as follows:

In Panel A, principal components PC were first selected for the yield curve, with 3, 5 and 10 principal components respectively. When 10 principal components were used, this was the experiment in Cochrane and Piazzesi (2005). It turns out that R2R^2R2 is negative, so it’s worse if you add more PC, and then you add the nonlinear term PC squared, and it gets worse. Using PLS does not improve performance.

In Panel B, R2R^2R2 of ridge regression is negative, while R2R^2R2 of the model with sparsity (LASSO regression, elastic network) is positive when predicting bonds maturing more than 4 years, and the R2R^2R2 of the prediction portfolio is also positive.

In Panel C, limit tree performs best among a series of tree models, possibly because it has the randomization of feature division position. For the neural network, the shallow network (1 hidden layer, 3 nodes) performs as well as the deep network (2 hidden layers, 7 nodes), and deepening the network will make the results worse. In addition, Cochrane and Piazzesi (2005) pointed out that the forward rates with 1-11 periods lag contained information about excess returns that was not found in the forward rates. In this paper, the forward rates with 1-11 periods lag and 10 forwards were put into a 1-layer hidden layer network with 7 nodes, and compared with the shallow layer network. The effect is not better, so the lagging yield curve can not help the yield curve at the current moment to improve the predictive ability.

2.2 Forecast with yield curve and macro data

The result is shown below:

In Panel A and Panel B, in order to compare with Ludvigson and Ng (2009), the second line extracts the first 8 principal components from the macro variables and then takes A subset of them according to its setting. The CP factor (a linear combination of forward rates) extracted by Cochrane and Piazzesi (2005) was used in some of the yield data (lines 4, 5 and 6). The results show that the out-of-sample performance of dense models such as ridge regression or data compression is poor, while that of sparse models is better, and is significantly improved compared with the experimental results in 2.1, indicating that the addition of macro variables is conducive to prediction.

Panel C is the result of the three kinds of networks, including hybrid network effect drawing group, and deepen the depth of its accuracy can be improved, if select network structure using the prior knowledge of economics, dramatically affects the prediction effect, 1 group of integrated network and 3 layer hybrid network effect as well, and in seven years, and 10 year bond effect is better. In addition, the effect of adding macro variables into several tree models is better, and the effect of limit tree is the best.

3 Analysis and prediction

Is there a difference in predictability between an expansion and a recession? The recession Indicator of NBER was used to divide the expansion period and recession period and calculate out of sample respectively
R 2 R^2
, the results are shown as follows:

The yield curve is extracted from the principal component PC, and the first three principal components represent the level, slope and curvature of the yield curve respectively. Regression can be used to test whether the hidden factor extracted by the neural network can predict the changes of the first three principal components:


P C i . t + 1 P C i . t = b 0 + b 1 T P t + b 2 T x t + ϵ i . t + 1  for  i = 1 . 2 . 3 PC_{i,t+1}-PC_{i,t}=b_0 + \boldsymbol{b}_1^T \mathcal{P}_t + \boldsymbol{b}_2^T \boldsymbol{x}_t + \epsilon_{i,t+1} I = 1, 2, 3 \ text {for}

Xt \boldsymbol{x}_txt is the hidden factor extracted by the neural network, that is, the output of the hidden layer. The results are as follows (the first behavior does not add the implied factor as a reference) :

Which variables are important? The partial derivative of the predicted value with respect to a variable can be calculated:


E [ partial partial y i t x r t + 1 ( n ) y i t = y ˉ i ] \mathbb{E}\left[\dfrac{\partial}{\partial y_{it}} xr_{t+1}^{(n)}\bigg| y_{it}=\bar{y}_i\right]

You take the absolute value, you average all the TTTS, and you get the importance of a variable. The importance of each group can be obtained by averaging the importance of each group of macro variables.

The following figure (a) and (b) respectively show the importance of each variable when predicting the excess yield of bonds with 2-year maturity and 10-year maturity. (c) and (d) are respectively the importance of each group after grouping when predicting the excess yield of bonds with maturity of 2 years and 10 years.

All of these results suggest that nonlinearity is important, but does the nonlinearity that improves performance come from inter-group interactions or intra-group interactions? The second order differential between groups can be calculated for fully connected network and packet integrated network respectively:


E [ partial 2 partial y i partial y j x r t + 1 ( n ) y i G A . y j G B ] \mathbb{E}\left[\dfrac{\partial^2}{\partial y_{i} \partial y_j} xr_{t+1}^{(n)}\bigg| y_{i}\in G_A, y_j\in G_B\right]

The results are shown below:

Panel A shows that the interaction between groups is large in the fully connected network, while Panel B shows that the interaction between groups is similar in the two neural networks. Therefore, the excellent performance of packet-integrated neural network comes from the fact that it forbids interaction between groups and allows interaction within groups.

4 predictable economic benefits

Does predictability translate into return on investment? This paper considers univariate and multivariate asset allocation experiments. In the univariate experiment, investors only consider investing in risk-free bonds and risky bonds with maturity of NNN years. In this paper, investors only focus on n=2n=2n=2 or n=10n=10n=10. In the multivariate case, investors consider both bonds with maturities of 2 to 10 years and risk-free bonds. The results are shown in the table below. Positive values indicate that the prediction model is stronger than EH (Expectation hypothesis) model. The neural network structure selected is the optimal structure in Table 1 and Table 2 respectively.

Predictable economic drivers

In the figure below, (a) (b) is the predicted 10-year maturity bond yield (solid line) and the growth rate of the Industrial Production (IP) index (dashed line), and (c) (d) is the realized 10-year maturity bond yield replaced by the dashed line.

It is also possible to calculate the Sharpe ratio for the recession and expansion periods:

The above evidence suggests that the countercyclical nature of the bond risk premium is clear.

Then the key drivers of bond risk premium in asset pricing theory are regression by using the predicted excess yield of 10-year maturity bond. Factors are:

  • DiB(g)DiB(g)DiB(g) and DiB(π)DiB(π) represent the degree of divergence in beliefs about excess returns, representing real divergence and nominal divergence, respectively, from forecasts of GDP and CPI four quarters in advance, using data from the SPF database.
  • −Surplus −Surplus is the inverse of the 10-year average of weighted consumption growth rate, which represents risk aversion. RAbex (Bekaert, Engstrom and Xu, 2019) is also used as time-varying risk aversion.
  • UnC(g)UnC(g)UnC(g) and UnC(π)UnC(\ PI)UnC(π) represent the uncertainty of economic growth and inflation, respectively.
  • TYVIX{TYVIX}TYVIX is the one-month risk-neutral implied volatility of a 10-year bond, σB(n)\sigma_B^{(n)}σB(n) is the sum of the squares of the monthly change in the yield of a 10-year zero-coupon bond. These two proxy variables measure the volatility of a bond.

The results are shown in the following table:

Then we calculate the prediction of 10-year maturity bond excess yield and the correlation coefficients of three subjective risk premium proxy variables (EBR*, SUBJ_BRP, and GLS) from Blue Chip Financial Forecasts (BCFF) survey. The correlation coefficients are shown in the following table:

reference

  • Bekaert, Geert, Eric C. Engstrom, and Nancy R. Xu. The time variation in risk appetite and uncertainty. No. w25673. National Bureau of Economic Research, 2019.
  • Bianchi, Daniele, Matthias Buchner, and Andrea Tamoni. “Bond Risk largess with Machine learning.”The Review of Financial Studies (2020).
  • Cochrane, John H., and Monika Piazzesi. “Bond risk premia.” American Economic Review95.1 (2005) : 138-160.
  • Liu, Yan, and Jing Cynthia Wu. Reconstructing the yield curve. No. w27266. National Bureau of Economic Research, 2020.
  • Ludvigson, Sydney C., and Serena Ng. “Macro factors in bond risk premia.” The Review of Financial Studies22.12 (2009) : 5027-5067.