Abstract: This paper mainly introduces the application of physical layer based on deep learning, and proposes a location information verification scheme of MIMO system based on deep Q network (DQN). The receiver uses the deep Q network to update continuously in the changeable and unknown channel environment.

01 introduction

With the explosive growth of mobile traffic, communication scenarios with high reliability and low latency bring greater complexity and computing challenges to the current network. According to IBM, mobile data volumes will exceed 40 trillion Gbits by 2020, a 44-fold increase from 2009, and the total number of connected devices will reach 50 billion. To meet this demand, new communication theories and innovative technologies are needed to meet the needs of 5G systems. In recent years the development of deep learning paradigm that caused the academia and industry on the research of the wireless communication technology based on the deep study, the results confirmed that the deep learning technology can improve the performance of wireless communication system, and has potential application in the physical interference adjustment, channel estimation and signal detection, signal processing, etc.

02 Deep learning paradigm

The concept of deep learning originates from the research of artificial neural network (ANN), which was proposed by Hinton et al in 2006. As shown in Figure 1, deep learning builds an ANN with a hierarchical structure, often consisting of an input layer, multiple hidden layers, and an output layer. Each layer uses different weights to connect with adjacent layers. By extracting and screening the input information layer by layer, end-to-end supervised learning and unsupervised learning can be realized. Deep neural networks include feedforward neural network (FNN), recurrent neural network (RNN), convolutional neural network (CNN), adversarial generative network (GAN) and deep belief network, etc. Gating BASED RNN, such as LSTM network, has certain memory function for input, so it is often used in physical layer signal processing and channel state information estimation. In addition, deep learning can also participate in the construction of reinforcement learning (RL) system to form deep reinforcement learning, such as deep Q network (DQN) [1], which can be used to optimize the formulation of signal processing strategies at the physical layer.

1) Long and short term memory network

As a variant of RNN, LNS can effectively solve the gradient explosion or disappearance problem of simple recurrent neural networks. RNN stores historical information through hidden state. In a simple RNN, every moment of the hidden state is rewritten and can therefore be considered a short-term memory. However, in LSTM network, memory units hold critical information longer than short-term memory. LSTM networks introduce a gate mechanism to control the path of information transmission. The gate mechanism has a value between 0 and 1 to control the percentage of messages that pass through. LSTM network mainly includes three gates, among which the forgetting gate controls how much information needs to be forgotten about the internal state at the last moment. The input gate controls how much information is stored in the candidate state at the current moment; The output gate controls how much information about the internal state needs to be output to the external state at the current moment.

2) Deep Q network

DQN combines CNN and Q learning, uses the target value function of Q learning to construct the objective function of deep learning, uses the memory replay mechanism to solve the problem of relevance between data, and adopts iterative update to solve the problem of system stability. Assume that the state of the environment at the moment is, the agent takes actions according to certain strategies and gets rewards. Then, the environment moves to the next state with transition probability at any moment. In DQN, agents interact with the environment through a series of actions aimed at maximizing cumulative rewards.

At the same time, the empirical playback based on convolutional neural network is used for continuous approximation of Q function. In experience playback, the agent selects actions using ξ-greedy at each step and saves the learning experience for each moment in the experience pool. In the parameter updating cycle of the algorithm, random sampling or batch random sampling is carried out on the samples in the memory pool, and the parameters of the model are updated through Q learning. According to previous experience, the maximum Q value was continuously approximated through CNN. The loss function of CNN is the deviation between the approximate Q value and the real Q value. The value of the loss function can be continuously reduced by constantly adjusting the weight of the neural network through the gradient descent algorithm.

Application of physical layer signal processing based on deep learning

In recent years, there have been some related works about deep learning applied to physical layer in academia and industry, and the research results have been published that deep learning can improve the performance of physical layer. In this section, from the point of view of physical layer signal processing, examples and explanations are given for the existing related work in four aspects: channel state information (CSI) estimation, signal coding and decoding, interference adjustment and signal detection.

1) CSI estimation based on deep learning

Accurate CSI acquisition is very important to ensure the link performance of wireless communication system. Wireless networks select specific signal control schemes according to channel estimation states. For example, when CSI is low, the physical layer adopts low-order modulation schemes to combat bad communication states and reduce bit error rate. 5G communication system adopts multiple input multiple output (MIMO), millimeter wave and non-orthogonal multiple access (NOMA) technologies, which make communication parties have more transmission channels, and channel estimation becomes more complicated. The traditional CSI estimation scheme needs to perform matrix operation with high complexity, which is limited by computing resources and time delay.

It has been proved that deep learning can improve the efficiency of CSI estimation and reduce the amount of data required for upstream and downstream reference information by obtaining the correlation between CSI information space-time and upstream and downstream [2]. As shown in FIG. 2, the paper [3] proposes to extract frequency feature vectors from historical CSI data through a two-dimensional convolutional neural network, and then extract state feature vectors from frequency feature vectors using a one-dimensional convolutional neural network. Finally, an LSTM network is used for CSI state prediction. Since 2d convolutional neural network was originally used to process image data, the author divided CSI original data into cells, and each cell corresponded to one image pixel. The CSI of each band and the pixel corresponding to the auxiliary information form a channel. Therefore, data from N frequency bands will be converted into pixel information from N channels and fed into the learning framework.

2) Deep learning based codec

The application of deep learning in source coding and channel coding also proves that it can improve coding efficiency and reduce BER of network. The joint coding scheme based on deep learning framework can realize the source coding (structuring) of the text through the recurrent neural network, and then input the structured information into the two-way LSTM network, and finally output the final binary data stream. At the receiving end, LSTM is used for decoding. In this paper [4], an encoder with fully connected deep neural network is proposed to improve the efficiency of HPDC decoding based on confidence propagation algorithm. O ‘Shea et al. [5] modeled the entire physical layer as an autoencoder containing modulation, channel coding and signal classification functions, and trained the autoencoder by using convolutional neural network. As shown in Figure 3, in the learning framework of multi-dense layer neural network, the input signal is encoded as one-hot encoding, and the wireless channel is modeled as a noise layer. The cross-entropy loss function and the stochastic gradient descent algorithm are used to train the model, and the highest probability output signal is taken as the decoding result at the output end.

3) Interference adjustment based on deep learning

Interference adjustment in MIMO system adjusts the transmitted signal through linear precoding technology, so that the interference signal at the receiving end can be controlled in a dimensionality reduction subspace, thus breaking through the throughput limitation brought by interference problems in MIMO system. The existing research results show that deep learning can improve the throughput in the interference adjustment network and achieve optimization results. He et al. [6] proposed the adoption of DQN to obtain the optimal user selection strategy under interference adjustment. In this mechanism, a central scheduler is used to collect all channel states and cache states for each user, and allocate channel resources to each user. The channel time-varying process is modeled by a finite state Markov model, and the state of the system is defined as the channel state and cache state of each user. The central scheduler is used to train the optimal policy for the system. The corresponding system action is defined as whether to allocate channel resources to each user for data transmission to maximize the throughput of the interference adjustment network. DQN can also be used to eliminate the interference between the secondary user and the primary user in cognitive radio networks. The secondary user uses frequency hopping and mobility to resist the interference [7].

4) Signal detection based on deep learning

Detection algorithms based on DL can significantly improve the performance of communication systems, especially when traditional processing modules need joint optimization or channels cannot be represented by common analysis models. In this paper [8], a five-layer fully connected DNN framework is proposed to be embedded in the OFDM receiver for joint channel estimation and signal detection. Using the received signal as input, along with the corresponding transmission data and pilot, DNN can infer channel information and can be used to predict the data to be sent. In MIMO detection, iterative methods based on Bayesian optimal detectors have been proved to have better performance and moderate computational complexity. However, in many more complex environments, unknown channel distribution conditions will limit the effectiveness of this detector. Deep learning algorithm can be used to recover model parameters according to certain input data, so as to improve the adaptive ability of detector. At the same time, in some cases, the deep learning algorithm can also use some semantic information, such as the location of the receiver and the information of the surrounding vehicle nodes, to predict the beam, thus improving the performance of the system.

04 DQN-based signal detection mechanism

In location-based services scenarios, vehicles or users need to continuously send beacon messages to report their location, thereby improving location-based services and network performance. But some vehicles or users will choose to send fake locations to get more resources, affecting the effectiveness of network services.

In MIMO system, the transmitted signal often contains rich information (Angle of arrival, received power, etc.), which can be used to verify the position of beacon message by signal detection technology at the receiving end. We propose a signal detection mechanism based on DQN, which can be used to verify the sender’s position information and detect the information forger in MIMO systems. The main idea is that the receiving end adopts maximum likelihood estimation to test the hypothesis of the received signal. When the received signal passes the detection test, it is considered that the sent signal comes from the position reported by the sender. Otherwise, the sender is considered to have reported false location information. In order to improve the detection performance under the changeable channel state, DQN is used to predict the benefits of using different detection thresholds at the receiver and select the optimal detection threshold. The system framework is shown in Figure 4.

1) The null hypothesis in the hypothesis testing of the system model is that the sending node reports real location information, and the alternative hypothesis is that the sending node reports false location information. At each moment, the signal received by the receiver is related to the real position between the sender and the receiver, the state of the channel and the Angle of arrival of the signal. Under the condition of known transmitting information and transmitting power, the receiver can use maximum likelihood detection to test the hypothesis of the received signal.

2) Maximum likelihood detection The receiver adopts maximum likelihood detection algorithm to verify the received signal, and the detection rule is defined as:

Where is the detection threshold, and the value range is. And represent test results reported as normal and false respectively. And are posterior distributions of observed signals under null hypothesis and alternative hypothesis respectively. According to [9], the results of hypothesis testing (false positive rate and loss rate) are related to the actual position of sender, reporting position, channel status and detection threshold. For the receiver, the actual location, reported location and channel status of the sender are unknown or partially known environmental variables. In the process of continuous information interaction with the sender, this paper proposes that the receiver can continuously optimize the selection of detection threshold based on DQN, so as to improve the accuracy of signal detection.

3) Detection threshold optimization based on DQN

In the mechanism proposed in this paper, the state space of receiver is divided into two dimensions, the first dimension is the channel state from sender to receiver, and the second dimension is the result of channel detection. The channel state space includes a series of quantized channel indicators, and it is assumed that the channel state transition conforms to Markov process, that is, the channel state at the current moment is only related to the state at the last moment. The result state space includes four types: real data detection result is true, real data detection result is false; False data detection results are true and false data detection results are false. In the process of each action, the direct reward of the receiving end is related to the detection result, which is positive when the detection result is correct and negative when the detection result is wrong. The action of the receiving end is defined as the threshold for signal detection, and the action space includes a series of quantified detection thresholds. At each moment, the receiver’s mixed strategy is the probability of selecting a different detection threshold. Based on the DQN principle introduced in Chapter 2 of this paper, after each experience, the receiving end stores the selected test threshold, corresponding state results and benefits into the experience pool, and uses CNN to train and predict the Q function, so as to continuously optimize the selection of detection threshold.

Summary and suggestions for future development

In this paper, we prove the great application potential of deep learning in physical layer communication through existing work and cases. In addition to several application directions introduced above, deep learning has also been applied in end-to-end communication system to some extent. However, it has not been concluded whether the end-to-end communication system based on deep learning will eventually surpass the performance of traditional communication system. In addition, the physical layer applications based on deep learning need to be data-driven. In order to improve the training efficiency of the deep learning model, modules requiring long-term training can be fused, and the tradeoff between good performance and training efficiency needs to be considered. The rise of deep learning applications is largely due to the variety of data sets available, but there are still few data sets available for wireless communication. Data security and privacy issues further limit access to communications data in the real world. However, for deep learning-based communication applications, some open telecom data sets need to be published and shared. Finally, the complex and changeable communication environment of 5G, including MIMO, millimeter-wave communication and NOMA technology, also brings great potential for the application of deep learning.

reference

[1] Mnih, Volodymyr, et al. “Human levelcontrol of deep reinforcement learning.” Nature 518.7540(2015): 529. www.nature.com/articles/na… .

[2] A. Mousavi and R. G. Baraniuk, “Learning toInvert: IEEE Int ‘L. Acoustics Speech Signal Process. (ICASSP ’17), New Orleans, LA, Mar. 2017,pp. 2272 — 76.

[3] C. Luo, J. Ji, Q. Wang, X. Chen and P. Li,”Channel State Information Prediction for 5G Wireless Communications: ADeep Learning Approach,” in IEEE Transactions on Network Science andEngineering, early access.

[4] E. Nachmani, Y. Be ‘ery, And D. Burshtein, “Learning to Decode Linear Codes using deep Learning,” in Proc. Communication,Control, Computing (Allerton), 2016, pp. 341-346.

[5] T. O ‘Shea and J. Hoydis, “An Introduction to Deep Learning for thePhysical Layer,” in IEEE Transactions on Cognitive Communications andNetworking, vol. 3, no. 4, pp. 563-575, Dec. 2017.

[6] Y. He, C. Liang, F. R. Yu, N. Zhao, and H.Yin, “Optimization of cache-enabled opportunistic interference alignmentwireless networks: A Reinforcement Learning Approach for Large Data Deep Reinforcement, “in Proc.IEEE Int.conf.commun. (ICC), May 2017, pp. 1 — 6.

[7] G. Han, L. Xiao, And h. v. Poor, “two-dimensional anti-retroviral communication based on deep reinforcementlearning,” in Proc. IEEE Int. Acoust. Speech Signal Process. (ICASSP),New Orleans, USA, Mar. 2017, pp. 2087 — 2091.

[8] H. Ye, G. Y. Li, and B.-H. F. Juang, “Power ofDeep Learning for Channel Estimation and Signal Detection in OFDM Systems,” IEEE Wireless communications. Lett., Vol. 7, No. 1, Feb. 2018, pp. 114 — 17.

[9] Bai, Lin, Jinho Choi, and Quan Yu. “SignalProcessing at Receivers: “Low Complexity MIMO Receivers,Springer, Cham, 2014. Pp.5-28.

This article is shared from huawei cloud community “Deep learning in the physical layer signal processing application research”, the original author: quite suddenly.

Click to follow, the first time to learn about Huawei cloud fresh technology ~