Machine Learning 054- Blind Source Separation with ICA

(Python libraries and versions used in this article: Python 3.6, Numpy 1.14, Scikit-learn 0.19, matplotlib 2.2)

Blind source separation refers to the process of separating source signals from aliasing signals when the theoretical model and source signals cannot be accurately known. The aim of blind source separation is to obtain the best estimate of the source signal. To put it more colloquically, it’s like, if I have 10 people talking at the same time, and I use a tape recorder to record what they say, what I get is definitely a jumble of 10 voices, so how do I separate that jumble into a single voice? The solution to this problem is blind source separation.

Independent Components Analysis (ICA) solves the problem of original data decomposition and is often used in blind source separation. In my last article, Machine Learning 053- Data Dimensionality Reduction Techniques -PCA and Kernel PCA, I mentioned that ALTHOUGH PCA has various advantages, it also has several disadvantages. For example, IT cannot reduce the dimensionality of the data set of nonlinear organization. The solution to this disadvantage is to replace PCA with kernel PCA. Another disadvantage is that it can not be used to solve the case where the data set does not meet the Gaussian distribution. In this case, dimensionality reduction of data needs to be completed by independent component analysis (ICA).

Independent component analyst is a method to search for potential factors or components from multidimensional statistical data. ICA differs from PCA and other dimensionality reduction methods in that it looks for components that satisfy statistical independence and non-Gaussian. The mathematics and logic can be found in the blog post independent component Analysis ICA Series 2: Concepts, Applications and Estimation Principles


1. Load the data set

The data set is loaded first. This time, the data set is in the file mixture_of_signals. TXT, which has four columns of data representing four different signal sources and a total of 2000 samples

data_path="E:\PyProjects\DataSet\FireAI\mixture_of_signals.txt"
df=pd.read_csv(data_path,header=None,sep=' ')
print(df.info()) Check the data to make sure there are no errors
print(df.head())
print(df.tail())
dataset_X=df.values
print(dataset_X.shape)
Copy the code

After plotting, we can see the distribution of these data:


2. Use traditional PCA to separate signals

If we use PCA for blind source separation, we can see the effect. The code is:

What is the result if PCA is used for separation
from sklearn.decomposition import PCA
pca = PCA(n_components=4)
pca_dataset_X = pca.fit_transform(dataset_X) 
pd.DataFrame(pca_dataset_X).plot(title='PCA_dataset')
Copy the code

Although various signals after PCA separation were drawn above, they were mixed together and difficult to distinguish, so I wrote a function to display them separately

def plot_dataset_X(dataset_X):
    rows,cols=dataset_X.shape
    plt.figure(figsize=(15.20))
    for i in range(cols):
        plt.subplot(cols,1,i+1)
        plt.title('Signal_'+str(i))
        plt.plot(dataset_X[:,i])
        
Copy the code


3. Use ICA to separate signals

Let’s look at the separated signal obtained by independent component analysis:

# If ICA is used for signal separation
from sklearn.decomposition import FastICA
ica = FastICA(n_components=4)
ica_dataset_X = ica.fit_transform(dataset_X)
pd.DataFrame(ica_dataset_X).plot(title='ICA_dataset')
Copy the code

Similarly, for the convenience of display, all kinds of signals are drawn separately as follows:

It can be seen that the signals obtained after ICA separation are very regular, while the signals obtained after PCA separation are somewhat chaotic, and the blind source separation effect of ICA is better on the surface.

# # # # # # # # # # # # # # # # # # # # # # # # small * * * * * * * * * * and # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

1. ICA can solve the problem of blind source separation, and the separation effect obtained is much better than PCA.

2. In fact, most of the real data sets in life do not obey gaussian distribution, they generally obey super-Gaussian distribution or sub-Gaussian distribution. Therefore, PCA is not ideal for many problems, and ICA can get better results.

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #


Note: This part of the code has been uploaded to (my Github), welcome to download.

References:

1, Classic Examples of Python machine learning, by Prateek Joshi, translated by Tao Junjie and Chen Xiaoli