Relative entropy

Last time we said that the formula for IV and PSI can be expressed as a general expression:

And the concept of entropy is used to briefly explain why logarithmic terms can indicate the amount of information. The second half of the general formula has not been expanded, but today we start with it. In fact,

I’m going to apply the transformation to the latter term

There’s a better way to start with this, and those of you who are familiar with the entropy family might already see that each term is actually relative entropy (also known as KL divergence),PSI is actuallyandThe sum of relative entropies of distributions.

Asymmetrical metric: KL divergence

KL divergence is also known as relative entropy and information divergence. KL divergence is mainly used to measure the difference between two probability distributions.

Assuming thatandIs aboutIs the two probability distributions, thenrightThe KL divergence of is

KL divergence has the following properties

  1. KL divergence is asymmetric. Although it is used to measure the similarity or distance between two distributions, KL divergence itself is not distance.
  2. KL divergence also does not satisfy the triangle inequality
  3. KL divergence is non-negative because the logarithm function is convex, so the value of KL divergence is non-negative.

Just to be clear, KL divergence essentially measures the loss of information between the two, not the distance between them.

In traditional textbooks, the definition of distance measure is to satisfy four conditions: nonnegativeness, symmetry, identity and triangle inequality. Such definition is mainly based on experience, while KL divergence does not satisfy symmetry and triangle inequality.

We know that the biggest difference between KL divergence and the distance measure is that it’s an asymmetric measure, and since it’s an asymmetric measure, let’s solve for it hereandWhat’s the difference

(1) First, the formulaThere are two distributions involved, where the information to be passed comes from the distribution:; The way information is transmitted depends on the distribution:; distributionThe more likely the event is, yesThe greater the impact. If you wantAs small as possible, make it a priorityCommon event in the event and make sure it is inIt’s not particularly unusual. Because once event X is inRare distribution means that we’re not optimizing the cost of delivering x, the cost of delivering XWill is very large. When the transmission mode is used to transmitWhen distributed, the cost of delivering common events will be high, and the overall cost will be high.

(2) We remember entropy; The cross entropy, we will find;We found the second termAnd distributionSo minimizing KL divergence is minimizing cross entropy; But on the other hand,, both terms sumKL divergence is minimizedandThe cross entropy of thetaThe difference in entropy of phi.

It’s a little convoluted here,But we can also see visually from the formula,It is not equal to.

Metric of symmetry: JS divergence

In order to solve the asymmetry of KL divergence (the obsession of mathematicians, which is indeed more convenient and has better properties in practice), mathematicians invented a KL variant, JS divergence:

The value is between 0 and 1

Metric of symmetry: PSI

Similarly, we see:

PSI can be regarded as a symmetry measure to solve the asymmetry of KL divergence, which is used to measure the difference between distributions.

KL divergence python implementation

It can be found from the formula that the number of elements in P and Q need not be equal, only the discrete elements in the two distributions need to be consistent. Therefore, whether KL divergence or PSI is calculated, variables need to be discretized to ensure that the discrete elements of the two distributions are consistent.

Import numpy as np import scipy.stats p=np.asarray([0.65,0.25,0.07,0.03]) q= Np.array ([0.6,0.25,0.1,0.05]) def KL_divergence(p,q):return scipy.stats.entropy(p, q)
print(KL_divergence(p,q)) # 0.011735745199107783
print(KL_divergence(q,p)) # 0.013183150978050884
Copy the code

JS divergence python implementation

Import numpy as NP import scipy.stats P = Np.asarray ([0.65,0.25,0.07,0.03]) q= Np.array ([0.6,0.25,0.1,0.05]) Q2 = np. Array ([0.1, 0.2, 0.3, 0.4]) def JS_divergence (p, q) : M = (p + q) / 2return0.5 * scipy. Stats. Entropy (p, M) + 0.5 * scipy. Stats. Entropy (q, M)print(JS_divergence(p,q))  # 0.003093977084273652
print(JS_divergence(p,q2)) # 0.24719159952098618
print(JS_divergence(p,p)) # 0.0

Copy the code

PSI python implementation

We will leave this for the next period (and hive implementation of PSI calculation eggs, welcome to continue to pay attention to)

The resources

The digital control risk ChanLiang/Qiao Yang learning method of statistics expericnce https://www.jiqizhixin.com/articles/0224 https://blog.csdn.net/a358463121/article/details/82903957 https://blog.csdn.net/blmoistawinde/article/details/84329103

This article is formatted using MDNICE