Transfer: https://blog.csdn.net/shijing_0214/article/details/51757564

What is a norm?

We know that the definition of distance is a broad concept, as long as the non-negative, reflexive, triangle inequality can be called distance. Norm is an enhanced concept of distance, which is defined by one more number times than distance. Sometimes, for the sake of understanding, we can think of the norm as the distance.

In mathematics, norm includes vector norm and matrix norm. Vector norm represents the size of vector in vector space, and matrix norm represents the size of change caused by matrix. One kind of non-rigorous interpretation is that the corresponding vector norm, the vectors in the vector space have a size, and how to measure the size is measured by the norm, and different norms can measure the size, just like meters and feet can measure the distance; And the matrix norm, if you take linear algebra, we know that we can change X to B by operating AX is equal to B, and that’s how the matrix norm measures that change.

1. Like minkowski distance, L-P norm is not a norm, but a group of norms, which are defined as follows:



Lp = ∑ 1 nxpi ‾ ‾ ‾ ‾ ‾) p, x = (x1, x2,…, xn)



According to the changes of P, the norm also has different changes. A classic variation diagram of P norm is shown as follows:






The figure above shows the change of the graph formed by points with a distance (norm) of 1 from the origin in three-dimensional space when P changes from infinity to 0. Take the common L-2 norm (p=2) as an example, where the norm is also known as the Euclidean distance. Points in space with Euclidean distance of 1 from the origin form a sphere.

In fact, when 0≤ P <1, Lp does not satisfy the properties of the triangle inequality, so it is not a strict norm. At p = 0.5, 2 d coordinates (1, 4), (4, 1), (1, 9), for example, (1 + 4) ‾) ‾ ‾ ‾ ‾ ‾ ‾ ‾ ‾) 0.5 + (4 + 1) ‾) ‾ ‾ ‾ ‾ ‾ ‾ ‾ ‾) < 0.5 (1) + 9 ‾) ‾ ‾ ‾ ‾ ‾ ‾ ‾ ‾) 0.5. So the L-P norm here is just a conceptual generalization.

2. L0 norm When P=0, that is, L0 norm, it can be seen from the above that L0 norm is not a real norm, and it is mainly used to measure the number of non-zero elements of a vector. The definition of L-0 can be obtained by using the above l-p definition:



| | x | | = ∑ 1 nx0i ‾ ‾ ‾ ‾ 0 ‾), x = (x1, x2,…, xn)



There’s a little bit of a problem here. We know that a non-zero element to the zero power is 1, but what is L0 to the zero power, or a non-zero number to the zero power? It’s not very clear what L0 means.




| | x | | 0 = # (I | xi indicates zero)



Represents the number of non-zero elements of vector x.

For L0 norm, the optimization problem is as follows:



min||x||0


s.t. Ax=b



In practical application, because L0 norm itself is not easy to have a good mathematical representation, it is difficult to give the formal representation of the above problem, so it is considered to be a NP hard problem. So in practice, the optimization problem of L0 will be relaxed to optimization under L1 or L2.

L1 norm L1 norm is a norm we often see, and its definition is as follows:



| | | x = ∑ I | | 1 xi |



Represents the sum of absolute values of non-zero elements of vector x.

L1 norm has many names, such as Manhattan distance, minimum absolute error and so on. L1 norm can be used to measure the Difference between two vectors, such as the Sum of Absolute Difference:



SAD (x1, x2) = ∑ I | x1i – x2i |


For L1 norm, its optimization problems are as follows:



min||x||1


s.t.Ax=b



Due to the natural nature of L1 norm, the solution to L1 optimization is a sparse solution, so L1 norm is also called sparse regular operator. L1 can be used to achieve sparse features and remove some features without information. For example, when classifying users’ movie hobbies, there are 100 features of users, but only a dozen of them may be useful for classification. Most of the features, such as height and weight, may be useless, so L1 norm can be used to filter them out.

The L2 norm is the most common and commonly used norm. The most metric distance we use is Euclidean distance, which is defined as follows:



| | x | | 2 = ∑ ix2i ‾ ‾ ‾ ‾ ‾ ‾)



Represents the sum of squares squared of vector elements.


Like L1 norm, L2 can also measure the Difference between two vectors, such as the Sum of Squared Difference:




SSD (x1, x2) = ∑ I (x1i – x2i) 2


For L2 norm, its optimization problems are as follows:



min||x||2


s.t.Ax=b



L2 norm is usually used as the regularization term of the optimization objective function to prevent the over-fitting of the model in order to meet the training set, so as to improve the generalization ability of the model.

5. L-∞ norm When P=∞, the L-∞ norm, it is primarily used to measure the maximum value of a vector element. L∞ can be obtained by using the above L-P definition as:



| | x | | up = ∑ 1 nx up I ‾ ‾ ‾ ‾ ‾ ‾ tick up, x = (x1, x2,…, xn)



As with L0, in the usual case, everyone uses:




| | x | | up = Max (| xi |)



L up