This is the 28th day of my participation in the August Text Challenge.More challenges in August

Those of you working on a statistical system will have learned about normal distribution, variance, and standard deviation. In PHP, there are extension functions developed specifically for these statistical functions. The STATS extension library we’ll look at today is one such operator. Of course, I have never made any similar system, and I have only a little understanding of these concepts. Therefore, what I learn today is just based on my personal understanding and some content THAT I have touched a little before. Python, however, is said to be more powerful in this area. It is a versatile language, after all, and it is a language that is slowly gaining acceptance after its success in statistics.

The STATS extension is also easy to install using the normal extension installation. And it doesn’t require additional support from other system components, which is very convenient.

Random number between 0 and 1

Let’s start with a function that doesn’t have much to do with statistics.

var_dump(stats_rand_ranf()); / / float (0.32371053099632)
Copy the code

The normal rand() and mt_rand() functions both return integers from 0 to getrandmax(). The stats_rand_ranf() returns a decimal number between 0 and 1. In addition to this function, there are other functions starting with stats_rand_ that return random values such as normal distributions. If you are familiar with statistics, you can refer to the documentation.

Variance, standard deviation

The concepts of variance and standard deviation should be relatively simple and general. Like my real major is psychology, in psychological statistics, there is the calculation of variance and standard deviation, and it is also the required content of the exam. But this is also very simple, we will use our own calculation code after using the function to show the calculation formula of variance and standard deviation.

/ / 1,3,9,12
// average :(1+3+9+12)/4 = 6.25

/ / variance
var_dump(stats_variance([1.3.9.12])); / / float (19.6875)
/ / variance formula (1-6.25) ^ 2 + (3-6.25) ^ 2 + (9-6.25) ^ 2 + (12-6.25) ^ 2) / 4
var_dump((pow(1-6.25.2)+pow(3-6.25.2)+pow(9-6.25.2)+pow(12-6.25.2)) /4); / / float (19.6875)
Copy the code

Averages are useful for many statistical calculations and form the basis of many algorithms. So let’s prepare an average first, mainly for our manual calculation later. In fact, variance and standard deviation are also the basis for many other calculations.

The stats_variance() function is used to calculate the variance of a set of data. It takes an array of arguments and evaluates the value of the data inside the data. The formula for variance is essentially the square of each number minus the average, and then add them all up and divide by the number of numbers.

You can see that the calculated result is the same as if we called stats_variance() directly.

/ / the standard deviation
var_dump(stats_standard_deviation([1.3.9.12])); / / float (4.4370598373247)
var_dump(stats_standard_deviation([1.3.9.12].true)); / / float (5.1234753829798)
/ / standard deviation: the square root ((1-6.25) ^ 2 + (3-6.25) ^ 2 + (9-6.25) ^ 2 + (12-6.25) ^ 2) / 4)
/ / sample standard deviation: the square root ((1-6.25) ^ 2 + (3-6.25) ^ 2 + (9-6.25) ^ 2 + (12-6.25) ^ 2)/(4-1))

var_dump(sqrt((pow(1-6.25.2)+pow(3-6.25.2)+pow(9-6.25.2)+pow(12-6.25.2)) /4)); / / float (4.4370598373247)
var_dump(sqrt((pow(1-6.25.2)+pow(3-6.25.2)+pow(9-6.25.2)+pow(12-6.25.2)) /3)); / / float (5.1234753829798)
Copy the code

The standard deviation is just taking the square root of the variance and dividing it by the number of numbers. And there are two forms of it, either you divide it by the number, or you divide it by the number minus one, which is called the standard deviation and the sample standard deviation. As you can see, using stats_standard_deviation() directly and specifying its second parameter makes it easy to switch calculations of the two standard deviations. And it’s much more convenient than doing it by hand.

Mean deviation, harmonic mean, factorial

The mean deviation generally refers to the arithmetical mean of the absolute value of the difference between the values in a sequence and their arithmetical mean. Oh, my God, that’s so easy to read. How are you doing, statisticians? Of course, in the STATS extension a single function takes care of that.

// Average deviation
var_dump(stats_absolute_deviation([1.3.9.12])); / / 4.25

/ / ((6.25-1) + (6.25 3) + (9-6.25) + (12-6.25)) / 4
/ / (5.25 + 3.25 + 2.75 + 5.75) / 4 = 4.25
Copy the code

The stats_absolute_deviation() function is used to calculate the mean deviation. The absolute value of each number minus the average is divided by the number of numbers to see if the formula is much clearer than the above concept. Again, we look at downgrades and averages.

// Harmonic average
var_dump(stats_harmonic_mean([1.3.9.12])); / / float (2.6181818181818)
/ / 4/1/1 (+ / 3 + 1/9 + 1/12) = 2.6181818181818
Copy the code

Stats_harmonic_mean () is used to calculate the harmonic mean of a set of data. Can you see this from the calculation formula in the notes below? The harmonic mean is the result of adding the reciprocals of each number and dividing it by the sum of reciprocals.

And then finally, let’s do something a little bit easier, a function that directly calculates the factorial.

var_dump(stats_stat_factorial(6)); // float(720)
/ / 1 * 2 * 3 * 4 * 5 * 6 = 720
Copy the code

This function needs no further explanation.

Kurtosis, skewness, cumulative normal distribution function, probability density

These are concepts that I actually don’t have access to. It’s just a test that the function code can be used. There are many related functions, for example, we only have some functions related to the normal distribution, as well as F distribution, T distribution, Cauchy distribution, Chi-square distribution and other related calculation functions. I admit I’ve only heard of one or two names, and many have never even heard of them.

/ / kurtosis
var_dump(stats_kurtosis([1.3.9.12])); / / float (1.6960846560847)

/ / of skewness
var_dump(stats_skew([1.3.9.12])); / / float (0.091222998923078)


// Returns the cumulative distribution function of a normal distribution, its inverse function, or one of its arguments
var_dump(stats_cdf_normal(14.5.10.1));
// Returns the probability density of the first argument
var_dump(stats_dens_normal(14.5.10));
Copy the code

Other functions related to the calculation of distribution you can refer to the relevant documentation, here I will not force the car, the car is estimated to open the ditch to go.

conclusion

Before I read the official documentation, I did not know that PHP already has such an extension. I still thought that if we really want to do similar statistics system, it must be very troublesome to use PHP, so people choose other languages. These extensions already exist. It’s hard to say, but there aren’t too many examples of using PHP to do this kind of statistical system. If you need something, you should do more research. In addition, this kind of calculation is actually a mixture of various formulas, and I believe that there are many useful frameworks in Composer that can be used without the need to install separate extensions into the system.

Test code:

Github.com/zhangyue050…

Reference Documents:

www.php.net/manual/zh/b…