Original link:tecdat.cn/?p=11040

Original source:Tuo End number according to the tribe public number

 

Here, I’ll discuss which functions can be used to handle normal distributions: dnorm, pnorm, qnorm, and Rnorm.

The distribution function in R

There are four associated functions, the four normal distribution functions are:

  • D norm: Density function of normal distribution
  • P norm: Cumulative density function of normal distribution
  • Q norm: Quantile function of normal distribution
  • R norm: Random sampling from a normal distribution

Probability density function: dnorm

The probability density function (PDF) represents the probability of observing a measurement with a particular value, so the integral over the density is always 1. XX, the normal density is defined as

 

Using density, you can determine the probability of an event. For example, you might want to know: What is the probability that a person has an IQ of exactly 140? . In this case, you will need to retrieve the density of the IQ distribution at the value 140. The IQ distribution can be modeled with a mean of 100 and a standard deviation of 15. The corresponding density is:

sample.range <- 50:150
iq.mean <- 100
iq.sd <- 15
iq.dist <- dnorm(sample.range, mean = iq.mean, sd = iq.sd)
iq.df <- data.frame("IQ" = sample.range, "Density" = iq.dist)
library(ggplot2)
ggplot(iq.df, aes(x = IQ, y = Density)) + geom_point()
Copy the code

 

With this data, we can now answer the initial question as well as other questions:

 
# likelihood of IQ == 140?
pp(iq.df$Density[iq.df$IQ == 140])
Copy the code
# # [1] "0.076%"Copy the code
# likelihood of IQ >= 140?
 
Copy the code
# # [1] "0.384%"Copy the code
# likelihood of 50 < IQ <= 90?
 
Copy the code
# # [1] "26.284%"Copy the code

Cumulative density function: Pnorm

The cumulative density (CDF) function is monotonically increasing as it passes through

 

To get a sense of CDF, let’s create a graph for the IQ data:

 
ggplot(iq.df, aes(x = IQ, y = CDF_LowerTail)) + geom_point()
Copy the code

 

As we can see, the CDF depicted shows the likelihood that IQ is less than or equal to a given value. This is because Pnorm by default computes a low tail, that is, P[X<=X] P[X<=X]. Using this knowledge, we can get answers to some previous questions in slightly different ways:

# likelihood of 50 < IQ <= 90?
 
Copy the code
# # [1] "25.249%"Copy the code
# set lower.tail to FALSE to obtain P[X >= x]
 # Probability for IQ >= 140? same value as before using dnorm!
 
Copy the code
# # [1] "0.383%"Copy the code

Note that the result of pNORM is the same as the result of manually summarizing the probabilities obtained by DNORM. In addition, by setting lower.tail = FALSE, dnorm can be used to directly calculate a p-value that measures the observation as at least as likely as the obtained value.

Quantile function: qnorm

The quantile function is just the inverse of the cumulative density function (iCDF). Thus, the quantile function maps from probability to value. Let’s look at the quantile function P[X<=X] P[X<=X] :

# input to qnorm is a vector of probabilities
 
ggplot(icdf.df, aes(x = Probability, y = IQ)) + geom_point()
Copy the code

 

Using quantile functions, we can answer questions about quantiles:

# what is the 25th IQ percentile?
 
Copy the code
# # 89.88265 [1]Copy the code
# what is the 75 IQ percentile?
 
Copy the code
# # 110.1173 [1]Copy the code
# note: this is the same results as from the quantile function
 
Copy the code
-Inf 89.88265 100.00000 110.11735Copy the code

Random sampling function: Rnorm

Rnorm is used when you want to extract a random sample from a normal distribution. For example, we can use rnorm to model a random sample of the IQ distribution.

 
# show one facet per random sample of a given size
ggplot() + geom_histogram(data = my.df, aes(x = IQ)) + facet_wrap(.~SampleSize, scales = "free_y")
Copy the code

 

 
ggplot(my.sample.df, aes(x = IQ)) + geom_histogram()
Copy the code

 

Note that we call set.seed to ensure that the random number generator always generates the same sequence of numbers for repeatability.