Normal distribution

Contents hide

Normal distribution is a type of statistical distribution. It is also called Gaussian distribution. When you plot the data which has a normal distribution against its frequency (e.g. height on the x-axis and frequency on the y-axis) you get a bell-shaped curve. When data does not follow normal distribution is would be called non-parametric data.

The normal distribution is one of the distributions that numerical data can follow. Categorical data cannot have normal distribution.

Parametric vs non-parametric data

Statistical tests that are undertaken on parametric data (i.e.e data which has a normal distribution) is called parametric tests. To use To undertake parametric tests on data that is not parametric would lead to erroneous results. There are specific non-parametric statistical tests for non-parametric data.Therefor, prior to undertaking statistical analysis, it is important to assess whether the data has a normal distribution (i.e. parametric data) or not.

Normality tests

Graph

How do you assess whether the numerical data you have collected follows a normal distribution (i.e. parametric data)? In statistics, it is always a good practice to plot the data into a graph (e.g. a histogram). The shape of the graph gives a good indication of whether the data follows a normal distribution i.e. whether the shape of the graph resembles bell-shape.

Statistical tests

There are also statistical tests to assess whether the data is parametric. Examples of these tests are:

Anderson-Darling test
D’Agostino-Pearson omnibus normality test
Shapiro-Wilk normality test
Kolmogorov-Smirnov normality test with Dallal-Wilkinson-Lilliefor P-value

The D’Agostino-Pearson omnibus normality test is one of the commonly performed normality tests. If the p-value of the test is > 0.05, then you could assume that the data follows a normal distribution and therefore undertake parametric tests on it. However, the p-value of the D’Agostino-Pearson omnibus normality test is < 0.05, then the data is non-parametric and you need to use a non-parametric test on the data.

What if the data is non-parametric

If the data does not have a normal distribution, then you would need to use non-parametric tests to analyse the data. However, if you are undertaking student t-test (which is a parametric test), provided that sample size is large, the results are robust regardless of whether the data follows the normal or non-parametric distribution.

Normalisation

As mentioned above if the data is non-parametric you would need to use non-parametric tests. However, there are also techniques to convert non-parametric data into parametric data in certain circumstances. This might be using normalisation calculations or converting the data into logarithmic valves. We will review normalisation on another article.

1 thought on “Normal distribution”

Dear Professor,
I hope you are doing well.

Firstly, I would like to thank you for your statistic’s videos and
your efforts to make statistics easy for scientists. I have seen your
video about GraphPad Prism.

I have some questions which are:

1) If we want to ensure our data follows a normal distribution, which type should we pursue: a normal or
log-normal distribution?

For example, we found that lognormal is most likely than the normal
distribution, and using lognormal distribution we found that our data
passed the lognormality test. In this case, do we consider our data
normal and use a parametric test to do statistics and verify
significance between groups? Am I right or wrong?

2) When we want to verify normality and for example, we did 3
experiments and in each experiment, we have 4 values. In this case,
for checking normality distribution, should we enter in Graph Pad 3
values (Mean of each experiment, we did 3 experiments, so 3 means) or
we should enter 12 values (because we have 4 values in each
experiment)?

Thank you in advance for your help.

————————————————————————————

Thank you for your questions.

1. The best way to assess when the normal distribution is to plot the data to see whether the curve approximately approximates to bell-shape. In statistics it is always a good practice is to plot data collected into a graph before proceeding to undertake statistical tests

2. Normality tests assess whether the sample data that you have comes from a population that has a normal distribution.

3. In biology, almost no data set has a true bell-shaped curve. A true normal distribution has infinitely long tails on both sides. However, experiments have shown that statistical approximate bell-shaped curves are robust i.e. in practice, the result of the tests done on near-normal distribution gives meaningful results.

4. A number of parametric tests are valid e.g. student t-test, even with data that is not normally distributed; however, for t-test to be valid in non-normally distributed data, there must be a large number of data in the data set.

5. If you have a lognormal distribution, then you could perform standard parametric tests on them (https://www.graphpad.com/guides/prism/7/statistics/stat_the_lognormal_distribution.htm)

6. However, be aware there is still controversy on undertaking standard test s on data with lognormal distribution (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4120293/)

7. Four data points for each test seems small for normality tests (you would need to consult a professional statistician for this).

8. Repeat measures. As I do not know the exact experiment that you have and or the data, I cannot advise on how to treat the repeated measures from your study.

Hope you found the advice useful.

Good books on statistics for non-statisticians are those from Harvey Motulsky (the developer of GraphPad Prism).

– ‘Essential biostatistics. A nonmathematicial approach’ [shorter version]

– ‘Intuitive Biostatistics. A nonmathematical guide to statistical thinking’. 4th edition [Full version]

Both from Oxford University Press.

Hope you found the advice of some help. Very best wishes with your studies.

Naren

Ganesalingam Narenthiran
g_narenthiran@hotmail.com

Naren says:

April 1, 2020 at 10:41 am

Dear Professor,
I hope you are doing well.

Firstly, I would like to thank you for your statistic’s videos and
your efforts to make statistics easy for scientists. I have seen your
video about GraphPad Prism.

I have some questions which are:

1) If we want to ensure our data follows a normal distribution, which type should we pursue: a normal or
log-normal distribution?

For example, we found that lognormal is most likely than the normal
distribution, and using lognormal distribution we found that our data
passed the lognormality test. In this case, do we consider our data
normal and use a parametric test to do statistics and verify
significance between groups? Am I right or wrong?

2) When we want to verify normality and for example, we did 3
experiments and in each experiment, we have 4 values. In this case,
for checking normality distribution, should we enter in Graph Pad 3
values (Mean of each experiment, we did 3 experiments, so 3 means) or
we should enter 12 values (because we have 4 values in each
experiment)?

Thank you in advance for your help.

————————————————————————————

Thank you for your questions.

1. The best way to assess when the normal distribution is to plot the data to see whether the curve approximately approximates to bell-shape. In statistics it is always a good practice is to plot data collected into a graph before proceeding to undertake statistical tests

2. Normality tests assess whether the sample data that you have comes from a population that has a normal distribution.

3. In biology, almost no data set has a true bell-shaped curve. A true normal distribution has infinitely long tails on both sides. However, experiments have shown that statistical approximate bell-shaped curves are robust i.e. in practice, the result of the tests done on near-normal distribution gives meaningful results.

4. A number of parametric tests are valid e.g. student t-test, even with data that is not normally distributed; however, for t-test to be valid in non-normally distributed data, there must be a large number of data in the data set.

5. If you have a lognormal distribution, then you could perform standard parametric tests on them (https://www.graphpad.com/guides/prism/7/statistics/stat_the_lognormal_distribution.htm)

6. However, be aware there is still controversy on undertaking standard test s on data with lognormal distribution (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4120293/)

7. Four data points for each test seems small for normality tests (you would need to consult a professional statistician for this).

8. Repeat measures. As I do not know the exact experiment that you have and or the data, I cannot advise on how to treat the repeated measures from your study.

Hope you found the advice useful.

Good books on statistics for non-statisticians are those from Harvey Motulsky (the developer of GraphPad Prism).

– ‘Essential biostatistics. A nonmathematicial approach’ [shorter version]

– ‘Intuitive Biostatistics. A nonmathematical guide to statistical thinking’. 4th edition [Full version]

Both from Oxford University Press.

Hope you found the advice of some help. Very best wishes with your studies.

Naren

Ganesalingam Narenthiran
g_narenthiran@hotmail.com