Do’s and don’ts of statistics in research

Writing and Reviewing Research Papers

Department of Mathematical Sciences, Aalborg University

Statistics in research

Statistics in research

Introduction

  • Statistics is a branch of mathematics that deals with the collection, analysis, interpretation, and presentation of data.

  • Data is sampled from a population and used to make inferences about the population.

  • It is a fundamental tool in research.

Statistics in research

  • Statistics is used to summarize data.

  • It is used to make inferences about populations.

  • It is used to make informed decisions.

  • It is used to test hypotheses.

  • It is conventionally divided into descriptive and inferential statistics.

(Descriptive) Statistics

(Descriptive) Statistics

  • Descriptive statistics is used to summarize data.

  • It is used to describe the main features of a dataset.

  • It is used to present data in a meaningful way.

  • It is used to identify patterns in data.

(Descriptive) Statistics

Measures of central tendency

  • Mean: Average value of a dataset.

  • Median: Middle value of a dataset.

  • Mode: Most frequent value in a dataset.

  • It is important to choose the right measure of central tendency.

(Descriptive) Statistics

(Descriptive) Statistics

Measures of central tendency

(Descriptive) Statistics

Measures of central tendency

(Descriptive) Statistics

Measures of central tendency

(Descriptive) Statistics

Measures of dispersion

  • Range: Difference between the maximum and minimum values.

  • Variance: Average of the squared differences from the mean.

  • Standard deviation: Square root of the variance.

  • Interquartile range: Difference between the 75th and 25th percentiles.

(Descriptive) Statistics

Measures of dispersion

"Measures of dispersion for X"
1×3 Matrix{String}:
 "variance"  "std"  "range"
1×3 Matrix{Float64}:
 47.9362  6.9236  28.13
" "
"Measures of dispersion for Y"
1×3 Matrix{String}:
 "variance"  "std"  "range"
1×3 Matrix{Float64}:
 0.048988  0.221332  1.01

(Descriptive) Statistics

Data visualization

  • Scatter plot: Relationship between two variables.

  • Histogram: Distribution of a variable.

  • Box plot: Distribution of a variable, quartiles.

  • Density plot: Distribution of a variable, smoothed.

(Descriptive) Statistics

Box plot

(Descriptive) Statistics

Density plot

(Inferential) Statistics

(Inferential) Statistics

  • Inferential statistics is used to make inferences about populations.

  • It is used to test hypotheses.

  • It is used to make informed decisions.

  • It is used to estimate parameters.

(Inferential) Statistics

Hypothesis testing

  • Null and Alternative hypothesis.

  • Types of error (Type I and Type II).

  • P-value.

  • Confidence interval.

(Inferential) Statistics

Null and Alternative hypothesis

  • Null hypothesis: No effect or no difference.

  • Alternative hypothesis: Effect or difference.

  • Example: Null hypothesis: The vaccine has no effect. Alternative hypothesis: The vaccine has an effect.

(Inferential) Statistics

Types of error

  • Type I error: Rejecting the null hypothesis when it is true.

  • Type II error: Failing to reject the null hypothesis when it is false.

  • Example: Type I error: Jail an innocent person. Type II error: Free a guilty person.

(Inferential) Statistics

P-value

  • The probability of observing the data given that the null hypothesis is true.

  • It is used to test hypotheses.

  • (For historical reasons) It is compared to a threshold, usually 0.05.

(Inferential) Statistics

P-value

(Inferential) Statistics

Confidence interval

  • A range of values that is likely to contain the true value of a parameter.

  • It is used to estimate parameters.

  • (For historical reasons) It is usually set at 95%.

(Inferential) Statistics

Confidence interval

(Inferential) Statistics

Confidence interval

(Inferential) Statistics

Confidence interval

Do and don’ts of statistics in research

Do and don’ts of statistics in research

  • Do use the right measure of central tendency.

  • Don’t use the mean when the data is skewed or has outliers.

  • Do use the right measure of dispersion.

  • Don’t use the variance when you have outliers.

  • Do use standard deviation to preserve the units of the data.

Do and don’ts of statistics in research

  • Don’t say we proved the hypothesis.

  • Do say the data supports the hypothesis.

  • Do report confidence intervals.

  • Don’t confuse improbability with impossibility.

Biases in statistics

  • Selection bias: When the sample is not representative of the population.

  • Confirmation bias: When we look for evidence that confirms our beliefs.

  • Publication bias: When only significant results are published.

  • Extrapolation bias: When we extrapolate beyond the data.

  • Causation bias: When we confuse correlation with causation.

Conclusion

Conclusion

References

Cooper, Kenneth H. 1968. “A Means of Assessing Maximal Oxygen Intake: Correlation Between Field and Treadmill Testing.” Jama 203 (3): 201–4. https://jamanetwork.com/journals/jama/article-abstract/337382.