Do’s and don’ts of statistics in research

Writing and Reviewing Research Papers

Department of Mathematical Sciences, Aalborg University

Statistics in research

  • Statistics is a branch of mathematics that deals with the collection, analysis, interpretation, and presentation of data.

  • Data is sampled from a population and used to make inferences about the population.

  • It is a fundamental tool in research.

  • Statistics is used to summarize data.

  • It is used to make inferences about populations.

  • It is used to make informed decisions.

  • It is used to test hypotheses.

  • It is conventionally divided into descriptive and inferential statistics.

(Descriptive) Statistics

  • Descriptive statistics is used to summarize data.

  • It is used to describe the main features of a dataset.

  • It is used to present data in a meaningful way.

  • It is used to identify patterns in data.

Measures of central tendency

  • Mean: Average value of a dataset.

  • Median: Middle value of a dataset.

  • Mode: Most frequent value in a dataset.

  • It is important to choose the right measure of central tendency.

Measures of central tendency

Measures of central tendency

Measures of central tendency

Measures of dispersion

  • Range: Difference between the maximum and minimum values.

  • Variance: Average of the squared differences from the mean.

  • Standard deviation: Square root of the variance.

  • Interquartile range: Difference between the 75th and 25th percentiles.

Measures of dispersion

"Measures of dispersion for X"
1×3 Matrix{String}:
 "variance"  "std"  "range"
1×3 Matrix{Float64}:
 47.9362  6.9236  28.13
" "
"Measures of dispersion for Y"
1×3 Matrix{String}:
 "variance"  "std"  "range"
1×3 Matrix{Float64}:
 0.048988  0.221332  1.01

Data visualization

  • Scatter plot: Relationship between two variables.

  • Histogram: Distribution of a variable.

  • Box plot: Distribution of a variable, quartiles.

  • Density plot: Distribution of a variable, smoothed.

Box plot

Density plot

(Inferential) Statistics

  • Inferential statistics is used to make inferences about populations.

  • It is used to test hypotheses.

  • It is used to make informed decisions.

  • It is used to estimate parameters.

Hypothesis testing

  • Null and Alternative hypothesis.

  • Types of error (Type I and Type II).

  • P-value.

  • Confidence interval.

Null and Alternative hypothesis

  • Null hypothesis: No effect or no difference.

  • Alternative hypothesis: Effect or difference.

  • Example: Null hypothesis: The vaccine has no effect. Alternative hypothesis: The vaccine has an effect.

Types of error

  • Type I error: Rejecting the null hypothesis when it is true.

  • Type II error: Failing to reject the null hypothesis when it is false.

  • Example: Type I error: Jail an innocent person. Type II error: Free a guilty person.

  • The probability of observing the data given that the null hypothesis is true.

  • It is used to test hypotheses.

  • (For historical reasons) It is compared to a threshold, usually 0.05.

Confidence interval

  • A range of values that is likely to contain the true value of a parameter.

  • It is used to estimate parameters.

  • (For historical reasons) It is usually set at 95%.

Confidence interval

Confidence interval

Confidence interval

Do and don’ts of statistics in research

  • Do use the right measure of central tendency.

  • Don’t use the mean when the data is skewed or has outliers.

  • Do use the right measure of dispersion.

  • Don’t use the variance when you have outliers.

  • Do use standard deviation to preserve the units of the data.

  • Don’t say we proved the hypothesis.

  • Do say the data supports the hypothesis.

  • Do report confidence intervals.

  • Don’t confuse improbability with impossibility.

Biases in statistics

  • Selection bias: When the sample is not representative of the population.

  • Confirmation bias: When we look for evidence that confirms our beliefs.

  • Publication bias: When only significant results are published.

  • Extrapolation bias: When we extrapolate beyond the data.

  • Causation bias: When we confuse correlation with causation.




