Writing and Reviewing Research Papers
Department of Mathematical Sciences, Aalborg University
Statistics is a branch of mathematics that deals with the collection, analysis, interpretation, and presentation of data.
Data is sampled from a population and used to make inferences about the population.
It is a fundamental tool in research.
Statistics is used to summarize data.
It is used to make inferences about populations.
It is used to make informed decisions
It is used to test hypotheses.
Descriptive statistics is used to summarize data.
It is used to describe the main features of a dataset.
It is used to present data in a meaningful way.
It is used to identify patterns in data.
Mean: Average value of a dataset.
Median: Middle value of a dataset.
Mode: Most frequent value in a dataset.
Don’t use the mean to report about the median.
Do use the median when the data is skewed or has outliers.
Do label the axes in your plots.
Categorical data: Elementary, Secondary, Higher Education
Don’t use the mean when you have categorical data.
Do use the mode or median instead.
Range: Difference between the maximum and minimum values.
Interquartile range: Difference between the 75th and 25th percentiles.
Variance: Average of the squared differences from the mean.
Standard deviation: Square root of the variance.
"Range: 2, Interquartile range: 1.0, Variance: 0.5289855072463768"
Do use standard deviation to preserve the units of the data.
Don’t use the variance when you have outliers.
Do use the right measure of dispersion depending on the data.
Scatter plot: Relationship between two variables.
Histogram: Distribution of a variable.
Box plot: Distribution of a variable, quartiles.
Density plot: Distribution of a variable, smoothed.
Do think about the units of the variables.
Do summarize the data to make it easier to understand.
Inferential statistics is used to make inferences about populations.
It is used to test hypotheses.
It is used to make informed decisions.
It is used to estimate parameters.
Null and Alternative hypothesis
Types of error (Type I and Type II)
P-value
Confidence interval
Null hypothesis: No effect or no difference.
Alternative hypothesis: Effect or difference.
Example:
Null hypothesis: The vaccine has no effect.
Alternative hypothesis: The vaccine has an effect.
Do state the null and alternative hypothesis.
Do make sure that the null hypothesis is the status quo.
Do make sure that the null and alternative hypothesis are mutually exclusive.
Type I error: Rejecting the null hypothesis when it is true.
Type II error: Failing to reject the null hypothesis when it is false.
Example:
Type I error: Jail an innocent person.
Type II error: Free a guilty person.
The probability of observing the data given that the null hypothesis is true.
It is used to test hypotheses.
(For historical reasons) It is compared to a threshold, usually 0.05 or 0.01.
Do report the p-value.
Do state the p-value threshold before the test.
Do use the p-value to make informed decisions.
Don’t use the p-value to make binary decisions.
Don’t change the model to get a p-value below the threshold.
A range of values that is likely to contain the true value of a parameter.
It is constructed from the data, hence we cannot guarantee that it contains the true value.
(For historical reasons) It is usually set at 95%.
Do use the right measure of central tendency.
Don’t use the mean when the data is skewed or has outliers.
Do use the right measure of dispersion.
Don’t use the variance when you have outliers.
Do use standard deviation to preserve the units of the data.
Do say the data supports the hypothesis.
Don’t confuse improbability with impossibility.
Selection bias: When the sample is not representative of the population.
Confirmation bias: When we look for evidence that confirms our beliefs.
Publication bias: When only significant results are published.
Extrapolation bias: When we extrapolate beyond the data.
Causation bias: When we confuse correlation with causation.
More questions? eduardo@math.aau.dk
Do and don’ts of statistics in research