Activating project at `~/Documents/Pages/everval.github.io/lectures/files`
Writing and Reviewing Research Papers
Department of Mathematical Sciences, Aalborg University
Activating project at `~/Documents/Pages/everval.github.io/lectures/files`
Statistics is a branch of mathematics that deals with the collection, analysis, interpretation, and presentation of data.
Data is sampled from a population and used to make inferences about the population.
It is a fundamental tool in research.
Statistics is used to summarize data.
It is used to make inferences about populations.
It is used to make informed decisions
It is used to test hypotheses.
Descriptive statistics is used to summarize data.
It is used to describe the main features of a dataset.
It is used to present data in a meaningful way.
It is used to identify patterns in data.
Mean: Average value of a dataset.
Median: Middle value of a dataset.
Mode: Most frequent value in a dataset.
Don’t use the mean to report about the median.
Do use the median when the data is skewed or has outliers.
Do label the axes in your plots.
Categorical data: Elementary, Secondary, Higher Education
Don’t use the mean when you have categorical data.
Do use the mode or median instead.
Range: Difference between the maximum and minimum values.
Interquartile range: Difference between the 75th and 25th percentiles.
Variance: Average of the squared differences from the mean.
Standard deviation: Square root of the variance.
"Range: 2, Interquartile range: 1.0, Variance: 0.5289855072463768"
Do use standard deviation to preserve the units of the data.
Don’t use the variance when you have outliers.
Do use the right measure of dispersion depending on the data.
Scatter plot: Relationship between two variables.
Histogram: Distribution of a variable.
Box plot: Distribution of a variable, quartiles.
Density plot: Distribution of a variable, smoothed.
Do think about the units of the variables.
Do summarize the data to make it easier to understand.
Inferential statistics is used to make inferences about populations.
It is used to test hypotheses.
It is used to make informed decisions.
It is used to estimate parameters.
Null and Alternative hypothesis
Types of error (Type I and Type II)
P-value
Confidence interval
Null hypothesis: No effect or no difference.
Alternative hypothesis: Effect or difference.
Example:
Null hypothesis: The vaccine has no effect.
Alternative hypothesis: The vaccine has an effect.
Do state the null and alternative hypothesis.
Do make sure that the null hypothesis is the status quo.
Do make sure that the null and alternative hypothesis are mutually exclusive.
Type I error: Rejecting the null hypothesis when it is true.
Type II error: Failing to reject the null hypothesis when it is false.
Example:
Type I error: Jail an innocent person.
Type II error: Free a guilty person.
The probability of observing the data given that the null hypothesis is true.
It is used to test hypotheses.
(For historical reasons) It is compared to a threshold, usually 0.05 or 0.01.
Do report the p-value.
Do state the p-value threshold before the test.
Do use the p-value to make informed decisions.
Don’t use the p-value to make binary decisions.
Don’t change the model to get a p-value below the threshold.
A range of values that is likely to contain the true value of a parameter.
It is constructed from the data, hence we cannot guarantee that it contains the true value.
(For historical reasons) It is usually set at 95%.
[ Info: Saved animation to /home/eddie/Documents/Pages/everval.github.io/lectures/regression.gif