Do’s and don’ts of statistics in research

Writing and Reviewing Research Papers

Department of Mathematical Sciences, Aalborg University

  Activating project at `~/Library/CloudStorage/OneDrive-AalborgUniversitet/Research/Webpage/everval.github.io/lectures`
    Updating registry at `~/.julia/registries/General.toml`
   Installed FFTW ───────────────────── v1.9.0
   Installed ArrayInterface ─────────── v7.19.0
   Installed DifferentiationInterface ─ v0.6.54
    Updating `~/Library/CloudStorage/OneDrive-AalborgUniversitet/Research/Webpage/everval.github.io/lectures/Project.toml`
  [336ed68f] + CSV v0.10.15
  [a93c6f00] + DataFrames v1.7.0
  [31c24e10] + Distributions v0.25.120
  [f5f8e4a8] + LongMemory v1.0.0
  [91a5bcdd] + Plots v1.40.13
  [10745b16] + Statistics v1.11.1
  [2913bbd2] + StatsBase v0.34.5
  [f3b207a7] + StatsPlots v0.15.7
  [9a3f8284] ~ Random ⇒ v1.11.0
    Updating `~/Library/CloudStorage/OneDrive-AalborgUniversitet/Research/Webpage/everval.github.io/lectures/Manifest.toml`
  [47edcb42] + ADTypes v1.14.0
  [621f4979] + AbstractFFTs v1.5.0
  [79e6a3ab] + Adapt v4.3.0
  [66dad0bd] + AliasTables v1.1.3
  [7d9fca2a] + Arpack v0.5.4
  [4fba245c] + ArrayInterface v7.19.0
  [13072b0f] + AxisAlgorithms v1.1.0
  [d1d4a3ce] + BitFlags v0.1.9
  [336ed68f] + CSV v0.10.15
  [d360d2e6] + ChainRulesCore v1.25.1
  [aaaa29a8] + Clustering v0.15.8
  [944b1d66] + CodecZlib v0.7.8
  [35d6a980] + ColorSchemes v3.29.0
  [3da002f7] + ColorTypes v0.12.1
  [c3611d14] + ColorVectorSpace v0.11.0
  [5ae59095] + Colors v0.13.1
  [bbf7d656] + CommonSubexpressions v0.3.1
  [34da2185] + Compat v4.16.0
  [f0e56b4a] + ConcurrentUtilities v2.5.0
  [187b0558] + ConstructionBase v1.5.8
  [d38c429a] + Contour v0.6.3
  [a8cc5b0e] + Crayons v4.1.1
  [9a962f9c] + DataAPI v1.16.0
  [a93c6f00] + DataFrames v1.7.0
  [864edb3b] + DataStructures v0.18.22
  [e2d170a0] + DataValueInterfaces v1.0.0
  [8bb1440f] + DelimitedFiles v1.9.1
  [163ba53b] + DiffResults v1.1.0
  [b552c78f] + DiffRules v1.15.1
⌅ [a0c0ee7d] + DifferentiationInterface v0.6.54
  [b4f34e82] + Distances v0.10.12
  [31c24e10] + Distributions v0.25.120
  [ffbed154] + DocStringExtensions v0.9.4
  [4e289a0a] + EnumX v1.0.5
  [460bff9d] + ExceptionUnwrapping v0.1.11
  [c87230d0] + FFMPEG v0.4.2
  [7a1cc6ca] + FFTW v1.9.0
  [48062228] + FilePathsBase v0.9.24
  [1a297f60] + FillArrays v1.13.0
  [6a86dc24] + FiniteDiff v2.27.0
  [53c48c17] + FixedPointNumbers v0.8.5
  [1fa38f19] + Format v1.3.7
  [f6369f11] + ForwardDiff v1.0.1
  [28b8d3ca] + GR v0.73.16
  [42e2da0e] + Grisu v1.0.2
  [cd3eb016] + HTTP v1.10.16
  [34004b35] + HypergeometricFunctions v0.3.28
  [842dd82b] + InlineStrings v1.4.3
⌅ [a98d9a8b] + Interpolations v0.15.1
  [41ab1584] + InvertedIndices v1.3.1
  [92d709cd] + IrrationalConstants v0.2.4
  [82899510] + IteratorInterfaceExtensions v1.0.0
  [1019f520] + JLFzf v0.1.11
  [692b3bcd] + JLLWrappers v1.7.0
  [682c06a0] + JSON v0.21.4
  [5ab0869b] + KernelDensity v0.6.9
  [b964fa9f] + LaTeXStrings v1.4.0
  [23fbe1c1] + Latexify v0.16.8
  [d3d80556] + LineSearches v7.3.0
  [2ab3a3ac] + LogExpFunctions v0.3.29
  [e6f89c97] + LoggingExtras v1.1.0
  [f5f8e4a8] + LongMemory v1.0.0
  [1914dd2f] + MacroTools v0.5.16
  [739be429] + MbedTLS v1.1.9
  [442fdcdd] + Measures v0.3.2
  [e1d29d7a] + Missings v1.2.0
  [6f286f6a] + MultivariateStats v0.10.3
  [d41bc354] + NLSolversBase v7.9.1
  [77ba4419] + NaNMath v1.1.3
  [b8a86587] + NearestNeighbors v0.4.21
  [510215fc] + Observables v0.5.5
  [6fe1bfb0] + OffsetArrays v1.17.0
  [4d8831e6] + OpenSSL v1.5.0
  [429524aa] + Optim v1.12.0
  [bac558e1] + OrderedCollections v1.8.1
  [90014a1f] + PDMats v0.11.35
  [d96e819e] + Parameters v0.12.3
  [69de0a69] + Parsers v2.8.3
  [ccf2f8ad] + PlotThemes v3.3.0
  [995b91a9] + PlotUtils v1.4.3
  [91a5bcdd] + Plots v1.40.13
  [2dfb63ee] + PooledArrays v1.4.3
  [85a6dd25] + PositiveFactorizations v0.2.4
⌅ [aea7be01] + PrecompileTools v1.2.1
  [21216c6a] + Preferences v1.4.3
  [08abe8d2] + PrettyTables v2.4.0
  [43287f4e] + PtrArrays v1.3.0
  [1fd47b50] + QuadGK v2.11.2
  [c84ed2f1] + Ratios v0.4.5
  [3cdcf5f2] + RecipesBase v1.3.4
  [01d81517] + RecipesPipeline v0.6.12
  [189a3867] + Reexport v1.2.2
  [05181044] + RelocatableFolders v1.0.1
  [ae029012] + Requires v1.3.1
  [79098fc4] + Rmath v0.8.0
  [6c6a2e73] + Scratch v1.2.1
  [91c51154] + SentinelArrays v1.4.8
  [efcf1570] + Setfield v1.1.2
  [992d4aef] + Showoff v1.0.3
  [777ac1f9] + SimpleBufferStream v1.2.0
  [a2af1166] + SortingAlgorithms v1.2.1
  [276daf66] + SpecialFunctions v2.5.1
  [860ef19b] + StableRNGs v1.0.3
  [90137ffa] + StaticArrays v1.9.13
  [1e83bf80] + StaticArraysCore v1.4.3
  [10745b16] + Statistics v1.11.1
  [82ae8749] + StatsAPI v1.7.1
  [2913bbd2] + StatsBase v0.34.5
  [4c63d2b9] + StatsFuns v1.5.0
  [f3b207a7] + StatsPlots v0.15.7
  [892a3eda] + StringManipulation v0.4.1
  [ab02a1b2] + TableOperations v1.2.0
  [3783bdb8] + TableTraits v1.0.1
  [bd369af6] + Tables v1.12.1
  [62fd8b95] + TensorCore v0.1.1
  [3bb67fe8] + TranscodingStreams v0.11.3
  [5c2747f8] + URIs v1.5.2
  [3a884ed6] + UnPack v1.0.2
  [1cfade01] + UnicodeFun v0.4.1
  [1986cc42] + Unitful v1.22.1
  [45397f5d] + UnitfulLatexify v1.7.0
  [41fe7b60] + Unzip v0.2.0
  [ea10d353] + WeakRefStrings v1.4.2
  [cc8bc4a8] + Widgets v0.6.7
  [efce3f68] + WoodburyMatrices v1.0.0
  [76eceee3] + WorkerUtilities v1.6.1
⌅ [68821587] + Arpack_jll v3.5.1+1
  [6e34b625] + Bzip2_jll v1.0.9+0
  [83423d85] + Cairo_jll v1.18.5+0
  [ee1fde0b] + Dbus_jll v1.16.2+0
  [2702e6a9] + EpollShim_jll v0.0.20230411+1
  [2e619515] + Expat_jll v2.6.5+0
⌅ [b22a6f82] + FFMPEG_jll v4.4.4+1
  [f5851436] + FFTW_jll v3.3.11+0
  [a3f928ae] + Fontconfig_jll v2.16.0+0
  [d7e528f0] + FreeType2_jll v2.13.4+0
  [559328eb] + FriBidi_jll v1.0.17+0
  [0656b61e] + GLFW_jll v3.4.0+2
  [d2c73de3] + GR_jll v0.73.16+0
  [78b55507] + Gettext_jll v0.21.0+0
  [7746bdde] + Glib_jll v2.84.0+0
  [3b182d85] + Graphite2_jll v1.3.15+0
  [2e76f6c2] + HarfBuzz_jll v8.5.1+0
  [1d5cc7b8] + IntelOpenMP_jll v2025.0.4+0
  [aacddb02] + JpegTurbo_jll v3.1.1+0
  [c1c5ebd0] + LAME_jll v3.100.2+0
  [88015f11] + LERC_jll v4.0.1+0
  [1d63c593] + LLVMOpenMP_jll v18.1.8+0
  [dd4b983a] + LZO_jll v2.10.3+0
  [e9f186c6] + Libffi_jll v3.4.7+0
  [7e76a0d4] + Libglvnd_jll v1.7.1+1
  [94ce4f54] + Libiconv_jll v1.18.0+0
  [4b2f31a3] + Libmount_jll v2.41.0+0
  [89763e89] + Libtiff_jll v4.7.1+0
  [38a345b3] + Libuuid_jll v2.41.0+0
  [856f044c] + MKL_jll v2025.0.1+1
  [e7412a2a] + Ogg_jll v1.3.5+1
  [458c3c95] + OpenSSL_jll v3.5.0+0
  [efe28fd5] + OpenSpecFun_jll v0.5.6+0
  [91d4177d] + Opus_jll v1.3.3+0
  [36c8627f] + Pango_jll v1.56.3+0
⌅ [30392449] + Pixman_jll v0.44.2+0
  [c0090381] + Qt6Base_jll v6.8.2+1
  [629bc702] + Qt6Declarative_jll v6.8.2+1
  [ce943373] + Qt6ShaderTools_jll v6.8.2+1
  [e99dba38] + Qt6Wayland_jll v6.8.2+0
  [f50d1b31] + Rmath_jll v0.5.1+0
  [a44049a8] + Vulkan_Loader_jll v1.3.243+0
  [a2964d1f] + Wayland_jll v1.23.1+0
  [2381bf8a] + Wayland_protocols_jll v1.36.0+0
⌅ [02c8fc9c] + XML2_jll v2.13.6+1
  [ffd25f8a] + XZ_jll v5.8.1+0
  [f67eecfb] + Xorg_libICE_jll v1.1.2+0
  [c834827a] + Xorg_libSM_jll v1.2.6+0
  [4f6342f7] + Xorg_libX11_jll v1.8.12+0
  [0c0b7dd1] + Xorg_libXau_jll v1.0.13+0
  [935fb764] + Xorg_libXcursor_jll v1.2.4+0
  [a3789734] + Xorg_libXdmcp_jll v1.1.6+0
  [1082639a] + Xorg_libXext_jll v1.3.7+0
  [d091e8ba] + Xorg_libXfixes_jll v6.0.1+0
  [a51aa0fd] + Xorg_libXi_jll v1.8.3+0
  [d1454406] + Xorg_libXinerama_jll v1.1.6+0
  [ec84b674] + Xorg_libXrandr_jll v1.5.5+0
  [ea2f1a96] + Xorg_libXrender_jll v0.9.12+0
  [c7cfdc94] + Xorg_libxcb_jll v1.17.1+0
  [cc61e674] + Xorg_libxkbfile_jll v1.1.3+0
  [e920d4aa] + Xorg_xcb_util_cursor_jll v0.1.4+0
  [12413925] + Xorg_xcb_util_image_jll v0.4.0+1
  [2def613f] + Xorg_xcb_util_jll v0.4.0+1
  [975044d2] + Xorg_xcb_util_keysyms_jll v0.4.0+1
  [0d47668e] + Xorg_xcb_util_renderutil_jll v0.3.9+1
  [c22f9ab0] + Xorg_xcb_util_wm_jll v0.4.1+1
  [35661453] + Xorg_xkbcomp_jll v1.4.7+0
  [33bec58e] + Xorg_xkeyboard_config_jll v2.44.0+0
  [c5fb5394] + Xorg_xtrans_jll v1.6.0+0
  [3161d3a3] + Zstd_jll v1.5.7+1
  [35ca27e7] + eudev_jll v3.2.9+0
  [214eeab7] + fzf_jll v0.61.1+0
  [1a1c6b14] + gperf_jll v3.3.0+0
  [a4ae2306] + libaom_jll v3.11.0+0
  [0ac62f75] + libass_jll v0.15.2+0
  [1183f4f0] + libdecor_jll v0.2.2+0
  [2db6ffa8] + libevdev_jll v1.11.0+0
  [f638f0a6] + libfdk_aac_jll v2.0.3+0
  [36db933b] + libinput_jll v1.18.0+0
  [b53b4c65] + libpng_jll v1.6.48+0
  [f27f6e37] + libvorbis_jll v1.3.7+2
  [009596ad] + mtdev_jll v1.1.6+0
  [1317d2d5] + oneTBB_jll v2022.0.0+0
⌅ [1270edf5] + x264_jll v2021.5.5+0
⌅ [dfaa095f] + x265_jll v3.5.0+0
  [d8fb68d0] + xkbcommon_jll v1.8.1+0
  [0dad84c5] + ArgTools v1.1.2
  [56f22d72] + Artifacts v1.11.0
  [2a0f44e3] + Base64 v1.11.0
  [ade2ca70] + Dates v1.11.0
  [8ba89e20] + Distributed v1.11.0
  [f43a241f] + Downloads v1.6.0
  [7b1f6079] + FileWatching v1.11.0
  [9fa8497b] + Future v1.11.0
  [b77e0a4c] + InteractiveUtils v1.11.0
  [4af54fe1] + LazyArtifacts v1.11.0
  [b27032c2] + LibCURL v0.6.4
  [76f85450] + LibGit2 v1.11.0
  [8f399da3] + Libdl v1.11.0
  [37e2e46d] + LinearAlgebra v1.11.0
  [56ddb016] + Logging v1.11.0
  [d6f4376e] + Markdown v1.11.0
  [a63ad114] + Mmap v1.11.0
  [ca575930] + NetworkOptions v1.2.0
  [44cfe95a] + Pkg v1.11.0
  [de0858da] + Printf v1.11.0
  [3fa0cd96] + REPL v1.11.0
  [9a3f8284] + Random v1.11.0
  [ea8e919c] + SHA v0.7.0
  [9e88b42a] + Serialization v1.11.0
  [1a1011a3] + SharedArrays v1.11.0
  [6462fe0b] + Sockets v1.11.0
  [2f01184e] + SparseArrays v1.11.0
  [f489334b] + StyledStrings v1.11.0
  [4607b0f0] + SuiteSparse
  [fa267f1f] + TOML v1.0.3
  [a4e569a6] + Tar v1.10.0
  [8dfed614] + Test v1.11.0
  [cf7118a7] + UUIDs v1.11.0
  [4ec0a83e] + Unicode v1.11.0
  [e66e0078] + CompilerSupportLibraries_jll v1.1.1+0
  [deac9b47] + LibCURL_jll v8.6.0+0
  [e37daf67] + LibGit2_jll v1.7.2+0
  [29816b5a] + LibSSH2_jll v1.11.0+1
  [c8ffd9c3] + MbedTLS_jll v2.28.6+0
  [14a3606d] + MozillaCACerts_jll v2023.12.12
  [4536629a] + OpenBLAS_jll v0.3.27+1
  [05823500] + OpenLibm_jll v0.8.5+0
  [efcefdf7] + PCRE2_jll v10.42.0+1
  [bea87d4a] + SuiteSparse_jll v7.7.0+0
  [83775a58] + Zlib_jll v1.2.13+1
  [8e850b90] + libblastrampoline_jll v5.11.0+0
  [8e850ede] + nghttp2_jll v1.59.0+0
  [3f19e933] + p7zip_jll v17.4.0+2
        Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m`
Precompiling project...
    286.7 ms  ✓ Parameters
    351.6 ms  ✓ ADTypes → ADTypesChainRulesCoreExt
    312.5 ms  ✓ ArrayInterface
    429.8 ms  ✓ Distances
    347.7 ms  ✓ TableOperations
    479.1 ms  ✓ Widgets
    495.8 ms  ✓ ConstructionBase → ConstructionBaseStaticArraysExt
    593.3 ms  ✓ DifferentiationInterface
    267.9 ms  ✓ ArrayInterface → ArrayInterfaceStaticArraysCoreExt
    336.3 ms  ✓ ArrayInterface → ArrayInterfaceChainRulesCoreExt
    557.0 ms  ✓ ForwardDiff → ForwardDiffStaticArraysExt
    526.5 ms  ✓ ArrayInterface → ArrayInterfaceSparseArraysExt
    611.4 ms  ✓ Distributions → DistributionsChainRulesCoreExt
    461.9 ms  ✓ Distances → DistancesSparseArraysExt
    273.3 ms  ✓ Distances → DistancesChainRulesCoreExt
    895.2 ms  ✓ MultivariateStats
    358.8 ms  ✓ DifferentiationInterface → DifferentiationInterfaceChainRulesCoreExt
    510.1 ms  ✓ DifferentiationInterface → DifferentiationInterfaceStaticArraysExt
    322.3 ms  ✓ DifferentiationInterface → DifferentiationInterfaceSparseArraysExt
    467.4 ms  ✓ DifferentiationInterface → DifferentiationInterfaceForwardDiffExt
    354.6 ms  ✓ FiniteDiff
    333.2 ms  ✓ FiniteDiff → FiniteDiffSparseArraysExt
    946.9 ms  ✓ NearestNeighbors
    383.1 ms  ✓ DifferentiationInterface → DifferentiationInterfaceFiniteDiffExt
    477.0 ms  ✓ FiniteDiff → FiniteDiffStaticArraysExt
    893.6 ms  ✓ NLSolversBase
   1263.3 ms  ✓ Clustering
   3009.0 ms  ✓ FFTW
   1005.7 ms  ✓ LineSearches
   1352.9 ms  ✓ KernelDensity
   1935.1 ms  ✓ Optim
   5585.3 ms  ✓ CSV
   3668.9 ms  ✓ StatsPlots
   3122.4 ms  ✓ LongMemory
  34 dependencies successfully precompiled in 9 seconds. 264 already precompiled.

Statistics in research

Introduction

Statistics is a branch of mathematics that deals with the collection, analysis, interpretation, and presentation of data.
Data is sampled from a population and used to make inferences about the population.
It is a fundamental tool in research.

Statistics in research

Statistics is used to summarize data.
It is used to make inferences about populations.
It is used to make informed decisions
It is used to test hypotheses.

It is conventionally divided into descriptive and inferential statistics.

(Descriptive) Statistics

Descriptive statistics is used to summarize data.
It is used to describe the main features of a dataset.
It is used to present data in a meaningful way.
It is used to identify patterns in data.

(Descriptive) Statistics

Measures of central tendency

Mean: Average value of a dataset.
Median: Middle value of a dataset.
Mode: Most frequent value in a dataset.

It is important to choose the right measure of central tendency depending on the data.

Measures of central tendency

Half the population has a below average IQ.
Don’t use the mean to report about the median.
Do use the median when the data is skewed or has outliers.
Do label the axes in your plots.

Measures of central tendency

Categorical data: Elementary, Secondary, Higher Education

Mean of categorical data is not meaningful.
Don’t use the mean when you have categorical data.
Do use the mode or median instead.

(Descriptive) Statistics

Measures of dispersion

Range: Difference between the maximum and minimum values.
Interquartile range: Difference between the 75th and 25th percentiles.
Variance: Average of the squared differences from the mean.
Standard deviation: Square root of the variance.

Measures of dispersion

"Range: 2, Interquartile range: 1.0, Variance: 0.5289855072463768"

Variance is not meaningful for categorical data.

Measures of dispersion

Do use standard deviation to preserve the units of the data.
Don’t use the variance when you have outliers.
Do use the right measure of dispersion depending on the data.

(Descriptive) Statistics

Data visualization

Scatter plot: Relationship between two variables.
Histogram: Distribution of a variable.
Box plot: Distribution of a variable, quartiles.
Density plot: Distribution of a variable, smoothed.

Scatter plot

Do think about the units of the variables.
Do summarize the data to make it easier to understand.

Density plot

X values
Y values

Do think if the data is Normally distributed.

(Inferential) Statistics

Inferential statistics is used to make inferences about populations.
It is used to test hypotheses.
It is used to make informed decisions.
It is used to estimate parameters.

(Inferential) Statistics

Hypothesis testing

Null and Alternative hypothesis
Types of error (Type I and Type II)
P-value
Confidence interval

Null and Alternative hypothesis

Null hypothesis: No effect or no difference.
Alternative hypothesis: Effect or difference.
Example:
- Null hypothesis: The vaccine has no effect.
- Alternative hypothesis: The vaccine has an effect.

Do state the null and alternative hypothesis.
Do make sure that the null hypothesis is the status quo.
Do make sure that the null and alternative hypothesis are mutually exclusive.

Types of error

Type I error: Rejecting the null hypothesis when it is true.
Type II error: Failing to reject the null hypothesis when it is false.
Example:
- Type I error: Jail an innocent person.
- Type II error: Free a guilty person.

P-value

The probability of observing the data given that the null hypothesis is true.
It is used to test hypotheses.
(For historical reasons) It is compared to a threshold, usually 0.05 or 0.01.

P-value

Do report the p-value.
Do state the p-value threshold before the test.
Do use the p-value to make informed decisions.
Don’t use the p-value to make binary decisions.
Don’t change the model to get a p-value below the threshold.

Confidence interval

A range of values that is likely to contain the true value of a parameter.
It is constructed from the data, hence we cannot guarantee that it contains the true value.
(For historical reasons) It is usually set at 95%.

Confidence interval

[ Info: Saved animation to /Users/eduardo/Library/CloudStorage/OneDrive-AalborgUniversitet/Research/Webpage/everval.github.io/lectures/regression.gif

Confidence interval

Do and don’ts of statistics in research

Do use the right measure of central tendency.
Don’t use the mean when the data is skewed or has outliers.
Do use the right measure of dispersion.
Don’t use the variance when you have outliers.
Do use standard deviation to preserve the units of the data.

Do and don’ts of statistics in research

Don’t say we proved the hypothesis.
Do say the data supports the hypothesis.
Don’t say we accept the null hypothesis.
Do say we fail to reject the null hypothesis.
Do report confidence intervals.
Don’t confuse improbability with impossibility.

Biases in statistics

Selection bias: When the sample is not representative of the population.
Confirmation bias: When we look for evidence that confirms our beliefs.
Publication bias: When only significant results are published.
Extrapolation bias: When we extrapolate beyond the data.
Causation bias: When we confuse correlation with causation.

Conclusion

Ask questions, use PhD consult: https://www.math.aau.dk/research/phd-consult

More questions? eduardo@math.aau.dk
Thank you.