# Autumn

## Basic

#### Statistics for experimental medical researchers

##### Course director: Erin Gabriel

ECTS: 4 – Language: English

##### Description

This six-day intensive course aims at Ph.D. students in biomedical research who work in a laboratory or similar setting, performing experiments on e.g. cells, tissues, mice, or human volunteers. When participating in this course, you will get a working knowledge of statistical concepts, methods of analysis, and adequate ways of presenting statistical results, as well as hands on experience in analysing experimental data with R statistical software. We will also explain some of the most common errors biomedical researchers make in their statistical analyses. In summary, we aim at teaching you high-quality statistics suitable for research publications.

##### Learning objectives

A student who has met the objectives of the course will be able to:

• Have a qualified discussion with a statistical consultant, e.g. on how to plan the analyses for a research project or how to answer the concerns raised by a reviewer.
• Interpret basic statistical information from research papers, e.g. descriptive statistics, effect estimates, confidence intervals and p-values.
• Apply the most frequently used statistical analyses to real life experimental data using the statistical software R (see contents section for the specific analyses taught in this course).
• Present statistical results in suitable figures, tables, and words.
• Critically assess the validity of the most frequently used statistical analyses by being aware of their modelling assumptions and limitations.

Introduction to R for Basic Statistics (NB: A minimum level of familiarity with basic R is essential, corresponding to that obtained after completing the course “Introduction to R for basic statistics” or the online introduction at https://biostat.ku.dk/r/. The estimated number of hours to complete the online introduction is 10 to 15 hours, depending on your R- and technical skills)

Course webpage: NA

#### Introduction to R for Basic Statistics

##### Course director: Alessandra Meddis

ECTS: 1.4 – Language: English

##### Description

We will explain basic concepts on the statistical software R (install R and Rstudio interface, upload packages, load/write data ). Use of functions in R with the help page and simple mathematical calculations. Basic tools for data manipulation (data structures in R, data frame creation, define/select variables), descriptive statistics in R and creation of graphics in basic R (scatterplot, box-plot and histogram). Half of the course will include exercises.

##### Learning objectives

The course aims to give an introduction to the statistical software R by the user interface Rstudio. The course is designed for health science researcher who wants to become more familiar with R for simple calculations, data management, data exploration and analysis. In particular, the course provides basic functionalities matching the needs for the courses “Basic Statistics for Health Science Researchers” and “Statistics for Experimental Researchers”.

A student who has met the objectives of the course should be able to:

• Use the interface Rstudio
• Implement basic calculation in R
• Manipulate data in R
• Create descriptive analyses in R
• Plot graphics in R

The course is for people that have no or little prior knowledge of R

Course webpage: NA

#### Basic statistics for health science researchers (Danish)

##### Course director: Julie Forman

ECTS: 7.5 – Language: Danish

##### Description

Basic statistical concepts (datatypes, distributions, estimation, confidence intervals). Significance tests (power and sample size calculation, adjustments for multiple testing). Planning and interpretation (exploratory vs confirmatory analyses, randomized vs observational studies, confounding, mediation, effect modification, estimation vs prediction). Analysis of quantitative outcomes (t-tests, ANOVA, linear regression, correlation, ANCOVA, multiple linear regression). Analysis of binary and categorical outcomes (association in two-way tables, logistic regression). Introduction to survival analysis (Kaplan-Meier curves, log-rank test, Cox regression). Introduction to analysis of repeated measurements and clustered data (linear mixed models, simplification).

##### Learning objectives

This course will teach you how to use statistics in a research context by giving you a thorough repetition of basic statistical concepts and models illustrated with case studies from health science.

A student who has met the objectives of the course will be able to:

Interpret basic statistical information from research papers: descriptive statistics, sample size calculations, estimates of effect or association, confidence intervals, and p-values.

• Understand the basic statistical analyses most commonly used in health science: two-sample and paired t-test, linear regression, correlation, analysis of variance (ANOVA), analysis of covariance (ANCOVA), linear models, risk difference, relative risk, odds ratio, chi-square test, logistic regression, survival analysis and linear mixed models.
• Carry out the most commonly used basic statistical analyses using R statistical software, interpret the results, and present them in appropriate tables and figures.
• Recognize the limitations and potential misinterpretations of statistical analyses related to e.g. model violations, confounding, missing data, lack of power, and multiple testing.
• Follow advanced statistics courses from the ph.d. school at the faculty of health science.
• Take advice from a statistician, e.g. in the advisory service at the Section of Biostatistics.

Introduction to R for Basic Statistics (NB: A minimum level of familiarity with basic R is essential, corresponding to that obtained after completing the course “Introduction to R for basic statistics” or the online introduction at https://biostat.ku.dk/r/. The estimated number of hours to complete the online introduction is 10 to 15 hours, depending on your R- and technical skills)

Course webpage: NA

#### Basic statistics for health researchers (English)

##### Course director: Paul Blanche

ECTS: 7.5 – Language: English

##### Description

Basic statistical concepts (datatypes, distributions, estimation, confidence intervals). Significance tests (power and sample size calculation, adjustments for multiple testing). Planning and interpretation (exploratory vs confirmatory analyses, randomized vs observational studies, confounding, mediation, effect modification, estimation vs prediction). Analysis of quantitative outcomes (t-tests, ANOVA, linear regression, correlation, ANCOVA, multiple linear regression). Analysis of binary and categorical outcomes (association in two-way tables, logistic regression). Introduction to survival analysis (Kaplan-Meier curves, log-rank test, Cox regression). Introduction to analysis of repeated measurements and clustered data (linear mixed models, simplification).

##### Learning objectives

This course will teach you how to use statistics in a research context by giving you a thorough repetition of basic statistical concepts and models illustrated with case studies from health science.

A student who has met the objectives of the course will be able to:

Interpret basic statistical information from research papers: descriptive statistics, sample size calculations, estimates of effect or association, confidence intervals, and p-values.

• Understand the basic statistical analyses most commonly used in health science: two-sample and paired t-test, linear regression, correlation, analysis of variance (ANOVA), analysis of covariance (ANCOVA), linear models, risk difference, relative risk, odds ratio, chi-square test, logistic regression, survival analysis and linear mixed models.
• Carry out the most commonly used basic statistical analyses using R statistical software, interpret the results, and present them in appropriate tables and figures.
• Recognize the limitations and potential misinterpretations of statistical analyses related to e.g. model violations, confounding, missing data, lack of power, and multiple testing.
• Follow advanced statistics courses from the ph.d. school at the faculty of health science.
• Take advice from a statistician, e.g. in the advisory service at the Section of Biostatistics.

Introduction to R for Basic Statistics (NB: A minimum level of familiarity with basic R is essential, corresponding to that obtained after completing the course “Introduction to R for basic statistics” or the online introduction at https://biostat.ku.dk/r/. The estimated number of hours to complete the online introduction is 10 to 15 hours, depending on your R- and technical skills)

Course webpage: NA

#### Statistical analysis of survival data

##### Course director: Thomas Scheike

ECTS: 4.9 – Language: English

##### Description

Kaplan-Meier estimation, log-rank test, stratified analysis, Cox-regression. Censoring and truncation. Competing risks. Practical implementation of the techniques through computer labs and home assignments.

##### Learning objectives

The aim of the course is to make the participants able to

• do simple survival analyses
• critically read medical papers using survival analysis techniques
• understand and interpret the outcome of survival analyses

The course is tailored for Ph.D.-students in health sciences who already have taken the Ph.D.-course ”Basic Statistics for Health Researchers” or have a similar knowledge about statistics, and who wish to have more knowledge about the statistical methods underlying the approaches presented in the course.

A basic knowledge of statistics and previous experience with the software program R is expected. However, little or no previous exposure to the topics covered is expected.

Course webpage: NA

#### Targeted Register Analysis

##### Course director: Thomas Gerds

ECTS: 2.8 – Language: English

##### Description

The course consists of 4 days where each day consists of lectures about methods and exercises with R:

Lectures: International experts are giving lectures about recent developments in statistical methods for register analyses. The aim is inspiration and the lectures should be about methods that are as complex as they have to be to solve the real world problems; they should neither simplify the data nor the methods only for the sake of teaching success. The tentative list of topics is:

• Analysing Danish register data
• The roadmap of targeted statistical learning
• The transition from traditional epidemiological tools (cohort followup studies, case-control studies) which produce hazard ratios or odds ratios to average treatment effects defined in a dynamic causal framework
• Machine learning (random forests/recursive neural networks)
• Longitudinal minimum loss estimation (LTMLE)

Exercises: Participants learn data management with R, especially with respect to working with data from Danish registers. During the computer exercises participants will learn how to move a given data analysis project from the often encountered situation of a messy 1-room appartment to a functional multiroom laboratory that invites collaborators to follow the workflow. All steps of the analysis, from the import of the raw data until the export of the tables and figures are controlled by the R-package targets.

##### Learning objectives

A student who has met the objectives of the course will be able to:

• understand the limitations of logistic regression and Cox regression
• know how to ask causible questions (target parameters) before looking into the register data
• define dynamic treatment regimens and analyse register data using the R-package ltmle
• have knowledge of statistical (machine) learning algorithms for register data
• use the R-package targets to setup and organize a reproducible analysis

The course is tailored for Ph.D.-students in health sciences who already have taken the Ph.D.-course “Basic Statistics for Health Researchers” or have a similar knowledge about statistics, and who wish to have more knowledge about the statistical methods underlying the approaches presented in the course.

A basic knowledge of programming with R is expected and previous experience with register data analysis is a great advantage

Course webpage: NA

#### Programming and statistical modelling in R

##### Course director: Michael Sachs

ECTS: 2.4 – Language: English

##### Description

The course covers use of the statistical software package R. The aim is to take the intermediate R user to the next level, and make use of programming techniques for more efficient use of R. A key focus is on introducing core programming principles such as loops and functions. The course will have four half-day lectures after which the students will work on some exercises. This will give the students a chance to use and work with different aspects of R and apply the principles to their own research. Describe the course curriculum in terms of scientific topics covered.

##### Learning objectives

A student who has met the objectives of the course will be able to:

• use programming principles (loops and functions) to handle repetitive tasks
• use functions in R
• use loops in R
• do efficient data manipulation, visualization, and aggregation

Ph.D.-students and health researchers with a basic knowledge of statistics corresponding to the course on basic statistics for health researchers and with a good working knowledge of R, e.g., as obtained by having already followed an introductory course on R.

Course webpage: https://sachsmc.github.io/r-programming

#### Advanced statistical analysis of epidemiological studies

##### Course director: Per Kragh Andersen

ECTS: 4.2 – Language: English

##### Description

Repetition of logistic regression, Poisson regression, and Cox regression. Time-dependent exposure variables. Conditional logistic regression for matched case-control studies. Alternative designs of cohort studies: Nested case-control- and case-cohort studies. The case-cross-over and case-time-control designs. Competing risks. Recurrent events. Introduction to causal inference.

##### Learning objectives

The course builds on the Ph.D.-course in Epidemiological methods in medical research. The purpose is to give an introduction to more advanced statistical methods frequently applied in epidemiological studies. After completing the course the participants will:

• be able to analyse data from classical cohort studies using Poisson or Cox regression and data from case-control studies using ordinary or conditional logistic regression
• know about the advantages of using cohort data sampled as a nested case-control study or a case-cohort study
• know about methods to account for competing risks and recurrent events in follow-up studies
• know about the basic concepts for causal inference

Ph.D.-students with a background corresponding to the course “Epidemiological methods in medical research”

Course webpage: NA

#### Advanced Statistical Topics in Health Research A

##### Course director: Claus Ekstrøm

ECTS: 2.8 – Language: English

##### Description
• Introduction to statistical methods for high-dimensional data, linear models, regularization methods, and variable selection

• Big-p small-n problems
• Multiple testing techniques (inference correction, false discovery rates)
• Regularization methods such as lasso, ridge regression, and elastic net
• The correlation vs. causation and prediction vs. hypothesis differences
• Permutation testing, bootstrapping, and cross-validation

• Parametric and non-parametric bootstrap
• Cross-validation and the jackknife
• Randomization testing
• Classification and regression tress

• Classification and regression trees
• Random forests
• Variable importance
• Imputation techniques for handling missing data

• Imputation and Rubin’s rules
• Multiple Imputation by Chained Equations
##### Learning objectives

Many modern research projects collect data and use experimental designs that require advanced statistical methods beyond what is taught as part of the curriculum in introductory statistical courses. This course covers some of the more general statistical models and methods suitable for analyzing complex data and experimental designs encountered in health research such as methods for high-dimensional data, classification and regression trees, penalized regression, bootstrapping, cross-validation, imputation, and dimension reduction.

The course will contain equal parts theory and applications and consists of four full days of teaching and computer lab exercises. It is the intention that the participants will have a good understanding of the statistical methods presented and are able to apply them in practice after having followed the course. This course is aimed at health researchers with previous knowledge of statistics and the computer language R who need of an overview about appropriate analytical methods and discussions with statisticians to be able to solve their problem.

Note that there are two courses entitled “Advanced Statistical Topics in Health Research” (denoted A and B). They have no overlap and can be taken independently of each other.

A student who has met the objectives of the course will be able to:

• Analyze data using the methods presented and be able to draw valid conclusions based on the results obtained.
• Understand the advantages/disadvantages of the methods presented and be able to discuss potential pitfalls from using these methods.

The course is tailored for Ph.D.-students in health sciences who already have taken the Ph.D.-course “Basic Statistics for Health Researchers” or have a similar knowledge about statistics, and who wish to have more knowledge about the statistical methods underlying the approaches presented in the course.

A basic knowledge of statistics and previous experience with the software program R is expected. However, little or no previous exposure to the topics covered is expected.

Course webpage: NA

# Spring

## Basic

#### Basic statistics for health science researchers (Danish)

##### Course director: Julie Forman

ECTS: 7.5 – Language: Danish

##### Description

Basic statistical concepts (datatypes, distributions, estimation, confidence intervals). Significance tests (power and sample size calculation, adjustments for multiple testing). Planning and interpretation (exploratory vs confirmatory analyses, randomized vs observational studies, confounding, mediation, effect modification, estimation vs prediction). Analysis of quantitative outcomes (t-tests, ANOVA, linear regression, correlation, ANCOVA, multiple linear regression). Analysis of binary and categorical outcomes (association in two-way tables, logistic regression). Introduction to survival analysis (Kaplan-Meier curves, log-rank test, Cox regression). Introduction to analysis of repeated measurements and clustered data (linear mixed models, simplification).

##### Learning objectives

This course will teach you how to use statistics in a research context by giving you a thorough repetition of basic statistical concepts and models illustrated with case studies from health science.

A student who has met the objectives of the course will be able to:

Interpret basic statistical information from research papers: descriptive statistics, sample size calculations, estimates of effect or association, confidence intervals, and p-values.

• Understand the basic statistical analyses most commonly used in health science: two-sample and paired t-test, linear regression, correlation, analysis of variance (ANOVA), analysis of covariance (ANCOVA), linear models, risk difference, relative risk, odds ratio, chi-square test, logistic regression, survival analysis and linear mixed models.
• Carry out the most commonly used basic statistical analyses using R statistical software, interpret the results, and present them in appropriate tables and figures.
• Recognize the limitations and potential misinterpretations of statistical analyses related to e.g. model violations, confounding, missing data, lack of power, and multiple testing.
• Follow advanced statistics courses from the ph.d. school at the faculty of health science.
• Take advice from a statistician, e.g. in the advisory service at the Section of Biostatistics.

Introduction to R for Basic Statistics (NB: A minimum level of familiarity with basic R is essential, corresponding to that obtained after completing the course “Introduction to R for basic statistics” or the online introduction at https://biostat.ku.dk/r/. The estimated number of hours to complete the online introduction is 10 to 15 hours, depending on your R- and technical skills)

Course webpage: NA

#### Basic statistics for health researchers (English)

##### Course director: Paul Blanche

ECTS: 7.5 – Language: English

##### Description

Basic statistical concepts (datatypes, distributions, estimation, confidence intervals). Significance tests (power and sample size calculation, adjustments for multiple testing). Planning and interpretation (exploratory vs confirmatory analyses, randomized vs observational studies, confounding, mediation, effect modification, estimation vs prediction). Analysis of quantitative outcomes (t-tests, ANOVA, linear regression, correlation, ANCOVA, multiple linear regression). Analysis of binary and categorical outcomes (association in two-way tables, logistic regression). Introduction to survival analysis (Kaplan-Meier curves, log-rank test, Cox regression). Introduction to analysis of repeated measurements and clustered data (linear mixed models, simplification).

##### Learning objectives

This course will teach you how to use statistics in a research context by giving you a thorough repetition of basic statistical concepts and models illustrated with case studies from health science.

A student who has met the objectives of the course will be able to:

Interpret basic statistical information from research papers: descriptive statistics, sample size calculations, estimates of effect or association, confidence intervals, and p-values.

• Understand the basic statistical analyses most commonly used in health science: two-sample and paired t-test, linear regression, correlation, analysis of variance (ANOVA), analysis of covariance (ANCOVA), linear models, risk difference, relative risk, odds ratio, chi-square test, logistic regression, survival analysis and linear mixed models.
• Carry out the most commonly used basic statistical analyses using R statistical software, interpret the results, and present them in appropriate tables and figures.
• Recognize the limitations and potential misinterpretations of statistical analyses related to e.g. model violations, confounding, missing data, lack of power, and multiple testing.
• Follow advanced statistics courses from the ph.d. school at the faculty of health science.
• Take advice from a statistician, e.g. in the advisory service at the Section of Biostatistics.

Introduction to R for Basic Statistics (NB: A minimum level of familiarity with basic R is essential, corresponding to that obtained after completing the course “Introduction to R for basic statistics” or the online introduction at https://biostat.ku.dk/r/. The estimated number of hours to complete the online introduction is 10 to 15 hours, depending on your R- and technical skills)

Course webpage: NA

#### Introduction to R for Basic Statistics

##### Course director: Alessandra Meddis

ECTS: 1.4 – Language: English

##### Description

We will explain basic concepts on the statistical software R (install R and Rstudio interface, upload packages, load/write data ). Use of functions in R with the help page and simple mathematical calculations. Basic tools for data manipulation (data structures in R, data frame creation, define/select variables), descriptive statistics in R and creation of graphics in basic R (scatterplot, box-plot and histogram). Half of the course will include exercises.

##### Learning objectives

The course aims to give an introduction to the statistical software R by the user interface Rstudio. The course is designed for health science researcher who wants to become more familiar with R for simple calculations, data management, data exploration and analysis. In particular, the course provides basic functionalities matching the needs for the courses “Basic Statistics for Health Science Researchers” and “Statistics for Experimental Researchers”.

A student who has met the objectives of the course should be able to:

• Use the interface Rstudio
• Implement basic calculation in R
• Manipulate data in R
• Create descriptive analyses in R
• Plot graphics in R

The course is for people that have no or little prior knowledge of R

Course webpage: NA

#### Epidemiological methods in medical research

##### Course director: Brice Ozenne

ECTS: 7 – Language: English

##### Description

Epidemiological investigations have made critical contributions to public health. Historical examples include establishing adverse effects of tobacco use on health, describing the spread of diseases and infectious etiology of HIV, or assessing the safety of vaccines in large populations. They have also addressed medical controversies using strict design of studies and careful methodological considerations. However, epidemiologic studies have often showed conflicting results, which has given space for criticism of epidemiology. This course aims at providing the methodological foundations of epidemiology and thereby rationalize decisions about the formulation of research question, study design, statistical methods, and communication of the results. This should promote scientifically sound epidemiological studies and critical assessment of epidemiological evidence.

This course is spread over 10 full-days where you will be introduced to key concepts in epidemiology and statistical methods in epidemiology research. You will apply them to analyse historical datasets and reflect upon their usefulness and limitations. Toward the end of the course, you will be asked to make a short presentation either illustrating the use of concepts/methods seen during the course (e.g. on data from your Ph.D.) or discuss extensions these concepts/methods based on suggested literature.

The course cover the following topics:

• Purpose and role of epidemiology
• Quantification of disease frequency and its association with an exposure
• Introduction to causal inference: causality, confounding, collider, directed acyclic graphs (DAGs)
• Introduction to various study designs: cohort, case-control, nested case-control, case-cohort
• Statistical methods for handling confounding (stratification, adjustment, standardisation, matching)
• Design and analysis of case-control studies
• Statistical models for binary and time to event outcome (logistic regression, Cox regression, Poisson regression). Handling interactions and performing hypothesis testing.
• Reasoning, illustrated using common fallacies in epidemiology: Simpson paradox, Berkson’s paradox, ecology fallacy, immortal time bias.
• Communication of epidemiologic results
##### Learning objectives

On conclusion of the course, participants should be able to conduct a ‘standard’ epidemiology study:

• reformulate a “typical” epidemiology research question in term of prevalence, rate, or risk.
• define a parameter of interest answering the research question.
• propose a study design relevant for the estimation of the parameter of interest.
• argument about the strength and weaknesses of a study design.
• argument about the variables to consider in the subsequent statistical analysis.
• propose a statistical method relevant for the estimation of the parameter of interest.
• interpret the results: their plausibility and how they answer the research question
• communicate the methods used and the results obtained

They should also be able to critically assess epidemiology articles:

• describe the methodology used by a study based on the ‘materials and method’ section of an article and explicit its implications/assumptions.
• summarize the results of a study based on the ‘result’ section of an article and discuss to which extend they provide evidence to answer the research question.

Acquisition of programming skills:

• use a software program to provide a graphical representation of binary outcome and time to event data.
• use a software program to carry out planned analyses and visualize the results.

Data management is not part of the learning objectives for this course.

The course is tailored for Ph.D.-students in health sciences with interest in epidemiologic research. Students are expected to have a basic knowledge in epidemiology, statistics and programming. Having completed the course in Basic Statistics and introduction to R is advantageous but not mandatory.

Course webpage: NA

#### Introduction to validation of patient reported outcome measures.

##### Course director: Karl Bang Christensen

ECTS: 2.2 – Language: English

##### Description

The course introduces simple methods validation of index scales that summarize information from several items. The course covers classical psychometrics, confirmatory factor analysis, and methods for detection of differential item functioning. The computer exercises use SAS or R, but most of the methods discussed are relatively simple and can be done using SPSS or Stata. The course consists of ten hours of classroom teaching supplemented by online elements.

Illustrative examples are drawn from existing PROMS used in clinical research.

##### Learning objectives

A student who has met the objectives of the course will be able to:

• Know the basic principles for scale validation.
• Compute simple indicators of patient reported outcome measures (PROMs) validity.
• Do a simple confirmatory factor analysis to evaluate the quality of PROMS.

Ph.D.-students and researchers within medicine, public health, epidemiology, sociology, and psychology. A very basic knowledge of statistics will be assumed.

Course webpage: NA

#### Statistical data analysis using the computer program SAS

##### Course director: Karl Bang Christensen

ECTS: 2.5 – Language: English

##### Description

The course covers fundamental use of the statistical software package SAS, from data handling over descriptive statistics and standard methods to an introductory description of the regression procedures. Approximately half the time will be reserved for hands-on exercises. Some emphasis will be put on explaining the theoretical foundation and the applicability of the methods in example problems. There will be a take-home exam which will be evaluated in order to pass the course. T

##### Learning objectives

A student who has met the objectives of the course will be able to:

• Use statistical methods for data analysis in SAS
• Use SAS for simple data management
• Generate tables and figures for publications

PhD students. Some knowledge of basic statistics will be advantageous, but is not required.

Course webpage: NA

#### Advanced Statistical Topics in Health Research B

##### Course director: Claus Ekstrøm

ECTS: 2.8 – Language: English

##### Description

Many modern research projects collect data and use experimental designs that require advanced statistical methods beyond what is taught as part of the curriculum in introductory statistical courses. This course covers some of the more general statistical models based on ideas from Bayesian statistics. These methods are suitable for analyzing complex data and experimental designs encountered in health research such as supervised and non-supervised machine learning methods, principal component analysis and partial least squares, support-vector machines, network analysis, and causal learning.

The course will contain equal parts theory and applications and consists of four full days of teaching and computer lab exercises. It is the intention that the participants will have a good understanding of the statistical methods presented and are able to apply them in practice after having followed the course. This course is aimed at health researchers with previous knowledge of statistics and the computer language R who need of an overview about appropriate analytical methods and discussions with statisticians to be able to solve their problem.

Note that there are two courses entitled “Advanced Statistical Topics in Health Research”. They have no overlap and can be taken independently of each other.

• Introduction to Bayesian statistics and the difference between frequentist and Bayesian statistics.

• Credibility intervals, prior and posterior distributions
• Bayesian classifiers
• Markov-chain Monte Carlo (MCMC) estimation
• Empirical Bayes estimators
• Network analysis

• Introduction to graphs and graph theory
• Visualizing graphs
• Identifying communities
• Latent variable models
• Principal component analysis, partial least squares, and Support-vector machines

• Dimension reduction techniques
• PCA and PLS
• Sparse PCA and PLS
• Multiclass and non-linear SVMs
• Causal Structure Learning

• Introduction to directed acyclic graphs (DAGs)
• Causal structure learning
• Algorithms and assumptions for causal learning
##### Learning objectives

A student who has met the objectives of the course will be able to:

• Analyze data using the methods presented and be able to draw valid conclusions based on the results obtained.
• Understand the advantages/disadvantages of the methods presented and be able to discuss potential pitfalls from using these methods.

The course is tailored for Ph.D.-students in health sciences who already have taken the Ph.D.-course “Basic Statistics for Health Researchers” or have a similar knowledge about statistics, and who wish to have more knowledge about the statistical methods underlying the approaches presented in the course.

A basic knowledge of statistics and previous experience with the software program R is expected. However, little or no previous exposure to the topics covered is expected.

Course webpage: NA

#### Statistical methods in bioinformatics

##### Course director: Claus Ekstrøm

ECTS: 3.5 – Language: English

##### Description
• Penalized regression approaches, principal component regression

• Analysis of mapped reads from mRNA data
• General assembly
• Dynamic programming of pairwise alignment
• Alignment methods for mRNA data
• Poisson methods for expression quantification and transcript distribution
• Genome-wide association studies

• Multiple testing problems
• Imputation
• Common variants vs rare variants. Sequence Kernel Association Test
• Regularization methods, SVM
• Enrichment approaches, gene-set analyses
• Network biology

• Quality assessment and heterogeneous data integration
• Biomedical text mining (named entity recognition & co-occurrence analysis)
• Network analysis with STRING and Cytoscape
• Integrative data analysis

• Zero-inflated and hurdle models (microbiome data and RNA-seq revisited)
• Compositional data analysis
• Gene expression analyses
• Combining data and making inference from multiple platforms and experiments
##### Learning objectives

A student who has met the objectives of the course will be able to: Bioinformatics is concerned with the study of inherent structure of biological information and statistical methods are the workhorses in many of these studies. Some of this inherent structure is very obvious and can be observed directly through correlations of patterns in high-dimensional data, while other patterns arise through more complicated underlying relationships. This course covers some of the basic and novel statistical models and methods suitable for analysing high dimensional data - in particular high dimensional data that rely heavily on statistical methods. The course will contain of equal parts theory and applications and consists of five full days of teaching and computer lab exercises. It is the intention that the participants will have a thorough understanding of the statistical methods and are able to apply them in practice after having followed this course. A student who has met the objectives of the course will be able to:

• Analyse data from a bioinformatics experiment using the methods described below and draw valid conclusions based on the results obtained.
• Understand the advantages/disadvantages of the methods presented and be able to discuss potential pitfalls from using these methods.
• Develop new methods that can be used to analyse novel types of bioinformatics data.

The course is tailored for Ph.D.-students with experience in mathematics, statistics, or bioinformatics, who wish to have more knowledge about the statistical methods underlying the approaches used for common problems in bioinformatics. A basic knowledge of statistics including a little exposure to calculus is expected. However, little or no previous exposure to the topics covered is expected. Students from applied fields are welcome on the course but should expect extra focus on the statistical methodology.

Course webpage: NA

#### Statistical analysis of survival data.

##### Course director: Frank Eriksson

ECTS: 4.9 – Language: English

##### Description

Kaplan-Meier estimation, log-rank test, stratified analysis, Cox-regression. Censoring and truncation. Competing risks. Practical implementation of the techniques through computer labs and home assignments.

##### Learning objectives

The aim of the course is to make the participants able to

• do simple survival analyses
• critically read medical papers using survival analysis techniques
• understand and interpret the outcome of survival analyses

The course is tailored for Ph.D.-students in health sciences who already have taken the Ph.D.-course ”Basic Statistics for Health Researchers” or have a similar knowledge about statistics, and who wish to have more knowledge about the statistical methods underlying the approaches presented in the course.

A basic knowledge of statistics and previous experience with the software program R is expected. However, little or no previous exposure to the topics covered is expected.

Course webpage: NA

#### Programming and statistical modelling in R

##### Course director: Michael Sachs

ECTS: 2.4 – Language: English

##### Description

The course covers use of the statistical software package R. The aim is to take the intermediate R user to the next level, and make use of programming techniques for more efficient use of R. A key focus is on introducing core programming principles such as loops and functions. The course will have four half-day lectures after which the students will work on some exercises. This will give the students a chance to use and work with different aspects of R and apply the principles to their own research. Describe the course curriculum in terms of scientific topics covered.

##### Learning objectives

A student who has met the objectives of the course will be able to:

• use programming principles (loops and functions) to handle repetitive tasks
• use functions in R
• use loops in R
• do efficient data manipulation, visualization, and aggregation

Ph.D.-students and health researchers with a basic knowledge of statistics corresponding to the course on basic statistics for health researchers and with a good working knowledge of R, e.g., as obtained by having already followed an introductory course on R.

Course webpage: https://sachsmc.github.io/r-programming

#### Bayesian methods in biomedical research

##### Course director: Paul Blanche

ECTS: 2.4 – Language: English

##### Description

Bayesian analysis is a statistical tool that is becoming increasingly popular in biomedical sciences. Notably, Bayesian approaches have become commonly used in adaptive designs for Phase I/II clinical trials, in meta-analyses, and also in transcriptomics analysis. This course provides an introduction to Bayesian tools, with an emphasis on biostatistics applications, in order to familiarize students with such methods and their practical applications. A case study from drug development will be discussed to illustrate some of the methods. Thanks to its rich and flexible modelling possibilities and intuitive interpretation, the Bayesian framework is appealing — especially when the number of observations is scarce. It can adaptively incorporate information as it becomes available, an important feature for early phase clinical trials. For example, adaptive Bayesian designs for Phase I/II trials reduce the chances of unnecessarily exposing participants to inappropriate doses and have better decision-making properties compared to the standard rule-based dose-escalation designs. Besides, the use of a Bayesian approach is also very appealing in meta-analyses because of: i) the often relatively small number of studies available, ii) its flexibility, iii) and its better handling of heterogeneity from aggregated results, especially in network meta-analyses. Finally, Bayesian power provides an interesting opportunity to evaluate the probability of success of a trial or program. Thanks to modern computing tools, practical Bayesian analysis has become relatively straightforward, which is contributing to its increasing popularity. JAGS is a flexible software interfaced with R, that allows to easily specify a Bayesian model and that automatically perform inference for posterior parameters distributions as well as graphic outputs to monitor the quality of the analysis.

The aim of the course is to provide insights into Bayesian statistics in the context of medical studies. We will cover the following topics:

• Bayesian modeling (prior, posterior, likelihood, Bayes theorem);
• Bayesian estimation (Credibility Intervals, Maximum a Posteriori, Bayes factor);
• Bayesian applications to meta-analyses;
• Practical Bayesian Analysis with R and JAGS softwares;
• Critical reading of medical publications. All concepts will be illustrated with real-life examples from the medical literrature.
• Evaluating the probability of success of a trial or set of trials
##### Learning objectives

A student who has met the objectives of the course will be able to:

• understand and assess a Bayesian modelling strategy, and discuss its underlying assumptions
• rigorously describe expert knowledge by a quantitative prior distribution
• perform a Bayesian regression using R, applied to meta-analysis
• put into perspective the results from a Bayesian analysis described in a scientific article
• evaluate the probability of success of a trial or set of trials

This course is targeted towards students in graduate programms at the Faculty of Health and Medical Sciences. To be able to follow this course, participants need both:

• some knowledge in statistics (most notably some familiarity with usual probality distributions, probability denstity functions, confidence intervals and Maximum Likelihood Estimation), and
• a practical knowledge of R programming (especially functional programming, for loops and “if” statements, vector allocation, linear regression).

Course webpage: NA

#### Psychometric validation of patient reported outcome measures

##### Course director: Karl Bang Christensen

ECTS: 2.6 – Language: English

##### Description

The course introduces psychometric models for validation of index scales summarizing information from several items. The course covers confirmatory factor analysis (CFA) models, item response theory (IRT) models, and Rasch measurement models. Detection and modelling of differential item functioning and local dependence is discussed. The computer exercises use R. The course consists of ten hours of classroom teaching supplemented by online elements.

##### Learning objectives

A student who has met the objectives of the course will be able to:

• Know the basic principles for validation of patient reported outcome measures (PROMs) using item response theory (IRT) models and Rasch models.
• Do simple analyses for PROM validation studies using state-of-the-art methods.
• Evaluate the quality of published PROMS validation studies.

Ph.D.-students and researchers within medicine, public health, epidemiology, sociology, and psychology. A basic knowledge of statistics will be assumed, as will knowledge of simple methods for scale validation corresponding to the contents of the Ph.D. course ‘Introduction to validation of patient reported outcome measures’

Course webpage: NA

#### Statistical analysis of repeated measurements and clustered data

##### Course director: Julie Forman

ECTS: 4.2 – Language: English

##### Description

This course is concerned with the analysis of correlated quantitative data arising e.g. when collecting data repeatedly on the same persons, animals, or tissue over time or on different locations of the body, or when observations are clustered as from patients in a multi-center study, siblings or pups belonging to the same litter. Appropriate statistical models for analysis will be exemplified and statistical errors arising with other frequently employed analyses will be discussed. Topics include analysis of baseline follow-up studies, longitudinal data analysis, multi-level and variance component models, analysis of cross-over trials, and reproducibility of measurements methods. We will further discuss the potential biases that occur due to missing data and statistical methods for handling these. A thorough introduction to linear mixed models for quantitative outcomes will be given, while generalized linear mixed models and marginal models (aka generalized estimating equations) for the analysis of binary, ordinal, and count data are more briefly touched upon by the end of the course. Computer exercises with R statistical software will be given.

##### Learning objectives

This advanced statistics course will give you an introduction to the most common repeated measurement designs used in medical research. The aim of the course is to teach you to:

• understand and interpret the analyses of various repeated measurement designs including baseline follow-up studies, cross-over trials, and reproducibility of measurement methods, as well as analyses of clustered designs (e.g. multi-level models), and of mixed type.
• perform your own analyses using R statistical software.
• use model diagnostics to assess the validity of your analyses.
• make suitable presentations of the results from your analyses.
• understand the statistical consequences of different kinds of study designs.

Ph.D.-students with a basic knowledge of statistics, e.g. corresponding to the course ”Basic statistics for health researchers” and R programming at beginner level.

Course webpage: https://absalon.ku.dk/courses/47665

#### Targeted Minimum Loss-based Estimation (TMLE) for Causal Inference

##### Course director: Helene Rytgaard

ECTS: 2.8 – Language: English

##### Description

Targeted minimum loss-based estimation (TMLE) is a general framework for estimation of causal effects that combines semiparametric efficiency theory and machine learning in a two-step procedure. The main focus of the course is to understand the overall concept, the theory, and the application of TMLE. Topics covered include:

• The roadmap of targeted learning.
• Basics of causal inference, including counterfactual notation, hypothetical interventions, the g-formula, and the average treatment effect (ATE).
• Causal effect estimation in nonparametric models: target parameters, nuisance parameters, efficient influence functions, asymptotic linearity, and statistical inference based on the efficient influence function.
• TMLE as a two-step procedure involving initial estimation followed by a targeting step.
• Super learning: combining multiple machine learning algorithms via loss-based cross-validation.
• Extensions to more complex data settings: survival outcome, time-dependent confounding, dynamic treatment regimes.
• Basic usage of existing software in R.
##### Learning objectives

A student who has met the objectives of the course will be able to:

• Explain the fundamental principles of statistical inference using targeted minimum loss-based estimation (TMLE) and its application as a general framework for estimation of causal effects.
• Implement TMLE using R software to estimate average treatment effects and time-varying treatment effects based on simulated data, and assess the accuracy and efficiency of the estimators.
• Compare the assumptions and performance of TMLE to related causal inference tools such as inverse probability weighting and standardization, and discuss the strengths and limitations of each approach.
• Evaluate the suitability of super learning and its application in TMLE, and implement the algorithm to improve estimation accuracy.
• Discuss and evaluate the challenges and opportunities in time-varying settings in causal inference, including time-varying treatments and time-dependent confounding, and how TMLE can be used to address these challenges.

The course is relevant for Ph.D.-students with sufficient background in mathematics and statistics. To participate in the practicals, the participants should have knowledge of the statistical software R.

Course webpage: NA

##### Course director: Thomas Scheike

ECTS: 5.6 – Language: English

##### Description

This is a course aimed for Ph.D.-students in biostatistics/statistics.

The course will describe advanced topics for survival data. The first 4 days gives a brief introduction and considers regression models for survival data, including Cox’s regression model and alternative models like the additive intensity model. Goodness-of-fit for these models will be discussed. We will also discuss how to deal with multivariate survival data including frailty models and marginal models. The last 4 days will consider competing risks, multistate models and recurrent events. The course will consist of lectures and computer sessions (using R/SAS) illustrating how the various models can be applied with focus on the practical implementation and interpretation of the methods. The course will be passed via satisfactorily responding to a take-home exam. We expect students to bring their own laptops.

##### Learning objectives

The aim of the course is to make the participants able to

• do practical survival analyses using R
• understand the theoretical arguments behind the key methods
• theoretically analyse simple extensions of survival models
• understand how to deal with competing risks and multistate models.