# Ph.D.-courses from Section of Biostatistics

Below is a list of Ph.D.-courses provided by the section of biostatistics each year.

To sign up you must use the course website of the Ph.D.-school. Click here to sign up.

Beware that a course typically will only be present on the PhD-school page, a few months before the course starts.

Course secretary for all courses: Susanne Kragskov Laupstad, e-mail: skl@sund.ku.dk**NB:** ECTS may vary a little from year to year.

## Spring

**Topics covered:**

Basic statistical concepts (probability, distribution, estimation, test of significance). Analysis of quantitative measurements (group comparisons, regression and the general linear model). Sample size determination. Categorical data (association in two-way tables, logistic regression analysis). Survival analysis (Kaplan-Meier, Cox regression). Correlated measurements, longitudinal data.

We have different courses for the different statistical programmes – SPSS, SAS, R.

**Learning objectives**

A student who has met the objectives of the course will be able to:

After finishing the course, the participants will

- have a general feeling for the ideas in a statistical model and the type of conclusions that can be drawn from the subsequent statistical analysis
- be able to understand and interpret the results of basic statistical procedures (t-test, associations in 2x2 tables, linear regression, multiple linear regression, logistic regression, survival analysis)
- know the assumptions involved in the basic statistical procedures, and why these are not all equally important, depending on the aim of the analysis
- be able to carry out these basic statistical procedures using one of the mainstream statistical software packages
- have required tools for detecting gross misfit of the model and make remedies such as transformation of outcome and/or covariates
- have a thorough understanding of the concepts of confounding and interaction, preferably in the context of their own work
- know about estimation of association parameters, statistical significance and power, so that they can write the statistical methods and results sections for their own research reports (limited to basic statistical procedures)
- know when to seek expert help.

**Topics covered: **The course covers several study designs (cohort studies, case-control studies), statistical regression model (logistic, Poisson, Cox), and methods to dealing with confounding (stratification, adjustment, standardization, matching) commonly used in epidemiology. Concepts and the use of statistical models are illustrated using examples drawn from medical research.

**Learning objectives: **

The purpose of the course is to provide an introduction to modern methods in epidemiology as used in medical research and based on biostatistics. On conclusion of the course the participants should be able to read and understand modern epidemiological methods and design and analyse standard studies.

**Content and learning objectives: **

Many modern research projects collect data and use experimental designs that require advanced statistical methods beyond what is taught as part of the curriculum in introductory statistical courses.

This course covers some of the more general statistical models based on ideas from Bayesian statistics. These methods suitable for analyzing complex data and experimental designs encountered in health research such as supervised and non-supervised machine learning methods, neural networks, support-vector machines, and network analysis.

The course will contain equal parts theory and applications and consists of four full days of teaching and computer lab exercises. It is the intention that the participants will have a good understanding of the statistical methods presented and are able to apply them in practice after having followed the course. This course is aimed at health researchers with previous knowledge of statistics and the computer language R who need of an overview about appropriate analytical methods and discussions with statisticians to be able to solve their problem.

Note that there are two courses entitled “Advanced Statistical Topics in Health Research”. They have no overlap and can be taken independently of each other.

- Introduction to Bayesian statistics and the difference between frequentist and Bayesian statistics.
- Credibility intervals, prior and posterior distributions
- Bayesian classifiers
- Markov-chain Monte Carlo (MCMC) estimation
- Empirical Bayes estimators

- Neural networks
- Supervised vs unsupervised machine learning methods
- Logistic regression
- Fitting neural networks
- Introduction to deep learning

- Support-vector machines

- Fitting SVMs
- Multiclass and non-linear SVMs

- Network analysis

- Introduction to graphs and graph theory
- Visualizing graphs
- Identifying communities
- Latent variable models

**Learning objectives:**

A student who has met the objectives of the course will be able to:

Bioinformatics is concerned with the study of inherent structure of biological information and statistical methods are the workhorses in many of these studies. Some of this inherent structure is very obvious and can be observed directly through correlations of patterns in high-dimensional data, while other patterns arise through more complicated underlying relationships.

This course covers some of the basic and novel statistical models and methods suitable for analysing high dimensional data - in particular high dimensional data that rely heavily on statistical methods. The course will contain of equal parts theory and applications and consists of five full days of teaching and computer lab exercises. It is the intention that the participants will have a thorough understanding of the statistical methods and are able to apply them in practice after having followed this course.

A student who has met the objectives of the course will be able to:

- Analyse data from a bioinformatics experiment using the methods described below and draw valid conclusions based on the results obtained.
- Understand the advantages/disadvantages of the methods presented and be able to discuss potential pitfalls from using these methods.
- Develop new methods that can be used to analyse novel types of bioinformatics data.

**Topics covered:**

*Brief overview of molecular data. Introduction to statistical methods for high-dimensional**data, linear models and regularization methods*

- Big-p small-n problems
- Multiple testing techniques (inference correction, false discovery rates, q-values)
- The correlation vs. causation and prediction vs. hypothesis differences
- Penalized regression approaches, principal component regression

*Analysis of mapped reads from mRNA data*

- General assembly
- Dynamic programming of pairwise alignment
- Alignment methods for mRNA data
- Poisson methods for expression quantification and transcript distribution

*Genome-wide association studies*

- Multiple testing problems
- Imputation
- Common variants vs rare variants. Sequence Kernel Association Test
- Regularization methods, SVM
- Enrichment approaches, gene-set analyses,

*Network biology*

- Quality assessment and heterogeneous data integration
- Biomedical text mining (named entity recognition & co-occurrence analysis)
- Network analysis with STRING and Cytoscape

*Analysis of array data and integrative data analysis*

- Zero-inflated and hurdle models (microbiome data and RNA-seq revisited)
- DNA variant calling
- Gene expression analyses
- Matrix factorization
- Combining data from multiple platforms and experiments
- Inference methods for combined (and simultaneous) data

**Learning objectives:**

The aim of the course is to make the participants able to:

- do simple survival analyses
- critically read medical papers using survival analysis techniques
- understand and interpret the outcome of survival analyses

**Content:**

Kaplan-Meier estimation, log-rank test, stratified analysis, Cox-regression. Censoring and truncation. Competing risks. Practical implementation of the techniques through computer labs and home assignments.

**Learning objectives:**

A student who has met the objectives of the course will be able to:

- Use statistical methods for data analysis in SAS
- Use SAS for simple data management
- Generate tables and figures for publications

** ****Aim and content:**

The course covers fundamental use of the statistical software package SAS, from data handling over descriptive statistics and standard methods to an introductory description of the regression procedures. Approximately half the time will be reserved for hands-on exercises. Some emphasis will be put on explaining the theoretical foundation and the applicability of the methods in example problems.

There will be a take-home exam which will be evaluated in order to pass the course

**Learning objectives:**

The aim of the course is to make the participants able to:

- read simple data into SPSS
- do simple data manipulation in SPSS
- do merging and restructuring
- do reproducible research using SPSS dialogues and syntax

**Content:**

Topics covered include: Introduction to SPSS’s way of dealing with data and output. Definition of variables. Entering data. Recoding of existing variables and calculation of new variables. Producing tables and diagrams. SPSS syntax files. Statistical analyses (e.g. regression analysis and analysis of variance).

**Learning objectives:**

The aim of the course is to make the participants able to

- use programming principles (loops and functions) to handle repetitive tasks
- use functions in R
- use loops in R
- do efficient data manipulation and aggregation

**Content:**

The course covers use of the statistical software package R. The aim is to take the intermediate R user to the next level, and make use of programming techniques for more efficient use of R. A key focus in on introducing loops and functions. The course will have four half-day lectures after which the students will work on some exercises. This will give the students a chance to use and work with different aspects of R.

**Learning objectives:**

A student who has met the objectives of the course will be able to:

- understand and assess a Bayesian modelling strategy, and discuss its underlying assumptions
- rigorously describe expert knowledge by a quantitative prior distribution
- perform a Bayesian regression using R, applied to meta-analysis
- put into perspective the results from a Bayesian analysis described in a scientific article

**Content:**

Bayesian analysis is a statistical tool that is becoming increasingly popular in biomedical sciences. Notably, Bayesian approaches have become commonly used in adaptive designs for Phase I/II clinical trials, in meta-analyses, and also in transcriptomics analysis. This course provides an introduction to Bayesian tools, with an emphasis on biostatistics applications, in order to familiarize students with such methods and their practical applications. Thanks to its rich and flexible modelling possibilities and intuitive interpretation, the Bayesian framework is appealing – especially when the number of observations is scarce. It can adaptively incorporate information as it becomes available, an important feature for early phase clinical trials. For example, adaptive Bayesian designs for Phase I/II trials reduce the chances of unnecessarily exposing participants to inappropriate doses and have better decision-making properties compared to the standard rule-based dose-escalation designs. Besides, the use of a

Bayesian approach is also very appealing in meta-analyses because of:

- the often relatively small number of studies available,
- its flexibility,
- and its better handling of heterogeneity from aggregated results, especially in network meta-analyses. Thanks to modern computing tools, practical Bayesian analysis has become relatively straightforward, which is contributing to its increasing popularity. JAGS is a flexible software interfaced with R, that allows to easily specify a Bayesian model and that automatically perform inference for posterior parameters distributions as well as graphic outputs to monitor the quality of the analysis.

The aim of the course is to provide insights into Bayesian statistics in the context of medical studies. We will cover the following topics:

1) Bayesian modeling (prior, posterior, likelihood, Bayes theorem);

2) Bayesian estimation (Credibility Intervals, Maximum a Posteriori, Bayes factor);

3) Bayesian applications to meta-analyses;

4) Practical Bayesian Analysis with R and JAGS softwares;

5) Critical reading of medical publications. All concepts will be illustrated with real-life examples from the medical literature.

**Learning objectives:**

A student who has met the objectives of the course will be able to:

- Know the basic principles for validation of patient reported outcome measures (PROMS) using item response theory (IRT) models and confirmatory factor analysis (CFA) models.
- Validate simple PROMS using state-of-the-art methods.
- Evaluate the quality of published PROMS validation studies.

### **Content:**

The course introduces psychometric models for validation of index scales summarizing information from several items. The course covers confirmatory factor analysis (CFA) models, item response theory (IRT) models, and Rasch measurement models. Detection and modelling of differential item functioning and local dependence is discussed. The computer exercises use SAS or R.

**Learning objectives:**

A student who has met the objectives of the course will be able to:

- Know the basic principles for scale validation.
- Compute simple indicators of patient reported outcome measures (PROMS) validity.
- Do a simple confirmatory factor analysis to evaluate the quality of PROMS.

**Content:**

The course introduces simple methods validation of index scales that summarize information from several items. Examples from Patient Report Outcomes are used. The course covers classical psychometrics, confirmatory factor analysis, and methods for detection of differential item functioning. The computer exercises use SAS or R, but most of the methods discussed are relatively simple and can be done using SPSS or Stata.

Illustrative examples are drawn from existing PROMS used in clinical research.

**Learning objectives:**

To enable the student to work with the statistical software R. R is a free (based on open source principles) statistical software package, which is supported by a very large international research community. The program can be used to do all types of statistical analyses. Due to the open principles and the large supporting research community the latest statistical techniques are almost always available in R long before the commercial programs (SAS, SPSS, Stata).

**Content:**

We will explain basic use of R; from data management through descriptive statistics and standard analysis to downloading and using packages with the latest statistical techniques from the web. Roughly half of the course will be spent doing exercises in the computer-rooms or using your own laptop. Main focus will be on the use of R, but there will also be time for discussing more statistical aspects of the methods. The whole course will be structured around a few real life cases.

## Autumn

**Topics covered: **

Basic statistical concepts (probability, distribution, estimation, test of significance). Analysis of quantitative measurements (group comparisons, regression and the general linear model). Sample size determination. Categorical data (association in two-way tables, logistic regression analysis). Survival analysis (Kaplan-Meier, Cox regression). Correlated measurements, longitudinal data.

We have different courses for the different statistical programmes – SPSS, SAS, R.

**Learning objectives: **

A student who has met the objectives of the course will be able to:

After finishing the course, the participants will

- have a general feeling for the ideas in a statistical model and the type of conclusions that can be drawn from the subsequent statistical analysis
- be able to understand and interpret the results of basic statistical procedures (t-test, associations in 2x2 tables, linear regression, multiple linear regression, logistic regression, survival analysis)
- know the assumptions involved in the basic statistical procedures, and why these are not all equally important, depending on the aim of the analysis
- be able to carry out these basic statistical procedures using one of the mainstream statistical software packages
- have required tools for detecting gross misfit of the model and make remedies such as transformation of outcome and/or covariates
- have a thorough understanding of the concepts of confounding and interaction, preferably in the context of their own work
- know about estimation of association parameters, statistical significance and power, so that they can write the statistical methods and results sections for their own research reports (limited to basic statistical procedures)
- know when to seek expert help.

**Learning objectives:**

After finishing the course, the participants will

- have a general feeling for the ideas in a statistical model and the type of conclusions that can be drawn from the subsequent statistical analysis.
- be able to understand and interpret the results of basic statistical procedures (t-test, Wilcoxon rank test, associations in 2x2 tables, linear regression, multiple linear regression, logistic regression, survival analysis).
- know the assumptions involved in the basic statistical procedures, and why these are not all equally important, depending on the aim of the analysis.
- be able to carry out these basic statistical procedures using one of the mainstream statistical software packages.
- have learned about graphical tools for assessing the fit of the statistical model and basic recipes to make remedies such as transformation of outcome and/or covariates.
- have a thorough understanding of the concepts of confounding and interaction, preferably in the context of their own work.
- know about estimation of association parameters, statistical significance and power, so that they can write the statistical methods and results sections for their own research reports (limited to basic statistical procedures).
- know when to seek expert help.

The participants are invited to bring their own data to the exercise sessions, since in quiet moments there will be a limited access to discuss these in the light of the topics covered in the course.

**Content**

**Topics covered:**

Descriptive statistics: mean, standard deviation, quantiles, percentages. Basic concepts of statistical inference: parameter estimate, confidence interval, p-value, significance level, power, multiple testing. Analysis of quantitative measurements: group comparisons, regression and the general linear model. Categorical data: association in two-way tables, logistic regression analysis. Survival analysis: Kaplan-Meier, Cox regression. Sample size determination, regression to the mean, confounding and interaction, association versus causation.

**Statistical software:**

The focus of this course is not on how to use statistical software. But, statistical software is needed for all data analyses and examples that illustrate the statistical methods. It is expected that students learn the syntax and semantics of a suitable statistical software program before and during the course by themselves. Note that this will often mean a lot of extra hours for preparation and self-training in addition to the actual teaching hours. The free statistical software R is used to illustrate the practicals and for tutorials throughout the course.

For participants who do not know any statistical package before the course starts it is strongly recommended that they work with R via the R-studio https://www.rstudio.com/ which is a user-friendly platform-independent interface to R. They are also expected to start working with R syntax and semantics several weeks before the course starts. A minimum level corresponding to that obtained after completed our online introduction to R at http://r.sund.ku.dk/ is considered as a prerequisite. In this introduction, we guide you through how to install R, how to load data, data manipulation and simple calculations and plots. Estimated number of hours to complete the introduction: 15 +/- 5 hours depending on your R- and technical skills. You can start working with the introduction now if you have limited time up to the course start.

Participants can also work with one of the following alternative statistical software packages (SAS, STATA, SPSS), however, this is only recommended if they have considerable experience with the software before the course starts. The following should be noted: the teachers will be able to answer statistical questions regarding the output of all statistical software packages, but technical questions (e.g., about coding) only for the software R.

The participants are expected to use their own laptops during the course, to have installed all relevant software and to have downloaded all data for use during the course.

**Aim and learning objectives****:**

The course builds on the Ph.D.-course in Epidemiological methods in medical research. The purpose is to give an introduction to more advanced statistical methods frequently applied in epidemiological studies. After completing the course the participants will:

- be able to analyse data from classical cohort studies using Poisson or Cox regression and data from case-control studies using ordinary or conditional logistic regression
- know about the advantages of using cohort data sampled as a nested case-control study or a case-cohort study
- know about methods for analysing clustered data
- know about methods to account for competing risks and recurrent events in follow-up studies
- know about the basic concepts for causal inference

**Content****:**

Repetition of logistic regression, Poisson regression, and Cox regression. Time-dependent exposure variables. Conditional logistic regression for matched case-control studies. Alternative designs of cohort studies: Nested case-control- and case-cohort studies. Analysis of correlated data. Longitudinal studies. Competing risks. Recurrent events. Introduction to causal inference.

This six-day intensive course aims at Ph.D. students in biomedical research who work in a laboratory or similar setting, performing experiments on e.g., cells, tissues, mice, or human volunteers. When participating in this course, you will get a working knowledge of statistical concepts, methods of analysis, and adequate ways of presenting statistical results, as well as hands on experience in analysing experimental data with R statistical software. We will also explain some of the most common errors biomedical researchers make in their statistical analyses. In summary, we aim at teaching you high-quality statistics suitable for research publications.

**Learning objectives:**

A student who has met the objectives of the course will be able to:

- Have a qualified discussion with a statistical consultant, e.g. on how to plan the analyses for a research project or how to answer the concerns raised by a reviewer.
- Interpret basic statistical information from research papers, e.g. descriptive statistics, effect estimates, confidence intervals and p-values.
- Apply the most frequently used statistical analyses to real life experimental data using the statistical software R (see contents section for the specific analyses taught in this course).
- Present statistical results in suitable figures, tables, and words.
- Critically assess the validity of the most frequently used statistical analyses by being aware of their modelling assumptions and limitations.

**Learning objectives:**

To enable the student to work with the statistical software R. R is a free (based on open source principles) statistical software package, which is supported by a very large international research community. The program can be used to do all types of statistical analyses. Due to the open principles and the large supporting research community the latest statistical techniques are almost always available in R long before the commercial programs (SAS, SPSS, Stata).

**Content:**

We will explain basic use of R; from data management through descriptive statistics and standard analysis to downloading and using packages with the latest statistical techniques from the web. Roughly half of the course will be spent doing exercises in the computer-rooms or using your own laptop. Main focus will be on the use of R, but there will also be time for discussing more statistical aspects of the methods. The whole course will be structured around a few real life cases.

**Learning objectives:**

The aim of the course is to make the participants able to

- use programming principles (loops and functions) to handle repetitive tasks
- use functions in R
- use loops in R
- do efficient data manipulation and aggregation

**Content:**

The course covers use of the statistical software package R. The aim is to take the intermediate R user to the next level, and make use of programming techniques for more efficient use of R. A key focus in on introducing loops and functions. The course will have four half-day lectures after which the students will work on some exercises. This will give the students a chance to use and work with different aspects of R.

**Learning objectives:**

The aim of the course is to make the participants able to

- do simple survival analyses
- critically read medical papers using survival analysis techniques
- understand and interpret the outcome of survival analyses

### **Content:**

Kaplan-Meier estimation, log-rank test, stratified analysis, Cox-regression. Censoring and truncation. Competing risks. Practical implementation of the techniques through computer labs and home assignments.

**Aim and study objectives:**

This advanced statistics course will give you and introduction to the most common repeated

measurement designs used in medical research. The aim of the course is to teach you to:

- understand and interpret the analyses of various repeated measurement designs including baseline follow-up studies, cross-over trials, and reproducibility of measurement methods, as well as analyses of clustered designs (e.g., multi-level models), and of mixed type.
- perform your own analyses using either SAS or R statistical software.
- use model diagnostics to assess the validity of your analyses.
- make suitable presentations of the results from your analyses.
- understand the statistical consequences of different kinds of study designs.

**Content:**

This course is concerned with the analysis of correlated quantitative data arising e.g., when collecting data repeatedly on the same persons, animals, or tissue over time or on different locations of the body, or when observations are clustered as from patients in a multi-center study, siblings or pups belonging to the same litter. Appropriate statistical models for analysis will be exemplified and statistical errors arising with other frequently employed analyses will be discussed. Topics include analysis of baseline follow-up studies, longitudinal data analysis, multi-level and variance component models, analysis of cross-over trials, and reproducibility of measurements methods. We will further discuss the potential biases that occur due to missing data and statistical methods for handling these. A thorough introduction to linear mixed models for quantitative outcomes will be given, while generalized linear mixed models and marginal models (aka generalized estimating equations) for the analysis of binary, ordinal, and count data are more briefly touched upon by the end of the course. Computer exercises with SAS/R statistical software will be given.

**Statistical software:**

You must bring your own laptop with either SAS or R installed (or access to SAS Studio) to participate in the exercises. Note that if you have never used SAS/R before we strongly recommend that you complete a course on SAS/R programming before attending this course.