Data Analysis in R
Online
DURATION
5 Days
LANGUAGES
English
PACE
Full time
APPLICATION DEADLINE
Request application deadline
EARLIEST START DATE
Request earliest startdate
TUITION FEES
EUR 930 *
STUDY FORMAT
Distance Learning
* for professionals I students and PhD students EUR 730 I students PhD students and employees of VU Amsterdam, Amsterdam UMC or an aurora network partner EUR 630
Introduction
With the increasing use of statistical languages like R in data analysis, now is the time to get to grips with them!
R is an open-source programming language that has become very important in data science because of its versatility in the field of statistics. R is usually used when the task requires special analysis of data for standalone or distributed computing. R is also perfect for data exploration. It can be used in any kind of analysis work, as it has many tools and is also very extensible.
This course focuses on understanding statistical models and analysing the results whilst learning to work with R. As well as introducing the software to newcomers, it presents basic and more advanced statistics.
Course Overview
- Course days: TBA
- Attendance: online only
- Course level: Master's students, PhD candidates, and professionals
- See the course curriculum
- Coordinating lecturer: Andrea Bassi
- Forms of tuition: online lectures and practical sessions
- Language of tuition: English
Please note: a student is expected to have a high level of English proficiency for this course (at least equivalent to IELTS 6.5 / Level B2)
- Forms of assessment: written assignment
- Equivalent to 3 ECTS
- Contact Hours: 35
- Self-study time: 45
- How to apply: read about our application process
Ideal Students
Who can join?
Final year Bachelor's, Master’s students, PhD candidates, and professionals who are interested in learning the basics and some more advanced skills of R and applying them to solve their data analysis problems are welcome to apply. A basic knowledge of statistics (at least an undergraduate course) is a prerequisite to apply.
Admissions
Curriculum
Students should apply for Data analysis in R to discover the enormous potential of the open-source programming language R and to develop a series of skills and tools to analyse statistical problems of a diverse nature.
Almost every major organizations and universities use R. Google not only uses R but has also written standards for the language that are widely accepted.
This course focuses on understanding statistical models and analysing the results whilst learning to work with R. As well as introducing the software to newcomers, it presents basic and more advanced statistics.
We will start with descriptive statistics and visual representation of data, which is the first step for most statistical analyses. We then introduce the linear regression model, a widely used model with two main purposes: modelling relationships among the data and predicting future observations. After that, we will extend the linear model to the generalised linear framework, to analyse non-normally distributed variables.
Each day consists of short lectures with examples and exercises in which you apply what you have learned right away. At the end of the course, there will be a written assignment which will be graded.
Continue reading below for additional course information.
Day 1: Introduction
- Introduction to R
- Data handling
- Reading/writing data files
We start with explaining the basics of the R environment, and R studio. You will learn how to work with the main data types in R: vector, factor, matrix, list and data frames. You will learn to create variables, select cases and variables, and how to use plots. Simple functions to calculate the mean and the standard deviation are introduced.
Day 2: Data & functions
- Data generation
- Data visualization
- Functions and loops
You will read a data file into R, and you will learn how to compute descriptive statistics and frequencies in R. The functions discussed last day will be applied to this survey dataset. Additionally, various loop commands that allow you to run complicated tasks on the entire dataset are discussed. We introduce vectorization as an alternative to loops. Although a loop is more intuitive, vectorization is much faster. Throughout the course, we will practice these skills in writing a function for the t-test, linear regression and the log-likelihood ratio test.
Day 3: Simple regression
- Functions and loops
- Simple linear regression
- Comparing means: T-test
We will discuss how the linear model is related to the t-test. You will learn how to interpret the results with one independent dummy or interval variable, and how you can test the assumptions of linear regression.
Day 4: Multiple regression
- Theory and assumptions of regression models
- Multiple linear regression
- Multiple linear regression
This day builds on day 3 in which we treat simple regression. The multiple regression model additionally adds the concept of ‘ceteris paribus’. We will also treat confounding and interaction effects, and when and how to use mean centering.
Day 5: Logistic regression
- χ2 (Chi-squared test)
- Logistic Regression
- Recap
We will introduce logistic regression as part of the generalized linear framework. We will calculate the odds ratio and discuss how it is related to the chi-square test and logistic regression. Furthermore, we will discuss the log-likelihood ratio test to compare two or more models.
Assignment(s) and Grading
At the end of the course, you are supposed to complete an assignment which is graded. The focus of the exercises and assignment is the coding in R and how to apply and interpret generalized linear regression models. The deadline for submitting the assignment is January, Friday 19th at 23:59 CET.
Program Outcome
By the end of the course, students will be acquainted with various popular R packages and will be able to perform different statistical analyses, write their functions and use attractive plots to present their data.
Program Tuition Fee
Program Leaders
Why study at VU Amsterdam Graduate Winter School
VU Graduate Winter School offers a selection of high-level short courses with a very specific research focus.
Our intensive winter courses are an excellent opportunity to hone your academic research skills and boost your employability. You will be joined by leading academics and other like-minded students and professionals all eager to dive into the specific research topics of each course.
Our winter school programme is completely online and delivered in English. Each course has its own schedule, details of which can be found on the overview page for each course.
Program delivery
The course will last for five days between 8-12 January and takes place from 9 AM to 1 PM (CET). During the afternoons, some exercise sessions will be scheduled (self-work with live Q&A in a virtual classroom environment).
The course will be taught through online lectures and exercise classes but students will be expected to dedicate an additional 45 hours (approx.) to self-study.