student-paired.utf8

Student’s paired t-test may be used to compare the means of two dependent samples. As for Student’s t-test for independent samples, the assumptions are that the samples are normally distributed, and have equal variances.

The function in R is t.test(x, y, paired = TRUE) where x and y are the vectors containing the samples and paired = TRUE the argument indicating that samples are dependent (note that paired = FALSE is the argument by default in t.test(), which explains why one does not need to (but can) write paired = FALSE whenever running the test on independent samples).

Let’s take an example. We have access to weather records for the period April 5-15th, 2016 at Florida, close to Bergen City Centre. Daily mean temperatures were recorded by two stations relatively close to each other. The data from the first station is provided by GFI and data from the second station is provided by Yr. Our whole data set thus consists of pairs of measurements for the same location and the same period. We may thus run Student’s paired t-test, but we still need to check for normal distribution (Shapiro-Wilk test) and equal variances (Fisher’s F test):

## 
##  Shapiro-Wilk normality test
## 
## data:  GFI
## W = 0.96474, p-value = 0.8291

## 
##  Shapiro-Wilk normality test
## 
## data:  Yr
## W = 0.97059, p-value = 0.8924

## 
##  F test to compare two variances
## 
## data:  GFI and Yr
## F = 1.1771, num df = 10, denom df = 10, p-value = 0.8016
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.3166883 4.3749041
## sample estimates:
## ratio of variances 
##           1.177065

All p-values are greater than 0.05; the samples thus have equal variances and originate from a normal distribution.

Let’s visualize the data with a boxplot:

Here, the medians are relatively close to each other but not equal, and the spread is rather similar. Let’s look at the data with a line plot:

It seems like there is a rather small, but constant gap between the lines. Is that enough to declare that the 2 samples are significantly different?

Let’s test it with Student’s paired t-test:

t.test(GFI, Yr, paired = TRUE)

## 
##  Paired t-test
## 
## data:  GFI and Yr
## t = -7.0695, df = 10, p-value = 3.418e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.5619396 -0.2926059
## sample estimates:
## mean of the differences 
##              -0.4272727

So, the test results in a p-value less than 0.05, thus indicating that the null hypothesis (the sample means are not different) may be rejected.

Note: If you wonder whether the argument paired = TRUE makes a difference or not, let’s try the test without it:

t.test(GFI, Yr)

## 
##  Welch Two Sample t-test
## 
## data:  GFI and Yr
## t = -0.7122, df = 19.869, p-value = 0.4846
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.6792380  0.8246926
## sample estimates:
## mean of x mean of y 
##  6.236364  6.663636

This time, the p-value is greater than 0.05 and H0 cannot be rejected anymore… So it is clearly important to know about the dependency of your samples; the conclusion of the test relies on that.