welcome: please sign in

Seiteninhalt hochladen

Sie können für die unten genannte Seite Inhalt hochladen. Wenn Sie den Seitennamen ändern, können Sie auch Inhalt für eine andere Seite hochladen. Wenn der Seitenname leer ist, leiten wir den Seitennamen vom Dateinamen ab.

Datei, aus der der Seiteninhalt geladen wird
Seitenname
Kommentar

Revision 8 vom 2015-05-02 09:03:10

location: RstatisTik / RstatisTikPortal / RcourSe / CourseOutline / TestsInR

Classical Tests

Exercises

Exercises - Solutions

   1 > sumdf <- data %>%
   2 +     group_by(Subject,Sex,Age_PRETEST,testid) %>%
   3 +     summarise(count=n(),
   4 +               n.corr = sum(Stim.Type=="hit"),
   5 +               perc.corr = n.corr/count,
   6 +               mean.ttime = mean(TTime),
   7 +               sd.ttime = sd(TTime),
   8 +               se.ttime = sd.ttime/sqrt(count))
   9 > head(sumdf)
  10 Subject Sex Age_PRETEST testid count n.corr perc.corr mean.ttime 
  11 1       1   f        3.11  test1    95     63 0.6631579   8621.674 
  12 2       1   f        3.11      1    60     32 0.5333333   9256.367 
  13 3       1   f        3.11      2    59     32 0.5423729   9704.712 
  14 4       1   f        3.11      3    60     38 0.6333333  14189.550 
  15 5       1   f        3.11      4    59     31 0.5254237  13049.831 
  16 6       1   f        3.11      5    59     33 0.5593220  14673.525 
  17 Variables not shown: sd.ttime se.ttime (dbl)  

four possible situations

Situation

H_0 is true

H_0 is false

Conclusion

H_0 is not rejected

Correct decision

Type II error

H_0 is rejected

Type I error

Correct decision

Common symbols

n

number of observations (sample size)

K

number of samples (each having n elements)

alpha

level of significance

nu

degrees of freedom

mu

population mean

xbar

sample mean

sigma

standard deviation (population)

s

standard deviation (sample)

rho

population correlation coefficient

r

sample correlation coefficient

Z

standard normal deviate

Alternatives

attachment:twosided.pdf

attachment:greater.pdf

attachment:less.pdf

The p-value is the probability of the sample estimate (of the respective estimator) under the null. The p-value is NOT the probability that the null is true.

Z-test for a population mean

The z-test is a something like a t-test (it is like you would know almost everything about the perfect conditions. It uses the normal distribution as test statistic and is therefore a good example. To investigate the significance of the difference between an assumed population mean

$\mu_0$

and a sample mean

$\bar{x}$

$\sigma^2$

is known.

Excercise

$$x \leq Z$$

. Change your function so that it has the p-value (for a two sided test) as result.

You can always test your function using simulated values: rnorm(100,mean=0) gives you a vector containing 100 normal distributed values with mean 0.

Z-test for a population mean - solution

Write a function which takes a vector, the population standard deviation and the population mean as arguments and which gives the Z score as result.

   1 > ztest <- function(x,x.sd,mu=0){
   2 +     sqrt(length(x)) * (mean(x)-mu)/x.sd
   3 + }
   4 > set.seed(1)
   5 > ztest(rnorm(100),x.sd = 1)
   6 [1] 1.088874

Add a line to your function that allows you to also process numeric vectors containing missing values!

   1 > ztest <- function(x,x.sd,mu=0){
   2 +     x <- x[!is.na(x)]
   3 +     if(length(x) < 3) stop("too few values in x")
   4 +     sqrt(length(x)) * (mean(x)-mu)/x.sd
   5 + }

The function pnorm(Z) gives the probability of

$$x \leq Z$$

. Change your function so that it has the p-value (for a two sided test) as result.

   1 > ztest <- function(x,x.sd,mu=0){
   2 +     x <- x[!is.na(x)]
   3 +     if(length(x) < 3) stop("too few values in x")
   4 +     z <- sqrt(length(x)) * (mean(x)-mu)/x.sd
   5 +     2*pnorm(-abs(z))
   6 + }
   7 > set.seed(1)
   8 > ztest(rnorm(100),x.sd = 1)
   9 [1] 0.2762096

Now let the result be a named vector containing the estimated difference, Z, p and the n.

   1 > ztest <- function(x,x.sd,mu=0){
   2 +     x <- x[!is.na(x)]
   3 +     if(length(x) < 3) stop("too few values in x")
   4 +     est.diff <- mean(x)-mu
   5 +     z <- sqrt(length(x)) * (est.diff)/x.sd
   6 +     round(c(diff=est.diff,Z=z,pval=2*pnorm(-abs(z)),n=length(x)),4)
   7 + }
   8 > set.seed(1)
   9 > ztest(rnorm(100),x.sd = 1)
  10 diff        Z     pval        n 

Requirements

To investigate the statistical significance of the difference between an assumed population mean

$\mu_0$

and a sample mean

$\bar{x}$

. There is a function z.test() in the BSDA package

$\sigma^2$

is known.

Simulation Exercises

Simulation Exercises -- Solutions

   1 > ztest(rnorm(100,mean=10,sd=2),x.sd=2,mu=10)["pval"]
   2 pval 
   3 > ztest(rnorm(100,mean=10,sd=2),x.sd=2,mu=10)["diff"]
   4 diff 
   5 > ztest(rnorm(100,mean=10,sd=2),x.sd=2,mu=10)[c("pval","diff")]
   6 pval   diff 

using replicate()

   1 > res <- replicate(1000, ztest(rnorm(100,mean=10,sd=2),x.sd=2,mu=10))
   2 > res <- as.data.frame(t(res))
   3 > head(res)
   4 diff       Z   pval   n
   5 1 -0.2834 -1.4170 0.1565 100
   6 2  0.2540  1.2698 0.2042 100
   7 3 -0.1915 -0.9576 0.3383 100
   8 4  0.1462  0.7312 0.4646 100
   9 5  0.1122  0.5612 0.5747 100
  10 6 -0.0141 -0.0706 0.9437 100

using replicate() II

   1 > res <- replicate(1000, ztest(rnorm(100,mean=10,sd=2),x.sd=2,mu=10),
   2 +                        simplify = F)
   3 > res <- as.data.frame(Reduce(rbind,res))
   4 > head(res)
   5 diff       Z   pval   n
   6 init -0.0175 -0.0874 0.9304 100

using for()

   1 > res <- matrix(numeric(2000),ncol=2)
   2 > for(i in seq.int(1000)){
   3 +     res[i,] <- ztest(rnorm(100,mean=10,sd=2),x.sd=2,mu=10)[c("pval","diff")] }
   4 > res <- as.data.frame(res)
   5 > names(res) <- c("pval","diff")
   6 > head(res)
   7 pval    diff
   8 1 0.0591 -0.3775
   9 2 0.2466  0.2317
  10 3 0.6368  0.0944
  11 4 0.5538 -0.1184
  12 5 0.9897 -0.0026
  13 6 0.7748  0.0572

   1 > table(res$pval < 0.05)
   2 FALSE  TRUE 
   3 960    40 
   4 > tapply(abs(res$diff),res$pval < 0.05,summary)
   5 > min(abs(res$diff[res$pval<0.05]))
   6 [1] 0.3928

   1 > res2 <- replicate(1000, ztest(rnorm(1000,mean=10,sd=2),
   2 +                              x.sd=2,mu=10))
   3 > res2 <- as.data.frame(t(res2))
   4 > head(res2)
   5 diff       Z   pval    n
   6 1 -0.0731 -1.1559 0.2477 1000
   7 2  0.0018  0.0292 0.9767 1000
   8 3  0.0072  0.1144 0.9089 1000
   9 4 -0.1145 -1.8100 0.0703 1000
  10 5 -0.1719 -2.7183 0.0066 1000
  11 6  0.0880  1.3916 0.1640 1000

   1 > table(res2$pval < 0.05)
   2 FALSE  TRUE 
   3 946    54 
   4 > tapply(abs(res2$diff),res$pval < 0.05,summary)

Simulation Exercises Part II

Simulation Exercises -- Solutions

   1 > res <- rbind(res,res2)  
   2 > require(ggplot2)
   3 > ggplot(res,aes(x=pval)) +
   4 +     geom_histogram(bin=0.1,fill="forestgreen") +
   5 +     facet_grid(~ n)
   6 > ggsave("hist.png")

attachment:hist.png

   1 > ggplot(res,aes(x=diff,colour=factor(n))) +
   2 +     geom_density(size=3)
   3 > ggsave("dens.png")

Simulation Exercises -- Solutions

attachment:dens.png

attachment:point.png

attachment:dens2d.png

   1 > set.seed(1)
   2 > x <- rnorm(12)
   3 > t.test(x,mu=0) ## population mean 0
   4 t = 1.1478, df = 11, p-value = 0.2754
   5 alternative hypothesis: true mean is not equal to 0
   6 95 percent confidence interval:
   7 sample estimates:
   8 mean of x 

   1 > t.test(x,mu=1) ## population mean 1
   2 t = -3.125, df = 11, p-value = 0.009664
   3 alternative hypothesis: true mean is not equal to 1
   4 95 percent confidence interval:
   5 sample estimates:
   6 mean of x 

t-tests

A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported.

$$t = \frac{\bar{x}-\mu_0}{s/\sqrt{n}}$$

where

$\bar{x}$

is the sample mean, s is the sample standard deviation and n is the sample size. The degrees of freedom used in this test is n-1

   1 > set.seed(1)
   2 > x <- rnorm(12)
   3 > t.test(x,mu=0) ## population mean 0
   4 
   5         One Sample t-test
   6 
   7 data:  x
   8 t = 1.1478, df = 11, p-value = 0.2754
   9 alternative hypothesis: true mean is not equal to 0
  10 95 percent confidence interval:
  11  -0.2464740  0.7837494
  12 sample estimates:
  13 mean of x 
  14 0.2686377 
  15 
  16 > t.test(x,mu=1) ## population mean 1
  17 
  18         One Sample t-test
  19 
  20 data:  x
  21 t = -3.125, df = 11, p-value = 0.009664
  22 alternative hypothesis: true mean is not equal to 1
  23 95 percent confidence interval:
  24  -0.2464740  0.7837494
  25 sample estimates:
  26 mean of x 
  27 0.2686377 

Two Sample t-tests

There are two ways to perform a two sample t-test in R:

t.test(x $\sim$ g)

(read: x dependend on g)

Two Sample t-tests: two vector syntax

   1 > set.seed(1)
   2 > x <- rnorm(12)
   3 > y <- rnorm(12)
   4 > g <- sample(c("A","B"),12,replace = T)
   5 > t.test(x,y)
   6 t = 0.5939, df = 20.012, p-value = 0.5592
   7 alternative hypothesis: true difference in means is not equal to 0
   8 95 percent confidence interval:
   9 sample estimates:
  10 mean of x  mean of y 

Two Sample t-tests: formula syntax

   1 > t.test(x ~ g)
   2 t = -0.6644, df = 6.352, p-value = 0.5298
   3 alternative hypothesis: true difference in means is not equal to 0
   4 95 percent confidence interval:
   5 sample estimates:
   6 mean in group A mean in group B 

Welch/Satterthwaite vs. Student

Student's t-test

   1 > t.test(x, y, var.equal = T)
   2 t = 0.5939, df = 22, p-value = 0.5586
   3 alternative hypothesis: true difference in means is not equal to 0
   4 95 percent confidence interval:
   5 sample estimates:
   6 mean of x  mean of y 

Requirements

Exercises

   1 tob <- t.test(x$TTime ~ x$Stim.Type)
   2 tmp <- data.frame(
   3 Subject = unique(x$Subject),
   4 testid = unique(x$testid),
   5 pval = tob$p.value,
   6 alternative = tob$alternative,
   7 res <- Reduce(rbind,tmp.l)

Solutions

   1 > t.test(data$TTime ~ data$Stim.Type)
   2 t = -6.3567, df = 9541.891, p-value = 2.156e-10
   3 alternative hypothesis: true difference in means is not equal to 0
   4 95 percent confidence interval:
   5 sample estimates:
   6 mean in group hit mean in group incorrect 
   7 > ggplot(data,aes(x=Stim.Type,y=TTime)) +
   8 +     geom_boxplot()

   1 > t.test(data$TTime[data$Subject==1 & data$testid=="test1"] ~
   2 +        data$Stim.Type[data$Subject==1 & data$testid=="test1"])
   3 t = -0.5846, df = 44.183, p-value = 0.5618
   4 alternative hypothesis: true difference in means is not equal to 0
   5 95 percent confidence interval:
   6 sample estimates:
   7 mean in group hit mean in group incorrect 
   8 > t.test(data$TTime[data$Subject==1 & data$testid=="test2"] ~
   9 +        data$Stim.Type[data$Subject==1 & data$testid=="test2"])
  10 t = -1.7694, df = 47.022, p-value = 0.08332
  11 alternative hypothesis: true difference in means is not equal to 0
  12 95 percent confidence interval:
  13 sample estimates:
  14 mean in group hit mean in group incorrect 

   1 > ggplot(data,aes(x=testid,y=TTime)) +
   2 +    geom_boxplot(aes(fill=Stim.Type)) +
   3 +    facet_wrap(~Subject)
   4 > ggplot(data,aes(x=factor(Subject),y=TTime)) +
   5 +    geom_boxplot(aes(fill=Stim.Type)) +
   6 +    facet_wrap(~testid)

   1 > table(res$pval < 0.05)
   2 FALSE  TRUE 
   3 165    22 
   4 > prop.table(table(res$pval < 0.05))
   5 FALSE      TRUE 
   6 > 

Exercises - Solutions

   1 tob <- t.test(x$TTime ~ x$Stim.Type)
   2 tmp <- data.frame(
   3 Subject = unique(x$Subject),
   4 testid = unique(x$testid),
   5 pval = tob$p.value,
   6 alternative = tob$alternative,
   7 res <- Reduce(rbind,tmp.l)