welcome: please sign in

Seiteninhalt hochladen

Sie können für die unten genannte Seite Inhalt hochladen. Wenn Sie den Seitennamen ändern, können Sie auch Inhalt für eine andere Seite hochladen. Wenn der Seitenname leer ist, leiten wir den Seitennamen vom Dateinamen ab.

Datei, aus der der Seiteninhalt geladen wird
Seitenname
Kommentar

Revision 2 vom 2015-05-01 08:25:45

location: RstatisTik / RstatisTikPortal / RcourSe / CourseOutline / FunctionsInR / ApplyR

Introduction

Implicit Loops

A common application of loops is to apply a function to each element of a set of values and collect the results in a single structure. In R this is done by the functions:

lapply()

   1 > lapply(dat,mean)
   2 [1] 6753.636
   3 [1] 5433.182
   4 > sapply(dat,mean)

apply()

   1 > x<-1:12
   2 > dim(x)<-c(2,2,3)
   3 > apply(x,3,quantile) ## calculate the quantiles 

tapply()

For example in the quine data frame, we can summarize Days classify by Eth and Lrn as follows:

   1 > tapply(Days,list(Eth,Lrn),mean)
   2 AL       SL
   3 A 18.57500 24.89655
   4 N 13.25581 10.82353

* the class() function shows the class of an object use it in combination with lapply() to get the classes of the columns of the quine data frame * do the same with sapply() what is the difference * try to combine this with what you learned about indexing and create a new data frame quine2 only containing the columns which are factors * calculate the row and column means of the below defined matrix m using the apply function PS: in real life application use the rowMeans() and colMeans() function

   1 m <- matrix(rnorm(100),nrow=10)  

* use tapply() to summarise the number of missing days at school per Ethnicity and/or per Sex (three lines) * sometimes the aggregate() function is more convenient; note the use of  #!latex $\sim$; it is read as 'is dependent on'and it is extensively used in modelling

   1 > aggregate(Days ~ Sex + Eth, data=quine,mean)
   2 Sex Eth     Days
   3 1   F   A 20.92105
   4 2   M   A 21.61290
   5 3   F   N 10.07143
   6 4   M   N 14.71429
   7 > aggregate(Days ~ Sex + Eth, data=quine,summary)
   8 Sex Eth Days.Min. Days.1st Qu. Days.Median Days.Mean Days.3rd Qu. Days.Max.
   9 1   F   A      0.00         5.25       13.50     20.92        30.25     81.00
  10 2   M   A      2.00         9.50       16.00     21.61        33.00     57.00
  11 3   F   N      0.00         5.00        7.00     10.07        14.00     37.00
  12 4   M   N      0.00         3.50        8.00     14.71        19.50     69.00  

Functions

Every function in R has three important characteristics:

You can see all three parts if you type the name of the function without primitives. Exceptions are brackets. Primitive functions, like sum(), call C code directly with .Primitive() and contain no R code. Therefore their formals(), body(), and environment() are all NULL.

Functions

   1 > chisq.test
   2 function (x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), 
   3 DNAME <- deparse(substitute(x))
   4 if (is.data.frame(x)) 
   5 expected = E, residuals = (x - E)/sqrt(E), stdres = (x - 
   6 > sum
   7 function (..., na.rm = FALSE)  .Primitive("sum")

Function Arguments

Arguments are matched

By default, R function arguments are lazy, they are only evaluated if they are actually used:

   1 > f <- function(x) {
   2 f <- function(x) {
   3 +   10
   4 + }
   5 > f(stop("This is an error!"))
   6 [1] 10
   7 > 

Function Exercises (Verzani)

* Write a function to compute the average distance from the mean for some data vector. * Write a function f() which finds the average of the x values after squaring and substracts the square of the average of the numbers. Verify this output will always be non-negative by computing \texttt{f(1:10)} * An integer is even if the remainder upon dividing it by 2 is 0. This remainder is given by R with the syntax \texttt{ x \%\% 2}. Use this to write a function iseven(). How would you write isodd()? * Write a function isprime() that checks if a number x is prime by dividing x by all values \texttt{$2,\ldots,x-1}}}} then checking to see if there is a remainder of 0.

Function Exercises (Verzani)

   1 > avg.dist <- function(x){
   2 +     xbar <- mean(x)
   3 +     mean(abs(x-xbar))
   4 + }  

Function Exercises (Verzani)

   1 > f <- function(x){
   2 +     mean(x**2) - mean(x)**2
   3 + }
   4 > f(1:10)
   5 [1] 8.25  

Function Exercises (Verzani)

   1 > iseven <- function(x){
   2 +     x %% 2 == 0
   3 + }
   4 > iseven(1:10)
   5 [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
   6 > isodd <- function(x){
   7 +     !iseven(x)
   8 + }
   9 > isodd(1:10)
  10 [1]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE

Function Exercises (Verzani)

   1 > isprime <- function(x){
   2 +     if(x == 2) return(TRUE)
   3 +     !(0 %in% (x %% (2:(x-1))))
   4 + }
   5 > isprime(2)
   6 [1] TRUE
   7 > isprime(5)
   8 [1] TRUE
   9 > isprime(15)
  10 [1] FALSE  

Read in the file

   1 > file <- "../session1/session1data/pre001.txt"
   2 > skip <- 3
   3 > tmp <- read.table(file,skip = skip,sep = "\t",
   4 +                   header=T,na.strings = c(" +",""),
   5 +                   fill=T)

Remove empty line

   1 > tmp <- tmp[!is.na(tmp$Subject),] 

Remove spaces

Remove unnecessary spaces from character vectors/factors

   1 > tmp <- lapply(tmp,function(x) {
   2 +         if( class(x) %in% c("character","factor") ){
   3 +             x <- factor(gsub(" ","",as.character(x)))
   4 +             return(x)}else{ return(x) }})
   5 > tmp <- as.data.frame(tmp)

Find/Remove breaks

   1 > if(length(pause)>0){
   2 +     drei <- which(tmp$Code==3 & !is.na(tmp$Code))
   3 +     drei <- drei[drei > pause][1:2]
   4 +     if(pause + 1 < drei[1]){
   5 +         tmp <- tmp[-(pause:drei[2]),]
   6 +     }}
   7 > tmp <- tmp[!(tmp$Event.Type %in% c("Pause","Resume")), ]

Find/Remove first/last rows

   1 > first.pic <- min(which(tmp$Event.Type=="Picture" & 
   2 +                           !is.na(tmp$Event.Type) )) - 1 
   3 > tmp <- tmp[-(1:first.pic),]
   4 > last.pic <- min(which(tmp$Event.Type=="Picture" & 
   5 +                           !is.na(tmp$Event.Type) &
   6 +                           tmp$Code=="Fertig!" & 
   7 +                           !is.na(tmp$Code)))
   8 > tmp <- tmp[-(last.pic:nrow(tmp)),]

Extract Responses

   1 > zeilen <- which(tmp$Event.Type %in% c("Response"))
   2 > zeilen <- sort(unique(c(zeilen,zeilen-1)))
   3 > zeilen <- zeilen[zeilen>0]
   4 > tmp <- tmp[zeilen,]

Extract Responses

   1 > responses <- which(tmp$Code %in% c(1,2))
   2 > events <- responses-1
   3 > tmp$Type <- NA
   4 > tmp$Type[responses] <- as.character(tmp$Event.Type[events])
   5 > head(tmp)
   6 Subject Trial Event.Type     Code   Time TTime Uncertainty Duration
   7 6   PRE001     7    Picture RO09.jpg 168954     0           1    10197
   8 7   PRE001     7   Response        2 178963 10009           1       NA
   9 11  PRE001    12    Picture RO20.jpg 230338     0           1     8398
  10 12  PRE001    12   Response        1 238680  8342           1       NA
  11 16  PRE001    17    Picture RS28.jpg 289723     0           1     8198
  12 17  PRE001    17   Response        2 297789  8066           1       NA
  13 6              2       0   next incorrect          7    <NA>
  14 7             NA      NA   <NA>      <NA>         NA Picture
  15 11             2       0   next incorrect         12    <NA>
  16 12            NA      NA   <NA>      <NA>         NA Picture
  17 16             2       0   next       hit         17    <NA>
  18 17            NA      NA   <NA>      <NA>         NA Picture

Moving Information

Moving all (necessary) information to the response lines.

   1 > tmp$Event.Code <- NA
   2 > tmp$Event.Code[responses] <- as.character(tmp$Code[events])
   3 > tmp$Stim.Type[responses] <- as.character(tmp$Stim.Type[events])
   4 > tmp$Duration[responses] <- as.character(tmp$Duration[events])
   5 > tmp$Uncertainty.1[responses] <- as.character(tmp$Uncertainty.1[events])
   6 > tmp$ReqTime[responses] <- as.character(tmp$ReqTime[events])
   7 > tmp$ReqDur[responses] <- as.character(tmp$ReqDur[events])
   8 > tmp$Pair.Index[responses] <- as.character(tmp$Pair.Index[events])
   9 > tmp$Stim.Type[responses] <- as.character(tmp$Stim.Type[events])

Moving Information

   1 > head(tmp)
   2 Subject Trial Event.Type     Code   Time TTime Uncertainty Duration
   3 6   PRE001     7    Picture RO09.jpg 168954     0           1    10197
   4 7   PRE001     7   Response        2 178963 10009           1    10197
   5 11  PRE001    12    Picture RO20.jpg 230338     0           1     8398
   6 12  PRE001    12   Response        1 238680  8342           1     8398
   7 16  PRE001    17    Picture RS28.jpg 289723     0           1     8198
   8 17  PRE001    17   Response        2 297789  8066           1     8198
   9 6              2       0   next incorrect          7    <NA>       <NA>
  10 7              2       0   next incorrect          7 Picture   RO09.jpg
  11 11             2       0   next incorrect         12    <NA>       <NA>
  12 12             2       0   next incorrect         12 Picture   RO20.jpg
  13 16             2       0   next       hit         17    <NA>       <NA>
  14 17             2       0   next       hit         17 Picture   RS28.jpg

Keep response lines

   1 > tmp <- tmp[tmp$Event.Type=="Response" & !is.na(tmp$Type),]
   2 > tmp <- tmp[tmp$Type=="Picture" & !is.na(tmp$Type),]
   3 > head(tmp)
   4 Subject Trial Event.Type Code   Time TTime Uncertainty Duration
   5 7   PRE001     7   Response    2 178963 10009           1    10197
   6 12  PRE001    12   Response    1 238680  8342           1     8398
   7 17  PRE001    17   Response    2 297789  8066           1     8198
   8 22  PRE001    22   Response    1 351321 10811           1    10997
   9 27  PRE001    27   Response    2 403607   713           1      800
  10 32  PRE001    32   Response    1 467793 23709           1    23794
  11 7              2       0   next incorrect          7 Picture   RO09.jpg
  12 12             2       0   next incorrect         12 Picture   RO20.jpg
  13 17             2       0   next       hit         17 Picture   RS28.jpg
  14 22             2       0   next       hit         22 Picture   AT26.jpg
  15 27             2       0   next       hit         27 Picture   RS23.jpg
  16 32             2       0   next       hit         32 Picture   OF04.jpg

The Function

   1 tmp <- read.table(file,skip = skip,sep = "\t",

The Function (continued)

   1 tmp <- tmp[!is.na(tmp$Subject),] 
   2 tmp <- lapply(tmp,function(x) {
   3 x <- factor(gsub(" ","",as.character(x)))
   4 tmp <- as.data.frame(tmp)

The Function (continued)

   1 pause <- which(tmp$Event.Type=="Picture" & tmp$Code=="Pause")
   2 drei <- which(tmp$Code==3 & !is.na(tmp$Code))
   3 drei <- drei[drei > pause][1:2]
   4 tmp <- tmp[-(pause:drei[2]),]
   5 tmp <- tmp[!(tmp$Event.Type %in% c("Pause","Resume")), ]

The Function (continued)

   1 tmp <- tmp[-(1:first.pic),]
   2 tmp <- tmp[-(last.pic:nrow(tmp)),]

The Function (continued)

   1 zeilen <- which(tmp$Event.Type %in% c("Response"))
   2 zeilen <- sort(unique(c(zeilen,zeilen-1)))
   3 zeilen <- zeilen[zeilen>0]
   4 tmp <- tmp[zeilen,]
   5 responses <- which(tmp$Code %in% c(1,2))
   6 events <- responses-1

The Function (continued)

   1 tmp <- tmp[tmp$Event.Type=="Response" & !is.na(tmp$Type),]
   2 tmp <- tmp[tmp$Type=="Picture" & !is.na(tmp$Type),]

The Function (continued)

The Function Exercise

* run the function using source() * use the function to read in \texttt{../session1/session1data/pre001.txt} and \texttt{data/pretest/pre\_001.txt} * use some summary functions like table() or summary to check if they contain the same information We will learn about a function to compare data frames more exact soon.

The Function (continued)

   1 > file <- "../session1/session1data/pre001.txt"
   2 > pre1 <- read.file(file,skip=3)
   3 [1] "read ../session1/session1data/pre001.txt"
   4 > file <- "data/pretest/pre_001.txt"
   5 > pre1v2 <- read.file(file,skip=0)
   6 [1] "read ../session2/data/pretest/pre_001.txt"

rbind()

   1 > x <- data.frame(id=1:3,score=rnorm(3))
   2 > y <- data.frame(id=13:15,score=rnorm(3))
   3 > rbind(x,y)
   4 id       score
   5 1  1  0.71121163
   6 2  2 -0.62973249
   7 3  3  1.17737595
   8 4 13 -0.45074940
   9 5 14 -0.01044197
  10 6 15 -1.05217176

cbind()

   1 > cbind(x,y)
   2 id      score1      score2     score3
   3 1  1  0.11440705  0.14536778 -1.1773241
   4 2  2 -1.62862651  0.02020604  0.5686415
   5 3  3  0.05335811  0.25462270  0.8844987
   6 4  4 -0.19931734  0.15625511  0.9287316
   7 5  5 -1.15217836 -1.79804503 -0.7550234

merge()

   1 > (d1 <- data.frame(id=LETTERS[c(1,2,3)],day1=sample(10,3)))
   2 id day1
   3 1  A    3
   4 2  B    4
   5 3  C    5
   6 > (d2 <- data.frame(id=LETTERS[c(1,3,5,6)],day2=sample(10,4)))
   7 id day2
   8 1  A    7
   9 2  C   10
  10 3  E    3
  11 4  F    6

inner join

   1 > merge(d1,d2)
   2 id day1 day2
   3 1  A    3    7
   4 2  C    5   10

left outer join

   1 > merge(d1,d2,all.x = T)
   2 id day1 day2
   3 1  A    3    7
   4 2  B    4   NA
   5 3  C    5   10

right outer join

   1 > merge(d1,d2,all.y = T)
   2 id day1 day2
   3 1  A    3    7
   4 2  C    5   10
   5 3  E   NA    3
   6 4  F   NA    6

full outer join

   1 > merge(d1,d2,all = T)
   2 id day1 day2
   3 1  A    3    7
   4 2  B    4   NA
   5 3  C    5   10
   6 4  E   NA    3
   7 5  F   NA    6

merge()

merge() Exercise

merge() Exercise

   1 > persdat <- read.table("../session1/session1data/personendaten.txt",
   2 +                       sep="\t",
   3 +                       header=T)
   4 > pre1 <- merge(persdat,pre1,all.y = T)
   5 > head(pre1)
   6 Subject Sex Age_PRETEST Trial Event.Type Code   Time TTime Uncertainty
   7 1  PRE001   f        3.11     7   Response    2 178963 10009           1
   8 2  PRE001   f        3.11    12   Response    1 238680  8342           1
   9 3  PRE001   f        3.11    17   Response    2 297789  8066           1
  10 4  PRE001   f        3.11    22   Response    1 351321 10811           1
  11 5  PRE001   f        3.11    27   Response    2 403607   713           1
  12 6  PRE001   f        3.11    32   Response    1 467793 23709           1
  13 Duration Uncertainty.1 ReqTime ReqDur Stim.Type Pair.Index    Type Event.Code
  14 1    10197             2       0   next incorrect          7 Picture   RO09.jpg
  15 2     8398             2       0   next incorrect         12 Picture   RO20.jpg
  16 3     8198             2       0   next       hit         17 Picture   RS28.jpg
  17 4    10997             2       0   next       hit         22 Picture   AT26.jpg
  18 5      800             2       0   next       hit         27 Picture   RS23.jpg
  19 6    23794             2       0   next       hit         32 Picture   OF04.jpg

Reduce()

Reduce()

Reduce()

   1 > (d1 <- data.frame(id=LETTERS[c(1,2,3)],day1=sample(10,3)))
   2 id day1
   3 1  A    3
   4 2  B    1
   5 3  C    7
   6 > (d2 <- data.frame(id=LETTERS[c(1,3,5,6)],day2=sample(10,4)))
   7 id day2
   8 1  A    8
   9 2  C    2
  10 3  E    5
  11 4  F    3
  12 > (d3 <- data.frame(id=LETTERS[c(2,4:6)],day3=sample(10,4)))
  13 id day3
  14 1  B    8
  15 2  D    3
  16 3  E    4
  17 4  F   10
  18 > (d4 <- data.frame(id=LETTERS[c(1:5)],day4=sample(10,5)))
  19 id day4
  20 1  A    2
  21 2  B    7
  22 3  C    8
  23 4  D    9
  24 5  E    1

Reduce()

   1 > Reduce(merge,list(d1,d2,d3,d4))
   2 [1] id   day1 day2 day3 day4

Reduce()

   1 > Reduce(function(x,y) { merge(x,y, all=T) },
   2 +        list(d1,d2,d3,d4))
   3 id day1 day2 day3 day4
   4 1  A    3    8   NA    2
   5 2  B    1   NA    8    7
   6 3  C    7    2   NA    8
   7 4  E   NA    5    4    1
   8 5  F   NA    3   10   NA
   9 6  D   NA   NA    3    9

Reduce()

   1 > d4$day <- names(d4)[2]
   2 > names(d4)[2] <- "score"
   3 > Reduce(function(x,y) { y$day <- names(y)[2]
   4 +                        names(y)[2] <- "score"
   5 +                        rbind(x,y) } ,
   6 +        list(d1,d2,d3), init = d4)
   7 id score  day
   8 1   A     2 day4
   9 2   B     7 day4
  10 3   C     8 day4
  11 4   D     9 day4

A second function

dir()

   1 > dir()
   2 [1] "data"                  "function.r"            "function.r~"          
   3 [4] "ggp1.pdf"              "graphics.r"            "linkimage.aux"        
   4 [7] "session2apply.aux"     "session2apply.log"     "session2apply.nav"    
   5 [10] "session2apply.out"     "session2apply.pdf"     "session2apply.snm"    
   6 [13] "session2apply.tex"     "session2apply.tex~"    "#session2apply.tex#"  
   7 [16] "session2apply.toc"     "session2apply.vrb"     "session2hadley.aux"   
   8 [19] "session2hadley.log"    "session2hadley.nav"    "session2hadley.out"   
   9 [22] "session2hadley.pdf"    "session2hadley.snm"    "session2hadley.tex"   
  10 [25] "session2hadley.tex~"   "session2hadley.toc"    "session2hadley.vrb"   
  11 [28] "solutionssession1.r"   "solutionssession1.r~"  "solutionssession2.r"  
  12 [31] "solutionssession2.r~"  "#solutionssession2.r#"

dir()

   1 > dir("data")
   2 [1] "posttest"   "pretest"    "training_1" "training_2" "training_3"
   3 [6] "training_4" "training_5" "training_6" "training_7" "training_8"

dir()

   1 > dir("data",recursive = T)
   2 [1] "posttest/post_001.txt"      "posttest/post_002.txt"     
   3 [3] "posttest/post_003.txt"      "posttest/post_004.txt"     
   4 [5] "posttest/post_005.txt"      "posttest/post_006.txt"     
   5 [7] "posttest/post_007.txt"      "posttest/post_008.txt"     

dir()

   1 > dir("data",recursive = T, full.names = T)
   2 [1] "data/posttest/post_001.txt"      "data/posttest/post_002.txt"     
   3 [3] "data/posttest/post_003.txt"      "data/posttest/post_004.txt"     
   4 [5] "data/posttest/post_005.txt"      "data/posttest/post_006.txt"     

dir()

   1 > dir(pattern = "\\.r$")
   2 [1] "function.r"          "graphics.r"          "solutionssession1.r"
   3 [4] "solutionssession2.r"

dir() Exercise

dir() Exercise

   1 > dir("data",full.names = T, recursive = T,pattern = "txt$"
   2 + )
   3 [1] "data/posttest/post_001.txt"    "data/posttest/post_002.txt"   
   4 [3] "data/posttest/post_003.txt"    "data/posttest/post_004.txt"   
   5 [5] "data/posttest/post_005.txt"    "data/posttest/post_006.txt"   
   6 > files <- dir("data",full.names = T, recursive = T,pattern = "txt$")

Read all files

Now we use lapply() and our function read.file() to read all files in files

   1 > df.list <- lapply(files,read.file,skip=0)
   2 [1] "read data/posttest/post_001.txt"
   3 [1] "read data/posttest/post_002.txt"
   4 [1] "read data/posttest/post_003.txt"
   5 [1] "read data/posttest/post_004.txt"
   6 [1] "read data/posttest/post_005.txt"

Reading all files

   1 > sapply(df.list,class)
   2 [1] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
   3 [6] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
   4 [11] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
   5 [16] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
   6 [21] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
   7 [26] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
   8 [31] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
   9 [36] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
  10 [41] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
  11 [46] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
  12 [51] "data.frame" "NULL"       "data.frame" "data.frame" "data.frame"
  13 [56] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
  14 [61] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
  15 [66] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
  16 [71] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"

The Function 2

   1 > data <- Reduce(rbind,df.list)
   2 > nrow(data)
   3 [1] 12704
   4 > table(data$Subject)
   5 001_test2 002_test2 003_test2 004_test2 005_test2 006_test2 007_test2 008_test2 
   6 93        91        96        93        95        95        93        96 
   7 009_test2 010_test2 011_test2 012_test2 013_test2 014_test2 015_test2 016_test2 
   8 92        94        95        96        96        95        96        94 
   9 017_test2 018_test2 019_test2 020_test2 001_test1 002_test1 003_test1 004_test1 
  10 95        94        96        95        95        95        96        94 
  11 005_test1 006_test1 007_test1 008_test1 009_test1 010_test1 011_test1 012_test1 
  12 96        95        94        90        96        95        91        96 
  13 013_test1 014_test1 015_test1 016_test1 017_test1 018_test1 019_test1 020_test1 
  14 95        96        95        91        96        96        96        96 
  15 001_1     002_1     003_1     004_1     005_1     006_1    CHGU_1    008_1a 
  16 60        59        60        54        60        59        60        60 
  17 009_1     010_1     RMK_1     013_1     014_1     015_1     016_1    IJ2K_1 
  18 60        60        59        59        60        58        59        58 
  19 018_1     019_1     020_1     001_2     002_2     003_2     004_2     005_2 
  20 60        59        60        59        59        57        58        57 
  21 006_2     007_2     008_2     009_2     010_2     011_2     012_2     013_2 
  22 58        58        54        58        58        59        59        56 
  23 014_2     015_2     016_2     017_2     018_2     019_2     020_2     001_3 

The Function no 2

   1 > read.files <- function(filesdir,skip=3,recursive=F,pattern="."){
   2 +     files <- dir(filesdir,
   3 +                  full.names = T,
   4 +                  recursive = recursive,
   5 +                  pattern = pattern)
   6 +     Reduce(rbind,lapply(files,read.file,skip=skip))}
   7 > data <- read.files("data",recursive = T,skip=0,pattern = "\\.txt$")
   8 [1] "read data/posttest/post_001.txt"
   9 [1] "read data/posttest/post_002.txt"
  10 [1] "read data/posttest/post_003.txt"
  11 [1] "read data/posttest/post_004.txt"
  12 [1] "read data/posttest/post_005.txt"

The Subject column

The Subject column

   1 > table(data$Subject)
   2 001_test2 002_test2 003_test2 004_test2 005_test2 006_test2 007_test2 008_test2 
   3 93        91        96        93        95        95        93        96 
   4 009_test2 010_test2 011_test2 012_test2 013_test2 014_test2 015_test2 016_test2 
   5 92        94        95        96        96        95        96        94 
   6 017_test2 018_test2 019_test2 020_test2 001_test1 002_test1 003_test1 004_test1 
   7 95        94        96        95        95        95        96        94 

The Subject column

   1 > data$persid <- sapply(data$Subject,function(x)
   2 +     str_split(x,pattern = "_")[[1]][1])
   3 > data$testid <- sapply(data$Subject,function(x)
   4 +     str_split(x,pattern = "_")[[1]][2])
   5 > data$persid[data$persid=="CHGU"] <- "007"

The Subject column Exercises

The Subject column Exercises

   1 > data$persid[data$persid=="RMK"] <- "011"
   2 > data$persid[data$persid=="IJ2K"] <- "017"
   3 > data$persid[data$persid=="GA3K"] <- "004"
   4 > data$persid[data$persid=="Kj6K"] <- "006"

Merging

* now read in the file subjectsdemographics.txt using the appropriate command * join the demographics with our data data frame (there is a little problem left - compare the persid and Subject columns)

The Subject column Exercises

   1 > persdat <- read.table("data/subjectdemographics.txt",
   2 +                       sep="\t",
   3 +                       header=T)
   4 > persdat$Subject
   5 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
   6 > unique(data$persid)
   7 [1] "001" "002" "003" "004" "005" "006" "007" "008" "009" "010" "011" "012"
   8 [13] "013" "014" "015" "016" "017" "018" "019" "020"
   9 > data$persid <- as.numeric(data$persid)
  10 > unique(data$persid)
  11 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
  12 > data <- merge(persdat,data,by.x = "Subject",by.y = "persid",all=T)
  13 In merge.data.frame(persdat, data, by.x = "Subject", by.y = "persid",  :
  14 column nameSubjectis duplicated in the result
  15 > head(data)
  16 Subject Sex Age_PRETEST   Subject Trial Event.Type Code   Time TTime
  17 1       1   f        3.11 001_test2     7   Response    2 103745  2575
  18 2       1   f        3.11 001_test2    12   Response    2 156493  2737
  19 3       1   f        3.11 001_test2    17   Response    2 214772  6630
  20 4       1   f        3.11 001_test2    22   Response    1 262086  5957
  21 5       1   f        3.11 001_test2    27   Response    2 302589   272
  22 6       1   f        3.11 001_test2    32   Response    1 352703  7197

Summary Graphics

Just run the code and try to understand it. We will cover the ggplot graphics in the next session.

   1 > ggplot(data,aes(x=factor(Subject),fill=..count..)) +
   2 +     geom_bar() +
   3 +     facet_wrap(~testid)

<img alt='sesssion2/graph1.png' src='-1' />

Summary Graphics

   1 > data$testid <- str_replace(data$testid,"[a-z]$","")
   2 > data$testid <- factor(data$testid,
   3 +                       levels=c("test1","1","2","3","4","5","6","7","8","test2"))
   4 > table(data$Subject,data$testid)
   5 test1  1  2  3  4  5  6  7  8 test2
   6 1     95 60 59 60 59 59 60 60 60    93
   7 2     95 59 59 58 60 60 60 60 60    91
   8 3     96 60 57 60 60 60 60 59 58    96
   9 4     94 54 58 60 60 55 53 60 58    93
  10 5     96 60 57 60 60 60 60 60  0    95
  11 6     95 59 58 59 58 59 55 54 55    95
  12 7     94 60 58 60 58 59 60 59 59    93
  13 8     90 60 54 55 60 60 60 59 60    96
  14 9     96 60 58 59 57  0  0 58 56    92
  15 10    95 60 58 58 60 60 58  0  0    94
  16 11    91 59 59 60 60 57 58 60 60    95

Summary Graphics

   1 > ggplot(data,aes(x=factor(Subject),fill=..count..)) +
   2 +     geom_bar() +
   3 +     facet_wrap(~testid)

<img alt='sesssion2/graph2.png' src='-1' />

Summary Graphics

And another one.

   1 > ggplot(data,aes(x=testid,fill=Stim.Type)) +
   2 +     geom_bar(position=position_fill()) +
   3 +     facet_wrap(~Subject)

<img alt='sesssion2/graph3.png' src='-1' />