= Introduction =
== Remember: Implicit Loops ==
A common application of loops is  to apply a function to each element of a set of values and collect the results in a single structure.
In R this is done by the functions:
   * lapply() - works on elements of a list
   * sapply() - same as lapply but simplify results
   * apply() - works on rows or colums of a matrix or a data frame (or more general on arrays)
   * tapply() - works on groups defined by an index

==  Exercises ==
Given the two R object m and l below use
*  use lapply() to get the class and the length of each element of l (two steps)
*  apply() to get the maximum of each column in m
{{{#!highlight r
> m <- matrix(1:100, nrow=10)
> l <- list(a=1:10,b=rep(c(T,F),2),c=letters)  
}}}

== read one file ==
{{{#!highlight r
> file <- "../session1/session1data/pre001.txt"
> pre1 <- read.file(file,skip=3)
[1] "read ../session1/session1data/pre001.txt"
> file <- "data/pretest/pre_001.txt"
> pre1v2 <- read.file(file,skip=0)
[1] "read ../session2/data/pretest/pre_001.txt"
}}}

== read several files ==
   * we used dir() (with the arguments pattern, recursive, full.path) to get a list of file names we wanted to read in 
   * we learned about lapply() which takes a list l and a function f to perform the function f on every element of the list l
   * so now we combine what we learned to read all files at once

== get files names ==
{{{#!highlight r
> files <- dir("../session2/data",full.names = T, 
+                recursive = T,pattern = "[0-9]{3}\\.txt$")
> files
[1] "../session2/data/posttest/post_001.txt"   
[2] "../session2/data/posttest/post_002.txt"   
[3] "../session2/data/posttest/post_003.txt"   
[4] "../session2/data/posttest/post_004.txt"   
[5] "../session2/data/posttest/post_005.txt"   
[6] "../session2/data/posttest/post_006.txt"   
[7] "../session2/data/posttest/post_007.txt"   
}}}

== read in files ==
   * source the file containing our function read.file()
   * use lapply() to use read.file() on every entry of the list of file names
{{{#!highlight r
> source("function.r")
> df.list <- lapply(files,read.file,skip=0)
[1] "read ../session2/data/posttest/post_001.txt"
[1] "read ../session2/data/posttest/post_002.txt"
[1] "read ../session2/data/posttest/post_003.txt"
[1] "read ../session2/data/posttest/post_004.txt"
[1] "read ../session2/data/posttest/post_005.txt"
[1] "read ../session2/data/posttest/post_006.txt"
[1] "read ../session2/data/posttest/post_007.txt"
[1] "read ../session2/data/posttest/post_008.txt"
}}}

== Result ==
   * what we get is a list df.list containing the results: every element of the list is a data frame if read.file() read in successfully the respective file
   * so our variable files contains 195 file names
{{{#!highlight r
> length(files)
[1] 195  
}}}
   * so df.list contains 195 elements
{{{#!highlight r
> length(df.list)
[1] 195  
}}}
   * we can check the class of each of these results again with sapply()
{{{#!highlight r
> table(sapply(df.list,class))
192          2   
}}}

== Remember: Combining Data Frames ==

We learned about three basis functions to combine data frame
   * rbind() - combine two data frames row wise
   * cbind() - combine two data frames column wise
   * merge() - combine two data with respect two one or more identifying columns

   * all of them are binary function
   * so you can not put more than two data frame into it
   * using only these function it would be a tedious and boring work to combine 192 data frames

== Reduce() ==
   * is also a higher order function (functional)
   * Reduce() uses a binary function (like rbind() or merge()) to combine successively the elements of a given list
   * it can be used if you have not only two but many data frames

=== Example ===
   * first we make up 4 artifical data frames

{{{#!highlight r
> (d1 <- data.frame(id=LETTERS[c(1,2,3)],day1=sample(10,3)))
id day1
1  A    3
2  B    1
3  C    7
> (d2 <- data.frame(id=LETTERS[c(1,3,5,6)],day2=sample(10,4)))
id day2
1  A    8
2  C    2
3  E    5
4  F    3
> (d3 <- data.frame(id=LETTERS[c(2,4:6)],day3=sample(10,4)))
id day3
1  B    8
2  D    3
3  E    4
4  F   10
> (d4 <- data.frame(id=LETTERS[c(1:5)],day4=sample(10,5)))
id day4
1  A    2
2  B    7
3  C    8
4  D    9
5  E    1
}}}


   * now we use Reduce() in combination with merge()
{{{#!highlight r
> Reduce(merge,list(d1,d2,d3,d4))
[1] id   day1 day2 day3 day4
}}}
   * and what we get is an empty data frame
   * well this isn't exactly what we wanted, so why?
   * it is because the default behavior of merge() is set all=F, so we get only complete lines which is in this case - none
   * so we have to define a wrapper function which only change this argument to all=T
   * set all to TRUE
{{{#!highlight r
> Reduce(function(x,y) { merge(x,y, all=T) },
+        list(d1,d2,d3,d4))
id day1 day2 day3 day4
1  A    3    8   NA    2
2  B    1   NA    8    7
3  C    7    2   NA    8
4  E   NA    5    4    1
5  F   NA    3   10   NA
6  D   NA   NA    3    9
}}}
   * which is exactly what we want
   * a second example in combination with rbind()
{{{#!highlight r
> d4$day <- names(d4)[2]
> names(d4)[2] <- "score"
> Reduce(function(x,y) { y$day <- names(y)[2]
+                        names(y)[2] <- "score"
+                        rbind(x,y) } ,
+        list(d1,d2,d3), init = d4)
id score  day
1   A     2 day4
2   B     7 day4
3   C     8 day4
4   D     9 day4
}}}
   * which is exactly what we want
== Reduce() Exercise ==
   * the list ml contains three vectors
   * use lapply() to get the class of each of them
   * then use Reduce() and combination with c() to coerce them into one vector. Of which class is the resulting vector?
{{{#!highlight r
> ml <- list(vl <- c(TRUE,FALSE),
+            vn <- 1:10,
+            vc <- letters)
}}}
== Reduce() Exercise ==
   * use lapply() to get the class of each of them
{{{#!highlight r
> lapply(ml,class)
[1] "logical"
[1] "integer"
[1] "character"
}}}
   * then use Reduce() and combination with c() to coerce them into one vector. Of which class is the resulting vector?
{{{#!highlight r
> rv <- Reduce(c,ml)
> rv
[1] "1"  "0"  "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "a"  "b"  "c" 
[16] "d"  "e"  "f"  "g"  "h"  "i"  "j"  "k"  "l"  "m"  "n"  "o"  "p"  "q"  "r" 
[31] "s"  "t"  "u"  "v"  "w"  "x"  "y"  "z" 
> class(rv)
[1] "character"
}}}
== Combine all data frames ==
   * We used lapply() and our function read.file() to read all files in files
   * and we got back a list df.list containing 192 data frames
{{{#!highlight r
> df.list <- lapply(files,read.file,skip=0)
[1] "read data/posttest/post_001.txt"
[1] "read data/posttest/post_002.txt"
[1] "read data/posttest/post_003.txt"
[1] "read data/posttest/post_004.txt"
[1] "read data/posttest/post_005.txt"
> table(sapply(df.list,class))
192          3   
}}}
== Combine all data frames - exercise ==
   * now use what we learned about  Reduce{} and combining data frames using rbind()  to combine these 192 data frames. 
== Combine all data frames - exercise ==
{{{#!highlight r
> data <- Reduce(rbind,df.list)
> nrow(data)
[1] 12704
> table(data$Subject)
001_test2 002_test2 003_test2 004_test2 005_test2 006_test2 007_test2 008_test2 
93        91        96        93        95        95        93        96 
009_test2 010_test2 011_test2 012_test2 013_test2 014_test2 015_test2 016_test2 
92        94        95        96        96        95        96        94 
017_test2 018_test2 019_test2 020_test2 001_test1 002_test1 003_test1 004_test1 
95        94        96        95        95        95        96        94 
005_test1 006_test1 007_test1 008_test1 009_test1 010_test1 011_test1 012_test1 
96        95        94        90        96        95        91        96 
013_test1 014_test1 015_test1 016_test1 017_test1 018_test1 019_test1 020_test1 
95        96        95        91        96        96        96        96 
001_1     002_1     003_1     004_1     005_1     006_1    CHGU_1    008_1a 
60        59        60        54        60        59        60        60 
009_1     010_1     RMK_1     013_1     014_1     015_1     016_1    IJ2K_1 
60        60        59        59        60        58        59        58 
018_1     019_1     020_1     001_2     002_2     003_2     004_2     005_2 
60        59        60        59        59        57        58        57 
006_2     007_2     008_2     009_2     010_2     011_2     012_2     013_2 
58        58        54        58        58        59        59        56 
014_2     015_2     016_2     017_2     018_2     019_2     020_2     001_3 
}}}
== The Function no 2 ==
   * so it is recommended to build again a function out of this
{{{#!highlight r
> read.files <- function(filesdir,skip=3,recursive=F,pattern="."){
+     files <- dir(filesdir,
+                  full.names = T,
+                  recursive = recursive,
+                  pattern = pattern)
+     Reduce(rbind,lapply(files,read.file,skip=skip))}
> data <- read.files("data",recursive = T,skip=0,pattern = "\\.txt$")
[1] "read data/posttest/post_001.txt"
[1] "read data/posttest/post_002.txt"
[1] "read data/posttest/post_003.txt"
[1] "read data/posttest/post_004.txt"
[1] "read data/posttest/post_005.txt"
}}}
== The Function no 2 ==
   * by changing the pattern (passed through to dir() we can limit the read in files to specific time or person\tiny
{{{#!highlight r
> sub1 <- read.files("../session2/data",
+                    skip = 0, recursive = T,pattern="\\002\\.txt$")
[1] "read ../session2/data/posttest/post_002.txt"
[1] "read ../session2/data/pretest/pre_002.txt"
[1] "read ../session2/data/training_1/train_002.txt"
[1] "read ../session2/data/training_2/train_002.txt"
[1] "read ../session2/data/training_3/train_002.txt"
> test <- read.files("../session2/data",
+                    skip = 0, recursive = T,pattern="p[ro].+\\.txt$")
[1] "read ../session2/data/posttest/post_001.txt"
[1] "read ../session2/data/posttest/post_002.txt"
[1] "read ../session2/data/pretest/pre_001.txt"
[1] "read ../session2/data/pretest/pre_002.txt"
}}}
== The Subject column ==
   * table the Subject column again. What is the problem?
== The Subject column ==
{{{#!highlight r
> table(data$Subject)
001_test2 002_test2 003_test2 004_test2 005_test2 006_test2 007_test2 008_test2 
93        91        96        93        95        95        93        96 
009_test2 010_test2 011_test2 012_test2 013_test2 014_test2 015_test2 016_test2 
92        94        95        96        96        95        96        94 
017_test2 018_test2 019_test2 020_test2 001_test1 002_test1 003_test1 004_test1 
95        94        96        95        95        95        96        94 
}}}
   * subject and time coded in one variable
== The Subject column ==
   * we create two new variables using the str_split() function (stringr package)
   * because str_split() has a list containing a vector as result we have to use it in combination with sapply()
   * then correct some of the person ids
{{{#!highlight r
> data$persid <- sapply(data$Subject,function(x)
+     str_split(x,pattern = "_")[[1]][1])
> data$testid <- sapply(data$Subject,function(x)
+     str_split(x,pattern = "_")[[1]][2])
}}}
== The Subject column ==
   * a alternative is using again regular expressions using the str_replace() function (again stringr package)
   * str_replace() takes three arguments: the string, the pattern to be replaced and the replacement
{{{#!highlight r
> data$testid <- str_replace(data$Subject,"^.+_","")
> data$persid <- str_replace(data$Subject,"_.+$","")
> data$Subject <- NULL
}}}
== The Subject column ==
   * now table the personid column
   * what is left to do?
== The Subject column ==
{{{#!highlight r
> table(data$persid)
001  002  003  004  005  006  007  008  009  010  011  012  013  014  015  016 
665  662  666  587  608  588  600  654  536  543  600  523  589  669  662  663 
017  018  019  020 CHGU GA3K IJ2K Kj6K  RMK 
604  668  667  656   60   58   58   59   59 
> data$persid[data$persid=="CHGU"] <- "007"
}}}
== The Subject column Exercises ==
   * there are some more wrong person ids: RMK - 011, IJ2K - 017, GA3K - 004, Kj6K - 006. Correct them!
== The Subject column Exercises ==
{{{#!highlight r
> data$persid[data$persid=="RMK"] <- "011"
> data$persid[data$persid=="IJ2K"] <- "017"
> data$persid[data$persid=="GA3K"] <- "004"
> data$persid[data$persid=="Kj6K"] <- "006"
}}}
== Merging ==
* now read in the file subjectsdemographics.txt using the appropriate command
* join the demographics with our data data frame (there is a little problem left - compare the persid and Subject columns)
== The Subject column Exercises ==
{{{#!highlight r
> persdat <- read.table("data/subjectdemographics.txt",
+                       sep="\t",
+                       header=T)
> persdat$Subject
[1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> unique(data$persid)
[1] "001" "002" "003" "004" "005" "006" "007" "008" "009" "010" "011" "012"
[13] "013" "014" "015" "016" "017" "018" "019" "020"
> data$persid <- as.numeric(data$persid)
> unique(data$persid)
[1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> data <- merge(persdat,data,by.x = "Subject",by.y = "persid",all=T)
> head(data)
> summary(data)
Subject      Sex       Age_PRETEST        Trial          Event.Type   
1st Qu.: 5.00   m:5046   1st Qu.:3.110   1st Qu.:112.0   Response:12704  
Median :11.00            Median :4.400   Median :222.0   Sound   :    0  
Mean   :10.53            Mean   :4.154   Mean   :223.1   Pause   :    0  
3rd Qu.:16.00            3rd Qu.:4.600   3rd Qu.:332.0   Resume  :    0  
}}}
== Summary Graphics ==
Just run the code and try to understand it. We will cover the ggplot graphics soon.
{{{#!highlight r
> ggplot(data,aes(x=factor(Subject),fill=..count..)) +
+     geom_bar() +
+     facet_wrap(~testid)
}}}

[[attachment:graph1.png|{{attachment:graph1||width=800}}]]
== Summary Graphics ==
   * so there are problems in coding of the test id
   * we remove the letters at the end using str_replace()
{{{#!highlight r
> data$testid <- str_replace(data$testid,"[a-z]$","")
> data$testid <- factor(data$testid,
+                       levels=c("test1","1","2","3","4","5","6","7","8","test2"))
> table(data$Subject,data$testid)
test1  1  2  3  4  5  6  7  8 test2
1     95 60 59 60 59 59 60 60 60    93
2     95 59 59 58 60 60 60 60 60    91
3     96 60 57 60 60 60 60 59 58    96
4     94 54 58 60 60 55 53 60 58    93
5     96 60 57 60 60 60 60 60  0    95
6     95 59 58 59 58 59 55 54 55    95
7     94 60 58 60 58 59 60 59 59    93
8     90 60 54 55 60 60 60 59 60    96
9     96 60 58 59 57  0  0 58 56    92
}}}
== Summary Graphics ==
{{{#!highlight r
> ggplot(data,aes(x=factor(Subject),fill=..count..)) +
+     geom_bar() +
+     facet_wrap(~testid)
}}}

[[attachment:graph2.png|{{attachment:graph2.png||width=800}}]]

== Summary Graphics ==
And another one.
{{{#!highlight r
> ggplot(data,aes(x=testid,fill=Stim.Type)) +
+     geom_bar(position=position_fill()) +
+     facet_wrap(~Subject)
}}}

[[attachment:graph3.png|{{attachment:graph3.png||width=800}}]]