Unterschiede zwischen den Revisionen 1 und 3 (über 2 Versionen hinweg)

Introduction

Implicit Loops

A common application of loops is to apply a function to each element of a set of values and collect the results in a single structure. In R this is done by the functions:

lapply()
sapply()
apply()
tapply()

lapply()

The functions lapply and sapply are similar, their first argument can be a list, data frame, matrix or vector, the second argument the function to "apply". The former return a list (hence "l") and the latter tries to simplify the results (hence the "s"). For example:

   1 > lapply(dat,mean)
   2 [1] 6753.636
   3 [1] 5433.182
   4 > sapply(dat,mean)

apply()

apply() this function can be applied to an array. Its argument is the array, the second the dimension/s where we want to apply a function and the third is the function. For example

   1 > x<-1:12
   2 > dim(x)<-c(2,2,3)
   3 > apply(x,3,quantile) ## calculate the quantiles

tapply()

The function tapply() allows you to create tables (hence the "t") of the value of a function on subgroups defined by its second argument, which can be a factor or a list of factors.

For example in the quine data frame, we can summarize Days classify by Eth and Lrn as follows:

   1 > tapply(Days,list(Eth,Lrn),mean)
   2 AL       SL
   3 A 18.57500 24.89655
   4 N 13.25581 10.82353

the class() function shows the class of an object use it in combination with lapply() to get the classes of the columns of the quine data frame * do the same with sapply() what is the difference * try to combine this with what you learned about indexing and create a new data frame quine2 only containing the columns which are factors * calculate the row and column means of the below defined matrix m using the apply function PS: in real life application use the rowMeans() and colMeans() function

   1 m <- matrix(rnorm(100),nrow=10)

use tapply() to summarise the number of missing days at school per Ethnicity and/or per Sex (three lines) * sometimes the aggregate() function is more convenient; note the use of #!latex $\sim$; it is read as 'is dependent on'and it is extensively used in modelling

   1 > aggregate(Days ~ Sex + Eth, data=quine,mean)
   2 Sex Eth     Days
   3 1   F   A 20.92105
   4 2   M   A 21.61290
   5 3   F   N 10.07143
   6 4   M   N 14.71429
   7 > aggregate(Days ~ Sex + Eth, data=quine,summary)
   8 Sex Eth Days.Min. Days.1st Qu. Days.Median Days.Mean Days.3rd Qu. Days.Max.
   9 1   F   A      0.00         5.25       13.50     20.92        30.25     81.00
  10 2   M   A      2.00         9.50       16.00     21.61        33.00     57.00
  11 3   F   N      0.00         5.00        7.00     10.07        14.00     37.00
  12 4   M   N      0.00         3.50        8.00     14.71        19.50     69.00

Functions

Every function in R has three important characteristics:

a body (the code inside the function) - body()
arguments (the list of arguments which controls how you can call the function) - formals()
an environment (the “map” of the location of the function’s variables) - environment()

You can see all three parts if you type the name of the function without primitives. Exceptions are brackets. Primitive functions, like sum(), call C code directly with .Primitive() and contain no R code. Therefore their formals(), body(), and environment() are all NULL.

Functions

   1 > chisq.test
   2 function (x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)),
   3 DNAME <- deparse(substitute(x))
   4 if (is.data.frame(x))
   5 expected = E, residuals = (x - E)/sqrt(E), stdres = (x -
   6 > sum
   7 function (..., na.rm = FALSE)  .Primitive("sum")

Function Arguments

Arguments are matched

first by exact name (perfect matching)
then by prefix matching
and finally by position.

By default, R function arguments are lazy, they are only evaluated if they are actually used:

   1 > f <- function(x) {
   2 f <- function(x) {
   3 +   10
   4 + }
   5 > f(stop("This is an error!"))
   6 [1] 10
   7 >

Function Exercises (Verzani)

* Write a function to compute the average distance from the mean for some data vector. * Write a function f() which finds the average of the x values after squaring and substracts the square of the average of the numbers. Verify this output will always be non-negative by computing \texttt{f(1:10)} * An integer is even if the remainder upon dividing it by 2 is 0. This remainder is given by R with the syntax \texttt{ x \%\% 2}. Use this to write a function iseven(). How would you write isodd()? * Write a function isprime() that checks if a number x is prime by dividing x by all values \texttt{$2,\ldots,x-1}}}} then checking to see if there is a remainder of 0.

Function Exercises (Verzani)

Write a function to compute the average distance from the mean for some data vector.

   1 > avg.dist <- function(x){
   2 +     xbar <- mean(x)
   3 +     mean(abs(x-xbar))
   4 + }

Function Exercises (Verzani)

Write a function f() which finds the average of the x values aufter squaring and substracts the square of the average of the numbers. Verify this output will always be non-negative by computing \texttt{f(1:10)}

   1 > f <- function(x){
   2 +     mean(x**2) - mean(x)**2
   3 + }
   4 > f(1:10)
   5 [1] 8.25

Function Exercises (Verzani)

An integer is even if the remainder upon dividing it by 2 is 0. This remainder is given by R with the syntax \texttt{ x \%\% 2}. Use this to write a function iseven(). How would you write isodd()?

   1 > iseven <- function(x){
   2 +     x %% 2 == 0
   3 + }
   4 > iseven(1:10)
   5 [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
   6 > isodd <- function(x){
   7 +     !iseven(x)
   8 + }
   9 > isodd(1:10)
  10 [1]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE

Function Exercises (Verzani)

Write a function isprime() that checks if a number x is prime by dividing x by all values \texttt{$2,\ldots,x-1}}}} then checking to see if there is a remainder of 0.

   1 > isprime <- function(x){
   2 +     if(x == 2) return(TRUE)
   3 +     !(0 %in% (x %% (2:(x-1))))
   4 + }
   5 > isprime(2)
   6 [1] TRUE
   7 > isprime(5)
   8 [1] TRUE
   9 > isprime(15)
  10 [1] FALSE

Read in the file

   1 > file <- "../session1/session1data/pre001.txt"
   2 > skip <- 3
   3 > tmp <- read.table(file,skip = skip,sep = "\t",
   4 +                   header=T,na.strings = c(" +",""),
   5 +                   fill=T)

Remove empty line

   1 > tmp <- tmp[!is.na(tmp$Subject),]

Remove spaces

Remove unnecessary spaces from character vectors/factors

   1 > tmp <- lapply(tmp,function(x) {
   2 +         if( class(x) %in% c("character","factor") ){
   3 +             x <- factor(gsub(" ","",as.character(x)))
   4 +             return(x)}else{ return(x) }})
   5 > tmp <- as.data.frame(tmp)

Find/Remove breaks

   1 > if(length(pause)>0){
   2 +     drei <- which(tmp$Code==3 & !is.na(tmp$Code))
   3 +     drei <- drei[drei > pause][1:2]
   4 +     if(pause + 1 < drei[1]){
   5 +         tmp <- tmp[-(pause:drei[2]),]
   6 +     }}
   7 > tmp <- tmp[!(tmp$Event.Type %in% c("Pause","Resume")), ]

Find/Remove first/last rows

   1 > first.pic <- min(which(tmp$Event.Type=="Picture" &
   2 +                           !is.na(tmp$Event.Type) )) - 1
   3 > tmp <- tmp[-(1:first.pic),]
   4 > last.pic <- min(which(tmp$Event.Type=="Picture" &
   5 +                           !is.na(tmp$Event.Type) &
   6 +                           tmp$Code=="Fertig!" &
   7 +                           !is.na(tmp$Code)))
   8 > tmp <- tmp[-(last.pic:nrow(tmp)),]

Extract Responses

   1 > zeilen <- which(tmp$Event.Type %in% c("Response"))
   2 > zeilen <- sort(unique(c(zeilen,zeilen-1)))
   3 > zeilen <- zeilen[zeilen>0]
   4 > tmp <- tmp[zeilen,]

Extract Responses

   1 > responses <- which(tmp$Code %in% c(1,2))
   2 > events <- responses-1
   3 > tmp$Type <- NA
   4 > tmp$Type[responses] <- as.character(tmp$Event.Type[events])
   5 > head(tmp)
   6 Subject Trial Event.Type     Code   Time TTime Uncertainty Duration
   7 6   PRE001     7    Picture RO09.jpg 168954     0           1    10197
   8 7   PRE001     7   Response        2 178963 10009           1       NA
   9 11  PRE001    12    Picture RO20.jpg 230338     0           1     8398
  10 12  PRE001    12   Response        1 238680  8342           1       NA
  11 16  PRE001    17    Picture RS28.jpg 289723     0           1     8198
  12 17  PRE001    17   Response        2 297789  8066           1       NA
  13 6              2       0   next incorrect          7    <NA>
  14 7             NA      NA   <NA>      <NA>         NA Picture
  15 11             2       0   next incorrect         12    <NA>
  16 12            NA      NA   <NA>      <NA>         NA Picture
  17 16             2       0   next       hit         17    <NA>
  18 17            NA      NA   <NA>      <NA>         NA Picture

Moving Information

Moving all (necessary) information to the response lines.

   1 > tmp$Event.Code <- NA
   2 > tmp$Event.Code[responses] <- as.character(tmp$Code[events])
   3 > tmp$Stim.Type[responses] <- as.character(tmp$Stim.Type[events])
   4 > tmp$Duration[responses] <- as.character(tmp$Duration[events])
   5 > tmp$Uncertainty.1[responses] <- as.character(tmp$Uncertainty.1[events])
   6 > tmp$ReqTime[responses] <- as.character(tmp$ReqTime[events])
   7 > tmp$ReqDur[responses] <- as.character(tmp$ReqDur[events])
   8 > tmp$Pair.Index[responses] <- as.character(tmp$Pair.Index[events])
   9 > tmp$Stim.Type[responses] <- as.character(tmp$Stim.Type[events])

Moving Information

   1 > head(tmp)
   2 Subject Trial Event.Type     Code   Time TTime Uncertainty Duration
   3 6   PRE001     7    Picture RO09.jpg 168954     0           1    10197
   4 7   PRE001     7   Response        2 178963 10009           1    10197
   5 11  PRE001    12    Picture RO20.jpg 230338     0           1     8398
   6 12  PRE001    12   Response        1 238680  8342           1     8398
   7 16  PRE001    17    Picture RS28.jpg 289723     0           1     8198
   8 17  PRE001    17   Response        2 297789  8066           1     8198
   9 6              2       0   next incorrect          7    <NA>       <NA>
  10 7              2       0   next incorrect          7 Picture   RO09.jpg
  11 11             2       0   next incorrect         12    <NA>       <NA>
  12 12             2       0   next incorrect         12 Picture   RO20.jpg
  13 16             2       0   next       hit         17    <NA>       <NA>
  14 17             2       0   next       hit         17 Picture   RS28.jpg

Keep response lines

   1 > tmp <- tmp[tmp$Event.Type=="Response" & !is.na(tmp$Type),]
   2 > tmp <- tmp[tmp$Type=="Picture" & !is.na(tmp$Type),]
   3 > head(tmp)
   4 Subject Trial Event.Type Code   Time TTime Uncertainty Duration
   5 7   PRE001     7   Response    2 178963 10009           1    10197
   6 12  PRE001    12   Response    1 238680  8342           1     8398
   7 17  PRE001    17   Response    2 297789  8066           1     8198
   8 22  PRE001    22   Response    1 351321 10811           1    10997
   9 27  PRE001    27   Response    2 403607   713           1      800
  10 32  PRE001    32   Response    1 467793 23709           1    23794
  11 7              2       0   next incorrect          7 Picture   RO09.jpg
  12 12             2       0   next incorrect         12 Picture   RO20.jpg
  13 17             2       0   next       hit         17 Picture   RS28.jpg
  14 22             2       0   next       hit         22 Picture   AT26.jpg
  15 27             2       0   next       hit         27 Picture   RS23.jpg
  16 32             2       0   next       hit         32 Picture   OF04.jpg

The Function

it would be a tedious work to every step for all of the files
if we look through the steps the only important thing that we have to change is the file name
so we rather use a canned version of our procedure dependend the file name and the number of lines to skip - we create a function read.file(file):

   1 tmp <- read.table(file,skip = skip,sep = "\t",

The Function (continued)

   1 tmp <- tmp[!is.na(tmp$Subject),]
   2 tmp <- lapply(tmp,function(x) {
   3 x <- factor(gsub(" ","",as.character(x)))
   4 tmp <- as.data.frame(tmp)

The Function (continued)

   1 pause <- which(tmp$Event.Type=="Picture" & tmp$Code=="Pause")
   2 drei <- which(tmp$Code==3 & !is.na(tmp$Code))
   3 drei <- drei[drei > pause][1:2]
   4 tmp <- tmp[-(pause:drei[2]),]
   5 tmp <- tmp[!(tmp$Event.Type %in% c("Pause","Resume")), ]

The Function (continued)

   1 tmp <- tmp[-(1:first.pic),]
   2 tmp <- tmp[-(last.pic:nrow(tmp)),]

The Function (continued)

   1 zeilen <- which(tmp$Event.Type %in% c("Response"))
   2 zeilen <- sort(unique(c(zeilen,zeilen-1)))
   3 zeilen <- zeilen[zeilen>0]
   4 tmp <- tmp[zeilen,]
   5 responses <- which(tmp$Code %in% c(1,2))
   6 events <- responses-1

The Function (continued)

   1 tmp <- tmp[tmp$Event.Type=="Response" & !is.na(tmp$Type),]
   2 tmp <- tmp[tmp$Type=="Picture" & !is.na(tmp$Type),]

The Function (continued)

we can use this function now to read in the file
and get the processed data frame in one step
setting the parameter skip we can read both versions of the file (and should get the same result)

The Function Exercise

* run the function using source() * use the function to read in \texttt{../session1/session1data/pre001.txt} and \texttt{data/pretest/pre\_001.txt} * use some summary functions like table() or summary to check if they contain the same information We will learn about a function to compare data frames more exact soon.

The Function (continued)

   1 > file <- "../session1/session1data/pre001.txt"
   2 > pre1 <- read.file(file,skip=3)
   3 [1] "read ../session1/session1data/pre001.txt"
   4 > file <- "data/pretest/pre_001.txt"
   5 > pre1v2 <- read.file(file,skip=0)
   6 [1] "read ../session2/data/pretest/pre_001.txt"

rbind()

rbind() can be used to combine two dataframes (or matrices) in the sense of adding rows, the column names and types must be the same for the two objects

   1 > x <- data.frame(id=1:3,score=rnorm(3))
   2 > y <- data.frame(id=13:15,score=rnorm(3))
   3 > rbind(x,y)
   4 id       score
   5 1  1  0.71121163
   6 2  2 -0.62973249
   7 3  3  1.17737595
   8 4 13 -0.45074940
   9 5 14 -0.01044197
  10 6 15 -1.05217176

cbind()

cbind() can be used to combine two dataframes (or matrices) in the sense of adding columns, the number of rows must be the same for the two objects

   1 > cbind(x,y)
   2 id      score1      score2     score3
   3 1  1  0.11440705  0.14536778 -1.1773241
   4 2  2 -1.62862651  0.02020604  0.5686415
   5 3  3  0.05335811  0.25462270  0.8844987
   6 4  4 -0.19931734  0.15625511  0.9287316
   7 5  5 -1.15217836 -1.79804503 -0.7550234

it is not recommended to use cbind() to combining data frames

merge()

merge() is the command of choice for merging or joining data frames
it is the equivalent of join in sql
there are four cases
- inner join
- left outer join
- right outer join
- full outer join

   1 > (d1 <- data.frame(id=LETTERS[c(1,2,3)],day1=sample(10,3)))
   2 id day1
   3 1  A    3
   4 2  B    4
   5 3  C    5
   6 > (d2 <- data.frame(id=LETTERS[c(1,3,5,6)],day2=sample(10,4)))
   7 id day2
   8 1  A    7
   9 2  C   10
  10 3  E    3
  11 4  F    6

inner join

inner join means: keep only the cases present in both of the data frames

   1 > merge(d1,d2)
   2 id day1 day2
   3 1  A    3    7
   4 2  C    5   10

left outer join

left outer join means: keep all cases of the left data frame no matter if they are present in the right data frame (all.x=T)

   1 > merge(d1,d2,all.x = T)
   2 id day1 day2
   3 1  A    3    7
   4 2  B    4   NA
   5 3  C    5   10

right outer join

right outer join means: keep all cases of the right data frame no matter if they are present in the left data frame (all.y=T)

   1 > merge(d1,d2,all.y = T)
   2 id day1 day2
   3 1  A    3    7
   4 2  C    5   10
   5 3  E   NA    3
   6 4  F   NA    6

full outer join

full outer join means: keep all cases of both data frames (all=T)

   1 > merge(d1,d2,all = T)
   2 id day1 day2
   3 1  A    3    7
   4 2  B    4   NA
   5 3  C    5   10
   6 4  E   NA    3
   7 5  F   NA    6

merge()

if not stated otherwise R uses the intersect of the names of both data frames, in our case only \textit{id}
you can specify these columns directly by \texttt{by=c("colname1","colname2")} if the columns are named identical or
using\\ \texttt{by.x=c("colname1.x","colname2.x"),

merge() Exercise

now read in the file personendaten.txt using the appropriate command
join the demographics with our pre1 data frame (even though it does not make sense now)

merge() Exercise

   1 > persdat <- read.table("../session1/session1data/personendaten.txt",
   2 +                       sep="\t",
   3 +                       header=T)
   4 > pre1 <- merge(persdat,pre1,all.y = T)
   5 > head(pre1)
   6 Subject Sex Age_PRETEST Trial Event.Type Code   Time TTime Uncertainty
   7 1  PRE001   f        3.11     7   Response    2 178963 10009           1
   8 2  PRE001   f        3.11    12   Response    1 238680  8342           1
   9 3  PRE001   f        3.11    17   Response    2 297789  8066           1
  10 4  PRE001   f        3.11    22   Response    1 351321 10811           1
  11 5  PRE001   f        3.11    27   Response    2 403607   713           1
  12 6  PRE001   f        3.11    32   Response    1 467793 23709           1
  13 Duration Uncertainty.1 ReqTime ReqDur Stim.Type Pair.Index    Type Event.Code
  14 1    10197             2       0   next incorrect          7 Picture   RO09.jpg
  15 2     8398             2       0   next incorrect         12 Picture   RO20.jpg
  16 3     8198             2       0   next       hit         17 Picture   RS28.jpg
  17 4    10997             2       0   next       hit         22 Picture   AT26.jpg
  18 5      800             2       0   next       hit         27 Picture   RS23.jpg
  19 6    23794             2       0   next       hit         32 Picture   OF04.jpg

Reduce()

is a higher order function (functional)
Reduce() uses a binary function (like rbind() or merge()) to combine successively the elements of a given list
it can be used if you have not only two but many data frames

Reduce()

first we make up 4 artifical data frames

Reduce()

   1 > (d1 <- data.frame(id=LETTERS[c(1,2,3)],day1=sample(10,3)))
   2 id day1
   3 1  A    3
   4 2  B    1
   5 3  C    7
   6 > (d2 <- data.frame(id=LETTERS[c(1,3,5,6)],day2=sample(10,4)))
   7 id day2
   8 1  A    8
   9 2  C    2
  10 3  E    5
  11 4  F    3
  12 > (d3 <- data.frame(id=LETTERS[c(2,4:6)],day3=sample(10,4)))
  13 id day3
  14 1  B    8
  15 2  D    3
  16 3  E    4
  17 4  F   10
  18 > (d4 <- data.frame(id=LETTERS[c(1:5)],day4=sample(10,5)))
  19 id day4
  20 1  A    2
  21 2  B    7
  22 3  C    8
  23 4  D    9
  24 5  E    1

Reduce()

now we use Reduce() in combination with merge()

   1 > Reduce(merge,list(d1,d2,d3,d4))
   2 [1] id   day1 day2 day3 day4

and what we get is an empty data frame
well this isn't exactly what we wanted, so why?
it is because the default behavior of merge() is set all=F, so we get only complete lines which is in this case - none
so we have to define a wrapper function which only change this argument to all=T

Reduce()

now we use Reduce() in combination with merge()

   1 > Reduce(function(x,y) { merge(x,y, all=T) },
   2 +        list(d1,d2,d3,d4))
   3 id day1 day2 day3 day4
   4 1  A    3    8   NA    2
   5 2  B    1   NA    8    7
   6 3  C    7    2   NA    8
   7 4  E   NA    5    4    1
   8 5  F   NA    3   10   NA
   9 6  D   NA   NA    3    9

which is exactly what we want

Reduce()

a second example in combination with rbind()

   1 > d4$day <- names(d4)[2]
   2 > names(d4)[2] <- "score"
   3 > Reduce(function(x,y) { y$day <- names(y)[2]
   4 +                        names(y)[2] <- "score"
   5 +                        rbind(x,y) } ,
   6 +        list(d1,d2,d3), init = d4)
   7 id score  day
   8 1   A     2 day4
   9 2   B     7 day4
  10 3   C     8 day4
  11 4   D     9 day4

which is exactly what we want

A second function

well that's better, but it is still boring to do this for every single file
so see what we have learned: the combination of lapply() and Reduce() can do the work
using dir{} we get all the files contained in a given directory
then we use lapply() together with our new function read.file()

dir()

dir() without additional argument shows all files/directories in the working directory

   1 > dir()
   2 [1] "data"                  "function.r"            "function.r~"
   3 [4] "ggp1.pdf"              "graphics.r"            "linkimage.aux"
   4 [7] "session2apply.aux"     "session2apply.log"     "session2apply.nav"
   5 [10] "session2apply.out"     "session2apply.pdf"     "session2apply.snm"
   6 [13] "session2apply.tex"     "session2apply.tex~"    "#session2apply.tex#"
   7 [16] "session2apply.toc"     "session2apply.vrb"     "session2hadley.aux"
   8 [19] "session2hadley.log"    "session2hadley.nav"    "session2hadley.out"
   9 [22] "session2hadley.pdf"    "session2hadley.snm"    "session2hadley.tex"
  10 [25] "session2hadley.tex~"   "session2hadley.toc"    "session2hadley.vrb"
  11 [28] "solutionssession1.r"   "solutionssession1.r~"  "solutionssession2.r"
  12 [31] "solutionssession2.r~"  "#solutionssession2.r#"

dir()

given a path dir() will show the content of resp folder

   1 > dir("data")
   2 [1] "posttest"   "pretest"    "training_1" "training_2" "training_3"
   3 [6] "training_4" "training_5" "training_6" "training_7" "training_8"

dir()

setting recursive to TRUE R will recurse into directories recursively through

   1 > dir("data",recursive = T)
   2 [1] "posttest/post_001.txt"      "posttest/post_002.txt"
   3 [3] "posttest/post_003.txt"      "posttest/post_004.txt"
   4 [5] "posttest/post_005.txt"      "posttest/post_006.txt"
   5 [7] "posttest/post_007.txt"      "posttest/post_008.txt"

dir()

setting full.names to TRUE R will give the full path

   1 > dir("data",recursive = T, full.names = T)
   2 [1] "data/posttest/post_001.txt"      "data/posttest/post_002.txt"
   3 [3] "data/posttest/post_003.txt"      "data/posttest/post_004.txt"
   4 [5] "data/posttest/post_005.txt"      "data/posttest/post_006.txt"

dir()

with pattern we can specify which files to show (regexpr), e.g. all r files

   1 > dir(pattern = "\\.r$")
   2 [1] "function.r"          "graphics.r"          "solutionssession1.r"
   3 [4] "solutionssession2.r"

dir() Exercise

create a variable files containing the names of all text files in the data directory, my editor creates temporary files beginning and ending by a hash key, make sure they are not contained in the list

dir() Exercise

   1 > dir("data",full.names = T, recursive = T,pattern = "txt$"
   2 + )
   3 [1] "data/posttest/post_001.txt"    "data/posttest/post_002.txt"
   4 [3] "data/posttest/post_003.txt"    "data/posttest/post_004.txt"
   5 [5] "data/posttest/post_005.txt"    "data/posttest/post_006.txt"
   6 > files <- dir("data",full.names = T, recursive = T,pattern = "txt$")

Read all files

Now we use lapply() and our function read.file() to read all files in files

   1 > df.list <- lapply(files,read.file,skip=0)
   2 [1] "read data/posttest/post_001.txt"
   3 [1] "read data/posttest/post_002.txt"
   4 [1] "read data/posttest/post_003.txt"
   5 [1] "read data/posttest/post_004.txt"
   6 [1] "read data/posttest/post_005.txt"

Reading all files

the object df.list is a list containing 192 data frames

   1 > sapply(df.list,class)
   2 [1] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
   3 [6] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
   4 [11] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
   5 [16] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
   6 [21] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
   7 [26] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
   8 [31] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
   9 [36] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
  10 [41] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
  11 [46] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
  12 [51] "data.frame" "NULL"       "data.frame" "data.frame" "data.frame"
  13 [56] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
  14 [61] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
  15 [66] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
  16 [71] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"

The Function 2

in a last step we use Reduce{} to combine these 192 data frames

   1 > data <- Reduce(rbind,df.list)
   2 > nrow(data)
   3 [1] 12704
   4 > table(data$Subject)
   5 001_test2 002_test2 003_test2 004_test2 005_test2 006_test2 007_test2 008_test2
   6 93        91        96        93        95        95        93        96
   7 009_test2 010_test2 011_test2 012_test2 013_test2 014_test2 015_test2 016_test2
   8 92        94        95        96        96        95        96        94
   9 017_test2 018_test2 019_test2 020_test2 001_test1 002_test1 003_test1 004_test1
  10 95        94        96        95        95        95        96        94
  11 005_test1 006_test1 007_test1 008_test1 009_test1 010_test1 011_test1 012_test1
  12 96        95        94        90        96        95        91        96
  13 013_test1 014_test1 015_test1 016_test1 017_test1 018_test1 019_test1 020_test1
  14 95        96        95        91        96        96        96        96
  15 001_1     002_1     003_1     004_1     005_1     006_1    CHGU_1    008_1a
  16 60        59        60        54        60        59        60        60
  17 009_1     010_1     RMK_1     013_1     014_1     015_1     016_1    IJ2K_1
  18 60        60        59        59        60        58        59        58
  19 018_1     019_1     020_1     001_2     002_2     003_2     004_2     005_2
  20 60        59        60        59        59        57        58        57
  21 006_2     007_2     008_2     009_2     010_2     011_2     012_2     013_2
  22 58        58        54        58        58        59        59        56
  23 014_2     015_2     016_2     017_2     018_2     019_2     020_2     001_3

The Function no 2

so it is recommended to build again a function out of this

   1 > read.files <- function(filesdir,skip=3,recursive=F,pattern="."){
   2 +     files <- dir(filesdir,
   3 +                  full.names = T,
   4 +                  recursive = recursive,
   5 +                  pattern = pattern)
   6 +     Reduce(rbind,lapply(files,read.file,skip=skip))}
   7 > data <- read.files("data",recursive = T,skip=0,pattern = "\\.txt$")
   8 [1] "read data/posttest/post_001.txt"
   9 [1] "read data/posttest/post_002.txt"
  10 [1] "read data/posttest/post_003.txt"
  11 [1] "read data/posttest/post_004.txt"
  12 [1] "read data/posttest/post_005.txt"

The Subject column

table the Subject column again. What is the problem?

The Subject column

   1 > table(data$Subject)
   2 001_test2 002_test2 003_test2 004_test2 005_test2 006_test2 007_test2 008_test2
   3 93        91        96        93        95        95        93        96
   4 009_test2 010_test2 011_test2 012_test2 013_test2 014_test2 015_test2 016_test2
   5 92        94        95        96        96        95        96        94
   6 017_test2 018_test2 019_test2 020_test2 001_test1 002_test1 003_test1 004_test1
   7 95        94        96        95        95        95        96        94

subject and time coded in one variable

The Subject column

we create two new variables using the str\_split() function (stringr package)
becaus str\_split() has a list containing a vector as result we have to use it in combination with sapply()
then correct some of the person ids

   1 > data$persid <- sapply(data$Subject,function(x)
   2 +     str_split(x,pattern = "_")[[1]][1])
   3 > data$testid <- sapply(data$Subject,function(x)
   4 +     str_split(x,pattern = "_")[[1]][2])
   5 > data$persid[data$persid=="CHGU"] <- "007"

The Subject column Exercises

there are some more wrong person ids: RMK - 011, IJ2K - 017, GA3K - 004, Kj6K - 006. Correct them!

The Subject column Exercises

   1 > data$persid[data$persid=="RMK"] <- "011"
   2 > data$persid[data$persid=="IJ2K"] <- "017"
   3 > data$persid[data$persid=="GA3K"] <- "004"
   4 > data$persid[data$persid=="Kj6K"] <- "006"

Merging

* now read in the file subjectsdemographics.txt using the appropriate command * join the demographics with our data data frame (there is a little problem left - compare the persid and Subject columns)

The Subject column Exercises

   1 > persdat <- read.table("data/subjectdemographics.txt",
   2 +                       sep="\t",
   3 +                       header=T)
   4 > persdat$Subject
   5 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
   6 > unique(data$persid)
   7 [1] "001" "002" "003" "004" "005" "006" "007" "008" "009" "010" "011" "012"
   8 [13] "013" "014" "015" "016" "017" "018" "019" "020"
   9 > data$persid <- as.numeric(data$persid)
  10 > unique(data$persid)
  11 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
  12 > data <- merge(persdat,data,by.x = "Subject",by.y = "persid",all=T)
  13 In merge.data.frame(persdat, data, by.x = "Subject", by.y = "persid",  :
  14 column name ‘Subject’ is duplicated in the result
  15 > head(data)
  16 Subject Sex Age_PRETEST   Subject Trial Event.Type Code   Time TTime
  17 1       1   f        3.11 001_test2     7   Response    2 103745  2575
  18 2       1   f        3.11 001_test2    12   Response    2 156493  2737
  19 3       1   f        3.11 001_test2    17   Response    2 214772  6630
  20 4       1   f        3.11 001_test2    22   Response    1 262086  5957
  21 5       1   f        3.11 001_test2    27   Response    2 302589   272
  22 6       1   f        3.11 001_test2    32   Response    1 352703  7197

Summary Graphics

Just run the code and try to understand it. We will cover the ggplot graphics in the next session.

   1 > ggplot(data,aes(x=factor(Subject),fill=..count..)) +
   2 +     geom_bar() +
   3 +     facet_wrap(~testid)

Summary Graphics

so there are problems in coding of the test id
we remove the letters at the end using str\_replace()

   1 > data$testid <- str_replace(data$testid,"[a-z]$","")
   2 > data$testid <- factor(data$testid,
   3 +                       levels=c("test1","1","2","3","4","5","6","7","8","test2"))
   4 > table(data$Subject,data$testid)
   5 test1  1  2  3  4  5  6  7  8 test2
   6 1     95 60 59 60 59 59 60 60 60    93
   7 2     95 59 59 58 60 60 60 60 60    91
   8 3     96 60 57 60 60 60 60 59 58    96
   9 4     94 54 58 60 60 55 53 60 58    93
  10 5     96 60 57 60 60 60 60 60  0    95
  11 6     95 59 58 59 58 59 55 54 55    95
  12 7     94 60 58 60 58 59 60 59 59    93
  13 8     90 60 54 55 60 60 60 59 60    96
  14 9     96 60 58 59 57  0  0 58 56    92
  15 10    95 60 58 58 60 60 58  0  0    94
  16 11    91 59 59 60 60 57 58 60 60    95

Summary Graphics

   1 > ggplot(data,aes(x=factor(Subject),fill=..count..)) +
   2 +     geom_bar() +
   3 +     facet_wrap(~testid)

Summary Graphics

And another one.

   1 > ggplot(data,aes(x=testid,fill=Stim.Type)) +
   2 +     geom_bar(position=position_fill()) +
   3 +     facet_wrap(~Subject)

RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/FunctionsInR/ApplyR (zuletzt geändert am 2015-05-01 10:48:36 durch mandy.vogel@googlemail.com)

-  ⇤ ← Revision 1 vom 2015-05-01 08:23:34 → 
  Größe: 32159
  Autor: mandy.vogel@googlemail.com
  Kommentar:
+   ← Revision 3 vom 2015-05-01 08:27:34 → ⇥
  Größe: 31972
  Autor: mandy.vogel@googlemail.com
  Kommentar:
-Gelöschter Text ist auf diese Art markiert.
+Hinzugefügter Text ist auf diese Art markiert.
 Zeile 3:
-A common application of loops is  to apply a function to each element of a set of values and collect the results in a single structure.
In R this is done by the functions:
   * lapply()
   * sapply()
   * apply()
   * tapply()
+A common application of loops is  to apply a function to each element of a set of values and collect the results in a single structure. In R this is done by the functions:

 * lapply()
 * sapply()
 * apply()
 * tapply()
-Zeile 10:
+Zeile 11:
-   * The functions lapply and sapply are similar, their first argument can be a list, data frame, matrix or vector, the second argument the function to "apply". The former return a list (hence "l") and the latter tries to simplify the results (hence the "s").  For example:
+ * The functions lapply and sapply are similar, their first argument can be a list, data frame, matrix or vector, the second argument the function to "apply". The former return a list (hence "l") and the latter tries to simplify the results (hence the "s").  For example:
-Zeile 18:
+Zeile 20:
-   * apply() this function can be applied to an array. Its argument is the array, the second the dimension/s where we want to apply a function and the third is the function. For example
+ * apply() this function can be applied to an array. Its argument is the array, the second the dimension/s where we want to apply a function and the third is the function. For example
-Zeile 22:
+Zeile 25:
 > apply(x,3,quantile) ## calculate the quantiles
-Zeile 25:
+Zeile 28:
-   * The function tapply() allows you to create tables (hence the "t") of the value of a function on subgroups defined by its second argument, which can be a factor or a list of factors.
+ * The function tapply() allows you to create tables (hence the "t") of the value of a function on subgroups defined by its second argument, which can be a factor or a list of factors.
-Zeile 27:
+Zeile 31:
-Zeile 33:
+Zeile 38:
-==  ==
*  the class() function shows the class of an object use it in combination with lapply() to get the  classes of the columns of the quine data frame
*  do the same with sapply()  what is the difference
* try to combine this with what you learned about indexing and create a new data frame quine2 only containing the columns which are factors
*  calculate the row and column means of the below defined matrix m using the apply function PS: in real life application use the rowMeans() and colMeans() function 
{{{#!highlight r
m <- matrix(rnorm(100),nrow=10)  
}}}
*  use tapply() to summarise the number of missing days at school per Ethnicity and/or per Sex (three lines)
*  sometimes the aggregate() function is more convenient; note the use of {{{#!latex \sim$; it is read as 'is dependent on'and it is extensively used in modelling
+ *  the class() function shows the class of an object use it in combination with lapply() to get the  classes of the columns of the quine data frame *  do the same with sapply()  what is the difference * try to combine this with what you learned about indexing and create a new data frame quine2 only containing the columns which are factors *  calculate the row and column means of the below defined matrix m using the apply function PS: in real life application use the rowMeans() and colMeans() function

{{{#!highlight r
m <- matrix(rnorm(100),nrow=10)
}}}
 *  use tapply() to summarise the number of missing days at school per Ethnicity and/or per Sex (three lines) *  sometimes the aggregate() function is more convenient; note the use of {{{ #!latex $\sim$;}}} it is read as 'is dependent on'and it is extensively used in modelling
-Zeile 55:
+Zeile 57:
 M   N      0.00         3.50        8.00     14.71        19.50     69.00
-Zeile 59:
+Zeile 61:
-   * a body (the code inside the function) - body()
   * arguments (the list of arguments which controls how you can call the function) - formals()
   * an environment (the “map” of the location of the function’s variables) - environment()
+ * a body (the code inside the function) - body()
 * arguments (the list of arguments which controls how you can call the function) - formals()
 * an environment (the “map” of the location of the function’s variables) - environment()
-Zeile 63:
+Zeile 67:
-Zeile 66:
+Zeile 71:
 function (x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)),
-Zeile 68:
+Zeile 73:
-if (is.data.frame(x)) 
expected = E, residuals = (x - E)/sqrt(E), stdres = (x -
+if (is.data.frame(x))
expected = E, residuals = (x - E)/sqrt(E), stdres = (x -
-Zeile 74:
+Zeile 79:
-Arguments are matched 
   * first by exact name (perfect matching)
   * then by prefix matching
   * and finally by position.
+Arguments are matched

 * first by exact name (perfect matching)
 * then by prefix matching
 * and finally by position.
-Zeile 79:
+Zeile 86:
-Zeile 86:
+Zeile 94:
 >
-Zeile 89:
+Zeile 97:
-* Write a function to compute the average distance from the mean for some data vector.
* Write a function f() which finds the average of the x values after squaring and substracts the square of the average of the numbers. Verify this output will always be non-negative by computing \texttt{f(1:10)}
* An integer is even if the remainder upon dividing it by 2 is 0. This remainder is given by R with the syntax \texttt{ x \%\% 2}. Use this to write a function iseven(). How would you write isodd()?
* Write a function isprime() that checks if a number x is prime by dividing x by all values \texttt{$2,\ldots,x-1}}}} then checking to see if there is a remainder of 0.
+* Write a function to compute the average distance from the mean for some data vector. * Write a function f() which finds the average of the x values after squaring and substracts the square of the average of the numbers. Verify this output will always be non-negative by computing \texttt{f(1:10)} * An integer is even if the remainder upon dividing it by 2 is 0. This remainder is given by R with the syntax \texttt{ x \%\% 2}. Use this to write a function iseven(). How would you write isodd()? * Write a function isprime() that checks if a number x is prime by dividing x by all values \texttt{$2,\ldots,x-1}}}} then checking to see if there is a remainder of 0.
-Zeile 94:
+Zeile 100:
-   * Write a function to compute the average distance from the mean for some data vector.
+ * Write a function to compute the average distance from the mean for some data vector.
-Zeile 99:
+Zeile 106:
 + }
-Zeile 102:
+Zeile 109:
-   * Write a function f() which finds the average of the x values aufter squaring and substracts the square of the average of the numbers. Verify this output will always be non-negative by computing \texttt{f(1:10)}
+ * Write a function f() which finds the average of the x values aufter squaring and substracts the square of the average of the numbers. Verify this output will always be non-negative by computing \texttt{f(1:10)}
-Zeile 108:
+Zeile 116:
 [1] 8.25
-Zeile 111:
+Zeile 119:
-   * An integer is even if the remainder upon dividing it by 2 is 0. This remainder is given by R with the syntax \texttt{ x \%\% 2}. Use this to write a function iseven(). How would you write isodd()?
+ * An integer is even if the remainder upon dividing it by 2 is 0. This remainder is given by R with the syntax \texttt{ x \%\% 2}. Use this to write a function iseven(). How would you write isodd()?
-Zeile 125:
+Zeile 134:
-   * Write a function isprime() that checks if a number x is prime by dividing x by all values \texttt{$2,\ldots,x-1}}}} then checking to see if there is a remainder of 0.
+ * Write a function isprime() that checks if a number x is prime by dividing x by all values \texttt{$2,\ldots,x-1}}}} then checking to see if there is a remainder of 0.
-Zeile 136:
+Zeile 146:
 [1] FALSE
-Zeile 148:
+Zeile 158:
 > tmp <- tmp[!is.na(tmp$Subject),]
-Zeile 152:
+Zeile 162:
-Zeile 171:
+Zeile 182:
-> first.pic <- min(which(tmp$Event.Type=="Picture" & 
+                           !is.na(tmp$Event.Type) )) - 1
+> first.pic <- min(which(tmp$Event.Type=="Picture" &
+                           !is.na(tmp$Event.Type) )) - 1
-Zeile 174:
+Zeile 185:
 > last.pic <- min(which(tmp$Event.Type=="Picture" &
-Zeile 176:
+Zeile 187:
 +                           tmp$Code=="Fertig!" &
-Zeile 210:
+Zeile 221:
-Zeile 258:
+Zeile 270:
-   * it would be a tedious work to every step for all of the files
   * if we look through the steps the only important thing that we have to change is the file name
   * so we rather use a canned version of our procedure dependend the file name and the number of lines to skip  - we create a function read.file(file):
+ * it would be a tedious work to every step for all of the files
 * if we look through the steps the only important thing that we have to change is the file name
 * so we rather use a canned version of our procedure dependend the file name and the number of lines to skip  - we create a function read.file(file):
-Zeile 266:
+Zeile 279:
 tmp <- tmp[!is.na(tmp$Subject),]
-Zeile 299:
+Zeile 312:
-   * we can use this function now to read in the file 
   * and get the processed data frame in one step
   * setting the parameter skip we can read both versions of the file (and should get the same result)
+ * we can use this function now to read in the file
 * and get the processed data frame in one step
 * setting the parameter skip we can read both versions of the file (and should get the same result)
-Zeile 303:
+Zeile 317:
-* run the function using source()
* use the function to read in \texttt{../session1/session1data/pre001.txt} and \texttt{data/pretest/pre\_001.txt}
* use some summary functions like table() or summary to check if they contain the same information
We will learn about a function to compare data frames more exact soon.
+* run the function using source() * use the function to read in \texttt{../session1/session1data/pre001.txt} and \texttt{data/pretest/pre\_001.txt} * use some summary functions like table() or summary to check if they contain the same information We will learn about a function to compare data frames more exact soon.
-Zeile 317:
+Zeile 329:
-   * rbind() can be used to combine two dataframes (or matrices) in the sense of adding rows, the column names and types must be the same for the two objects
+ * rbind() can be used to combine two dataframes (or matrices) in the sense of adding rows, the column names and types must be the same for the two objects
-Zeile 331:
+Zeile 344:
-   * cbind() can be used to combine two dataframes (or matrices) in the sense of adding columns, the number of rows must be the same for the two objects
+ * cbind() can be used to combine two dataframes (or matrices) in the sense of adding columns, the number of rows must be the same for the two objects
-Zeile 341:
+Zeile 355:
-   * it is not recommended to use cbind() to combining data frames
+ * it is not recommended to use cbind() to combining data frames
-Zeile 343:
+Zeile 358:
-   * merge() is the command of choice for merging or joining data frames
   * it is the equivalent of join in sql
   * there are four cases
      * inner join
      * left outer join
      * right outer join
      * full outer join
+ * merge() is the command of choice for merging or joining data frames
 * it is the equivalent of join in sql
 * there are four cases
  * inner join
  * left outer join
  * right outer join
  * full outer join
-Zeile 364:
+Zeile 380:
-   * inner join means: keep only the cases present in both of the data frames
+ * inner join means: keep only the cases present in both of the data frames
-Zeile 372:
+Zeile 389:
-   * left outer join means: keep all cases of the left data frame no matter if they are present in the right data frame (all.x=T)
+ * left outer join means: keep all cases of the left data frame no matter if they are present in the right data frame (all.x=T)
-Zeile 381:
+Zeile 399:
-   * right outer join means: keep all cases of the right data frame no matter if they are present in the left data frame (all.y=T)
+ * right outer join means: keep all cases of the right data frame no matter if they are present in the left data frame (all.y=T)
-Zeile 391:
+Zeile 410:
-   * full outer join means: keep all cases of both data frames (all=T)
+ * full outer join means: keep all cases of both data frames (all=T)
-Zeile 402:
+Zeile 422:
-   * if not stated otherwise R uses the intersect of the names of both data frames, in our case only \textit{id}
   * you can specify these columns directly by \texttt{by=c("colname1","colname2")} if the columns are named identical or
   * using\\ \texttt{by.x=c("colname1.x","colname2.x"),
+ * if not stated otherwise R uses the intersect of the names of both data frames, in our case only \textit{id}
 * you can specify these columns directly by \texttt{by=c("colname1","colname2")} if the columns are named identical or
 * using\\ \texttt{by.x=c("colname1.x","colname2.x"),
-Zeile 406:
+Zeile 427:
-   * now read in the file personendaten.txt using the appropriate command
   * join the demographics with our pre1 data frame (even though it does not make sense now)
+ * now read in the file personendaten.txt using the appropriate command
 * join the demographics with our pre1 data frame (even though it does not make sense now)
-Zeile 431:
+Zeile 453:
-   * is a higher order function (functional)
   * Reduce() uses a binary function (like rbind() or merge()) to combine successively the elements of a given list
   * it can be used if you have not only two but many data frames
+ * is a higher order function (functional)
 * Reduce() uses a binary function (like rbind() or merge()) to combine successively the elements of a given list
 * it can be used if you have not only two but many data frames
-Zeile 435:
+Zeile 458:
-   * first we make up 4 artifical data frames
+ * first we make up 4 artifical data frames
-Zeile 464:
+Zeile 488:
-   * now we use Reduce() in combination with merge()
+ * now we use Reduce() in combination with merge()
-Zeile 469:
+Zeile 494:
-   * and what we get is an empty data frame
   * well this isn't exactly what we wanted, so why?
   * it is because the default behavior of merge() is set all=F, so we get only complete lines which is in this case - none
   * so we have to define a wrapper function which only change this argument to all=T
+ * and what we get is an empty data frame
 * well this isn't exactly what we wanted, so why?
 * it is because the default behavior of merge() is set all=F, so we get only complete lines which is in this case - none
 * so we have to define a wrapper function which only change this argument to all=T
-Zeile 474:
+Zeile 500:
-   * now we use Reduce() in combination with merge()
+ * now we use Reduce() in combination with merge()
-Zeile 486:
+Zeile 513:
-   * which is exactly what we want
+ * which is exactly what we want
-Zeile 488:
+Zeile 516:
-   * a second example in combination with rbind()
+ * a second example in combination with rbind()
-Zeile 502:
+Zeile 531:
-   * which is exactly what we want
+ * which is exactly what we want
-Zeile 504:
+Zeile 534:
-   * well that's better, but it is still boring to do this for every single file
   * so see what we have learned: the combination of lapply() and Reduce() can do the work 
   * using dir{} we get all the files contained in a given directory
   * then we use lapply() together with our new function read.file()
+ * well that's better, but it is still boring to do this for every single file
 * so see what we have learned: the combination of lapply() and Reduce() can do the work
 * using dir{} we get all the files contained in a given directory
 * then we use lapply() together with our new function read.file()
-Zeile 509:
+Zeile 540:
-   * dir() without additional argument shows all files/directories in the working directory
+ * dir() without additional argument shows all files/directories in the working directory
-Zeile 512:
+Zeile 544:
-[1] "data"                  "function.r"            "function.r~"          
[4] "ggp1.pdf"              "graphics.r"            "linkimage.aux"        
[7] "session2apply.aux"     "session2apply.log"     "session2apply.nav"    
[10] "session2apply.out"     "session2apply.pdf"     "session2apply.snm"    
[13] "session2apply.tex"     "session2apply.tex~"    "#session2apply.tex#"  
[16] "session2apply.toc"     "session2apply.vrb"     "session2hadley.aux"   
[19] "session2hadley.log"    "session2hadley.nav"    "session2hadley.out"   
[22] "session2hadley.pdf"    "session2hadley.snm"    "session2hadley.tex"   
[25] "session2hadley.tex~"   "session2hadley.toc"    "session2hadley.vrb"   
[28] "solutionssession1.r"   "solutionssession1.r~"  "solutionssession2.r"
+[1] "data"                  "function.r"            "function.r~"
[4] "ggp1.pdf"              "graphics.r"            "linkimage.aux"
[7] "session2apply.aux"     "session2apply.log"     "session2apply.nav"
[10] "session2apply.out"     "session2apply.pdf"     "session2apply.snm"
[13] "session2apply.tex"     "session2apply.tex~"    "#session2apply.tex#"
[16] "session2apply.toc"     "session2apply.vrb"     "session2hadley.aux"
[19] "session2hadley.log"    "session2hadley.nav"    "session2hadley.out"
[22] "session2hadley.pdf"    "session2hadley.snm"    "session2hadley.tex"
[25] "session2hadley.tex~"   "session2hadley.toc"    "session2hadley.vrb"
[28] "solutionssession1.r"   "solutionssession1.r~"  "solutionssession2.r"
-Zeile 525:
+Zeile 557:
-   * given a path dir() will show the content of resp folder
+ * given a path dir() will show the content of resp folder
-Zeile 532:
+Zeile 565:
-   * setting recursive to TRUE R will recurse into directories recursively through
+ * setting recursive to TRUE R will recurse into directories recursively through
-Zeile 535:
+Zeile 569:
-[1] "posttest/post_001.txt"      "posttest/post_002.txt"     
[3] "posttest/post_003.txt"      "posttest/post_004.txt"     
[5] "posttest/post_005.txt"      "posttest/post_006.txt"     
[7] "posttest/post_007.txt"      "posttest/post_008.txt"
+[1] "posttest/post_001.txt"      "posttest/post_002.txt"
[3] "posttest/post_003.txt"      "posttest/post_004.txt"
[5] "posttest/post_005.txt"      "posttest/post_006.txt"
[7] "posttest/post_007.txt"      "posttest/post_008.txt"
-Zeile 541:
+Zeile 575:
-   * setting full.names to TRUE R will give the full path
+ * setting full.names to TRUE R will give the full path
-Zeile 544:
+Zeile 579:
-[1] "data/posttest/post_001.txt"      "data/posttest/post_002.txt"     
[3] "data/posttest/post_003.txt"      "data/posttest/post_004.txt"     
[5] "data/posttest/post_005.txt"      "data/posttest/post_006.txt"
+[1] "data/posttest/post_001.txt"      "data/posttest/post_002.txt"
[3] "data/posttest/post_003.txt"      "data/posttest/post_004.txt"
[5] "data/posttest/post_005.txt"      "data/posttest/post_006.txt"
-Zeile 549:
+Zeile 584:
-   * with pattern we can specify which files to show (regexpr), e.g. all r files
+ * with pattern we can specify which files to show (regexpr), e.g. all r files
-Zeile 556:
+Zeile 592:
-   * create a variable files containing the names of all text files in the data directory, my editor creates temporary files beginning and ending by a hash key, make sure they are not contained in the list
+ * create a variable files containing the names of all text files in the data directory, my editor creates temporary files beginning and ending by a hash key, make sure they are not contained in the list
-Zeile 561:
+Zeile 598:
-[1] "data/posttest/post_001.txt"    "data/posttest/post_002.txt"   
[3] "data/posttest/post_003.txt"    "data/posttest/post_004.txt"   
[5] "data/posttest/post_005.txt"    "data/posttest/post_006.txt"
+[1] "data/posttest/post_001.txt"    "data/posttest/post_002.txt"
[3] "data/posttest/post_003.txt"    "data/posttest/post_004.txt"
[5] "data/posttest/post_005.txt"    "data/posttest/post_006.txt"
-Zeile 568:
+Zeile 605:
-Zeile 577:
+Zeile 615:
-   * the object df.list is a list containing 192 data frames
+ * the object df.list is a list containing 192 data frames
-Zeile 597:
+Zeile 636:
-   * in a last step we use Reduce{} to combine these 192 data frames
+ * in a last step we use Reduce{} to combine these 192 data frames
-Zeile 603:
+Zeile 643:
-_test2 002_test2 003_test2 004_test2 005_test2 006_test2 007_test2 008_test2 
93        91        96        93        95        95        93        96 
009_test2 010_test2 011_test2 012_test2 013_test2 014_test2 015_test2 016_test2 
92        94        95        96        96        95        96        94 
017_test2 018_test2 019_test2 020_test2 001_test1 002_test1 003_test1 004_test1 
95        94        96        95        95        95        96        94 
005_test1 006_test1 007_test1 008_test1 009_test1 010_test1 011_test1 012_test1 
96        95        94        90        96        95        91        96 
013_test1 014_test1 015_test1 016_test1 017_test1 018_test1 019_test1 020_test1  95        96        95        91        96        96        96        96 
001_1     002_1     003_1     004_1     005_1     006_1    CHGU_1    008_1a 
60        59        60        54        60        59        60        60 
009_1     010_1     RMK_1     013_1     014_1     015_1     016_1    IJ2K_1 
60        60        59        59        60        58        59        58 
018_1     019_1     020_1     001_2     002_2     003_2     004_2     005_2 
60        59        60        59        59        57        58        57 
006_2     007_2     008_2     009_2     010_2     011_2     012_2     013_2 
58        58        54        58        58        59        59        56 
014_2     015_2     016_2     017_2     018_2     019_2     020_2     001_3
+_test2 002_test2 003_test2 004_test2 005_test2 006_test2 007_test2 008_test2
93        91        96        93        95        95        93        96
009_test2 010_test2 011_test2 012_test2 013_test2 014_test2 015_test2 016_test2
92        94        95        96        96        95        96        94
017_test2 018_test2 019_test2 020_test2 001_test1 002_test1 003_test1 004_test1
95        94        96        95        95        95        96        94
005_test1 006_test1 007_test1 008_test1 009_test1 010_test1 011_test1 012_test1
96        95        94        90        96        95        91        96
013_test1 014_test1 015_test1 016_test1 017_test1 018_test1 019_test1 020_test1
95        96        95        91        96        96        96        96
001_1     002_1     003_1     004_1     005_1     006_1    CHGU_1    008_1a
60        59        60        54        60        59        60        60
009_1     010_1     RMK_1     013_1     014_1     015_1     016_1    IJ2K_1
60        60        59        59        60        58        59        58
018_1     019_1     020_1     001_2     002_2     003_2     004_2     005_2
60        59        60        59        59        57        58        57
006_2     007_2     008_2     009_2     010_2     011_2     012_2     013_2
58        58        54        58        58        59        59        56
014_2     015_2     016_2     017_2     018_2     019_2     020_2     001_3
-Zeile 624:
+Zeile 664:
-   * so it is recommended to build again a function out of this
+ * so it is recommended to build again a function out of this
-Zeile 640:
+Zeile 681:
-   * table the Subject column again. What is the problem?
+ * table the Subject column again. What is the problem?
-Zeile 644:
+Zeile 686:
-_test2 002_test2 003_test2 004_test2 005_test2 006_test2 007_test2 008_test2 
93        91        96        93        95        95        93        96 
009_test2 010_test2 011_test2 012_test2 013_test2 014_test2 015_test2 016_test2  92        94        95        96        96        95        96        94 
017_test2 018_test2 019_test2 020_test2 001_test1 002_test1 003_test1 004_test1  95        94        96        95        95        95        96        94 
}}}
   * subject and time coded in one variable
+_test2 002_test2 003_test2 004_test2 005_test2 006_test2 007_test2 008_test2
93        91        96        93        95        95        93        96
009_test2 010_test2 011_test2 012_test2 013_test2 014_test2 015_test2 016_test2
92        94        95        96        96        95        96        94
017_test2 018_test2 019_test2 020_test2 001_test1 002_test1 003_test1 004_test1
95        94        96        95        95        95        96        94
}}}
 * subject and time coded in one variable
-Zeile 653:
+Zeile 696:
-   * we create two new variables using the str\_split() function (stringr package)
   * becaus str\_split() has a list containing a vector as result we have to use it in combination with sapply()
   * then correct some of the person ids
+ * we create two new variables using the str\_split() function (stringr package)
 * becaus str\_split() has a list containing a vector as result we have to use it in combination with sapply()
 * then correct some of the person ids
-Zeile 664:
+Zeile 708:
-   * there are some more wrong person ids: RMK - 011, IJ2K - 017, GA3K - 004, Kj6K - 006. Correct them!
+ * there are some more wrong person ids: RMK - 011, IJ2K - 017, GA3K - 004, Kj6K - 006. Correct them!
-Zeile 673:
+Zeile 718:
-* now read in the file subjectsdemographics.txt using the appropriate command
* join the demographics with our data data frame (there is a little problem left - compare the persid and Subject columns)
+* now read in the file subjectsdemographics.txt using the appropriate command * join the demographics with our data data frame (there is a little problem left - compare the persid and Subject columns)
-Zeile 702:
+Zeile 747:
-Zeile 708:
+Zeile 754:
-Zeile 709:
+Zeile 756:
-   * so there are problems in coding of the test id
   * we remove the letters at the end using str\_replace()
+ * so there are problems in coding of the test id
 * we remove the letters at the end using str\_replace()
-Zeile 736:
+Zeile 784:
-Zeile 738:
+Zeile 787:

Quick Links

Search Wiki

Page Tools

Introduction

Implicit Loops

lapply()

apply()

tapply()

Functions

Functions

Function Arguments

Function Exercises (Verzani)

Function Exercises (Verzani)

Function Exercises (Verzani)

Function Exercises (Verzani)

Function Exercises (Verzani)

Read in the file

Remove empty line

Remove spaces

Find/Remove breaks

Find/Remove first/last rows

Extract Responses

Extract Responses

Moving Information

Moving Information

Keep response lines

The Function

The Function (continued)

The Function (continued)

The Function (continued)

The Function (continued)

The Function (continued)

The Function (continued)

The Function Exercise

The Function (continued)

rbind()

cbind()

merge()

inner join

left outer join

right outer join

full outer join

merge()

merge() Exercise

merge() Exercise

Reduce()

Reduce()

Reduce()

Reduce()

Reduce()

Reduce()

A second function

dir()

dir()

dir()

dir()

dir()

dir() Exercise

dir() Exercise

Read all files

Reading all files

The Function 2

The Function no 2

The Subject column

The Subject column

The Subject column

The Subject column Exercises

The Subject column Exercises

Merging

The Subject column Exercises

Summary Graphics

Summary Graphics

Summary Graphics

Summary Graphics