= Combining Data Frames = <> == rbind() == * rbind() can be used to combine two dataframes (or matrices) in the sense of adding rows, the column names and types must be the same for the two objects {{{#!highlight r > x <- data.frame(id=1:3,score=rnorm(3)) > y <- data.frame(id=13:15,score=rnorm(3)) > rbind(x,y) id score 1 1 0.71121163 2 2 -0.62973249 3 3 1.17737595 4 13 -0.45074940 5 14 -0.01044197 6 15 -1.05217176 }}} == cbind() == * cbind() can be used to combine two dataframes (or matrices) in the sense of adding columns, the number of rows must be the same for the two objects {{{#!highlight r > cbind(x,y) id score1 score2 score3 1 1 0.11440705 0.14536778 -1.1773241 2 2 -1.62862651 0.02020604 0.5686415 3 3 0.05335811 0.25462270 0.8844987 4 4 -0.19931734 0.15625511 0.9287316 5 5 -1.15217836 -1.79804503 -0.7550234 }}} * it is not recommended to use cbind() to combining data frames == merge() == * merge() is the command of choice for merging or joining data frames * it is the equivalent of join in sql * there are four cases * inner join * left outer join * right outer join * full outer join {{{#!highlight r > (d1 <- data.frame(id=LETTERS[c(1,2,3)],day1=sample(10,3))) id day1 1 A 3 2 B 4 3 C 5 > (d2 <- data.frame(id=LETTERS[c(1,3,5,6)],day2=sample(10,4))) id day2 1 A 7 2 C 10 3 E 3 4 F 6 }}} === inner join === * inner join means: keep only the cases present in both of the data frames {{{#!highlight r > merge(d1,d2) id day1 day2 1 A 3 7 2 C 5 10 }}} === left outer join === * left outer join means: keep all cases of the left data frame no matter if they are present in the right data frame (all.x=T) {{{#!highlight r > merge(d1,d2,all.x = T) id day1 day2 1 A 3 7 2 B 4 NA 3 C 5 10 }}} === right outer join === * right outer join means: keep all cases of the right data frame no matter if they are present in the left data frame (all.y=T) {{{#!highlight r > merge(d1,d2,all.y = T) id day1 day2 1 A 3 7 2 C 5 10 3 E NA 3 4 F NA 6 }}} === full outer join === * full outer join means: keep all cases of both data frames (all=T) {{{#!highlight r > merge(d1,d2,all = T) id day1 day2 1 A 3 7 2 B 4 NA 3 C 5 10 4 E NA 3 5 F NA 6 }}} * if not stated otherwise R uses the intersect of the names of both data frames, in our case only \textit{id} * you can specify these columns directly by \texttt{by=c("colname1","colname2")} if the columns are named identical or * using\\ \texttt{by.x=c("colname1.x","colname2.x"), === merge() Exercise === * now read in the file personendaten.txt using the appropriate command * join the demographics with our pre1 data frame (even though it does not make sense now) === merge() Solution === {{{#!highlight r > persdat <- read.table("../session1/session1data/personendaten.txt", + sep="\t", + header=T) > pre1 <- merge(persdat,pre1,all.y = T) > head(pre1) Subject Sex Age_PRETEST Trial Event.Type Code Time TTime Uncertainty 1 PRE001 f 3.11 7 Response 2 178963 10009 1 2 PRE001 f 3.11 12 Response 1 238680 8342 1 3 PRE001 f 3.11 17 Response 2 297789 8066 1 4 PRE001 f 3.11 22 Response 1 351321 10811 1 5 PRE001 f 3.11 27 Response 2 403607 713 1 6 PRE001 f 3.11 32 Response 1 467793 23709 1 Duration Uncertainty.1 ReqTime ReqDur Stim.Type Pair.Index Type Event.Code 1 10197 2 0 next incorrect 7 Picture RO09.jpg 2 8398 2 0 next incorrect 12 Picture RO20.jpg 3 8198 2 0 next hit 17 Picture RS28.jpg 4 10997 2 0 next hit 22 Picture AT26.jpg 5 800 2 0 next hit 27 Picture RS23.jpg 6 23794 2 0 next hit 32 Picture OF04.jpg }}}