welcome: please sign in
location: Änderungen von "RstatisTik/RstatisTikPortal/RcourSe/FinalFunction/CombDataFrames"
Unterschiede zwischen den Revisionen 1 und 2
Revision 1 vom 2015-03-15 11:04:41
Größe: 4284
Kommentar:
Revision 2 vom 2015-03-15 11:06:03
Größe: 4283
Kommentar:
Gelöschter Text ist auf diese Art markiert. Hinzugefügter Text ist auf diese Art markiert.
Zeile 52: Zeile 52:
== inner join == === inner join ===
Zeile 60: Zeile 60:
== left outer join == === left outer join ===
Zeile 69: Zeile 69:
== right outer join == === right outer join ===
Zeile 79: Zeile 79:
== full outer join == === full outer join ===
Zeile 90: Zeile 90:
== merge() ==
Zeile 94: Zeile 94:
== merge() Exercise == === merge() Exercise ===
Zeile 97: Zeile 97:
== merge() Solution == === merge() Solution ===

Combining Data Frames

rbind()

  • rbind() can be used to combine two dataframes (or matrices) in the sense of adding rows, the column names and types must be the same for the two objects

   1 > x <- data.frame(id=1:3,score=rnorm(3))
   2 > y <- data.frame(id=13:15,score=rnorm(3))
   3 > rbind(x,y)
   4 id       score
   5 1  1  0.71121163
   6 2  2 -0.62973249
   7 3  3  1.17737595
   8 4 13 -0.45074940
   9 5 14 -0.01044197
  10 6 15 -1.05217176

cbind()

  • cbind() can be used to combine two dataframes (or matrices) in the sense of adding columns, the number of rows must be the same for the two objects

   1 > cbind(x,y)
   2 id      score1      score2     score3
   3 1  1  0.11440705  0.14536778 -1.1773241
   4 2  2 -1.62862651  0.02020604  0.5686415
   5 3  3  0.05335811  0.25462270  0.8844987
   6 4  4 -0.19931734  0.15625511  0.9287316
   7 5  5 -1.15217836 -1.79804503 -0.7550234
  • it is not recommended to use cbind() to combining data frames

merge()

  • merge() is the command of choice for merging or joining data frames
  • it is the equivalent of join in sql
  • there are four cases
    • inner join
    • left outer join
    • right outer join
    • full outer join

   1 > (d1 <- data.frame(id=LETTERS[c(1,2,3)],day1=sample(10,3)))
   2 id day1
   3 1  A    3
   4 2  B    4
   5 3  C    5
   6 > (d2 <- data.frame(id=LETTERS[c(1,3,5,6)],day2=sample(10,4)))
   7 id day2
   8 1  A    7
   9 2  C   10
  10 3  E    3
  11 4  F    6

inner join

  • inner join means: keep only the cases present in both of the data frames

   1 > merge(d1,d2)
   2 id day1 day2
   3 1  A    3    7
   4 2  C    5   10

left outer join

  • left outer join means: keep all cases of the left data frame no matter if they are present in the right data frame (all.x=T)

   1 > merge(d1,d2,all.x = T)
   2 id day1 day2
   3 1  A    3    7
   4 2  B    4   NA
   5 3  C    5   10

right outer join

  • right outer join means: keep all cases of the right data frame no matter if they are present in the left data frame (all.y=T)

   1 > merge(d1,d2,all.y = T)
   2 id day1 day2
   3 1  A    3    7
   4 2  C    5   10
   5 3  E   NA    3
   6 4  F   NA    6

full outer join

  • full outer join means: keep all cases of both data frames (all=T)

   1 > merge(d1,d2,all = T)
   2 id day1 day2
   3 1  A    3    7
   4 2  B    4   NA
   5 3  C    5   10
   6 4  E   NA    3
   7 5  F   NA    6
  • if not stated otherwise R uses the intersect of the names of both data frames, in our case only \textit{id}
  • you can specify these columns directly by \texttt{by=c("colname1","colname2")} if the columns are named identical or
  • using\\ \texttt{by.x=c("colname1.x","colname2.x"),

merge() Exercise

  • now read in the file personendaten.txt using the appropriate command
  • join the demographics with our pre1 data frame (even though it does not make sense now)

merge() Solution

   1 > persdat <- read.table("../session1/session1data/personendaten.txt",
   2 +                       sep="\t",
   3 +                       header=T)
   4 > pre1 <- merge(persdat,pre1,all.y = T)
   5 > head(pre1)
   6 Subject Sex Age_PRETEST Trial Event.Type Code   Time TTime Uncertainty
   7 1  PRE001   f        3.11     7   Response    2 178963 10009           1
   8 2  PRE001   f        3.11    12   Response    1 238680  8342           1
   9 3  PRE001   f        3.11    17   Response    2 297789  8066           1
  10 4  PRE001   f        3.11    22   Response    1 351321 10811           1
  11 5  PRE001   f        3.11    27   Response    2 403607   713           1
  12 6  PRE001   f        3.11    32   Response    1 467793 23709           1
  13 Duration Uncertainty.1 ReqTime ReqDur Stim.Type Pair.Index    Type Event.Code
  14 1    10197             2       0   next incorrect          7 Picture   RO09.jpg
  15 2     8398             2       0   next incorrect         12 Picture   RO20.jpg
  16 3     8198             2       0   next       hit         17 Picture   RS28.jpg
  17 4    10997             2       0   next       hit         22 Picture   AT26.jpg
  18 5      800             2       0   next       hit         27 Picture   RS23.jpg
  19 6    23794             2       0   next       hit         32 Picture   OF04.jpg

RstatisTik/RstatisTikPortal/RcourSe/FinalFunction/CombDataFrames (zuletzt geändert am 2015-03-15 11:06:03 durch mandy.vogel@googlemail.com)