<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article  PUBLIC '-//OASIS//DTD DocBook XML V4.4//EN'  'http://www.docbook.org/xml/4.4/docbookx.dtd'><article><articleinfo><title>RstatisTik/RstatisTikPortal/RcourSe/FinalFunction/CombDataFrames</title><revhistory><revision><revnumber>2</revnumber><date>2015-03-15 11:06:03</date><authorinitials>mandy.vogel@googlemail.com</authorinitials></revision><revision><revnumber>1</revnumber><date>2015-03-15 11:04:41</date><authorinitials>mandy.vogel@googlemail.com</authorinitials></revision></revhistory></articleinfo><section><title>Combining Data Frames</title><section><title>rbind()</title><itemizedlist><listitem><para>rbind() can be used to combine two dataframes (or matrices) in the sense of adding rows, the column names and types must be the same for the two objects </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[x]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[data.frame]]></methodname><![CDATA[(]]><methodname><![CDATA[id]]></methodname><![CDATA[=1:3,]]><methodname><![CDATA[score]]></methodname><![CDATA[=]]><methodname><![CDATA[rnorm]]></methodname><![CDATA[(3))]]>
<![CDATA[> ]]><methodname><![CDATA[y]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[data.frame]]></methodname><![CDATA[(]]><methodname><![CDATA[id]]></methodname><![CDATA[=13:15,]]><methodname><![CDATA[score]]></methodname><![CDATA[=]]><methodname><![CDATA[rnorm]]></methodname><![CDATA[(3))]]>
<![CDATA[> ]]><methodname><![CDATA[rbind]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[,]]><methodname><![CDATA[y]]></methodname><![CDATA[)]]>
<methodname><![CDATA[id]]></methodname><![CDATA[       ]]><methodname><![CDATA[score]]></methodname>
<![CDATA[1  1  0.71121163]]>
<![CDATA[2  2 -0.62973249]]>
<![CDATA[3  3  1.17737595]]>
<![CDATA[4 13 -0.45074940]]>
<![CDATA[5 14 -0.01044197]]>
<![CDATA[6 15 -1.05217176]]>
</programlisting></section><section><title>cbind()</title><itemizedlist><listitem><para>cbind() can be used to combine two dataframes (or matrices) in the sense of adding columns, the number of rows must be the same for the two objects </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[cbind]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[,]]><methodname><![CDATA[y]]></methodname><![CDATA[)]]>
<methodname><![CDATA[id]]></methodname><![CDATA[      ]]><methodname><![CDATA[score1]]></methodname><![CDATA[      ]]><methodname><![CDATA[score2]]></methodname><![CDATA[     ]]><methodname><![CDATA[score3]]></methodname>
<![CDATA[1  1  0.11440705  0.14536778 -1.1773241]]>
<![CDATA[2  2 -1.62862651  0.02020604  0.5686415]]>
<![CDATA[3  3  0.05335811  0.25462270  0.8844987]]>
<![CDATA[4  4 -0.19931734  0.15625511  0.9287316]]>
<![CDATA[5  5 -1.15217836 -1.79804503 -0.7550234]]>
</programlisting><itemizedlist><listitem><para>it is not recommended to use cbind() to combining data frames </para></listitem></itemizedlist></section><section><title>merge()</title><itemizedlist><listitem><para>merge() is the command of choice for merging or joining data frames </para></listitem><listitem><para>it is the equivalent of join in sql </para></listitem><listitem><para>there are four cases </para><itemizedlist><listitem><para>inner join </para></listitem><listitem><para>left outer join </para></listitem><listitem><para>right outer join </para></listitem><listitem><para>full outer join </para></listitem></itemizedlist></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> (]]><methodname><![CDATA[d1]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[data.frame]]></methodname><![CDATA[(]]><methodname><![CDATA[id]]></methodname><![CDATA[=]]><symbol><![CDATA[LETTERS]]></symbol><methodname><![CDATA[[c]]></methodname><![CDATA[(1,2,3)]]><methodname><![CDATA[]]]></methodname><![CDATA[,]]><methodname><![CDATA[day1]]></methodname><![CDATA[=]]><methodname><![CDATA[sample]]></methodname><![CDATA[(10,3)))]]>
<methodname><![CDATA[id]]></methodname><![CDATA[ ]]><methodname><![CDATA[day1]]></methodname>
<![CDATA[1  ]]><methodname><![CDATA[A]]></methodname><![CDATA[    3]]>
<![CDATA[2  ]]><methodname><![CDATA[B]]></methodname><![CDATA[    4]]>
<![CDATA[3  ]]><methodname><![CDATA[C]]></methodname><![CDATA[    5]]>
<![CDATA[> (]]><methodname><![CDATA[d2]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[data.frame]]></methodname><![CDATA[(]]><methodname><![CDATA[id]]></methodname><![CDATA[=]]><symbol><![CDATA[LETTERS]]></symbol><methodname><![CDATA[[c]]></methodname><![CDATA[(1,3,5,6)]]><methodname><![CDATA[]]]></methodname><![CDATA[,]]><methodname><![CDATA[day2]]></methodname><![CDATA[=]]><methodname><![CDATA[sample]]></methodname><![CDATA[(10,4)))]]>
<methodname><![CDATA[id]]></methodname><![CDATA[ ]]><methodname><![CDATA[day2]]></methodname>
<![CDATA[1  ]]><methodname><![CDATA[A]]></methodname><![CDATA[    7]]>
<![CDATA[2  ]]><methodname><![CDATA[C]]></methodname><![CDATA[   10]]>
<![CDATA[3  ]]><methodname><![CDATA[E]]></methodname><![CDATA[    3]]>
<![CDATA[4  ]]><token><![CDATA[F]]></token><![CDATA[    6]]>
</programlisting><section><title>inner join</title><itemizedlist><listitem><para>inner join means: keep only the cases present in both of the data frames </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[merge]]></methodname><![CDATA[(]]><methodname><![CDATA[d1]]></methodname><![CDATA[,]]><methodname><![CDATA[d2]]></methodname><![CDATA[)]]>
<methodname><![CDATA[id]]></methodname><![CDATA[ ]]><methodname><![CDATA[day1]]></methodname><![CDATA[ ]]><methodname><![CDATA[day2]]></methodname>
<![CDATA[1  ]]><methodname><![CDATA[A]]></methodname><![CDATA[    3    7]]>
<![CDATA[2  ]]><methodname><![CDATA[C]]></methodname><![CDATA[    5   10]]>
</programlisting></section><section><title>left outer join</title><itemizedlist><listitem><para>left outer join means: keep all cases of the left data frame no matter if they are present in the right data frame (all.x=T) </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[merge]]></methodname><![CDATA[(]]><methodname><![CDATA[d1]]></methodname><![CDATA[,]]><methodname><![CDATA[d2]]></methodname><![CDATA[,]]><methodname><![CDATA[all.x]]></methodname><![CDATA[ = ]]><token><![CDATA[T]]></token><![CDATA[)]]>
<methodname><![CDATA[id]]></methodname><![CDATA[ ]]><methodname><![CDATA[day1]]></methodname><![CDATA[ ]]><methodname><![CDATA[day2]]></methodname>
<![CDATA[1  ]]><methodname><![CDATA[A]]></methodname><![CDATA[    3    7]]>
<![CDATA[2  ]]><methodname><![CDATA[B]]></methodname><![CDATA[    4   ]]><symbol><![CDATA[NA]]></symbol>
<![CDATA[3  ]]><methodname><![CDATA[C]]></methodname><![CDATA[    5   10]]>
</programlisting></section><section><title>right outer join</title><itemizedlist><listitem><para>right outer join means: keep all cases of the right data frame no matter if they are present in the left data frame (all.y=T) </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[merge]]></methodname><![CDATA[(]]><methodname><![CDATA[d1]]></methodname><![CDATA[,]]><methodname><![CDATA[d2]]></methodname><![CDATA[,]]><methodname><![CDATA[all.y]]></methodname><![CDATA[ = ]]><token><![CDATA[T]]></token><![CDATA[)]]>
<methodname><![CDATA[id]]></methodname><![CDATA[ ]]><methodname><![CDATA[day1]]></methodname><![CDATA[ ]]><methodname><![CDATA[day2]]></methodname>
<![CDATA[1  ]]><methodname><![CDATA[A]]></methodname><![CDATA[    3    7]]>
<![CDATA[2  ]]><methodname><![CDATA[C]]></methodname><![CDATA[    5   10]]>
<![CDATA[3  ]]><methodname><![CDATA[E]]></methodname><![CDATA[   ]]><symbol><![CDATA[NA]]></symbol><![CDATA[    3]]>
<![CDATA[4  ]]><token><![CDATA[F]]></token><![CDATA[   ]]><symbol><![CDATA[NA]]></symbol><![CDATA[    6]]>
</programlisting></section><section><title>full outer join</title><itemizedlist><listitem><para>full outer join means: keep all cases of both data frames (all=T) </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[merge]]></methodname><![CDATA[(]]><methodname><![CDATA[d1]]></methodname><![CDATA[,]]><methodname><![CDATA[d2]]></methodname><![CDATA[,]]><methodname><![CDATA[all]]></methodname><![CDATA[ = ]]><token><![CDATA[T]]></token><![CDATA[)]]>
<methodname><![CDATA[id]]></methodname><![CDATA[ ]]><methodname><![CDATA[day1]]></methodname><![CDATA[ ]]><methodname><![CDATA[day2]]></methodname>
<![CDATA[1  ]]><methodname><![CDATA[A]]></methodname><![CDATA[    3    7]]>
<![CDATA[2  ]]><methodname><![CDATA[B]]></methodname><![CDATA[    4   ]]><symbol><![CDATA[NA]]></symbol>
<![CDATA[3  ]]><methodname><![CDATA[C]]></methodname><![CDATA[    5   10]]>
<![CDATA[4  ]]><methodname><![CDATA[E]]></methodname><![CDATA[   ]]><symbol><![CDATA[NA]]></symbol><![CDATA[    3]]>
<![CDATA[5  ]]><token><![CDATA[F]]></token><![CDATA[   ]]><symbol><![CDATA[NA]]></symbol><![CDATA[    6]]>
</programlisting><itemizedlist><listitem><para>if not stated otherwise R uses the intersect of the names of both data frames, in our case only \textit{id} </para></listitem><listitem><para>you can specify these columns directly by \texttt{by=c(&quot;colname1&quot;,&quot;colname2&quot;)} if the columns are named identical or </para></listitem><listitem><para>using\\ \texttt{by.x=c(&quot;colname1.x&quot;,&quot;colname2.x&quot;), </para></listitem></itemizedlist></section><section><title>merge() Exercise</title><itemizedlist><listitem><para>now read in the file personendaten.txt using the appropriate command </para></listitem><listitem><para>join the demographics with our pre1 data frame (even though it does not make sense now) </para></listitem></itemizedlist></section><section><title>merge() Solution</title><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[persdat]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[read.table]]></methodname><![CDATA[(]]><phrase><![CDATA["]]></phrase><phrase><![CDATA[../session1/session1data/personendaten.txt"]]></phrase><![CDATA[,]]>
<![CDATA[+                       ]]><methodname><![CDATA[sep]]></methodname><![CDATA[=]]><phrase><![CDATA["]]></phrase><phrase><![CDATA[\t"]]></phrase><![CDATA[,]]>
<![CDATA[+                       ]]><methodname><![CDATA[header]]></methodname><![CDATA[=]]><token><![CDATA[T]]></token><![CDATA[)]]>
<![CDATA[> ]]><methodname><![CDATA[pre1]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[merge]]></methodname><![CDATA[(]]><methodname><![CDATA[persdat]]></methodname><![CDATA[,]]><methodname><![CDATA[pre1]]></methodname><![CDATA[,]]><methodname><![CDATA[all.y]]></methodname><![CDATA[ = ]]><token><![CDATA[T]]></token><![CDATA[)]]>
<![CDATA[> ]]><methodname><![CDATA[head]]></methodname><![CDATA[(]]><methodname><![CDATA[pre1]]></methodname><![CDATA[)]]>
<methodname><![CDATA[Subject]]></methodname><![CDATA[ ]]><methodname><![CDATA[Sex]]></methodname><![CDATA[ ]]><methodname><![CDATA[Age_PRETEST]]></methodname><![CDATA[ ]]><methodname><![CDATA[Trial]]></methodname><![CDATA[ ]]><methodname><![CDATA[Event.Type]]></methodname><![CDATA[ ]]><methodname><![CDATA[Code]]></methodname><![CDATA[   ]]><methodname><![CDATA[Time]]></methodname><![CDATA[ ]]><methodname><![CDATA[TTime]]></methodname><![CDATA[ ]]><methodname><![CDATA[Uncertainty]]></methodname>
<![CDATA[1  ]]><methodname><![CDATA[PRE001]]></methodname><![CDATA[   ]]><methodname><![CDATA[f]]></methodname><![CDATA[        3.11     7   ]]><methodname><![CDATA[Response]]></methodname><![CDATA[    2 178963 10009           1]]>
<![CDATA[2  ]]><methodname><![CDATA[PRE001]]></methodname><![CDATA[   ]]><methodname><![CDATA[f]]></methodname><![CDATA[        3.11    12   ]]><methodname><![CDATA[Response]]></methodname><![CDATA[    1 238680  8342           1]]>
<![CDATA[3  ]]><methodname><![CDATA[PRE001]]></methodname><![CDATA[   ]]><methodname><![CDATA[f]]></methodname><![CDATA[        3.11    17   ]]><methodname><![CDATA[Response]]></methodname><![CDATA[    2 297789  8066           1]]>
<![CDATA[4  ]]><methodname><![CDATA[PRE001]]></methodname><![CDATA[   ]]><methodname><![CDATA[f]]></methodname><![CDATA[        3.11    22   ]]><methodname><![CDATA[Response]]></methodname><![CDATA[    1 351321 10811           1]]>
<![CDATA[5  ]]><methodname><![CDATA[PRE001]]></methodname><![CDATA[   ]]><methodname><![CDATA[f]]></methodname><![CDATA[        3.11    27   ]]><methodname><![CDATA[Response]]></methodname><![CDATA[    2 403607   713           1]]>
<![CDATA[6  ]]><methodname><![CDATA[PRE001]]></methodname><![CDATA[   ]]><methodname><![CDATA[f]]></methodname><![CDATA[        3.11    32   ]]><methodname><![CDATA[Response]]></methodname><![CDATA[    1 467793 23709           1]]>
<methodname><![CDATA[Duration]]></methodname><![CDATA[ ]]><methodname><![CDATA[Uncertainty.1]]></methodname><![CDATA[ ]]><methodname><![CDATA[ReqTime]]></methodname><![CDATA[ ]]><methodname><![CDATA[ReqDur]]></methodname><![CDATA[ ]]><methodname><![CDATA[Stim.Type]]></methodname><![CDATA[ ]]><methodname><![CDATA[Pair.Index]]></methodname><![CDATA[    ]]><methodname><![CDATA[Type]]></methodname><![CDATA[ ]]><methodname><![CDATA[Event.Code]]></methodname>
<![CDATA[1    10197             2       0   ]]><methodname><![CDATA[next]]></methodname><![CDATA[ ]]><methodname><![CDATA[incorrect]]></methodname><![CDATA[          7 ]]><methodname><![CDATA[Picture]]></methodname><![CDATA[   ]]><methodname><![CDATA[RO09.jpg]]></methodname>
<![CDATA[2     8398             2       0   ]]><methodname><![CDATA[next]]></methodname><![CDATA[ ]]><methodname><![CDATA[incorrect]]></methodname><![CDATA[         12 ]]><methodname><![CDATA[Picture]]></methodname><![CDATA[   ]]><methodname><![CDATA[RO20.jpg]]></methodname>
<![CDATA[3     8198             2       0   ]]><methodname><![CDATA[next]]></methodname><![CDATA[       ]]><methodname><![CDATA[hit]]></methodname><![CDATA[         17 ]]><methodname><![CDATA[Picture]]></methodname><![CDATA[   ]]><methodname><![CDATA[RS28.jpg]]></methodname>
<![CDATA[4    10997             2       0   ]]><methodname><![CDATA[next]]></methodname><![CDATA[       ]]><methodname><![CDATA[hit]]></methodname><![CDATA[         22 ]]><methodname><![CDATA[Picture]]></methodname><![CDATA[   ]]><methodname><![CDATA[AT26.jpg]]></methodname>
<![CDATA[5      800             2       0   ]]><methodname><![CDATA[next]]></methodname><![CDATA[       ]]><methodname><![CDATA[hit]]></methodname><![CDATA[         27 ]]><methodname><![CDATA[Picture]]></methodname><![CDATA[   ]]><methodname><![CDATA[RS23.jpg]]></methodname>
<![CDATA[6    23794             2       0   ]]><methodname><![CDATA[next]]></methodname><![CDATA[       ]]><methodname><![CDATA[hit]]></methodname><![CDATA[         32 ]]><methodname><![CDATA[Picture]]></methodname><![CDATA[   ]]><methodname><![CDATA[OF04.jpg]]></methodname>
</programlisting></section></section></section></article>