<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article  PUBLIC '-//OASIS//DTD DocBook XML V4.4//EN'  'http://www.docbook.org/xml/4.4/docbookx.dtd'><article><articleinfo><title>RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels</title><revhistory><revision><revnumber>6</revnumber><date>2015-05-07 07:36:43</date><authorinitials>mandy.vogel@googlemail.com</authorinitials></revision><revision><revnumber>5</revnumber><date>2015-05-07 07:33:57</date><authorinitials>mandy.vogel@googlemail.com</authorinitials></revision><revision><revnumber>4</revnumber><date>2015-05-03 07:13:15</date><authorinitials>mandy.vogel@googlemail.com</authorinitials></revision><revision><revnumber>3</revnumber><date>2015-05-03 06:48:42</date><authorinitials>mandy.vogel@googlemail.com</authorinitials></revision><revision><revnumber>2</revnumber><date>2015-05-03 06:46:31</date><authorinitials>mandy.vogel@googlemail.com</authorinitials></revision><revision><revnumber>1</revnumber><date>2015-05-03 06:44:19</date><authorinitials>mandy.vogel@googlemail.com</authorinitials></revision></revhistory></articleinfo><section><title>Choosing the appropriate method</title><para>It is essential, therefore, that you can answer the following questions: </para><itemizedlist><listitem><para>Which of your variables is the response variable? </para></listitem><listitem><para>Which are the explanatory variables? </para></listitem><listitem><para>Are the explanatory variables continuous or categorical, or a mixture of both? </para></listitem><listitem><para>What kind of response variable do you have: is it a continuous measurement, a count, a proportion, a time at death, or a category? </para></listitem></itemizedlist></section><section><title>Choosing the appropriate method</title><informaltable><tgroup cols="2"><colspec colname="col_0"/><colspec colname="col_1"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para>Explanatory Variables are </para></entry><entry colsep="1" rowsep="1"/></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>all continuous </para></entry><entry colsep="1" rowsep="1"><para>Regression </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>all categorical </para></entry><entry colsep="1" rowsep="1"><para>Analysis of variance (ANOVA) </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>both continuous and categorical </para></entry><entry colsep="1" rowsep="1"><para>Analysis of covariance (ANCOVA)</para></entry></row></tbody></tgroup></informaltable></section><section><title>Choosing the appropriate method</title><informaltable><tgroup cols="2"><colspec colname="col_0"/><colspec colname="col_1"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para>Response Variables </para></entry><entry colsep="1" rowsep="1"/></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>(a) Continuous </para></entry><entry colsep="1" rowsep="1"><para>Normal regression, ANOVA or ANCOVA</para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>(b) Proportion </para></entry><entry colsep="1" rowsep="1"><para>Logistic regression </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>(c) Count </para></entry><entry colsep="1" rowsep="1"><para>Log-linear models </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>(d) Binary </para></entry><entry colsep="1" rowsep="1"><para>Binary logistic analysis </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>(e) Time at death</para></entry><entry colsep="1" rowsep="1"><para>Survival analysis </para></entry></row></tbody></tgroup></informaltable><para>The best model is the model that produces the least unexplained variation (the minimal residual) </para></section><section><title>Choosing the appropriate method</title><itemizedlist><listitem><para>It is very important to understand that there is not one model; </para></listitem><listitem><para>there will be a large number of different, more or less plausible models that   might be fitted to any given set of data.  </para></listitem></itemizedlist></section><section><title>Maximum Likelihood</title><para>We define <emphasis>best</emphasis> in terms of maximum likelihood. </para><itemizedlist><listitem><para>given the data,                                                                 </para></listitem><listitem><para>and given our choice of model,                                                  </para></listitem><listitem><para>what values of the parameters of that model make the observed data most likely? </para></listitem></itemizedlist><para>We judge the model on the basis how likely the data would be if the model were correct.  </para></section><section><title>Ockham's Razor</title><para>The principle is attributed to William of Ockham, who insisted that, given a set of equally good explanations for a given phenomenon, the correct explanation is the simplest explanation. The most useful statement of the principle for scientists is when you have two competing theories which make exactly the same predictions, the one that is simpler is the better. </para></section><section><title>Ockham's Razor</title><para>For statistical modelling, the principle of parsimony means that: </para><itemizedlist><listitem><para>models should have as few parameters as possible; </para></listitem><listitem><para>linear models should be preferred to non-linear models; </para></listitem><listitem><para>experiments relying on few assumptions should be preferred to those relying on many; </para></listitem><listitem><para>models should be pared down until they are minimal adequate; </para></listitem><listitem><para>simple explanations should be preferred to complex explanations. </para></listitem></itemizedlist></section><section><title>Types of Models</title><para>Fitting models to data is the central function of R. There are no fixed rules and no absolutes. The object is to determine a minimal adequate model from a large set of potential models. For this reason we looking at the following types of models:  </para><itemizedlist><listitem><para>the null model; </para></listitem><listitem><para>the minimal adequate model; </para></listitem><listitem><para>the maximal model; and </para></listitem><listitem><para>the saturated model. </para></listitem></itemizedlist></section><section><title>The Null model</title><para><ulink url="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=nullmodel.png"><inlinemediaobject><imageobject><imagedata depth="400" fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=nullmodel.png" width="800"/></imageobject><textobject><phrase>attachment:nullmodel.png</phrase></textobject></inlinemediaobject></ulink>  </para><itemizedlist><listitem><para>Just one parameter, the overall mean ybar </para></listitem><listitem><para>Fit: none; SSE = SSY </para></listitem><listitem><para>Degrees of freedom: n-1 </para></listitem><listitem><para>Explanatory power of the model: none </para></listitem></itemizedlist></section><section><title>Adding Information</title><para>Minimal adequate model: <ulink url="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=minimalmodel.png"><inlinemediaobject><imageobject><imagedata depth="400" fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=minimalmodel.png" width="800"/></imageobject><textobject><phrase>attachment:minimalmodel.png</phrase></textobject></inlinemediaobject></ulink>  </para><itemizedlist><listitem><para>model with  </para></listitem></itemizedlist><para><inlinemediaobject><imageobject><imagedata fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=latex_1a10e421cc633a5322272c8aeda312b672847f2a_p1.png"/></imageobject><textobject><phrase>$0 \le p' \le p$ </phrase></textobject></inlinemediaobject></para><para>}}} parameters </para><itemizedlist><listitem><para>Fit: less than the maximal model, but not significantly so </para></listitem><listitem><para>Degrees of freedom:  </para></listitem></itemizedlist><para><inlinemediaobject><imageobject><imagedata fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=latex_22206272d8f01c63eca8897adcdc4c0de9aad492_p1.png"/></imageobject><textobject><phrase>$n-p'-1$</phrase></textobject></inlinemediaobject></para><itemizedlist><listitem><para>Explanatory power of the model:  </para></listitem></itemizedlist><para><inlinemediaobject><imageobject><imagedata fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=latex_9c31afa907a3c71bd8a0a393978e15f838c0d7e2_p1.png"/></imageobject><textobject><phrase>$r^2 = \frac{SSR}{SSY}$</phrase></textobject></inlinemediaobject></para></section><section><title>Adding Information</title><para>Minimal adequate model: <ulink url="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=minimalmodel2.png"><inlinemediaobject><imageobject><imagedata depth="400" fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=minimalmodel2.png" width="800"/></imageobject><textobject><phrase>attachment:minimalmodel2.png</phrase></textobject></inlinemediaobject></ulink>  </para><itemizedlist><listitem><para>model with  </para></listitem></itemizedlist><para><inlinemediaobject><imageobject><imagedata fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=latex_79c4306d72ea5fef9117eced128e89044a50f599_p1.png"/></imageobject><textobject><phrase>$0 \le p' \le p$</phrase></textobject></inlinemediaobject></para><para> parameters </para><itemizedlist><listitem><para>Fit: less than the maximal model, but not significantly so </para></listitem><listitem><para>Degrees of freedom:  </para></listitem></itemizedlist><para><inlinemediaobject><imageobject><imagedata fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=latex_22206272d8f01c63eca8897adcdc4c0de9aad492_p1.png"/></imageobject><textobject><phrase>$n-p'-1$</phrase></textobject></inlinemediaobject></para><itemizedlist><listitem><para>Explanatory power of the model:  </para></listitem></itemizedlist><para><inlinemediaobject><imageobject><imagedata fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=latex_2c22d156a27d5cb1356e6629c51fb9e377733279_p1.png"/></imageobject><textobject><phrase>$r^2 = \frac{SSA}{SSY}$</phrase></textobject></inlinemediaobject></para></section><section><title>Saturated/Maximal Model</title><para>saturated model </para><itemizedlist><listitem><para>One parameter for every data point </para></listitem><listitem><para>Fit: perfect </para></listitem><listitem><para>Degrees of freedom: none </para></listitem><listitem><para>Explanatory power of the model: none </para></listitem></itemizedlist><para>maximal model </para><itemizedlist><listitem><para>Contains all p factors, interactions and covariates that </para></listitem><listitem><para>Degrees of freedom:  </para></listitem></itemizedlist><para><inlinemediaobject><imageobject><imagedata fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=latex_16c27bfa5fc9ea1a89b98f09d707c7e840c7bc44_p1.png"/></imageobject><textobject><phrase>$n-p-1$</phrase></textobject></inlinemediaobject></para><itemizedlist><listitem><para>Explanatory power of the model: it depends </para></listitem></itemizedlist></section><section><title>How to choose...</title><itemizedlist><listitem><para>models are representations of reality that should be both accurate and convenient </para></listitem><listitem><para>it is impossible to maximize a model’s realism, generality and holism simultaneously </para></listitem><listitem><para>the principle of parsimony is a vital tool in helping to choose one model over another </para></listitem><listitem><para>only include an explanatory variable in a model if it significantly improved the fit of the model (or if there other strong reasons) </para></listitem><listitem><para>the fact that we went to the trouble of measuring something does not mean we have to have it in our model </para></listitem></itemizedlist></section><section><title>ANOVA</title><itemizedlist><listitem><para>a technique we use when all explanatory variables are categorical (factor) </para></listitem><listitem><para>if there is one factor with three or more levels we use one-way ANOVA (only two levels: t-test should be preferred, would give exactly the same answer since with 2 levels  </para></listitem></itemizedlist><para><inlinemediaobject><imageobject><imagedata fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=latex_9fecf66bceb9debfa1a0c51018abc2c7b77a0cf3_p1.png"/></imageobject><textobject><phrase>$F=t^2$)</phrase></textobject></inlinemediaobject></para><itemizedlist><listitem><para>for more factors there there is two-way, three-way anova  </para></listitem><listitem><para>central idea is to compare two or more means by comparing variances </para></listitem></itemizedlist></section><section><title>The Garden Data</title><para>A data frame with 14 observations on 2 variables.  </para><itemizedlist><listitem><para>ozone: athmospheric ozone concentration  </para></listitem><listitem><para>garden: garden id </para></listitem></itemizedlist><informaltable><tgroup cols="15"><colspec colname="col_0"/><colspec colname="col_1"/><colspec colname="col_2"/><colspec colname="col_3"/><colspec colname="col_4"/><colspec colname="col_5"/><colspec colname="col_6"/><colspec colname="col_7"/><colspec colname="col_8"/><colspec colname="col_9"/><colspec colname="col_10"/><colspec colname="col_11"/><colspec colname="col_12"/><colspec colname="col_13"/><colspec colname="col_14"/><tbody><row rowsep="1"><entry align="center" colsep="1" nameend="col_1" namest="col_0" rowsep="1"><para>1 </para></entry><entry colsep="1" rowsep="1"><para>2 </para></entry><entry colsep="1" rowsep="1"><para>3 </para></entry><entry colsep="1" rowsep="1"><para>4 </para></entry><entry colsep="1" rowsep="1"><para>5 </para></entry><entry colsep="1" rowsep="1"><para>6 </para></entry><entry colsep="1" rowsep="1"><para>7 </para></entry><entry colsep="1" rowsep="1"><para>8 </para></entry><entry colsep="1" rowsep="1"><para>9 </para></entry><entry colsep="1" rowsep="1"><para>10 </para></entry><entry colsep="1" rowsep="1"><para>11 </para></entry><entry colsep="1" rowsep="1"><para>12 </para></entry><entry colsep="1" rowsep="1"><para>13 </para></entry><entry colsep="1" rowsep="1"><para>14 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>ozone </para></entry><entry colsep="1" rowsep="1"><para> 9 </para></entry><entry colsep="1" rowsep="1"><para> 7 </para></entry><entry colsep="1" rowsep="1"><para> 6 </para></entry><entry colsep="1" rowsep="1"><para> 8 </para></entry><entry colsep="1" rowsep="1"><para> 5 </para></entry><entry colsep="1" rowsep="1"><para>11 </para></entry><entry colsep="1" rowsep="1"><para> 9 </para></entry><entry colsep="1" rowsep="1"><para>11 </para></entry><entry colsep="1" rowsep="1"><para> 9 </para></entry><entry colsep="1" rowsep="1"><para> 6 </para></entry><entry colsep="1" rowsep="1"><para>10 </para></entry><entry colsep="1" rowsep="1"><para> 8 </para></entry><entry colsep="1" rowsep="1"><para> 8 </para></entry><entry colsep="1" rowsep="1"><para>12 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>garden </para></entry><entry colsep="1" rowsep="1"><para>a </para></entry><entry colsep="1" rowsep="1"><para>a </para></entry><entry colsep="1" rowsep="1"><para>a </para></entry><entry colsep="1" rowsep="1"><para>b </para></entry><entry colsep="1" rowsep="1"><para>a </para></entry><entry colsep="1" rowsep="1"><para>b </para></entry><entry colsep="1" rowsep="1"><para>b </para></entry><entry colsep="1" rowsep="1"><para>b </para></entry><entry colsep="1" rowsep="1"><para>b </para></entry><entry colsep="1" rowsep="1"><para>a </para></entry><entry colsep="1" rowsep="1"><para>b </para></entry><entry colsep="1" rowsep="1"><para>a </para></entry><entry colsep="1" rowsep="1"><para>a </para></entry><entry colsep="1" rowsep="1"><para>b </para></entry></row></tbody></tgroup></informaltable></section><section><title>Total Sum of Squares</title><itemizedlist><listitem><para>we plot the values in order they are measured </para></listitem></itemizedlist><para><ulink url="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=TSS1.png"><inlinemediaobject><imageobject><imagedata depth="400" fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=TSS1.png" width="800"/></imageobject><textobject><phrase>attachment:TSS1.png</phrase></textobject></inlinemediaobject></ulink>  </para></section><section><title>Total Sum of Squares</title><itemizedlist><listitem><para>there is a lot of scatter, indicating that the variance in ozone is large </para></listitem><listitem><para>to get a feel for the overall variance we plot the overall mean (8.5) and indicate each of the residuals by a vertical line </para></listitem></itemizedlist><para><ulink url="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=TSS.png"><inlinemediaobject><imageobject><imagedata depth="400" fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=TSS.png" width="800"/></imageobject><textobject><phrase>attachment:TSS.png</phrase></textobject></inlinemediaobject></ulink>  </para><itemizedlist><listitem><para>we refer to this overall variation as the <emphasis>total sum of squares, SSY or TSS</emphasis>  </para></listitem><listitem><para>in this case SSY = 55.5$ </para></listitem></itemizedlist><para><ulink url="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=TSS2.png"><inlinemediaobject><imageobject><imagedata depth="400" fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=TSS2.png" width="800"/></imageobject><textobject><phrase>attachment:TSS2.png</phrase></textobject></inlinemediaobject></ulink>  </para></section><section><title>Group Means</title><itemizedlist><listitem><para>now instead of fitting the overall mean, let us fit the individual garden means </para></listitem></itemizedlist><informaltable><tgroup cols="3"><colspec colname="col_0"/><colspec colname="col_1"/><colspec colname="col_2"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para>garden </para></entry><entry colsep="1" rowsep="1"><para>a </para></entry><entry colsep="1" rowsep="1"><para>b  </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>mean </para></entry><entry colsep="1" rowsep="1"><para> 7 </para></entry><entry colsep="1" rowsep="1"><para>10  </para></entry></row></tbody></tgroup></informaltable></section><section><title>Group Means</title><para><ulink url="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=ESS.png">{{attachment:ESS.png}}</ulink>  </para><itemizedlist><listitem><para>now we see that the mean ozone concentration is substantially higher in garden B </para></listitem><listitem><para>the aim of ANOVA is to determine  </para><itemizedlist><listitem><para>whether it is significantly higher <emphasis>or</emphasis> </para></listitem><listitem><para>whether this kind of difference could come by chance alone </para></listitem></itemizedlist></listitem></itemizedlist></section><section><title>Error Sum of Squares</title><itemizedlist><listitem><para>we define the new sum of squares as the <emphasis>error sum of squares</emphasis> (error in the sense of 'residual') </para></listitem><listitem><para>in this case SSE = 24.0$ </para></listitem></itemizedlist><para><ulink url="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=ESS2.png"><inlinemediaobject><imageobject><imagedata depth="400" fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=ESS2.png" width="800"/></imageobject><textobject><phrase>attachment:ESS2.png</phrase></textobject></inlinemediaobject></ulink>  </para></section><section><title>Treatment Sum of Squares</title><itemizedlist><listitem><para>then the component of the variation that is explained by the difference of the means is called the <emphasis>treatment sum of squares</emphasis> SSA </para></listitem><listitem><para>analysis of variance is based  on the notion that we break down the total sum of squares into useful and informative components </para><itemizedlist><listitem><para>SSA = explained variation </para></listitem><listitem><para>SSE = unexplained variation </para></listitem></itemizedlist></listitem></itemizedlist></section><section><title>ANOVA table</title><informaltable><tgroup cols="5"><colspec colname="col_0"/><colspec colname="col_1"/><colspec colname="col_2"/><colspec colname="col_3"/><colspec colname="col_4"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para>Source </para></entry><entry colsep="1" rowsep="1"><para>Sum of squares </para></entry><entry colsep="1" rowsep="1"><para>Degrees of freedom </para></entry><entry colsep="1" rowsep="1"><para>Mean square </para></entry><entry colsep="1" rowsep="1"><para>F ratio </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>Garden </para></entry><entry colsep="1" rowsep="1"><para> 31.5 </para></entry><entry colsep="1" rowsep="1"><para>1 </para></entry><entry colsep="1" rowsep="1"><para> 31.5 </para></entry><entry colsep="1" rowsep="1"><para> 15.75</para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>Error </para></entry><entry colsep="1" rowsep="1"><para> 24.0 </para></entry><entry colsep="1" rowsep="1"><para>12 </para></entry><entry colsep="1" rowsep="1"><para> s^2=2.0 </para></entry><entry colsep="1" rowsep="1"/></row><row rowsep="1"><entry colsep="1" rowsep="1"><para>Total </para></entry><entry colsep="1" rowsep="1"><para>55.5 </para></entry><entry colsep="1" rowsep="1"><para>13 </para></entry><entry colsep="1" rowsep="1"/></row></tbody></tgroup></informaltable></section><section><title>ANOVA</title><itemizedlist><listitem><para>now we need to test whether an F ratio of 15.75 is large or small </para></listitem><listitem><para>we can use a table or software package </para></listitem><listitem><para>I use here software to calculate the cumulative probability </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> 1 - ]]><methodname><![CDATA[pf]]></methodname><![CDATA[(15.75,1,12)]]>
<methodname><![CDATA[[1]]></methodname><methodname><![CDATA[]]]></methodname><![CDATA[ 0.001864103]]>
</programlisting><para>XXX noch Grafik hochladen <ulink url="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=fdens.png"><inlinemediaobject><imageobject><imagedata depth="400" fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=fdens.png" width="800"/></imageobject><textobject><phrase>attachment:fdens.png</phrase></textobject></inlinemediaobject></ulink>  </para></section><section><title>ANOVA in R</title><itemizedlist><listitem><para>in R we use the lm() or the aov() command and </para></listitem><listitem><para>the formula syntax a ~ b </para></listitem><listitem><para>we assign this to an variable </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><methodname><![CDATA[mm]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[lm]]></methodname><![CDATA[(]]><methodname><![CDATA[ozone]]></methodname><![CDATA[ ~ ]]><methodname><![CDATA[garden]]></methodname><![CDATA[, ]]><methodname><![CDATA[data]]></methodname><![CDATA[=]]><methodname><![CDATA[oneway]]></methodname><![CDATA[)]]>
<![CDATA[7            3  ]]>
</programlisting><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[summary]]></methodname><![CDATA[(]]><methodname><![CDATA[mm]]></methodname><![CDATA[)]]>
<methodname><![CDATA[Min]]></methodname><![CDATA[     1]]><methodname><![CDATA[Q]]></methodname><![CDATA[ ]]><methodname><![CDATA[Median]]></methodname><![CDATA[     3]]><methodname><![CDATA[Q]]></methodname><![CDATA[    ]]><methodname><![CDATA[Max]]></methodname><![CDATA[ ]]>
<methodname><![CDATA[Estimate]]></methodname><![CDATA[ ]]><methodname><![CDATA[Std.]]></methodname><![CDATA[ ]]><methodname><![CDATA[Error]]></methodname><![CDATA[ ]]><methodname><![CDATA[t]]></methodname><![CDATA[ ]]><methodname><![CDATA[value]]></methodname><![CDATA[ ]]><methodname><![CDATA[Pr]]></methodname><![CDATA[(>|]]><methodname><![CDATA[t]]></methodname><![CDATA[|)    ]]>
<methodname><![CDATA[gardenb]]></methodname><![CDATA[       3.0000     0.7559   3.969  0.00186 ** ]]>
<methodname><![CDATA[Residual]]></methodname><![CDATA[ ]]><methodname><![CDATA[standard]]></methodname><![CDATA[ ]]><methodname><![CDATA[error]]></methodname><![CDATA[: 1.414 ]]><methodname><![CDATA[on]]></methodname><![CDATA[ 12 ]]><methodname><![CDATA[degrees]]></methodname><![CDATA[ ]]><methodname><![CDATA[of]]></methodname><![CDATA[ ]]><methodname><![CDATA[freedom]]></methodname>
<methodname><![CDATA[Multiple]]></methodname><![CDATA[ ]]><methodname><![CDATA[R]]></methodname><![CDATA[-]]><methodname><![CDATA[squared]]></methodname><![CDATA[:  0.5676,    ]]><methodname><![CDATA[Adjusted]]></methodname><![CDATA[ ]]><methodname><![CDATA[R]]></methodname><![CDATA[-]]><methodname><![CDATA[squared]]></methodname><![CDATA[:  0.5315 ]]>
</programlisting><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[anova]]></methodname><![CDATA[(]]><methodname><![CDATA[mm]]></methodname><![CDATA[)]]>
<methodname><![CDATA[Analysis]]></methodname><![CDATA[ ]]><methodname><![CDATA[of]]></methodname><![CDATA[ ]]><methodname><![CDATA[Variance]]></methodname><![CDATA[ ]]><methodname><![CDATA[Table]]></methodname>
<methodname><![CDATA[Df]]></methodname><![CDATA[ ]]><methodname><![CDATA[Sum]]></methodname><![CDATA[ ]]><methodname><![CDATA[Sq]]></methodname><![CDATA[ ]]><methodname><![CDATA[Mean]]></methodname><![CDATA[ ]]><methodname><![CDATA[Sq]]></methodname><![CDATA[ ]]><token><![CDATA[F]]></token><![CDATA[ ]]><methodname><![CDATA[value]]></methodname><![CDATA[   ]]><methodname><![CDATA[Pr]]></methodname><![CDATA[(>]]><token><![CDATA[F]]></token><![CDATA[)   ]]>
<methodname><![CDATA[garden]]></methodname><![CDATA[     1   31.5    31.5   15.75 0.001864 **]]>
<methodname><![CDATA[Residuals]]></methodname><![CDATA[ 12   24.0     2.0                    ]]>
</programlisting><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[m2]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[aov]]></methodname><![CDATA[(]]><methodname><![CDATA[ozone]]></methodname><![CDATA[ ~ ]]><methodname><![CDATA[garden]]></methodname><![CDATA[, ]]><methodname><![CDATA[data]]></methodname><![CDATA[=]]><methodname><![CDATA[oneway]]></methodname><![CDATA[)]]>
<![CDATA[> ]]><methodname><![CDATA[m2]]></methodname>
<methodname><![CDATA[garden]]></methodname><![CDATA[ ]]><methodname><![CDATA[Residuals]]></methodname>
<methodname><![CDATA[Sum]]></methodname><![CDATA[ ]]><methodname><![CDATA[of]]></methodname><![CDATA[ ]]><methodname><![CDATA[Squares]]></methodname><![CDATA[    31.5      24.0]]>
<methodname><![CDATA[Residual]]></methodname><![CDATA[ ]]><methodname><![CDATA[standard]]></methodname><![CDATA[ ]]><methodname><![CDATA[error]]></methodname><![CDATA[: 1.414214]]>
<methodname><![CDATA[Estimated]]></methodname><![CDATA[ ]]><methodname><![CDATA[effects]]></methodname><![CDATA[ ]]><methodname><![CDATA[may]]></methodname><![CDATA[ ]]><methodname><![CDATA[be]]></methodname><![CDATA[ ]]><methodname><![CDATA[unbalanced]]></methodname>
<![CDATA[> ]]><methodname><![CDATA[summary]]></methodname><![CDATA[(]]><methodname><![CDATA[m2]]></methodname><![CDATA[)]]>
<methodname><![CDATA[Df]]></methodname><![CDATA[ ]]><methodname><![CDATA[Sum]]></methodname><![CDATA[ ]]><methodname><![CDATA[Sq]]></methodname><![CDATA[ ]]><methodname><![CDATA[Mean]]></methodname><![CDATA[ ]]><methodname><![CDATA[Sq]]></methodname><![CDATA[ ]]><token><![CDATA[F]]></token><![CDATA[ ]]><methodname><![CDATA[value]]></methodname><![CDATA[  ]]><methodname><![CDATA[Pr]]></methodname><![CDATA[(>]]><token><![CDATA[F]]></token><![CDATA[)   ]]>
<methodname><![CDATA[garden]]></methodname><![CDATA[       1   31.5    31.5   15.75 0.00186 **]]>
<methodname><![CDATA[Residuals]]></methodname><![CDATA[   12   24.0     2.0                   ]]>
<![CDATA[> ]]><methodname><![CDATA[summary.lm]]></methodname><![CDATA[(]]><methodname><![CDATA[m2]]></methodname><![CDATA[)]]>
<methodname><![CDATA[Min]]></methodname><![CDATA[     1]]><methodname><![CDATA[Q]]></methodname><![CDATA[ ]]><methodname><![CDATA[Median]]></methodname><![CDATA[     3]]><methodname><![CDATA[Q]]></methodname><![CDATA[    ]]><methodname><![CDATA[Max]]></methodname><![CDATA[ ]]>
<methodname><![CDATA[Estimate]]></methodname><![CDATA[ ]]><methodname><![CDATA[Std.]]></methodname><![CDATA[ ]]><methodname><![CDATA[Error]]></methodname><![CDATA[ ]]><methodname><![CDATA[t]]></methodname><![CDATA[ ]]><methodname><![CDATA[value]]></methodname><![CDATA[ ]]><methodname><![CDATA[Pr]]></methodname><![CDATA[(>|]]><methodname><![CDATA[t]]></methodname><![CDATA[|)    ]]>
<methodname><![CDATA[gardenb]]></methodname><![CDATA[       3.0000     0.7559   3.969  0.00186 ** ]]>
<methodname><![CDATA[Residual]]></methodname><![CDATA[ ]]><methodname><![CDATA[standard]]></methodname><![CDATA[ ]]><methodname><![CDATA[error]]></methodname><![CDATA[: 1.414 ]]><methodname><![CDATA[on]]></methodname><![CDATA[ 12 ]]><methodname><![CDATA[degrees]]></methodname><![CDATA[ ]]><methodname><![CDATA[of]]></methodname><![CDATA[ ]]><methodname><![CDATA[freedom]]></methodname>
<methodname><![CDATA[Multiple]]></methodname><![CDATA[ ]]><methodname><![CDATA[R]]></methodname><![CDATA[-]]><methodname><![CDATA[squared]]></methodname><![CDATA[:  0.5676,    ]]><methodname><![CDATA[Adjusted]]></methodname><![CDATA[ ]]><methodname><![CDATA[R]]></methodname><![CDATA[-]]><methodname><![CDATA[squared]]></methodname><![CDATA[:  0.5315 ]]>
<![CDATA[> ]]><methodname><![CDATA[summary]]></methodname><![CDATA[(]]><methodname><![CDATA[m2]]></methodname><![CDATA[)]]>
<methodname><![CDATA[Df]]></methodname><![CDATA[ ]]><methodname><![CDATA[Sum]]></methodname><![CDATA[ ]]><methodname><![CDATA[Sq]]></methodname><![CDATA[ ]]><methodname><![CDATA[Mean]]></methodname><![CDATA[ ]]><methodname><![CDATA[Sq]]></methodname><![CDATA[ ]]><token><![CDATA[F]]></token><![CDATA[ ]]><methodname><![CDATA[value]]></methodname><![CDATA[  ]]><methodname><![CDATA[Pr]]></methodname><![CDATA[(>]]><token><![CDATA[F]]></token><![CDATA[)   ]]>
<methodname><![CDATA[garden]]></methodname><![CDATA[       1   31.5    31.5   15.75 0.00186 **]]>
<methodname><![CDATA[Residuals]]></methodname><![CDATA[   12   24.0     2.0                   ]]>
</programlisting><section><title>ANOVA Assumptions</title><itemizedlist><listitem><para>independed, normal distributed errors </para></listitem><listitem><para>equality of variances (homogeneity) </para></listitem></itemizedlist></section><section><title>Welch ANOVA</title><itemizedlist><listitem><para>generalization of the Welch t-test </para></listitem><listitem><para>tests whether the means of the outcome variables are different across the factor levels </para></listitem><listitem><para>assumes sufficiently large sample (greater than 10 times the number of groups in the calculation, groups of size one are to be excluded) </para></listitem><listitem><para>sensitive to the existence of outliers (only few are allowed) </para></listitem><listitem><para>the r command is oneway.test() </para></listitem><listitem><para>non-parametric alternative kruskal.test() </para></listitem></itemizedlist><para>* Look at the help of the TukeyHSD function. What is its purpose?  * Execute the code of the example near the end of the help page, interpret the results! * install and load the granovaGG package (a package for visualization of ANOVAs), load the arousal data frame and use the stack() command to bring the data in the long form. Do a anova analysis. Is there a difference at least 2 of the groups? If indicated do a post-hoc test. * Visualize your results </para></section></section><section><title>Exercises and Solutions</title><para>* Look at the help of the TukeyHSD function. What is its purpose?  * Execute the code of the example near the end of the help page, interpret the results! * install and load the granovaGG package (a package for visualization of ANOVAs), load the arousal data frame and use the stack() command to bring the data in the long form. Do a anova analysis. Is there a difference at least 2 of the groups? If indicated do a post-hoc test. </para><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[require]]></methodname><![CDATA[(]]><methodname><![CDATA[granovaGG]]></methodname><![CDATA[)]]>
<![CDATA[> ]]><methodname><![CDATA[data]]></methodname><![CDATA[(]]><methodname><![CDATA[arousal]]></methodname><![CDATA[)]]>
<![CDATA[> ]]><methodname><![CDATA[datalong]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[stack]]></methodname><![CDATA[(]]><methodname><![CDATA[arousal]]></methodname><![CDATA[)]]>
<![CDATA[> ]]><methodname><![CDATA[m1]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[aov]]></methodname><![CDATA[(]]><methodname><![CDATA[values]]></methodname><![CDATA[ ~ ]]><methodname><![CDATA[ind]]></methodname><![CDATA[, ]]><methodname><![CDATA[data]]></methodname><![CDATA[ = ]]><methodname><![CDATA[datalong]]></methodname><![CDATA[)]]>
<![CDATA[> ]]><methodname><![CDATA[summary]]></methodname><![CDATA[(]]><methodname><![CDATA[m1]]></methodname><![CDATA[)]]>
<methodname><![CDATA[Df]]></methodname><![CDATA[ ]]><methodname><![CDATA[Sum]]></methodname><![CDATA[ ]]><methodname><![CDATA[Sq]]></methodname><![CDATA[ ]]><methodname><![CDATA[Mean]]></methodname><![CDATA[ ]]><methodname><![CDATA[Sq]]></methodname><![CDATA[ ]]><token><![CDATA[F]]></token><![CDATA[ ]]><methodname><![CDATA[value]]></methodname><![CDATA[   ]]><methodname><![CDATA[Pr]]></methodname><![CDATA[(>]]><token><![CDATA[F]]></token><![CDATA[)    ]]>
<methodname><![CDATA[ind]]></methodname><![CDATA[          3  273.4   91.13   10.51 4.17e-05 ***]]>
<methodname><![CDATA[Residuals]]></methodname><![CDATA[   36  312.3    8.68                     ]]>
<![CDATA[> ]]><methodname><![CDATA[TukeyHSD]]></methodname><![CDATA[(]]><methodname><![CDATA[m1]]></methodname><![CDATA[)]]>
<methodname><![CDATA[Tukey]]></methodname><![CDATA[ ]]><methodname><![CDATA[multiple]]></methodname><![CDATA[ ]]><methodname><![CDATA[comparisons]]></methodname><![CDATA[ ]]><methodname><![CDATA[of]]></methodname><![CDATA[ ]]><methodname><![CDATA[means]]></methodname>
<methodname><![CDATA[diff]]></methodname><![CDATA[           ]]><methodname><![CDATA[lwr]]></methodname><![CDATA[        ]]><methodname><![CDATA[upr]]></methodname><![CDATA[     ]]><methodname><![CDATA[p]]></methodname><![CDATA[ ]]><methodname><![CDATA[adj]]></methodname>
</programlisting><para>* Visualize your results\scriptsize </para><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[ggplot]]></methodname><![CDATA[(]]><methodname><![CDATA[datalong]]></methodname><![CDATA[,]]><methodname><![CDATA[aes]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[=]]><methodname><![CDATA[ind]]></methodname><![CDATA[,]]><methodname><![CDATA[y]]></methodname><![CDATA[=]]><methodname><![CDATA[values]]></methodname><![CDATA[)) + ]]>
<![CDATA[+        ]]><methodname><![CDATA[geom_boxplot]]></methodname><![CDATA[()  ]]>
</programlisting><para><ulink url="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=aovgr1.png"><inlinemediaobject><imageobject><imagedata depth="400" fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=aovgr1.png" width="800"/></imageobject><textobject><phrase>attachment:aovgr1.png</phrase></textobject></inlinemediaobject></ulink>  </para><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[granovagg.1w]]></methodname><![CDATA[(]]><methodname><![CDATA[datalong]]></methodname><![CDATA[$]]><methodname><![CDATA[values]]></methodname><![CDATA[,]]><methodname><![CDATA[group]]></methodname><![CDATA[ = ]]><methodname><![CDATA[datalong]]></methodname><![CDATA[$]]><methodname><![CDATA[ind]]></methodname><![CDATA[)]]>
<methodname><![CDATA[group]]></methodname><![CDATA[ ]]><methodname><![CDATA[group.mean]]></methodname><![CDATA[ ]]><methodname><![CDATA[trimmed.mean]]></methodname><![CDATA[ ]]><methodname><![CDATA[contrast]]></methodname><![CDATA[ ]]><methodname><![CDATA[variance]]></methodname><![CDATA[ ]]><methodname><![CDATA[standard.deviation]]></methodname>
<![CDATA[4  ]]><methodname><![CDATA[Placebo]]></methodname><![CDATA[      20.43        20.30    -3.65     5.83               2.41]]>
<![CDATA[3   ]]><methodname><![CDATA[Drug.B]]></methodname><![CDATA[      23.82        23.85    -0.26     7.50               2.74]]>
<![CDATA[1   ]]><methodname><![CDATA[Drug.A]]></methodname><![CDATA[      24.27        24.45     0.19     7.89               2.81]]>
<![CDATA[2 ]]><methodname><![CDATA[Drug.A.B]]></methodname><![CDATA[      27.81        27.52     3.73    13.49               3.67]]>
<![CDATA[4         10]]>
<![CDATA[3         10]]>
<![CDATA[1         10]]>
<![CDATA[2         10]]>
<methodname><![CDATA[Below]]></methodname><![CDATA[ ]]><methodname><![CDATA[is]]></methodname><![CDATA[ ]]><methodname><![CDATA[a]]></methodname><![CDATA[ ]]><methodname><![CDATA[linear]]></methodname><![CDATA[ ]]><methodname><![CDATA[model]]></methodname><![CDATA[ ]]><methodname><![CDATA[summary]]></methodname><![CDATA[ ]]><methodname><![CDATA[of]]></methodname><![CDATA[ ]]><methodname><![CDATA[your]]></methodname><![CDATA[ ]]><methodname><![CDATA[input]]></methodname><![CDATA[ ]]><methodname><![CDATA[data]]></methodname>
<methodname><![CDATA[Min]]></methodname><![CDATA[     1]]><methodname><![CDATA[Q]]></methodname><![CDATA[ ]]><methodname><![CDATA[Median]]></methodname><![CDATA[     3]]><methodname><![CDATA[Q]]></methodname><![CDATA[    ]]><methodname><![CDATA[Max]]></methodname><![CDATA[ ]]>
<methodname><![CDATA[Estimate]]></methodname><![CDATA[ ]]><methodname><![CDATA[Std.]]></methodname><![CDATA[ ]]><methodname><![CDATA[Error]]></methodname><![CDATA[ ]]><methodname><![CDATA[t]]></methodname><![CDATA[ ]]><methodname><![CDATA[value]]></methodname><![CDATA[ ]]><methodname><![CDATA[Pr]]></methodname><![CDATA[(>|]]><methodname><![CDATA[t]]></methodname><![CDATA[|)    ]]>
<methodname><![CDATA[groupPlacebo]]></methodname><![CDATA[   -3.8400     1.3172  -2.915  0.00608 ** ]]>
<methodname><![CDATA[Residual]]></methodname><![CDATA[ ]]><methodname><![CDATA[standard]]></methodname><![CDATA[ ]]><methodname><![CDATA[error]]></methodname><![CDATA[: 2.945 ]]><methodname><![CDATA[on]]></methodname><![CDATA[ 36 ]]><methodname><![CDATA[degrees]]></methodname><![CDATA[ ]]><methodname><![CDATA[of]]></methodname><![CDATA[ ]]><methodname><![CDATA[freedom]]></methodname>
<methodname><![CDATA[Multiple]]></methodname><![CDATA[ ]]><methodname><![CDATA[R]]></methodname><![CDATA[-]]><methodname><![CDATA[squared]]></methodname><![CDATA[:  0.4668,    ]]><methodname><![CDATA[Adjusted]]></methodname><![CDATA[ ]]><methodname><![CDATA[R]]></methodname><![CDATA[-]]><methodname><![CDATA[squared]]></methodname><![CDATA[:  0.4223 ]]>
</programlisting><para><ulink url="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=aovgr1.png"><inlinemediaobject><imageobject><imagedata depth="400" fileref="https://wiki.init.mpg.de/IT4Science/RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/BasicModels?action=AttachFile&amp;do=get&amp;target=aovgr1.png" width="800"/></imageobject><textobject><phrase>attachment:aovgr1.png</phrase></textobject></inlinemediaobject></ulink>  </para></section></article>