welcome: please sign in

Revision 6 vom 2015-05-07 07:36:43

Nachricht löschen
location: RstatisTik / RstatisTikPortal / RcourSe / CourseOutline / BasicModels

Choosing the appropriate method

It is essential, therefore, that you can answer the following questions:

Choosing the appropriate method

Explanatory Variables are

all continuous

Regression

all categorical

Analysis of variance (ANOVA)

both continuous and categorical

Analysis of covariance (ANCOVA)

Choosing the appropriate method

Response Variables

(a) Continuous

Normal regression, ANOVA or ANCOVA

(b) Proportion

Logistic regression

(c) Count

Log-linear models

(d) Binary

Binary logistic analysis

(e) Time at death

Survival analysis

The best model is the model that produces the least unexplained variation (the minimal residual)

Choosing the appropriate method

Maximum Likelihood

We define best in terms of maximum likelihood.

We judge the model on the basis how likely the data would be if the model were correct.

Ockham's Razor

The principle is attributed to William of Ockham, who insisted that, given a set of equally good explanations for a given phenomenon, the correct explanation is the simplest explanation. The most useful statement of the principle for scientists is when you have two competing theories which make exactly the same predictions, the one that is simpler is the better.

Ockham's Razor

For statistical modelling, the principle of parsimony means that:

Types of Models

Fitting models to data is the central function of R. There are no fixed rules and no absolutes. The object is to determine a minimal adequate model from a large set of potential models. For this reason we looking at the following types of models:

The Null model

attachment:nullmodel.png

Adding Information

Minimal adequate model: attachment:minimalmodel.png

$0 \le p' \le p$

}}} parameters

$n-p'-1$

$r^2 = \frac{SSR}{SSY}$

Adding Information

Minimal adequate model: attachment:minimalmodel2.png

$0 \le p' \le p$

parameters

$n-p'-1$

$r^2 = \frac{SSA}{SSY}$

Saturated/Maximal Model

saturated model

maximal model

$n-p-1$

How to choose...

ANOVA

$F=t^2$)

The Garden Data

A data frame with 14 observations on 2 variables.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

ozone

9

7

6

8

5

11

9

11

9

6

10

8

8

12

garden

a

a

a

b

a

b

b

b

b

a

b

a

a

b

Total Sum of Squares

attachment:TSS1.png

Total Sum of Squares

attachment:TSS.png

attachment:TSS2.png

Group Means

garden

a

b

mean

7

10

Group Means

[ANHÄNGEN]

Error Sum of Squares

attachment:ESS2.png

Treatment Sum of Squares

ANOVA table

Source

Sum of squares

Degrees of freedom

Mean square

F ratio

Garden

31.5

1

31.5

15.75

Error

24.0

12

s^2=2.0

Total

55.5

13

ANOVA

   1 > 1 - pf(15.75,1,12)
   2 [1] 0.001864103

XXX noch Grafik hochladen attachment:fdens.png

ANOVA in R

   1 mm <- lm(ozone ~ garden, data=oneway)
   2 7            3  

   1 > summary(mm)
   2 Min     1Q Median     3Q    Max 
   3 Estimate Std. Error t value Pr(>|t|)    
   4 gardenb       3.0000     0.7559   3.969  0.00186 ** 
   5 Residual standard error: 1.414 on 12 degrees of freedom
   6 Multiple R-squared:  0.5676,    Adjusted R-squared:  0.5315 

   1 > anova(mm)
   2 Analysis of Variance Table
   3 Df Sum Sq Mean Sq F value   Pr(>F)   
   4 garden     1   31.5    31.5   15.75 0.001864 **
   5 Residuals 12   24.0     2.0                    

   1 > m2 <- aov(ozone ~ garden, data=oneway)
   2 > m2
   3 garden Residuals
   4 Sum of Squares    31.5      24.0
   5 Residual standard error: 1.414214
   6 Estimated effects may be unbalanced
   7 > summary(m2)
   8 Df Sum Sq Mean Sq F value  Pr(>F)   
   9 garden       1   31.5    31.5   15.75 0.00186 **
  10 Residuals   12   24.0     2.0                   
  11 > summary.lm(m2)
  12 Min     1Q Median     3Q    Max 
  13 Estimate Std. Error t value Pr(>|t|)    
  14 gardenb       3.0000     0.7559   3.969  0.00186 ** 
  15 Residual standard error: 1.414 on 12 degrees of freedom
  16 Multiple R-squared:  0.5676,    Adjusted R-squared:  0.5315 
  17 > summary(m2)
  18 Df Sum Sq Mean Sq F value  Pr(>F)   
  19 garden       1   31.5    31.5   15.75 0.00186 **
  20 Residuals   12   24.0     2.0                   

ANOVA Assumptions

Welch ANOVA

* Look at the help of the TukeyHSD function. What is its purpose? * Execute the code of the example near the end of the help page, interpret the results! * install and load the granovaGG package (a package for visualization of ANOVAs), load the arousal data frame and use the stack() command to bring the data in the long form. Do a anova analysis. Is there a difference at least 2 of the groups? If indicated do a post-hoc test. * Visualize your results

Exercises and Solutions

* Look at the help of the TukeyHSD function. What is its purpose? * Execute the code of the example near the end of the help page, interpret the results! * install and load the granovaGG package (a package for visualization of ANOVAs), load the arousal data frame and use the stack() command to bring the data in the long form. Do a anova analysis. Is there a difference at least 2 of the groups? If indicated do a post-hoc test.

   1 > require(granovaGG)
   2 > data(arousal)
   3 > datalong <- stack(arousal)
   4 > m1 <- aov(values ~ ind, data = datalong)
   5 > summary(m1)
   6 Df Sum Sq Mean Sq F value   Pr(>F)    
   7 ind          3  273.4   91.13   10.51 4.17e-05 ***
   8 Residuals   36  312.3    8.68                     
   9 > TukeyHSD(m1)
  10 Tukey multiple comparisons of means
  11 diff           lwr        upr     p adj

* Visualize your results\scriptsize

   1 > ggplot(datalong,aes(x=ind,y=values)) + 
   2 +        geom_boxplot()  

attachment:aovgr1.png

   1 > granovagg.1w(datalong$values,group = datalong$ind)
   2 group group.mean trimmed.mean contrast variance standard.deviation
   3 4  Placebo      20.43        20.30    -3.65     5.83               2.41
   4 3   Drug.B      23.82        23.85    -0.26     7.50               2.74
   5 1   Drug.A      24.27        24.45     0.19     7.89               2.81
   6 2 Drug.A.B      27.81        27.52     3.73    13.49               3.67
   7 4         10
   8 3         10
   9 1         10
  10 2         10
  11 Below is a linear model summary of your input data
  12 Min     1Q Median     3Q    Max 
  13 Estimate Std. Error t value Pr(>|t|)    
  14 groupPlacebo   -3.8400     1.3172  -2.915  0.00608 ** 
  15 Residual standard error: 2.945 on 36 degrees of freedom
  16 Multiple R-squared:  0.4668,    Adjusted R-squared:  0.4223 

attachment:aovgr1.png