welcome: please sign in
location: Änderungen von "RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/FunctionsInR/ApplyR"
Unterschiede zwischen den Revisionen 5 und 6
Revision 5 vom 2015-05-01 10:46:07
Größe: 6377
Kommentar:
Revision 6 vom 2015-05-01 10:48:36
Größe: 6388
Kommentar:
Gelöschter Text ist auf diese Art markiert. Hinzugefügter Text ist auf diese Art markiert.
Zeile 80: Zeile 80:
 * the class() function shows the class of an object use it in combination with lapply() to get the classes of the columns of the quine data frame
 * do the same with sapply()  what is the difference
 * try to combine this with what you learned about indexing and create a new data frame quine2 only containing the columns which are factors
 * calculate the row and column means of the below defined matrix m using the apply function PS: in real life application use the rowMeans() and colMeans() function
 * the class() function shows the class of an object, use it in combination with lapply() to get the classes of the columns of the quine data frame
 * do the same with sapply() - what is the difference?
 *  try to combine this with what you learned about indexing and create a new data frame quine2 only containing the columns which are factors
 * calculate the row and column means of the below defined matrix m using the apply function PS: in real life application use the rowMeans() and colMeans() function instead

Introduction

Every function in R has three important characteristics:

  • a body (the code inside the function) - body()
  • arguments (the list of arguments which controls how you can call the function) - formals()
  • an environment (the “map” of the location of the function’s variables) - environment()

You can see all three parts if you type the name of the function without brackets. Exceptions are primitives. Primitive functions, like sum(), call C code directly with .Primitive() and contain no R code. Therefore their formals(), body(), and environment() are all NULL.

Functions

   1 > chisq.test
   2 function (x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)),
   3 DNAME <- deparse(substitute(x))
   4 if (is.data.frame(x))
   5 expected = E, residuals = (x - E)/sqrt(E), stdres = (x - ...
   6 
   7 > sum
   8 function (..., na.rm = FALSE)  .Primitive("sum")

Function Arguments

Arguments are matched

  • first by exact name (perfect matching)
  • then by prefix matching
  • and finally by position.

By default, R function arguments are lazy, they are only evaluated if they are actually used:

   1 > f <- function(x) {
   2 f <- function(x) {
   3 +   10
   4 + }
   5 > f(stop("This is an error!"))
   6 [1] 10
   7 >

Implicit Loops

Introduction

A common application of loops is to apply a function to each element of a set of values and collect the results in a single structure. In R this is mainly done by the higher order functions:

  • lapply()
  • sapply()
  • apply()
  • tapply()

lapply()

  • The functions lapply and sapply are similar, their first argument can be a list, data frame, matrix or vector, the second argument the function to "apply". The former return a list (hence "l") and the latter tries to simplify the results (hence the "s"). For example:

   1 > lapply(dat,mean)
   2 [1] 6753.636
   3 [1] 5433.182
   4 > sapply(dat,mean)

apply()

  • apply() this function can be applied to an array. Its argument is the array, the second the dimension/s where we want to apply a function and the third is the function. For example

   1 > x<-1:12
   2 > dim(x)<-c(2,2,3)
   3 > apply(x,3,quantile) ## calculate the quantiles

tapply()

  • The function tapply() allows you to create tables (hence the "t") of the value of a function on subgroups defined by its second argument, which can be a factor or a list of factors.

For example in the quine data frame, we can summarize Days classify by Eth and Lrn as follows:

   1 > tapply(Days,list(Eth,Lrn),mean)
   2 AL       SL
   3 A 18.57500 24.89655
   4 N 13.25581 10.82353
  • the class() function shows the class of an object, use it in combination with lapply() to get the classes of the columns of the quine data frame
  • do the same with sapply() - what is the difference?
  • try to combine this with what you learned about indexing and create a new data frame quine2 only containing the columns which are factors
  • calculate the row and column means of the below defined matrix m using the apply function PS: in real life application use the rowMeans() and colMeans() function instead

   1 m <- matrix(rnorm(100),nrow=10)
  • use tapply() to summarise the number of missing days at school per Ethnicity and/or per Sex (three lines) * sometimes the aggregate() function is more convenient; note the use of  #!latex $\sim$; it is read as 'is dependent on'and it is extensively used in modelling

   1 > aggregate(Days ~ Sex + Eth, data=quine,mean)
   2 Sex Eth     Days
   3 1   F   A 20.92105
   4 2   M   A 21.61290
   5 3   F   N 10.07143
   6 4   M   N 14.71429
   7 > aggregate(Days ~ Sex + Eth, data=quine,summary)
   8 Sex Eth Days.Min. Days.1st Qu. Days.Median Days.Mean Days.3rd Qu. Days.Max.
   9 1   F   A      0.00         5.25       13.50     20.92        30.25     81.00
  10 2   M   A      2.00         9.50       16.00     21.61        33.00     57.00
  11 3   F   N      0.00         5.00        7.00     10.07        14.00     37.00
  12 4   M   N      0.00         3.50        8.00     14.71        19.50     69.00

Function Exercises (Verzani)

  • Write a function to compute the average distance from the mean for some data vector.
  • Write a function f() which finds the average of the x values after squaring and substracts the square of the average of the numbers. Verify this output will always be non-negative by computing f(1:10)
  • An integer is even if the remainder upon dividing it by 2 is 0. This remainder is given by R with the syntax x \%\% 2. Use this to write a function iseven(). How would you write isodd()?
  • Write a function isprime() that checks if a number x is prime by dividing x by all values from 2,...,x-1 then checking to see if there is a remainder of 0.

Function Exercises (Verzani) Solutions

  • Write a function to compute the average distance from the mean for some data vector.

   1 > avg.dist <- function(x){
   2 +     xbar <- mean(x)
   3 +     mean(abs(x-xbar))
   4 + }

Function Exercises (Verzani) Solutions

  • Write a function f() which finds the average of the x values aufter squaring and substracts the square of the average of the numbers. Verify this output will always be non-negative by computing \texttt{f(1:10)}

   1 > f <- function(x){
   2 +     mean(x**2) - mean(x)**2
   3 + }
   4 > f(1:10)
   5 [1] 8.25

Function Exercises (Verzani) Solutions

  • An integer is even if the remainder upon dividing it by 2 is 0. This remainder is given by R with the syntax \texttt{ x \%\% 2}. Use this to write a function iseven(). How would you write isodd()?

   1 > iseven <- function(x){
   2 +     x %% 2 == 0
   3 + }
   4 > iseven(1:10)
   5 [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
   6 > isodd <- function(x){
   7 +     !iseven(x)
   8 + }
   9 > isodd(1:10)
  10 [1]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE

Function Exercises (Verzani) Solutions

  • Write a function isprime() that checks if a number x is prime by dividing x by all values \texttt{$2,\ldots,x-1}}}} then checking to see if there is a remainder of 0.

   1 > isprime <- function(x){
   2 +     if(x == 2) return(TRUE)
   3 +     !(0 %in% (x %% (2:(x-1))))
   4 + }
   5 > isprime(2)
   6 [1] TRUE
   7 > isprime(5)
   8 [1] TRUE
   9 > isprime(15)
  10 [1] FALSE

RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/FunctionsInR/ApplyR (zuletzt geändert am 2015-05-01 10:48:36 durch mandy.vogel@googlemail.com)