= Introduction = Every function in R has three important characteristics: * a body (the code inside the function) - body() * arguments (the list of arguments which controls how you can call the function) - formals() * an environment (the “map” of the location of the function’s variables) - environment() You can see all three parts if you type the name of the function without brackets. Exceptions are primitives. Primitive functions, like sum(), call C code directly with .Primitive() and contain no R code. Therefore their formals(), body(), and environment() are all NULL. == Functions == {{{#!highlight r > chisq.test function (x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), DNAME <- deparse(substitute(x)) if (is.data.frame(x)) expected = E, residuals = (x - E)/sqrt(E), stdres = (x - ... > sum function (..., na.rm = FALSE) .Primitive("sum") }}} == Function Arguments == Arguments are matched * first by exact name (perfect matching) * then by prefix matching * and finally by position. By default, R function arguments are lazy, they are only evaluated if they are actually used: {{{#!highlight r > f <- function(x) { f <- function(x) { + 10 + } > f(stop("This is an error!")) [1] 10 > }}} = Implicit Loops = == Introduction == A common application of loops is to apply a function to each element of a set of values and collect the results in a single structure. In R this is mainly done by the higher order functions: * lapply() * sapply() * apply() * tapply() == lapply() == * The functions lapply and sapply are similar, their first argument can be a list, data frame, matrix or vector, the second argument the function to "apply". The former return a list (hence "l") and the latter tries to simplify the results (hence the "s"). For example: {{{#!highlight r > lapply(dat,mean) [1] 6753.636 [1] 5433.182 > sapply(dat,mean) }}} == apply() == * apply() this function can be applied to an array. Its argument is the array, the second the dimension/s where we want to apply a function and the third is the function. For example {{{#!highlight r > x<-1:12 > dim(x)<-c(2,2,3) > apply(x,3,quantile) ## calculate the quantiles }}} == tapply() == * The function tapply() allows you to create tables (hence the "t") of the value of a function on subgroups defined by its second argument, which can be a factor or a list of factors. For example in the quine data frame, we can summarize Days classify by Eth and Lrn as follows: {{{#!highlight r > tapply(Days,list(Eth,Lrn),mean) AL SL A 18.57500 24.89655 N 13.25581 10.82353 }}} * the class() function shows the class of an object use it in combination with lapply() to get the classes of the columns of the quine data frame * do the same with sapply() what is the difference * try to combine this with what you learned about indexing and create a new data frame quine2 only containing the columns which are factors * calculate the row and column means of the below defined matrix m using the apply function PS: in real life application use the rowMeans() and colMeans() function {{{#!highlight r m <- matrix(rnorm(100),nrow=10) }}} * use tapply() to summarise the number of missing days at school per Ethnicity and/or per Sex (three lines) * sometimes the aggregate() function is more convenient; note the use of {{{ #!latex $\sim$;}}} it is read as 'is dependent on'and it is extensively used in modelling {{{#!highlight r > aggregate(Days ~ Sex + Eth, data=quine,mean) Sex Eth Days 1 F A 20.92105 2 M A 21.61290 3 F N 10.07143 4 M N 14.71429 > aggregate(Days ~ Sex + Eth, data=quine,summary) Sex Eth Days.Min. Days.1st Qu. Days.Median Days.Mean Days.3rd Qu. Days.Max. 1 F A 0.00 5.25 13.50 20.92 30.25 81.00 2 M A 2.00 9.50 16.00 21.61 33.00 57.00 3 F N 0.00 5.00 7.00 10.07 14.00 37.00 4 M N 0.00 3.50 8.00 14.71 19.50 69.00 }}} == Function Exercises (Verzani) == * Write a function to compute the average distance from the mean for some data vector. * Write a function f() which finds the average of the x values after squaring and substracts the square of the average of the numbers. Verify this output will always be non-negative by computing f(1:10) * An integer is even if the remainder upon dividing it by 2 is 0. This remainder is given by R with the syntax x \%\% 2. Use this to write a function iseven(). How would you write isodd()? * Write a function isprime() that checks if a number x is prime by dividing x by all values from 2,...,x-1 then checking to see if there is a remainder of 0. == Function Exercises (Verzani) Solutions == * Write a function to compute the average distance from the mean for some data vector. {{{#!highlight r > avg.dist <- function(x){ + xbar <- mean(x) + mean(abs(x-xbar)) + } }}} == Function Exercises (Verzani) Solutions == * Write a function f() which finds the average of the x values aufter squaring and substracts the square of the average of the numbers. Verify this output will always be non-negative by computing \texttt{f(1:10)} {{{#!highlight r > f <- function(x){ + mean(x**2) - mean(x)**2 + } > f(1:10) [1] 8.25 }}} == Function Exercises (Verzani) Solutions == * An integer is even if the remainder upon dividing it by 2 is 0. This remainder is given by R with the syntax \texttt{ x \%\% 2}. Use this to write a function iseven(). How would you write isodd()? {{{#!highlight r > iseven <- function(x){ + x %% 2 == 0 + } > iseven(1:10) [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE > isodd <- function(x){ + !iseven(x) + } > isodd(1:10) [1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE }}} == Function Exercises (Verzani) Solutions == * Write a function isprime() that checks if a number x is prime by dividing x by all values \texttt{$2,\ldots,x-1}}}} then checking to see if there is a remainder of 0. {{{#!highlight r > isprime <- function(x){ + if(x == 2) return(TRUE) + !(0 %in% (x %% (2:(x-1)))) + } > isprime(2) [1] TRUE > isprime(5) [1] TRUE > isprime(15) [1] FALSE }}}