<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article  PUBLIC '-//OASIS//DTD DocBook XML V4.4//EN'  'http://www.docbook.org/xml/4.4/docbookx.dtd'><article><articleinfo><title>RstatisTik/RstatisTikPortal/RcourSe/CourseOutline/FunctionsInR/ApplyR</title><revhistory><revision><revnumber>6</revnumber><date>2015-05-01 10:48:36</date><authorinitials>mandy.vogel@googlemail.com</authorinitials></revision><revision><revnumber>5</revnumber><date>2015-05-01 10:46:07</date><authorinitials>mandy.vogel@googlemail.com</authorinitials></revision><revision><revnumber>4</revnumber><date>2015-05-01 09:10:11</date><authorinitials>mandy.vogel@googlemail.com</authorinitials></revision><revision><revnumber>3</revnumber><date>2015-05-01 08:27:34</date><authorinitials>mandy.vogel@googlemail.com</authorinitials></revision><revision><revnumber>2</revnumber><date>2015-05-01 08:25:45</date><authorinitials>mandy.vogel@googlemail.com</authorinitials></revision><revision><revnumber>1</revnumber><date>2015-05-01 08:23:34</date><authorinitials>mandy.vogel@googlemail.com</authorinitials></revision></revhistory></articleinfo><section><title>Introduction</title><para>Every function in R has three important characteristics: </para><itemizedlist><listitem><para>a body (the code inside the function) - body() </para></listitem><listitem><para>arguments (the list of arguments which controls how you can call the function) - formals() </para></listitem><listitem><para>an environment (the “map” of the location of the function’s variables) - environment() </para></listitem></itemizedlist><para>You can see all three parts if you type the name of the function without brackets. Exceptions are primitives. Primitive functions, like sum(), call C code directly with .Primitive() and contain no R code. Therefore their formals(), body(), and environment() are all NULL. </para><section><title>Functions</title><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[chisq.test]]></methodname>
<methodname><![CDATA[function ]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[, ]]><methodname><![CDATA[y]]></methodname><![CDATA[ = ]]><symbol><![CDATA[NULL]]></symbol><![CDATA[, ]]><methodname><![CDATA[correct]]></methodname><![CDATA[ = ]]><symbol><![CDATA[TRUE]]></symbol><![CDATA[, ]]><methodname><![CDATA[p]]></methodname><![CDATA[ = ]]><methodname><![CDATA[rep]]></methodname><![CDATA[(1/]]><methodname><![CDATA[length]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[), ]]><methodname><![CDATA[length]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[)),]]>
<methodname><![CDATA[DNAME]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[deparse]]></methodname><![CDATA[(]]><methodname><![CDATA[substitute]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[))]]>
<methodname><![CDATA[if ]]></methodname><![CDATA[(]]><methodname><![CDATA[is.data.frame]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[))]]>
<methodname><![CDATA[expected]]></methodname><![CDATA[ = ]]><methodname><![CDATA[E]]></methodname><![CDATA[, ]]><methodname><![CDATA[residuals]]></methodname><![CDATA[ = (]]><methodname><![CDATA[x]]></methodname><![CDATA[ - ]]><methodname><![CDATA[E]]></methodname><![CDATA[)/]]><methodname><![CDATA[sqrt]]></methodname><![CDATA[(]]><methodname><![CDATA[E]]></methodname><![CDATA[), ]]><methodname><![CDATA[stdres]]></methodname><![CDATA[ = (]]><methodname><![CDATA[x]]></methodname><![CDATA[ - ]]><symbol><![CDATA[...]]></symbol>

<![CDATA[> ]]><methodname><![CDATA[sum]]></methodname>
<methodname><![CDATA[function ]]></methodname><![CDATA[(]]><symbol><![CDATA[...]]></symbol><![CDATA[, ]]><methodname><![CDATA[na.rm]]></methodname><![CDATA[ = ]]><symbol><![CDATA[FALSE]]></symbol><![CDATA[)  ]]><methodname><![CDATA[.Primitive]]></methodname><![CDATA[(]]><phrase><![CDATA["]]></phrase><phrase><![CDATA[sum"]]></phrase><![CDATA[)]]>
</programlisting></section><section><title>Function Arguments</title><para>Arguments are matched </para><itemizedlist><listitem><para>first by exact name (perfect matching) </para></listitem><listitem><para>then by prefix matching </para></listitem><listitem><para>and finally by position. </para></listitem></itemizedlist><para>By default, R function arguments are lazy, they are only evaluated if they are actually used: </para><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[f]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[function]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[) {]]>
<methodname><![CDATA[f]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[function]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[) {]]>
<![CDATA[+   10]]>
<![CDATA[+ }]]>
<![CDATA[> ]]><methodname><![CDATA[f]]></methodname><![CDATA[(]]><methodname><![CDATA[stop]]></methodname><![CDATA[(]]><phrase><![CDATA["]]></phrase><phrase><![CDATA[This is an error!"]]></phrase><![CDATA[))]]>
<methodname><![CDATA[[1]]></methodname><methodname><![CDATA[]]]></methodname><![CDATA[ 10]]>
<![CDATA[>]]>
</programlisting></section></section><section><title>Implicit Loops</title><section><title>Introduction</title><para>A common application of loops is  to apply a function to each element of a set of values and collect the results in a single structure. In R this is mainly done by the higher order functions: </para><itemizedlist><listitem><para>lapply() </para></listitem><listitem><para>sapply() </para></listitem><listitem><para>apply() </para></listitem><listitem><para>tapply() </para></listitem></itemizedlist></section><section><title>lapply()</title><itemizedlist><listitem><para>The functions lapply and sapply are similar, their first argument can be a list, data frame, matrix or vector, the second argument the function to &quot;apply&quot;. The former return a list (hence &quot;l&quot;) and the latter tries to simplify the results (hence the &quot;s&quot;).  For example: </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[lapply]]></methodname><![CDATA[(]]><methodname><![CDATA[dat]]></methodname><![CDATA[,]]><methodname><![CDATA[mean]]></methodname><![CDATA[)]]>
<methodname><![CDATA[[1]]></methodname><methodname><![CDATA[]]]></methodname><![CDATA[ 6753.636]]>
<methodname><![CDATA[[1]]></methodname><methodname><![CDATA[]]]></methodname><![CDATA[ 5433.182]]>
<![CDATA[> ]]><methodname><![CDATA[sapply]]></methodname><![CDATA[(]]><methodname><![CDATA[dat]]></methodname><![CDATA[,]]><methodname><![CDATA[mean]]></methodname><![CDATA[)]]>
</programlisting></section><section><title>apply()</title><itemizedlist><listitem><para>apply() this function can be applied to an array. Its argument is the array, the second the dimension/s where we want to apply a function and the third is the function. For example </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[x]]></methodname><![CDATA[<-1:12]]>
<![CDATA[> ]]><methodname><![CDATA[dim]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[)<-]]><methodname><![CDATA[c]]></methodname><![CDATA[(2,2,3)]]>
<![CDATA[> ]]><methodname><![CDATA[apply]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[,3,]]><methodname><![CDATA[quantile]]></methodname><![CDATA[) ]]><lineannotation><![CDATA[## calculate the quantiles]]></lineannotation>
</programlisting></section><section><title>tapply()</title><itemizedlist><listitem><para>The function tapply() allows you to create tables (hence the &quot;t&quot;) of the value of a function on subgroups defined by its second argument, which can be a factor or a list of factors. </para></listitem></itemizedlist><para>For example in the quine data frame, we can  summarize Days classify by Eth and Lrn as follows: </para><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[tapply]]></methodname><![CDATA[(]]><methodname><![CDATA[Days]]></methodname><![CDATA[,]]><methodname><![CDATA[list]]></methodname><![CDATA[(]]><methodname><![CDATA[Eth]]></methodname><![CDATA[,]]><methodname><![CDATA[Lrn]]></methodname><![CDATA[),]]><methodname><![CDATA[mean]]></methodname><![CDATA[)]]>
<methodname><![CDATA[AL]]></methodname><![CDATA[       ]]><methodname><![CDATA[SL]]></methodname>
<methodname><![CDATA[A]]></methodname><![CDATA[ 18.57500 24.89655]]>
<methodname><![CDATA[N]]></methodname><![CDATA[ 13.25581 10.82353]]>
</programlisting><itemizedlist><listitem><para>the class() function shows the class of an object, use it in combination with lapply() to get the  classes of the columns of the quine data frame  </para></listitem><listitem><para>do the same with sapply() - what is the difference? </para></listitem><listitem><para>try to combine this with what you learned about indexing and create a new data frame quine2 only containing the columns which are factors  </para></listitem><listitem><para>calculate the row and column means of the below defined matrix m using the apply function PS: in real life application use the rowMeans() and colMeans() function instead </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><methodname><![CDATA[m]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[matrix]]></methodname><![CDATA[(]]><methodname><![CDATA[rnorm]]></methodname><![CDATA[(100),]]><methodname><![CDATA[nrow]]></methodname><![CDATA[=10)]]>
</programlisting><itemizedlist><listitem><para>use tapply() to summarise the number of missing days at school per Ethnicity and/or per Sex (three lines) *  sometimes the aggregate() function is more convenient; note the use of <code> #!latex $\sim$;</code> it is read as 'is dependent on'and it is extensively used in modelling </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[aggregate]]></methodname><![CDATA[(]]><methodname><![CDATA[Days]]></methodname><![CDATA[ ~ ]]><methodname><![CDATA[Sex]]></methodname><![CDATA[ + ]]><methodname><![CDATA[Eth]]></methodname><![CDATA[, ]]><methodname><![CDATA[data]]></methodname><![CDATA[=]]><methodname><![CDATA[quine]]></methodname><![CDATA[,]]><methodname><![CDATA[mean]]></methodname><![CDATA[)]]>
<methodname><![CDATA[Sex]]></methodname><![CDATA[ ]]><methodname><![CDATA[Eth]]></methodname><![CDATA[     ]]><methodname><![CDATA[Days]]></methodname>
<![CDATA[1   ]]><token><![CDATA[F]]></token><![CDATA[   ]]><methodname><![CDATA[A]]></methodname><![CDATA[ 20.92105]]>
<![CDATA[2   ]]><methodname><![CDATA[M]]></methodname><![CDATA[   ]]><methodname><![CDATA[A]]></methodname><![CDATA[ 21.61290]]>
<![CDATA[3   ]]><token><![CDATA[F]]></token><![CDATA[   ]]><methodname><![CDATA[N]]></methodname><![CDATA[ 10.07143]]>
<![CDATA[4   ]]><methodname><![CDATA[M]]></methodname><![CDATA[   ]]><methodname><![CDATA[N]]></methodname><![CDATA[ 14.71429]]>
<![CDATA[> ]]><methodname><![CDATA[aggregate]]></methodname><![CDATA[(]]><methodname><![CDATA[Days]]></methodname><![CDATA[ ~ ]]><methodname><![CDATA[Sex]]></methodname><![CDATA[ + ]]><methodname><![CDATA[Eth]]></methodname><![CDATA[, ]]><methodname><![CDATA[data]]></methodname><![CDATA[=]]><methodname><![CDATA[quine]]></methodname><![CDATA[,]]><methodname><![CDATA[summary]]></methodname><![CDATA[)]]>
<methodname><![CDATA[Sex]]></methodname><![CDATA[ ]]><methodname><![CDATA[Eth]]></methodname><![CDATA[ ]]><methodname><![CDATA[Days.Min.]]></methodname><![CDATA[ ]]><methodname><![CDATA[Days.1st]]></methodname><![CDATA[ ]]><methodname><![CDATA[Qu.]]></methodname><![CDATA[ ]]><methodname><![CDATA[Days.Median]]></methodname><![CDATA[ ]]><methodname><![CDATA[Days.Mean]]></methodname><![CDATA[ ]]><methodname><![CDATA[Days.3rd]]></methodname><![CDATA[ ]]><methodname><![CDATA[Qu.]]></methodname><![CDATA[ ]]><methodname><![CDATA[Days.Max.]]></methodname>
<![CDATA[1   ]]><token><![CDATA[F]]></token><![CDATA[   ]]><methodname><![CDATA[A]]></methodname><![CDATA[      0.00         5.25       13.50     20.92        30.25     81.00]]>
<![CDATA[2   ]]><methodname><![CDATA[M]]></methodname><![CDATA[   ]]><methodname><![CDATA[A]]></methodname><![CDATA[      2.00         9.50       16.00     21.61        33.00     57.00]]>
<![CDATA[3   ]]><token><![CDATA[F]]></token><![CDATA[   ]]><methodname><![CDATA[N]]></methodname><![CDATA[      0.00         5.00        7.00     10.07        14.00     37.00]]>
<![CDATA[4   ]]><methodname><![CDATA[M]]></methodname><![CDATA[   ]]><methodname><![CDATA[N]]></methodname><![CDATA[      0.00         3.50        8.00     14.71        19.50     69.00]]>
</programlisting></section><section><title>Function Exercises (Verzani)</title><itemizedlist><listitem><para>Write a function to compute the average distance from the mean for some data vector.  </para></listitem><listitem><para>Write a function f() which finds the average of the x values after squaring and substracts the square of the average of the numbers. Verify this output will always be non-negative by computing f(1:10)  </para></listitem><listitem><para>An integer is even if the remainder upon dividing it by 2 is 0. This remainder is given by R with the syntax x \%\% 2. Use this to write a function iseven(). How would you write isodd()?  </para></listitem><listitem><para>Write a function isprime() that checks if a number x is prime by dividing x by all values from 2,...,x-1 then checking to see if there is a remainder of 0. </para></listitem></itemizedlist></section><section><title>Function Exercises (Verzani) Solutions</title><itemizedlist><listitem><para>Write a function to compute the average distance from the mean for some data vector. </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[avg.dist]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[function]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[){]]>
<![CDATA[+     ]]><methodname><![CDATA[xbar]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[mean]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[)]]>
<![CDATA[+     ]]><methodname><![CDATA[mean]]></methodname><![CDATA[(]]><methodname><![CDATA[abs]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[-]]><methodname><![CDATA[xbar]]></methodname><![CDATA[))]]>
<![CDATA[+ }]]>
</programlisting></section><section><title>Function Exercises (Verzani) Solutions</title><itemizedlist><listitem><para>Write a function f() which finds the average of the x values aufter squaring and substracts the square of the average of the numbers. Verify this output will always be non-negative by computing \texttt{f(1:10)} </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[f]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[function]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[){]]>
<![CDATA[+     ]]><methodname><![CDATA[mean]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[**2) - ]]><methodname><![CDATA[mean]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[)**2]]>
<![CDATA[+ }]]>
<![CDATA[> ]]><methodname><![CDATA[f]]></methodname><![CDATA[(1:10)]]>
<methodname><![CDATA[[1]]></methodname><methodname><![CDATA[]]]></methodname><![CDATA[ 8.25]]>
</programlisting></section><section><title>Function Exercises (Verzani) Solutions</title><itemizedlist><listitem><para>An integer is even if the remainder upon dividing it by 2 is 0. This remainder is given by R with the syntax \texttt{ x \%\% 2}. Use this to write a function iseven(). How would you write isodd()? </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[iseven]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[function]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[){]]>
<![CDATA[+     ]]><methodname><![CDATA[x]]></methodname><![CDATA[ %% 2 == 0]]>
<![CDATA[+ }]]>
<![CDATA[> ]]><methodname><![CDATA[iseven]]></methodname><![CDATA[(1:10)]]>
<methodname><![CDATA[[1]]></methodname><methodname><![CDATA[]]]></methodname><![CDATA[ ]]><symbol><![CDATA[FALSE]]></symbol><![CDATA[  ]]><symbol><![CDATA[TRUE]]></symbol><![CDATA[ ]]><symbol><![CDATA[FALSE]]></symbol><![CDATA[  ]]><symbol><![CDATA[TRUE]]></symbol><![CDATA[ ]]><symbol><![CDATA[FALSE]]></symbol><![CDATA[  ]]><symbol><![CDATA[TRUE]]></symbol><![CDATA[ ]]><symbol><![CDATA[FALSE]]></symbol><![CDATA[  ]]><symbol><![CDATA[TRUE]]></symbol><![CDATA[ ]]><symbol><![CDATA[FALSE]]></symbol><![CDATA[  ]]><symbol><![CDATA[TRUE]]></symbol>
<![CDATA[> ]]><methodname><![CDATA[isodd]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[function]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[){]]>
<![CDATA[+     !]]><methodname><![CDATA[iseven]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[)]]>
<![CDATA[+ }]]>
<![CDATA[> ]]><methodname><![CDATA[isodd]]></methodname><![CDATA[(1:10)]]>
<methodname><![CDATA[[1]]></methodname><methodname><![CDATA[]]]></methodname><![CDATA[  ]]><symbol><![CDATA[TRUE]]></symbol><![CDATA[ ]]><symbol><![CDATA[FALSE]]></symbol><![CDATA[  ]]><symbol><![CDATA[TRUE]]></symbol><![CDATA[ ]]><symbol><![CDATA[FALSE]]></symbol><![CDATA[  ]]><symbol><![CDATA[TRUE]]></symbol><![CDATA[ ]]><symbol><![CDATA[FALSE]]></symbol><![CDATA[  ]]><symbol><![CDATA[TRUE]]></symbol><![CDATA[ ]]><symbol><![CDATA[FALSE]]></symbol><![CDATA[  ]]><symbol><![CDATA[TRUE]]></symbol><![CDATA[ ]]><symbol><![CDATA[FALSE]]></symbol>
</programlisting></section><section><title>Function Exercises (Verzani) Solutions</title><itemizedlist><listitem><para>Write a function isprime() that checks if a number x is prime by dividing x by all values \texttt{$2,\ldots,x-1}}}} then checking to see if there is a remainder of 0. </para></listitem></itemizedlist><programlisting format="linespecific" language="highlight" linenumbering="numbered" startinglinenumber="1"><![CDATA[> ]]><methodname><![CDATA[isprime]]></methodname><![CDATA[ <- ]]><methodname><![CDATA[function]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[){]]>
<![CDATA[+     ]]><methodname><![CDATA[if]]></methodname><![CDATA[(]]><methodname><![CDATA[x]]></methodname><![CDATA[ == 2) ]]><methodname><![CDATA[return]]></methodname><![CDATA[(]]><symbol><![CDATA[TRUE]]></symbol><![CDATA[)]]>
<![CDATA[+     !(0 %in% (]]><methodname><![CDATA[x]]></methodname><![CDATA[ %% (2:(]]><methodname><![CDATA[x]]></methodname><![CDATA[-1))))]]>
<![CDATA[+ }]]>
<![CDATA[> ]]><methodname><![CDATA[isprime]]></methodname><![CDATA[(2)]]>
<methodname><![CDATA[[1]]></methodname><methodname><![CDATA[]]]></methodname><![CDATA[ ]]><symbol><![CDATA[TRUE]]></symbol>
<![CDATA[> ]]><methodname><![CDATA[isprime]]></methodname><![CDATA[(5)]]>
<methodname><![CDATA[[1]]></methodname><methodname><![CDATA[]]]></methodname><![CDATA[ ]]><symbol><![CDATA[TRUE]]></symbol>
<![CDATA[> ]]><methodname><![CDATA[isprime]]></methodname><![CDATA[(15)]]>
<methodname><![CDATA[[1]]></methodname><methodname><![CDATA[]]]></methodname><![CDATA[ ]]><symbol><![CDATA[FALSE]]></symbol>
</programlisting></section></section></article>