= The ggplot2 Package = * ggplot2 is - like lattice based on the grid graphics system (Paul Murrell) * graphics and parts of graphics are objects and they are manipulable == Structure of a ggplot Object == begin with an empty object to see the structure: {{{#!highlight r > po <- ggplot() > summary(po) data: [x] faceting: facet_null() }}} * what we see are empty place holders * when we use str() to explore the structure of the object we see that it is a list with length 9 {{{#!highlight r > str(po) List of 9 List of 9 $ data : list() ..- attr(*, "class")= chr "waiver" $ layers : list() $ scales :Reference class 'Scales' [package "ggplot2"] with 1 fields ..$ scales: NULL ..and 21 methods, of which 9 are possibly relevant: .. add, clone, find, get_scales, has_scale, initialize, input, n, .. non_position_scales $ mapping : list() $ theme : list() $ coordinates:List of 1 ..$ limits:List of 2 .. ..$ x: NULL .. ..$ y: NULL ..- attr(*, "class")= chr [1:2] "cartesian" "coord" $ facet :List of 1 ..$ shrink: logi TRUE ..- attr(*, "class")= chr [1:2] "null" "facet" $ plot_env : $ labels : list() - attr(*, "class")= chr [1:2] "gg" "ggplot" }}} == Structure of a ggplot Object == Now we fill this structure - first the three main steps: * the first argument to ggplot is data * then specify what graphics shapes you are going to use to view the data (e.g. geom_line() or geom_point()). * specify what features (or aesthetics) will be used (e.g. what variables will determine x- and y-locations) with the aes() function * if these aesthetics are intented to be used in all layers it is more convenient to specify them in the ggplot object == Feed the Object == * first we create a little sample data frame\small {{{#!highlight r > x1 <- 1:10; y1 <- 1:10; z1 <- 10:1 > l1 <- LETTERS[1:10] > a <- 10; b <- (0:-9)/10:1 > ex <- data.frame(x=x1,y=y1,z=z1,l=l1,a=a,b=b) > ex x y z l a b 1 1 1 10 A 10 0.0000000 2 2 2 9 B 10 -0.1111111 3 3 3 8 C 10 -0.2500000 4 4 4 7 D 10 -0.4285714 5 5 5 6 E 10 -0.6666667 6 6 6 5 F 10 -1.0000000 7 7 7 4 G 10 -1.5000000 8 8 8 3 H 10 -2.3333333 9 9 9 2 I 10 -4.0000000 10 10 10 1 J 10 -9.0000000 }}} * then we create a ggplot object containing the data and some standard aesthetics (here we define the x and the y positions) * add one or more geoms, we begin with geom_point {{{#!highlight r > po <- ggplot(ex,aes(x=x1,y=y1)) > summary(po) > p1 <- po + geom_point() }}} [[attachment:ggp1.pdf]] == Layers == * ggplot() creates an object - every "+" adds something to this object (change the object) * the default method of ggplot() is print(), which creates the plot * it is better to store the object - so you can change it (e.g. you can change the data frame) == Layers == * so we add another layer, which adds a label to the points (use geom_text) {{{#!highlight r > p2 <- po + + geom_point() + + geom_text(aes(label=l), hjust=1.1, vjust=-0.2) > p2 }}} * aes(label=l) maps the l variable to the label aesthetic, and hjust and vjust define where our labels are placed [[ggp2.pdf]] == Layers == * imagine you have worked a little time on a plot - and then you detect a mistake in your data, so the ''real'' data frame looks different * so you can replace the old, wrong data by the new data (using %+%) {{{#!highlight r > ## the new data > ex2 <- data.frame(x1=sample(1:20), + y1=sample(1:10), + l=letters[1:20]) > head(ex2,10) x1 y1 l 1 3 6 a 2 6 2 b 3 14 1 c 4 19 10 d 5 12 4 e 6 15 8 f 7 20 5 g 8 17 7 h 9 13 3 i 10 16 9 j }}} [[attachment:ggp3.pdf]] == Layers == {{{#!highlight r > p2 %+% ex2 }}} == Layers == * by using the line geom you can join the points (we use the new data) {{{#!highlight r > pn <- p %+% ex2 ## replace data in p > pn + geom_line() }}} [[attachment:ggp4.pdf]] == Layers == * you can also join the points in the order of the data fram by using the path geom instead\footnotesize {{{#!highlight r > my.text <- geom_text(aes(label=l), + hjust=1.1, + vjust=-0.2) > pn + geom_path() + my.text }}} [[attachment:ggp5.pdf]] == Layers == Adding extra lines: * there are three geoms: abline, vline, hline * abline adds one or more lines with specified slope and intercept to the plot\footnotesize {{{#!highlight r > ## one line > p + geom_abline(intercept=10,slope=-1, + colour=rgb(.5,.5,.9)) > ## two lines > p + geom_abline(intercept=c(10,9),slope=c(-1,-2), + colour=rgb(.5,.5,.9)) > more lines > p + geom_abline(intercept=10:1,slope=-(10:1)/10, }}} [[attachment:ggp6.png|{{attachment:ggp6.png||width=800}}]] == Layers == * adding lines referring to the data frame {{{#!highlight r > p1 + + geom_abline(aes(slope=b,intercept=a,colour=x1)) + + scale_x_continuous(limits=c(0,10)) }}} [[attachment:ggp7.pdf]] == Layers == * the same works for the hline and the vline geom which add horizonal and vertical line(s) * argument: yintercept, xintercept respectively * setting and mapping are possible {{{#!highlight r > p1 + geom_hline(yintercept=1:10) > p1 + geom_hline(yintercept=1:10) + + geom_vline(xintercept=1:10) }}} [[attachment:ggp8.pdf]] == Other Common Layers == * some other layers for 1 continuous variable: * geom_boxplot() * geom_histogram() * geom_density() * some other layers for 1 discrete variable: * geom_bar() * some other layers for 2 or more continuous variables: * geom_smooth() * geom_density2d() * geom_contour() * geom_quantile() == Exercises == * use our data frame or load it: load("20150310data.rdata") * create a new variable EC1 containing the first 2 letters of the Event.Code column, use the function str_sub() from the stringr package (type ?str_sub to get help) === Solutions === {{{#!highlight r > data$EC1 <- factor(str_sub(data$Event.Code,1,2)) > head(data) Subject Sex Age_PRETEST Trial Event.Type Code Time TTime Uncertainty 1 1 f 3.11 7 Response 2 103745 2575 1 2 1 f 3.11 12 Response 2 156493 2737 1 3 1 f 3.11 17 Response 2 214772 6630 1 4 1 f 3.11 22 Response 1 262086 5957 1 5 1 f 3.11 27 Response 2 302589 272 1 6 1 f 3.11 32 Response 1 352703 7197 1 Duration Uncertainty.1 ReqTime ReqDur Stim.Type Pair.Index Type Event.Code 1 2599 3 0 next hit 7 Picture RO26.jpg 2 2800 2 0 next incorrect 12 Picture RO19.jpg 3 6798 2 0 next hit 17 Picture RS23.jpg 4 5999 2 0 next incorrect 22 Picture OF22.jpg 5 400 2 0 next hit 27 Picture AT08.jpg 6 7398 2 0 next hit 32 Picture AT30.jpg testid EC1 1 test2 RO 2 test2 RO 3 test2 RS 4 test2 OF 5 test2 AT 6 test2 AT }}} == Exercises II == Create the five plots and save them into a file. * create a plot using ggplot, map the variable EC1 to x and use geom_bar() * now to the plot again, but add another aesthetic: fill (colour of the filling); map fill to Stim.Type * add the position argument to geom_bar(), set it to "fill" * now add facet_wrap(~testid) to show the same graph per time * make a graph facetted per child showing stacked hit/incorrect bars with time on the x axis === solutions === * create a plot using ggplot, map the variable EC1 to x and use geom_bar() {{{#!highlight r > ggplot(data,aes(x=EC1)) + + geom_bar() > > ggsave("plot1.png") Saving 16 x 9.13 in image }}} [[attachment:plot1.png|{{attachment:plot1.png||width=800}}]] * now to the plot again, but add another aesthetic: fill (colour of the filling); map fill to Stim.Type {{{#!highlight r > ggplot(data,aes(x=EC1,fill=Stim.Type)) + + geom_bar() > > ggsave("plot2.png") Saving 16 x 9.13 in image }}} [[attachment:plot1.png|{{attachment:plot2.png||width=800}}]] * add the position argument to geom_bar(), set it to "fill" {{{#!highlight r > ggplot(data,aes(x=EC1,fill=Stim.Type)) + + geom_bar(position = "fill") > > ggsave("plot3.png") Saving 16 x 9.13 in image }}} [[attachment:plot3.png|{{attachment:plot3.png||width=800}}]] * now add facet_wrap(~testid) to show the same graph per time {{{#!highlight r > ggplot(data,aes(x=EC1,fill=Stim.Type)) + + geom_bar(position = "fill") + + facet_wrap(~testid) > > ggsave("plot4.png") Saving 16 x 9.13 in image }}} [[attachment:plot4.png|{{attachment:plot4.png||width=800}}]] * now add facet_wrap(~testid) to show the same graph per time {{{#!highlight r > ggplot(data,aes(x=EC1,fill=Stim.Type)) + + geom_bar(position = "fill") + + facet_wrap(~testid,scales = "free") > > ggsave("plot4a.png") Saving 16 x 9.13 in image }}} [[attachment:plot4a.png|{{attachment:plot4a.png||width=800}}]] * make a graph facetted per child showing stacked hit/incorrect bars with time on the x axis {{{#!highlight r > ggplot(data,aes(x=testid,fill=Stim.Type)) + + geom_bar(position = "fill") + + facet_wrap(~ Subject) > ggsave("plot5.png") Saving 16 x 9.13 in image }}} [[attachment:plot5.png|{{attachment:plot5.png||width=800}}]] == Scales == What if we want to change the colours? * this leads to another important type of component not yet mentioned * if you map a variable to a aesthetic is these done in a default way, in this case some reddish colour is mapped to hit while light blue is mapped to incorrect; in addition a discrete range of colours is automatically used * these rules of mapping are called scales * different type of scales exists for the axes, colours, shapes etc, some of them exists in discrete and continuous versions, some in just one of them (in general one can say, everytime there can be a legend there is a scale) * the name convention: scale_aesthetic_specification. for example scale_x_discrete for customizing a discrete x axis (e.g. in barplots) == Changing a Scale == * to change our discrete colour scale for the filling we type \footnotesize {{{#!highlight r > ggplot(data,aes(x=EC1,fill=Stim.Type)) + + geom_bar(position = "fill") + + facet_wrap(~testid,scales = "free") + + scale_fill_manual(values=c("forestgreen","firebrick")) }}} [[attachment:ggp10.png|{{attachment:ggp10.png||width=800}}]] There are other ways to customize a discrete colour/fill scales * scale_colour_grey() * scale_colour_hue() * scale_colour_brewer() {{{#!highlight r > ggplot(data,aes(x=EC1,fill=Stim.Type)) + + geom_bar(position = "fill") + + facet_wrap(~testid,scales = "free") + + scale_fill_grey() > ggplot(data,aes(x=EC1,fill=Stim.Type)) + + geom_bar(position = "fill") + + facet_wrap(~testid,scales = "free") + + scale_fill_hue(h=c(180,360)) > ggplot(data,aes(x=EC1,fill=Stim.Type)) + + geom_bar(position = "fill") + + facet_wrap(~testid,scales = "free") + + scale_fill_brewer(type = "div",palette = 2) }}} == Continuous Scales == Not only colour scales are modifiable * here scale_aesthetic_specification becomes scale_x_continuous() or scale_y_continuous() * to transform a axis we can use the trans arg * we now create a scatter plot with Trial on the x-axis and TTime on the y-axis * the respective geom is geom_point() * if we look at the distribution of the y values we see that they are right skewed {{{#!highlight r > ggplot(data,aes(x=Trial,y=TTime)) + + geom_point() > ggsave("ggp11.png") }}} [[attachment:ggp11.png|{{attachment:ggp11.png||width=800}}]] * to face the skewness we can transform the y-axis maybe with a square root function {{{#!highlight r > ggplot(data,aes(x=Trial,y=TTime)) + + geom_point() + + scale_y_continuous(trans="sqrt") > ggsave("ggp12.png") }}} [[attachment:ggp12.png|{{attachment:ggp12.png||width=800}}]] * still a little skewed... * ... so maybe we should try {{{#!latex x^{1/3} }}} * we first have to write a new transformation\small {{{#!highlight r > xt1_3_trans <- function() + trans_new("xt1_3", function(x) x**(1/3), function(x) x**3) > ggplot(data,aes(x=Trial,y=TTime)) + + geom_point() + + scale_y_continuous(trans="xt1_3") > ggsave("ggp13.png") }}} [[attachment:ggp13.png|{{attachment:ggp13.png||width=800}}]] For standard transformation there is a short cut: {{{#!highlight r > ggplot(data,aes(x=Trial,y=TTime)) + + geom_point() + + scale_y_sqrt() }}} * scale_x_log10() * scale_x_reverse() * scale_x_sqrt() Other transformation available: \small {{{#!highlight r }}} == Other Scales == * colour scales have also a continuous version (we have seen it in the first bar plot) * scale_colour_gradient() * scale_colour_gradient2() * scale_colour_gradientn() * scale_linetype_continuous() and scale_linetype_discrete() * scale_shape_continuous() and scale_shape_discrete() * scale_size_continuous() and scale_size_discrete() * scale_x_date() * scale_discrete() == Exercises == * Create a scatter plot with Trial on the x-axis and TTime on the y-axis. Map colour to to age column. Looking at the pattern in the graph, is there relation between age and reaction time? * Make a plot which has a facet for each child containing the histogram of TTime, map fill to ..count.. (fill=..count.. inside aes()) * then add scale_fill_gradient() and set its arguments low and high to, say green and red resp. (or make your own choice) * do the same but now do the facetting by testid (and second per Stim.Type level). What can you conclude from these graphs? === Solutions === * Create a scatter plot with Trial on the x-axis and TTime on the y-axis. Map colour to to age column. Looking at the pattern in the graph, is there relation between age and reaction time? {{{#!highlight r > ggplot(data,aes(x=Trial,y=TTime,colour=Age_PRETEST)) + + geom_point() + + scale_y_continuous(trans="xt1_3") > ggsave("ggp14.png") }}} [[attachment:ggp14.png|{{attachment:ggp14.png||width=800}}]] * Make a plot which has a facet for each child containing the histogram of TTime, map fill to ..count.. (fill=..count.. inside aes()) {{{#!highlight r > ggplot(data,aes(x=TTime,fill=..count..)) + + geom_histogram(aes(y=..density..)) + + facet_wrap(~Subject) }}} [[attachment:ggp15.png|{{attachment:ggp15.png||width=800}}]] * then add scale_fill_gradient() and set its arguments low and high to, say green and red resp. (or make your own choice) {{{#!highlight r > ggplot(data,aes(x=TTime,fill=..count..)) + + geom_histogram() + + facet_wrap(~Subject) + + scale_fill_gradient(low="forestgreen", + high="firebrick3") }}} [[attachment:ggp15.png|{{attachment:ggp15.png||width=800}}]] * do the same but now do the facetting by testid (and second per Stim.Type level). What can you conclude from these graphs? {{{#!highlight r > ggplot(data,aes(x=TTime,fill=..count..)) + + geom_histogram(aes(y=..density..)) + + facet_wrap(~testid) + + scale_fill_gradient(low="forestgreen", + high="firebrick3") }}} [[attachment:ggp16.png|{{attachment:ggp16.png||width=800}}]] * do the same but now do the facetting by testid (and second per Stim.Type level). What can you conclude from these graphs? {{{#!highlight r > ggplot(data,aes(x=TTime,fill=..count..)) + + geom_histogram(aes(y=..density..)) + + facet_wrap(~Stim.Type) + + scale_fill_gradient(low="forestgreen", + high="firebrick3") }}} [[attachment:ggp16.png|{{attachment:ggp16.png||width=800}}]] == Hadleyverse == * stringr - easy string manipulation * lubridate - easy date time manipulation * reshape2, tidyr - data manipulation, melting * devtools, testthat - package developement * etc