⇤ ← Revision 1 vom 2015-03-15 09:22:15
Größe: 29
Kommentar:
|
Größe: 8174
Kommentar:
|
Gelöschter Text ist auf diese Art markiert. | Hinzugefügter Text ist auf diese Art markiert. |
Zeile 2: | Zeile 2: |
The most convenient way of reading data into R is via the function called created with Windows' NotePad or any plain-text editor. The result of read.table() is a data frame. It is expected that each line of the data file corresponds to a subject information, that the variables are separated by blanks or any other separator symbol (e.g., ",", ";"). The first line of the file can contain a header (header=T) giving the names of the variables, which is highly == read.table() == As an example we read in the data contained in the file fishercats.txt {{{#!highlight r > read.table("session1data/fishercats.txt", + sep=" ",header=T) Sex Bwt Hwt 1 F 2.0 7.0 2 F 2.0 7.4 3 F 2.0 9.5 4 F 2.1 7.2 5 F 2.1 7.3 }}} These data correspond to the heart and body weights of samples of male and female cats (R. A. Fisher, 1947). The first argument corresponds to the data file, the second to the fields separator and the third header=T specifies that the first line is a header with variable names. Important: the character variables will be automatically read as factors. There is a variant for reading data from an url: {{{#!highlight r > winer <- read.table( + "http://socserv.socsci.mcmaster.ca/jfox/Courses/R/ICPSR/Winer.txt", + header=T) }}} There are other variants of read.table function alike : * read.csv() this function assumes that fields are separated by a comma instead of whites spaces * read.csv2() this function assumes that the separate symbol is the semicolon, but use a comma as the decimal point (some programs, e.g., Microsoft Excel, generate this format when running in European systems) * the function scan() is a powerful, but less friendly, way to read data in R; you may need it, if you want to read files with different numbers ov values per line == Reading data from the clipboard == With the function read.delim() or also read.table() it is possible to read data directly from the clipboard. For example mark and copy some columns from an Excel spreadsheet and transfer this content to an R {{{#!highlight r > mydata <- read.delim("clipboard",na.strings=".") > str(mydata) # structure of the data }}} == The Data Editor == To interactively edit a data frame in R you can use the edit function. For example: {{{#!highlight r > data(airquality) > aq <-edit(airquality) }}} This brings up a spreadsheet-like editor with a column for each variable in the data frame. See help(airquality) for the contents of this data set. The function edit() leaves the original data frame unchanged, the changed data frame is assigned to aq. The function fix(x) invokes the function edit(x) on x '''and assign''' the new (edited) version of x to x == Reading Data from Other Programs == You can always use the export function from other (statistical) software to export data from other statistical systems to a tab or comma-delimited file and use the read.table(). However, R has some direct methods. The foreign package is one of the "recommended" packages in R. It contains routines to read files from SPSS (.sav format), SAS (export libraries), EpiInfo (.rec), Stata, Minitab, and some S-PLUS version 3 dump files. For example {{{#!highlight r > library(foreign) > mydata <- read.spss("test.sav", to.data.frame=T) }}} read the test.sav SPSS data set and convert it to a data.frame. == Reading Data from Excel Files == {{{#!highlight r > library(XLConnect) > setwd("/media/TRANSCEND/mpicbs/data/") > my.wb <- loadWorkbook("Duncan.xls") > sheets <- getSheets(my.wb) > content <- readWorksheet(my.wb, sheet=1) > head(content) Col0 type income education prestige 1 accountant prof 62 86 82 2 pilot prof 72 76 83 3 architect prof 75 92 90 4 author prof 55 90 76 5 chemist prof 64 86 90 6 minister prof 21 84 87 > }}} == Reading Data from Excel Files == If someone is really fond of Excel, RExcel (http://rcom.univie.ac.at/download.html) is really worth the effort. There is also a function reading MSAccess files (mdb.get() from the Hmisc package) == Something on Connections == The function read.table() opens a connection to a file, read the file, and close the connection. However, for data stored in databases, there exists a number of interface packages on CRAN. The RODBC package can set up ODBC connections to data stored by common applications including Excel and Access (for Excel and Access RODBC doesn't work on Unix but it is great for data base connections). There are also more general ways to build connections to data bases. For up-to-date information on these matters, consult the "R Data Import/Export" manual that comes with the system. = Read Presentation Files = {{{#!highlight r x <- read.table(file = "session1data/pre001.txt", sep = "\t", header = T, skip = 3) > head(x) Subject Trial Event.Type Code Time TTime Uncertainty Duration 1 NA NA NA NA NA 2 PRE001 1 Response 3 104975 114605 1 NA 3 PRE001 2 Response 3 117581 12411 1 NA 4 PRE001 4 Picture B1 T1 125765 0 1 5008 5 PRE001 5 Picture RO09.jpg 130773 0 391 38181 6 PRE001 6 Sound RO09.wav 131273 0 2 NA 1 NA NA NA 2 NA NA NA 3 NA NA NA 4 392 0 next other 0 5 392 0 next other 0 6 NA 0 other 0 }}} == Indexing with Positive Integers == * there are circumstances where we want to select only some of the elements of a vector/array/dataframe/list * this selection is done using subscripts (also known as indices) * subscripts have square brackets [2] while functions have round brackets (2) * Subscripts on vectors, matrices, arrays and dataframes have one set of square brackets [6], [3,4] or [2,3,2,1] * when a subscript appears as a blank it is understood to mean ''all of'' thus * \verb+[,4]+ means all rows in column 4 of an object * \verb+[2,]+ means all columns in row 2 of an object. * subscripts on lists have (usually) double square brackets [[2]] or [[i,j]] == Indexing with Positive Integers == * ''A vector of positive integers as index'':The index vector can be of any length and the result is of the same length as the index vector. For example, {{{#!highlight r > letters[1:3] [1] "a" "b" "c" > letters[c(1:3,1:3)] [1] "a" "b" "c" "a" "b" "c" }}} == Logical Indexing == * ''A logical vector as index'': Values corresponding to T values in the index vector are selected and those corresponding to F or NA are omitted. For example, {{{#!highlight r > x<-c(1,2,3,NA) > x[!is.na(x)] [1] 1 2 3 }}} creates a vector without missing values. Also {{{#!highlight r > x[is.na(x)] <- 0 > x [1] 1 2 3 0 }}} replaces the missing value by zeros. == Logical Indexing == A common operation is to select rows or columns of data frame that meet some criteria. For example, to select those rows of painters data frame with Colour <latex> \geq</latex> 17: {{{#!highlight r > library(MASS) > painters[painters$Colour >= 17,] Composition Drawing Colour Expression School Bassano 6 8 17 0 D Giorgione 8 9 18 4 D Pordenone 8 14 17 5 D }}} == Logical Indexing == We may want to select on more than one criterion. We can combine logical indices by the 'and', 'or' and 'not' operators <latex> \mathtt{\&, ¦ }</latex> and <latex> \mathtt{!}</latex>. For example, {{{#!highlight r > painters[painters$Colour >= 17 & Composition Drawing Colour Titian 12 15 18 Rembrandt 15 6 17 Rubens 18 13 17 Van Dyck 15 10 17 }}} |
Reading Date from Files
The most convenient way of reading data into R is via the function called created with Windows' NotePad or any plain-text editor. The result of read.table() is a data frame. It is expected that each line of the data file corresponds to a subject information, that the variables are separated by blanks or any other separator symbol (e.g., ",", ";"). The first line of the file can contain a header (header=T) giving the names of the variables, which is highly
read.table()
As an example we read in the data contained in the file fishercats.txt
These data correspond to the heart and body weights of samples of male and female cats (R. A. Fisher, 1947). The first argument corresponds to the data file, the second to the fields separator and the third header=T specifies that the first line is a header with variable names. Important: the character variables will be automatically read as factors. There is a variant for reading data from an url:
There are other variants of read.table function alike :
- read.csv() this function assumes that fields are separated by a comma instead of whites spaces
- read.csv2() this function assumes that the separate symbol is the semicolon, but use a comma as the decimal point (some programs, e.g., Microsoft Excel, generate this format when running in European systems)
- the function scan() is a powerful, but less friendly, way to read data in R; you may need it, if you want to read files with different numbers ov values per line
Reading data from the clipboard
With the function read.delim() or also read.table() it is possible to read data directly from the clipboard. For example mark and copy some columns from an Excel spreadsheet and transfer this content to an R
The Data Editor
To interactively edit a data frame in R you can use the edit function. For example:
This brings up a spreadsheet-like editor with a column for each variable in the data frame. See help(airquality) for the contents of this data set. The function edit() leaves the original data frame unchanged, the changed data frame is assigned to aq. The function fix(x) invokes the function edit(x) on x and assign the new (edited) version of x to x
Reading Data from Other Programs
You can always use the export function from other (statistical) software to export data from other statistical systems to a tab or comma-delimited file and use the read.table(). However, R has some direct methods. The foreign package is one of the "recommended" packages in R. It contains routines to read files from SPSS (.sav format), SAS (export libraries), EpiInfo (.rec), Stata, Minitab, and some S-PLUS version 3 dump files. For example
read the test.sav SPSS data set and convert it to a data.frame.
Reading Data from Excel Files
1 > library(XLConnect)
2 > setwd("/media/TRANSCEND/mpicbs/data/")
3 > my.wb <- loadWorkbook("Duncan.xls")
4 > sheets <- getSheets(my.wb)
5 > content <- readWorksheet(my.wb, sheet=1)
6 > head(content)
7 Col0 type income education prestige
8 1 accountant prof 62 86 82
9 2 pilot prof 72 76 83
10 3 architect prof 75 92 90
11 4 author prof 55 90 76
12 5 chemist prof 64 86 90
13 6 minister prof 21 84 87
14 >
Reading Data from Excel Files
If someone is really fond of Excel, RExcel (http://rcom.univie.ac.at/download.html) is really worth the effort. There is also a function reading MSAccess files (mdb.get() from the Hmisc package)
Something on Connections
The function read.table() opens a connection to a file, read the file, and close the connection. However, for data stored in databases, there exists a number of interface packages on CRAN. The RODBC package can set up ODBC connections to data stored by common applications including Excel and Access (for Excel and Access RODBC doesn't work on Unix but it is great for data base connections). There are also more general ways to build connections to data bases. For up-to-date information on these matters, consult the "R Data Import/Export" manual that comes with the system.
Read Presentation Files
1 x <- read.table(file = "session1data/pre001.txt",
2 sep = "\t",
3 header = T,
4 skip = 3)
5 > head(x)
6 Subject Trial Event.Type Code Time TTime Uncertainty Duration
7 1 NA NA NA NA NA
8 2 PRE001 1 Response 3 104975 114605 1 NA
9 3 PRE001 2 Response 3 117581 12411 1 NA
10 4 PRE001 4 Picture B1 T1 125765 0 1 5008
11 5 PRE001 5 Picture RO09.jpg 130773 0 391 38181
12 6 PRE001 6 Sound RO09.wav 131273 0 2 NA
13 1 NA NA NA
14 2 NA NA NA
15 3 NA NA NA
16 4 392 0 next other 0
17 5 392 0 next other 0
18 6 NA 0 other 0
Indexing with Positive Integers
- there are circumstances where we want to select only some of the elements of a vector/array/dataframe/list
- this selection is done using subscripts (also known as indices)
- subscripts have square brackets [2] while functions have round brackets (2)
- Subscripts on vectors, matrices, arrays and dataframes have one set of square brackets [6], [3,4] or [2,3,2,1]
when a subscript appears as a blank it is understood to mean all of thus
Indexing with Positive Integers
A vector of positive integers as index:The index vector can be of any length and the result is of the same length as the index vector. For example,
Logical Indexing
A logical vector as index: Values corresponding to T values in the index vector are selected and those corresponding to F or NA are omitted. For example,
creates a vector without missing values. Also
replaces the missing value by zeros.
Logical Indexing
A common operation is to select rows or columns of data frame that meet some criteria. For example, to select those rows of painters data frame with Colour <latex> \geq</latex> 17:
Logical Indexing
We may want to select on more than one criterion. We can combine logical indices by the 'and', 'or' and 'not' operators <latex> \mathtt{\&, ¦ }</latex> and <latex> \mathtt{!}</latex>. For example,