Unterschiede zwischen den Revisionen 1 und 2

Reading Date from Files

The most convenient way of reading data into R is via the function called created with Windows' NotePad or any plain-text editor. The result of read.table() is a data frame. It is expected that each line of the data file corresponds to a subject information, that the variables are separated by blanks or any other separator symbol (e.g., ",", ";"). The first line of the file can contain a header (header=T) giving the names of the variables, which is highly

read.table()

As an example we read in the data contained in the file fishercats.txt

   1 > read.table("session1data/fishercats.txt",  
   2 +            sep=" ",header=T)
   3 Sex Bwt Hwt
   4 1   F 2.0 7.0
   5 2   F 2.0 7.4
   6 3   F 2.0 9.5
   7 4   F 2.1 7.2
   8 5   F 2.1 7.3

These data correspond to the heart and body weights of samples of male and female cats (R. A. Fisher, 1947). The first argument corresponds to the data file, the second to the fields separator and the third header=T specifies that the first line is a header with variable names. Important: the character variables will be automatically read as factors. There is a variant for reading data from an url:

   1 > winer <- read.table( 
   2 + "http://socserv.socsci.mcmaster.ca/jfox/Courses/R/ICPSR/Winer.txt",
   3 + header=T)

There are other variants of read.table function alike :

read.csv() this function assumes that fields are separated by a comma instead of whites spaces
read.csv2() this function assumes that the separate symbol is the semicolon, but use a comma as the decimal point (some programs, e.g., Microsoft Excel, generate this format when running in European systems)
the function scan() is a powerful, but less friendly, way to read data in R; you may need it, if you want to read files with different numbers ov values per line

Reading data from the clipboard

With the function read.delim() or also read.table() it is possible to read data directly from the clipboard. For example mark and copy some columns from an Excel spreadsheet and transfer this content to an R

   1 > mydata <- read.delim("clipboard",na.strings=".")
   2 > str(mydata) # structure of the data

The Data Editor

To interactively edit a data frame in R you can use the edit function. For example:

   1 > data(airquality)
   2 > aq <-edit(airquality)

This brings up a spreadsheet-like editor with a column for each variable in the data frame. See help(airquality) for the contents of this data set. The function edit() leaves the original data frame unchanged, the changed data frame is assigned to aq. The function fix(x) invokes the function edit(x) on x and assign the new (edited) version of x to x

Reading Data from Other Programs

You can always use the export function from other (statistical) software to export data from other statistical systems to a tab or comma-delimited file and use the read.table(). However, R has some direct methods. The foreign package is one of the "recommended" packages in R. It contains routines to read files from SPSS (.sav format), SAS (export libraries), EpiInfo (.rec), Stata, Minitab, and some S-PLUS version 3 dump files. For example

   1 > library(foreign)
   2 > mydata <- read.spss("test.sav", to.data.frame=T)

read the test.sav SPSS data set and convert it to a data.frame.

Reading Data from Excel Files

   1 > library(XLConnect)
   2 > setwd("/media/TRANSCEND/mpicbs/data/")
   3 > my.wb <- loadWorkbook("Duncan.xls")
   4 > sheets <- getSheets(my.wb)
   5 > content <- readWorksheet(my.wb, sheet=1)
   6 > head(content)
   7 Col0 type income education prestige
   8 1 accountant prof     62        86       82
   9 2      pilot prof     72        76       83
  10 3  architect prof     75        92       90
  11 4     author prof     55        90       76
  12 5    chemist prof     64        86       90
  13 6   minister prof     21        84       87
  14 >

Reading Data from Excel Files

If someone is really fond of Excel, RExcel (http://rcom.univie.ac.at/download.html) is really worth the effort. There is also a function reading MSAccess files (mdb.get() from the Hmisc package)

Something on Connections

The function read.table() opens a connection to a file, read the file, and close the connection. However, for data stored in databases, there exists a number of interface packages on CRAN. The RODBC package can set up ODBC connections to data stored by common applications including Excel and Access (for Excel and Access RODBC doesn't work on Unix but it is great for data base connections). There are also more general ways to build connections to data bases. For up-to-date information on these matters, consult the "R Data Import/Export" manual that comes with the system.

Read Presentation Files

   1 x <- read.table(file = "session1data/pre001.txt",
   2 sep = "\t",
   3 header = T,
   4 skip = 3)
   5 > head(x)
   6 Subject Trial Event.Type     Code   Time  TTime Uncertainty Duration
   7 1            NA                         NA     NA          NA       NA
   8 2  PRE001     1   Response        3 104975 114605           1       NA
   9 3  PRE001     2   Response        3 117581  12411           1       NA
  10 4  PRE001     4    Picture    B1 T1 125765      0           1     5008
  11 5  PRE001     5    Picture RO09.jpg 130773      0         391    38181
  12 6  PRE001     6      Sound RO09.wav 131273      0           2       NA
  13 1            NA      NA                          NA
  14 2            NA      NA                          NA
  15 3            NA      NA                          NA
  16 4           392       0   next     other          0
  17 5           392       0   next     other          0
  18 6            NA       0            other          0

Indexing with Positive Integers

there are circumstances where we want to select only some of the elements of a vector/array/dataframe/list
this selection is done using subscripts (also known as indices)
subscripts have square brackets [2] while functions have round brackets (2)
Subscripts on vectors, matrices, arrays and dataframes have one set of square brackets [6], [3,4] or [2,3,2,1]
when a subscript appears as a blank it is understood to mean all of thus
- \verb+[,4]+ means all rows in column 4 of an object
- \verb+[2,]+ means all columns in row 2 of an object.
- subscripts on lists have (usually) double square brackets 2 or i,j

Indexing with Positive Integers

A vector of positive integers as index:The index vector can be of any length and the result is of the same length as the index vector. For example,

   1 > letters[1:3]
   2 [1] "a" "b" "c"
   3 > letters[c(1:3,1:3)]
   4 [1] "a" "b" "c" "a" "b" "c"

Logical Indexing

A logical vector as index: Values corresponding to T values in the index vector are selected and those corresponding to F or NA are omitted. For example,

   1 > x<-c(1,2,3,NA)
   2 > x[!is.na(x)]
   3 [1] 1 2 3

creates a vector without missing values. Also

   1 > x[is.na(x)] <- 0
   2 > x
   3 [1] 1 2 3 0

replaces the missing value by zeros.

Logical Indexing

A common operation is to select rows or columns of data frame that meet some criteria. For example, to select those rows of painters data frame with Colour <latex> \geq</latex> 17:

   1 > library(MASS)
   2 > painters[painters$Colour >= 17,]
   3 Composition Drawing Colour Expression School
   4 Bassano          6       8     17          0      D
   5 Giorgione        8       9     18          4      D
   6 Pordenone        8      14     17          5      D

Logical Indexing

We may want to select on more than one criterion. We can combine logical indices by the 'and', 'or' and 'not' operators <latex> \mathtt{\&, ¦ }</latex> and <latex> \mathtt{!}</latex>. For example,

   1 > painters[painters$Colour >= 17 & 
   2 Composition Drawing Colour
   3 Titian             12      15     18
   4 Rembrandt          15       6     17
   5 Rubens             18      13     17
   6 Van Dyck           15      10     17

RstatisTik/RstatisTikPortal/RcourSe/FinalFunction/ReadingFiles (zuletzt geändert am 2015-03-15 09:56:44 durch mandy.vogel@googlemail.com)

Quick Links

Search Wiki

Page Tools

Reading Date from Files

read.table()

Reading data from the clipboard

The Data Editor

Reading Data from Other Programs

Reading Data from Excel Files

Reading Data from Excel Files

Something on Connections

Read Presentation Files

Indexing with Positive Integers

Indexing with Positive Integers

Logical Indexing

Logical Indexing

Logical Indexing

-  ⇤ ← Revision 1 vom 2015-03-15 09:22:15 → 
  Größe: 29
  Autor: mandy.vogel@googlemail.com
  Kommentar:
+   ← Revision 2 vom 2015-03-15 09:29:11 → ⇥
  Größe: 8174
  Autor: mandy.vogel@googlemail.com
  Kommentar:
-Gelöschter Text ist auf diese Art markiert.
+Hinzugefügter Text ist auf diese Art markiert.
 Zeile 2:
+The most convenient way of reading data into R is via the function called
created with Windows' NotePad or any plain-text editor. The result of   read.table() is a
data frame.
It is expected that each line of the data file corresponds to a subject information, that the
variables are separated by blanks or any other separator symbol (e.g., ",", ";"). The first
line of the file can contain a header (header=T) giving the names of the variables, which is highly
== read.table() ==
As an example we read in the data contained in the file fishercats.txt 
{{{#!highlight r
> read.table("session1data/fishercats.txt",  
+            sep=" ",header=T)
Sex Bwt Hwt
1   F 2.0 7.0
2   F 2.0 7.4
3   F 2.0 9.5
4   F 2.1 7.2
5   F 2.1 7.3
}}}

These data correspond to the heart and body weights of samples of male and female cats (R. A. Fisher, 1947).
The first argument corresponds to the data file, the second to the fields separator  and the third header=T specifies that the first line is a header with variable names. Important: the character variables will be automatically read as factors.
There is a variant for reading data from an url:
{{{#!highlight r
> winer <- read.table( 
+ "http://socserv.socsci.mcmaster.ca/jfox/Courses/R/ICPSR/Winer.txt",
+ header=T)
}}}

There are other variants of read.table function alike :
   * read.csv() this function assumes that fields are separated by a comma instead of whites spaces
   * read.csv2() this function assumes that the separate symbol is the semicolon, but use a comma as the decimal point (some programs, e.g., Microsoft Excel, generate this format when running in European systems)
   * the function scan() is a powerful, but less friendly, way to read data in R; you may need it, if you want to read files with different numbers ov values per line

== Reading data from the clipboard ==
With the function read.delim() or also read.table() it is possible to read data directly from the clipboard.
For example mark and copy some columns from an Excel spreadsheet and transfer this content to an R
{{{#!highlight r
> mydata <- read.delim("clipboard",na.strings=".")
> str(mydata) # structure of the data
}}}

== The Data Editor ==
To interactively edit a data frame in R you can use the edit function. For example:
{{{#!highlight r
> data(airquality)
> aq <-edit(airquality)
}}}
This brings up a spreadsheet-like editor with a column for each variable in the data frame.
See help(airquality)  for the contents of this data set.
The function edit() leaves the original data frame unchanged, the changed data frame is assigned to aq. The function fix(x) invokes the function edit(x) on x '''and assign''' the new (edited) version of x to x  
== Reading Data from Other Programs ==
You can always use the export function from other (statistical) software to export data from other statistical systems to a tab or comma-delimited file and use the read.table(). However, R has some direct methods. 
The foreign package is one of the "recommended" packages in R. It contains routines to read files from SPSS (.sav format), SAS (export libraries), EpiInfo (.rec), Stata, Minitab, and some S-PLUS version 3 dump files. For example
{{{#!highlight r
> library(foreign)
> mydata <- read.spss("test.sav", to.data.frame=T)
}}}
read the test.sav SPSS data set and convert it to a data.frame.
== Reading Data from Excel Files ==
{{{#!highlight r
> library(XLConnect)
> setwd("/media/TRANSCEND/mpicbs/data/")
> my.wb <- loadWorkbook("Duncan.xls")
> sheets <- getSheets(my.wb)
> content <- readWorksheet(my.wb, sheet=1)
> head(content)
Col0 type income education prestige
1 accountant prof     62        86       82
2      pilot prof     72        76       83
3  architect prof     75        92       90
4     author prof     55        90       76
5    chemist prof     64        86       90
6   minister prof     21        84       87
> 
}}}
== Reading Data from Excel Files ==
If someone is really fond of Excel, RExcel (http://rcom.univie.ac.at/download.html) is really worth the effort. There is also a function reading MSAccess files (mdb.get() from the Hmisc package)
== Something on Connections ==
The function read.table() opens a connection to a file, read the file, and close the connection. However, for data stored in databases, there exists a number of interface packages on CRAN. 
The RODBC package can set up ODBC connections to data stored by common applications including Excel and Access (for Excel and Access RODBC doesn't work on Unix but it is great for data base connections). There are also more general ways to build connections to data bases.
For up-to-date information on these matters, consult the "R Data Import/Export" manual that comes with the system.

= Read Presentation Files =
{{{#!highlight r
x <- read.table(file = "session1data/pre001.txt",
sep = "\t",
header = T,
skip = 3)
> head(x)
Subject Trial Event.Type     Code   Time  TTime Uncertainty Duration
1            NA                         NA     NA          NA       NA
2  PRE001     1   Response        3 104975 114605           1       NA
3  PRE001     2   Response        3 117581  12411           1       NA
4  PRE001     4    Picture    B1 T1 125765      0           1     5008
5  PRE001     5    Picture RO09.jpg 130773      0         391    38181
6  PRE001     6      Sound RO09.wav 131273      0           2       NA
1            NA      NA                          NA
2            NA      NA                          NA
3            NA      NA                          NA
4           392       0   next     other          0
5           392       0   next     other          0
6            NA       0            other          0
}}}
== Indexing with Positive Integers ==
   * there are circumstances where we want to select only some of the elements of a vector/array/dataframe/list
   * this selection is done using subscripts (also known as indices)
   * subscripts have square brackets [2] while functions have round brackets (2)
   * Subscripts on vectors, matrices, arrays and dataframes have one set of square brackets [6], [3,4] or [2,3,2,1]
   * when a subscript appears as a blank it is understood to mean ''all of'' thus
      * \verb+[,4]+ means all rows in column 4 of an object
      * \verb+[2,]+ means all columns in row 2 of an object.
      * subscripts on lists have (usually) double square brackets [[2]] or [[i,j]]
== Indexing with Positive Integers ==
   * ''A vector of positive integers as index'':The index vector can be of any length and the result is of the same length as the index vector. For example,
{{{#!highlight r
> letters[1:3]
[1] "a" "b" "c"
> letters[c(1:3,1:3)]
[1] "a" "b" "c" "a" "b" "c"
}}}
== Logical Indexing ==
   * ''A logical vector as index'': Values corresponding to T values in the index vector are selected and those corresponding to F or NA are omitted. For example,
{{{#!highlight r
> x<-c(1,2,3,NA)
> x[!is.na(x)]
[1] 1 2 3
}}}
creates a vector without missing values. Also
{{{#!highlight r
> x[is.na(x)] <- 0
> x
[1] 1 2 3 0
}}}
replaces the missing value by zeros.
== Logical Indexing ==
A common operation is to select rows or columns of data frame that meet some criteria. For example, to select those rows of painters data frame with Colour <latex> \geq</latex> 17:
{{{#!highlight r
> library(MASS)
> painters[painters$Colour >= 17,]
Composition Drawing Colour Expression School
Bassano          6       8     17          0      D
Giorgione        8       9     18          4      D
Pordenone        8      14     17          5      D
}}}
== Logical Indexing ==
We may want to select on more than one criterion. We can combine logical indices by the 'and', 'or' and 'not' operators <latex> \mathtt{\&,  &brvbar; }</latex> and <latex> \mathtt{!}</latex>. For example,
{{{#!highlight r
> painters[painters$Colour >= 17 & 
Composition Drawing Colour
Titian             12      15     18
Rembrandt          15       6     17
Rubens             18      13     17
Van Dyck           15      10     17
}}}