welcome: please sign in
location: Änderungen von "RstatisTik/RstatisTikPortal/RcourSe/FinalFunction/ReadingFiles"
Unterschiede zwischen den Revisionen 1 und 3 (über 2 Versionen hinweg)
Revision 1 vom 2015-03-15 09:22:15
Größe: 29
Kommentar:
Revision 3 vom 2015-03-15 09:31:03
Größe: 5827
Kommentar:
Gelöschter Text ist auf diese Art markiert. Hinzugefügter Text ist auf diese Art markiert.
Zeile 2: Zeile 2:

<<TableOfContents(3)>>

The most convenient way of reading data into R is via the function called
created with Windows' NotePad or any plain-text editor. The result of read.table() is a
data frame.
It is expected that each line of the data file corresponds to a subject information, that the
variables are separated by blanks or any other separator symbol (e.g., ",", ";"). The first
line of the file can contain a header (header=T) giving the names of the variables, which is highly
== read.table() ==
As an example we read in the data contained in the file fishercats.txt
{{{#!highlight r
> read.table("session1data/fishercats.txt",
+ sep=" ",header=T)
Sex Bwt Hwt
1 F 2.0 7.0
2 F 2.0 7.4
3 F 2.0 9.5
4 F 2.1 7.2
5 F 2.1 7.3
}}}

These data correspond to the heart and body weights of samples of male and female cats (R. A. Fisher, 1947).
The first argument corresponds to the data file, the second to the fields separator and the third header=T specifies that the first line is a header with variable names. Important: the character variables will be automatically read as factors.
There is a variant for reading data from an url:
{{{#!highlight r
> winer <- read.table(
+ "http://socserv.socsci.mcmaster.ca/jfox/Courses/R/ICPSR/Winer.txt",
+ header=T)
}}}

There are other variants of read.table function alike :
   * read.csv() this function assumes that fields are separated by a comma instead of whites spaces
   * read.csv2() this function assumes that the separate symbol is the semicolon, but use a comma as the decimal point (some programs, e.g., Microsoft Excel, generate this format when running in European systems)
   * the function scan() is a powerful, but less friendly, way to read data in R; you may need it, if you want to read files with different numbers ov values per line

== Reading data from the clipboard ==
With the function read.delim() or also read.table() it is possible to read data directly from the clipboard.
For example mark and copy some columns from an Excel spreadsheet and transfer this content to an R
{{{#!highlight r
> mydata <- read.delim("clipboard",na.strings=".")
> str(mydata) # structure of the data
}}}

== The Data Editor ==
To interactively edit a data frame in R you can use the edit function. For example:
{{{#!highlight r
> data(airquality)
> aq <-edit(airquality)
}}}
This brings up a spreadsheet-like editor with a column for each variable in the data frame.
See help(airquality) for the contents of this data set.
The function edit() leaves the original data frame unchanged, the changed data frame is assigned to aq. The function fix(x) invokes the function edit(x) on x '''and assign''' the new (edited) version of x to x
== Reading Data from Other Programs ==
You can always use the export function from other (statistical) software to export data from other statistical systems to a tab or comma-delimited file and use the read.table(). However, R has some direct methods.
The foreign package is one of the "recommended" packages in R. It contains routines to read files from SPSS (.sav format), SAS (export libraries), EpiInfo (.rec), Stata, Minitab, and some S-PLUS version 3 dump files. For example
{{{#!highlight r
> library(foreign)
> mydata <- read.spss("test.sav", to.data.frame=T)
}}}
read the test.sav SPSS data set and convert it to a data.frame.
== Reading Data from Excel Files ==
{{{#!highlight r
> library(XLConnect)
> setwd("/media/TRANSCEND/mpicbs/data/")
> my.wb <- loadWorkbook("Duncan.xls")
> sheets <- getSheets(my.wb)
> content <- readWorksheet(my.wb, sheet=1)
> head(content)
Col0 type income education prestige
1 accountant prof 62 86 82
2 pilot prof 72 76 83
3 architect prof 75 92 90
4 author prof 55 90 76
5 chemist prof 64 86 90
6 minister prof 21 84 87
>
}}}
== Reading Data from Excel Files ==
If someone is really fond of Excel, RExcel (http://rcom.univie.ac.at/download.html) is really worth the effort. There is also a function reading MSAccess files (mdb.get() from the Hmisc package)
== Something on Connections ==
The function read.table() opens a connection to a file, read the file, and close the connection. However, for data stored in databases, there exists a number of interface packages on CRAN.
The RODBC package can set up ODBC connections to data stored by common applications including Excel and Access (for Excel and Access RODBC doesn't work on Unix but it is great for data base connections). There are also more general ways to build connections to data bases.
For up-to-date information on these matters, consult the "R Data Import/Export" manual that comes with the system.

= Read Presentation Files =
{{{#!highlight r
x <- read.table(file = "session1data/pre001.txt",
sep = "\t",
header = T,
skip = 3)
> head(x)
Subject Trial Event.Type Code Time TTime Uncertainty Duration
1 NA NA NA NA NA
2 PRE001 1 Response 3 104975 114605 1 NA
3 PRE001 2 Response 3 117581 12411 1 NA
4 PRE001 4 Picture B1 T1 125765 0 1 5008
5 PRE001 5 Picture RO09.jpg 130773 0 391 38181
6 PRE001 6 Sound RO09.wav 131273 0 2 NA
1 NA NA NA
2 NA NA NA
3 NA NA NA
4 392 0 next other 0
5 392 0 next other 0
6 NA 0 other 0
}}}

Reading Date from Files

The most convenient way of reading data into R is via the function called created with Windows' NotePad or any plain-text editor. The result of read.table() is a data frame. It is expected that each line of the data file corresponds to a subject information, that the variables are separated by blanks or any other separator symbol (e.g., ",", ";"). The first line of the file can contain a header (header=T) giving the names of the variables, which is highly

read.table()

As an example we read in the data contained in the file fishercats.txt

   1 > read.table("session1data/fishercats.txt",  
   2 +            sep=" ",header=T)
   3 Sex Bwt Hwt
   4 1   F 2.0 7.0
   5 2   F 2.0 7.4
   6 3   F 2.0 9.5
   7 4   F 2.1 7.2
   8 5   F 2.1 7.3

These data correspond to the heart and body weights of samples of male and female cats (R. A. Fisher, 1947). The first argument corresponds to the data file, the second to the fields separator and the third header=T specifies that the first line is a header with variable names. Important: the character variables will be automatically read as factors. There is a variant for reading data from an url:

   1 > winer <- read.table( 
   2 + "http://socserv.socsci.mcmaster.ca/jfox/Courses/R/ICPSR/Winer.txt",
   3 + header=T)

There are other variants of read.table function alike :

  • read.csv() this function assumes that fields are separated by a comma instead of whites spaces
  • read.csv2() this function assumes that the separate symbol is the semicolon, but use a comma as the decimal point (some programs, e.g., Microsoft Excel, generate this format when running in European systems)
  • the function scan() is a powerful, but less friendly, way to read data in R; you may need it, if you want to read files with different numbers ov values per line

Reading data from the clipboard

With the function read.delim() or also read.table() it is possible to read data directly from the clipboard. For example mark and copy some columns from an Excel spreadsheet and transfer this content to an R

   1 > mydata <- read.delim("clipboard",na.strings=".")
   2 > str(mydata) # structure of the data

The Data Editor

To interactively edit a data frame in R you can use the edit function. For example:

   1 > data(airquality)
   2 > aq <-edit(airquality)

This brings up a spreadsheet-like editor with a column for each variable in the data frame. See help(airquality) for the contents of this data set. The function edit() leaves the original data frame unchanged, the changed data frame is assigned to aq. The function fix(x) invokes the function edit(x) on x and assign the new (edited) version of x to x

Reading Data from Other Programs

You can always use the export function from other (statistical) software to export data from other statistical systems to a tab or comma-delimited file and use the read.table(). However, R has some direct methods. The foreign package is one of the "recommended" packages in R. It contains routines to read files from SPSS (.sav format), SAS (export libraries), EpiInfo (.rec), Stata, Minitab, and some S-PLUS version 3 dump files. For example

   1 > library(foreign)
   2 > mydata <- read.spss("test.sav", to.data.frame=T)

read the test.sav SPSS data set and convert it to a data.frame.

Reading Data from Excel Files

   1 > library(XLConnect)
   2 > setwd("/media/TRANSCEND/mpicbs/data/")
   3 > my.wb <- loadWorkbook("Duncan.xls")
   4 > sheets <- getSheets(my.wb)
   5 > content <- readWorksheet(my.wb, sheet=1)
   6 > head(content)
   7 Col0 type income education prestige
   8 1 accountant prof     62        86       82
   9 2      pilot prof     72        76       83
  10 3  architect prof     75        92       90
  11 4     author prof     55        90       76
  12 5    chemist prof     64        86       90
  13 6   minister prof     21        84       87
  14 > 

Reading Data from Excel Files

If someone is really fond of Excel, RExcel (http://rcom.univie.ac.at/download.html) is really worth the effort. There is also a function reading MSAccess files (mdb.get() from the Hmisc package)

Something on Connections

The function read.table() opens a connection to a file, read the file, and close the connection. However, for data stored in databases, there exists a number of interface packages on CRAN. The RODBC package can set up ODBC connections to data stored by common applications including Excel and Access (for Excel and Access RODBC doesn't work on Unix but it is great for data base connections). There are also more general ways to build connections to data bases. For up-to-date information on these matters, consult the "R Data Import/Export" manual that comes with the system.

Read Presentation Files

   1 x <- read.table(file = "session1data/pre001.txt",
   2 sep = "\t",
   3 header = T,
   4 skip = 3)
   5 > head(x)
   6 Subject Trial Event.Type     Code   Time  TTime Uncertainty Duration
   7 1            NA                         NA     NA          NA       NA
   8 2  PRE001     1   Response        3 104975 114605           1       NA
   9 3  PRE001     2   Response        3 117581  12411           1       NA
  10 4  PRE001     4    Picture    B1 T1 125765      0           1     5008
  11 5  PRE001     5    Picture RO09.jpg 130773      0         391    38181
  12 6  PRE001     6      Sound RO09.wav 131273      0           2       NA
  13 1            NA      NA                          NA
  14 2            NA      NA                          NA
  15 3            NA      NA                          NA
  16 4           392       0   next     other          0
  17 5           392       0   next     other          0
  18 6            NA       0            other          0

RstatisTik/RstatisTikPortal/RcourSe/FinalFunction/ReadingFiles (zuletzt geändert am 2015-03-15 09:56:44 durch mandy.vogel@googlemail.com)