RstatisTik/RstatisTikPortal/RcourSe/FinalFunction/ReadingFiles

Reading Date from Files

The most convenient way of reading data into R is via the function called created with Windows' NotePad or any plain-text editor. The result of read.table() is a data frame. It is expected that each line of the data file corresponds to a subject information, that the variables are separated by blanks or any other separator symbol (e.g., ",", ";"). The first line of the file can contain a header (header=T) giving the names of the variables, which is highly

read.table()

As an example we read in the data contained in the file fishercats.txt

   1 > read.table("session1data/fishercats.txt",  
   2 +            sep=" ",header=T)
   3 Sex Bwt Hwt
   4 1   F 2.0 7.0
   5 2   F 2.0 7.4
   6 3   F 2.0 9.5
   7 4   F 2.1 7.2
   8 5   F 2.1 7.3

These data correspond to the heart and body weights of samples of male and female cats (R. A. Fisher, 1947). The first argument corresponds to the data file, the second to the fields separator and the third header=T specifies that the first line is a header with variable names. Important: the character variables will be automatically read as factors. There is a variant for reading data from an url:

   1 > winer <- read.table( 
   2 + "http://socserv.socsci.mcmaster.ca/jfox/Courses/R/ICPSR/Winer.txt",
   3 + header=T)

There are other variants of read.table function alike :

Reading data from the clipboard

With the function read.delim() or also read.table() it is possible to read data directly from the clipboard. For example mark and copy some columns from an Excel spreadsheet and transfer this content to an R

   1 > mydata <- read.delim("clipboard",na.strings=".")
   2 > str(mydata) # structure of the data

The Data Editor

To interactively edit a data frame in R you can use the edit function. For example:

   1 > data(airquality)
   2 > aq <-edit(airquality)

This brings up a spreadsheet-like editor with a column for each variable in the data frame. See help(airquality) for the contents of this data set. The function edit() leaves the original data frame unchanged, the changed data frame is assigned to aq. The function fix(x) invokes the function edit(x) on x and assign the new (edited) version of x to x

Reading Data from Other Programs

You can always use the export function from other (statistical) software to export data from other statistical systems to a tab or comma-delimited file and use the read.table(). However, R has some direct methods. The foreign package is one of the "recommended" packages in R. It contains routines to read files from SPSS (.sav format), SAS (export libraries), EpiInfo (.rec), Stata, Minitab, and some S-PLUS version 3 dump files. For example

   1 > library(foreign)
   2 > mydata <- read.spss("test.sav", to.data.frame=T)

read the test.sav SPSS data set and convert it to a data.frame.

Reading Data from Excel Files

   1 > library(XLConnect)
   2 > setwd("/media/TRANSCEND/mpicbs/data/")
   3 > my.wb <- loadWorkbook("Duncan.xls")
   4 > sheets <- getSheets(my.wb)
   5 > content <- readWorksheet(my.wb, sheet=1)
   6 > head(content)
   7 Col0 type income education prestige
   8 1 accountant prof     62        86       82
   9 2      pilot prof     72        76       83
  10 3  architect prof     75        92       90
  11 4     author prof     55        90       76
  12 5    chemist prof     64        86       90
  13 6   minister prof     21        84       87
  14 > 

If someone is really fond of Excel, RExcel (http://rcom.univie.ac.at/download.html) is really worth the effort. There is also a function reading MSAccess files (mdb.get() from the Hmisc package)

Something on Connections

The function read.table() opens a connection to a file, read the file, and close the connection. However, for data stored in databases, there exists a number of interface packages on CRAN. The RODBC package can set up ODBC connections to data stored by common applications including Excel and Access (for Excel and Access RODBC doesn't work on Unix but it is great for data base connections). There are also more general ways to build connections to data bases. For up-to-date information on these matters, consult the "R Data Import/Export" manual that comes with the system.

Read Presentation Files

   1 x <- read.table(file = "session1data/pre001.txt",
   2 sep = "\t",
   3 header = T,
   4 skip = 3)
   5 > head(x)
   6 Subject Trial Event.Type     Code   Time  TTime Uncertainty Duration
   7 1            NA                         NA     NA          NA       NA
   8 2  PRE001     1   Response        3 104975 114605           1       NA
   9 3  PRE001     2   Response        3 117581  12411           1       NA
  10 4  PRE001     4    Picture    B1 T1 125765      0           1     5008
  11 5  PRE001     5    Picture RO09.jpg 130773      0         391    38181
  12 6  PRE001     6      Sound RO09.wav 131273      0           2       NA
  13 1            NA      NA                          NA
  14 2            NA      NA                          NA
  15 3            NA      NA                          NA
  16 4           392       0   next     other          0
  17 5           392       0   next     other          0
  18 6            NA       0            other          0

RstatisTik/RstatisTikPortal/RcourSe/FinalFunction/ReadingFiles (zuletzt geändert am 2015-03-15 09:31:55 durch mandy.vogel@googlemail.com)