This section deals with the basic structures R uses to store data and how to assemble them, as well as how to get data into and out of R.
Data structures in R
All R objects have a type or mode, as well as a class, which can be determined with
, typeof
& mode
.class
- Vector
Vectors are the basic structure and come in the following atomic modes (data types):
numeric, integer, character, logical, complex, raw
These modes have corresponding functions which test if an object is of that mode (
, e.g.is
) and convert an object to that mode (is.numeric
, e.g.as
)as.characterYou can assemble and combine vectors using the often-used function
. Note that vectors must consist of values of the same data type:cc(1, "a", TRUE) # all values coerced to character list(1, "a", TRUE) # preserves different types (see below)
- Factors
Factors encode categorical data, and are an extremely useful and efficient way of handling categories with multiple entries. Note that R often coerces character data to a factor type by default (e.g. when using
). Also haveread.table
&is.factor
.as.factorOne thing to watch out for with factors is converting them to numeric mode. Factors are actually stored as a list of integers, referring to the element number of the factor levels. In the following example, there are 3 levels ("100", "200" & "300"), which are represented as characters, and the numeric values of the factor comprise the integers 1-3, referring to the elements of the vector of levels.chars <- strsplit("the cat sat on the mat", "")[[1]] # create vector of characters chars <- factor(chars) # convert from character to a factor levels(chars) # show factor levels (i.e. different letters here) plot(chars) # show barchart of factor level frequencies levels(chars)[1] <- "_" # replace whitespaces with underscores paste(chars, collapse="") # collapse to a single character stringXvector <- c(1, 2, 2, 3, 3, 3) * 100 Xfactor <- factor(Xvector) levels(Xfactor) # show levels, which are "100" "200" "300" as.numeric(Xfactor) # reports "1 2 2 3 3 3" - the elements of the levels vector x <- as.numeric(levels(Xfactor)[Xfactor]) # retrieve actual numeric values identical(x, Xvector) # same as original numeric vector
- Matrix/arrays
Matrices are 2-dimensional arrays, which are themselves generalisations of a vector to more than 1 dimension. Also have
,is.matrix
&is.array
,as.matrix
.as.arrayArrays are actually stored in a 1 dimensional structure, so you can still access their elements with a single subscript:M <- matrix(1:12, nrow=3) # create a matrix with 3 rows & 4 columns) dim(M) # show dimensions M[2, 3] # print element in 2nd row & 3rd column 2 * matrix(rep(1, 12), nrow=3) # multiply every element by a constant A <- array(1:12, dim=c(2, 2, 3)) # create a 3d array A[1, 2, 1] # print single element A[1, , ] # print a matrix subset
A[5]
- List
Lists are used to store data of any type or dimensions in a free-form structure. Also have
&is.listas.listTo assemble a list cumulatively, e.g. in a loop:l <- list(functions=c(mean, median), chars=month.abb, numbers=rnorm(7)) l$chars # print "chars" element l[2] # print 2nd element *as a single-item list* l[[2]] # print element as a *vector* l["chars"] # } compare and l[["chars"]] # } contrast
l <- as.list(NULL) # create empty list for ( i in 1:3 ) l[i] <- LETTERS[i]
- Data frame
Data frames are widely used in R to store data in a variety of formats with related entries in each row and different attributes in each column, much like a table or spreadsheet. A data frame is essentially a special type of list and elements of data frames can be accessed in exactly the same way as for a list. Also have
&is.data.frameas.data.frameData frames can have both row and column names (default row names are the row number). This is the same as having aA <- data.frame(a=LETTERS[1:4], b=1:4, c=c(T, T, F, T)) sapply(A, class) # show data types for each column A$a^2 # perform arithmetic on column as a vector dim(A); nrow(A); ncol(A) # show dimensions of data frame (rows, columns) A[1, ] # print first row A[, 2] # print 2nd column as.list(A) # convert to a list # Note that matrices must contain data of the same type, so the following # command converts all the values to character format: as.matrix(A) # convert to a matrix
named
vector, as seen in the following example:Working with data frames is very easy:# created separate, named vectors of data:
planets.mass <- c("Mercury"=0.33, "Venus"=4.87, "Earth"=5.98, "Mars"=0.64, "Jupiter"=1899, "Saturn"=569, "Uranus"=87, "Neptune"=102, "Pluto"=0.13) * 1e24 planets.semimajoraxis <- c("Mercury"=57.9, "Venus"=108, "Earth"=150, "Mars"=228, "Jupiter"=778, "Saturn"=1430, "Uranus"=2870, "Neptune"=4500, "Pluto"=5900) * 1e9 # Now create a data frame: planets <- data.frame(mass=planets.mass, semimajoraxis=planets.semimajoraxis) planets["Earth", ] # show all data for the Earth planets["Mars", "mass"] # show the mass of Mars; same as planets[4, 1] rownames(planets) <- paste("planet", 1:9) # change row names dimnames(planets); rownames(planets); colnames(planets) # show infoExcluding columns from a data frame is also very easy, and can be done by reference to the column number or name:subset(planets, mass > mean(mass)) subset(planets, mass > 1e24 & semimajoraxis < 1e12 ) # Adding new columns to the data frame: planets <- transform(planets, log10mass = log10(mass), wibble = mass * semimajoraxis) # You can access the columns without including the data frame name, usingwith
: with(planets, mass^2 + 3 * semimajoraxis) # which is more convenient than: planets$mass^2 + 3 * planets$semimajoraxis # Similarly, you can often access column data within other functions # e.g. plotting with a data frame: plot( semimajoraxis ~ mass, data=planets, log="xy")A <- transform(planets, dummy = 1:nrow(planets)) # add an extra column A[, -3] # exclude extra column by number A[, -c(2:3)] # exclude multiple columns by number subset(A, select = -dummy) # exclude extra column by name subset(A, select = -c(dummy, mass)) # exclude multiple columns by name
Data input/output in R
For a basic introduction, see getting started. See also the R Data Import/Export manual.
R recognises a variety of formats for reading in data. For tabular data, the basic command
offers a powerful range of options, which is also used by the shortform commands read.table
and read.csv
, for reading in comma-separated variable (e.g. output from a spreadsheet) and tab-delimited format data, respectively. Similarly, the command read.delim
is used to output tabular format data.
write.table
For fixed-width format data, use
. A more powerful method is to read in data directly into a vector or list, using read.fwf
. The following are useful functions for reading and writing a variety of data types. See their respective help pages for details.scan
source: read in R commands from a file *ideal for loading pre-written chunks of code*save ; load: read / write R objects from / to a file (see below) *ideal for storing R data*scan: basic core function to read in data into a list/vectorread.table ; write.table: generic table-format dataread.csv: comma-separated values data (e.g. exported from spreadsheet)read.fwf: fixed-width format dataread.fortran: fixed-format data files using Fortran-style format specificationsread.DIF: Data Interchange Format (DIF) for data frames from single spreadsheetsread.dcf: Debian Control File formatread.ftable / write.ftable:flat
contingency tablesreadBin ; writeBin: binary datareadChar ; writeChar: character stringsreadLines ; writeLines: lineswrite: write data to a filedump: write text representation of an objectdget ; dput: read or recreate an ASCII representation of an R object
Other packages for R data input / output
There are a number of separate packages for reading and writing data in different formats. The following are some common examples; see the R Data Import/Export manual for more information.
library(help="foreign") # Minitab, S, SAS, SPSS, Stata, Systat, dBase, Octave format- RODBC package : for database sources supporting an ODBC interface
- gdata package : various tools, e.g.
for reading data from Excelread.xls - xtable package : Export tables to LaTeX or HTML
Entering & editing data within R
data.entry ; de: conveniet GUI tools for entering dataedit: use text editor to modify an R objectfix: invokeeditto change & overwrite an R object
Saving & loading R objects
writes an external representation of R objects to the specified file; these can then be loaded back into R usingsave
, e.g.loada <- 1:10; b <- a^2 save(a,b,file="mydata.RData") rm(a,b) # Remove (delete) objects load("mydata.RData") # Load data into R tmp <- load("mydata.RData") tmp # Lists names of objects in file [1] "a" "b"- At any time you can save the history of commands using:
savehistory(file="my.Rhistory") - and you can load such commands using:
loadhistory(file="my.Rhistory")
&ls
lists the objects currently definedobjects
finds objects with names containing the specified string, e.g.aproposapropos("max") [1] "cummax" "max" "max.col" "pmax" "pmax.int" "promax" [7] "varimax" "which.max"
For further information, you can find out more about how to access, manipulate, summarise, plot and analyse data using R.
Also, why not check out some of the graphs and plots shown in the R gallery, with the accompanying R source code used to create them.