See also the introduction to R here. There's also an R wiki with lots of useful information, tips etc.

```help.start()              # Load HTML help pages into browser
help(package)             # List help page for "package"
?package                  # Shortform for "help(package)"
help.search("keyword")    # Search help pages for "keyword"
?help                     # For more options
help(package=base)        # List tasks in package "base"
q()                       # Quit R
```
• Some extra packages are not loaded by default, but can be loaded as follows:
```library("package")
library()                 # List available packages to load
library(help="package")   # list package contents
```
• Commands can be separated by a semicolon (;) or a newline
• All characters after # are treated as comments (even on the same line as commands)
• Variables are assigned using <- (although = also works):
```x <- 4.5
y <- 2*x^2            # Use "<<-" for global assignment
```
• Vectors
```x <- c(1, 4, 5, 8)    # Concatenate values & assign to x
y <- 2*x^2            # Evaluate expression for each element in x
> y                   # Typing the name of a variable evaluates it
   2  32  50 128
```
If x is a vector, y will be a vector of the same length. The vector arithmetic in R means that an expression involving vectors of the same length (e.g. x+y, in the above example) will be evaluated piecewise for corresponding elements, without the need to loop explicitly over the values:
```> x+y
   3  36  55 136
> 1:10             # Same as seq(1, 10)
  1  2  3  4  5  6  7  8  9 10
> a <- 1:10
a[-1]              # Print whole vector *except* the first element
a[-c(1, 5)]        # Print whole vector except elements 1 & 5
a[1:5]             # Print elements 1 to 5
a[length(a)]       # Print last element, for any length of vector
head(a, n=3)       # Print the first n elements of a
tail(a, n=3)       # Print the last n elements of a
```
• Demonstrations of certain topics are available:
```demo(graphics)     # Run grapics demonstration
demo()             # List topics for which demos are available
?demo              # List help for "demo" function
```
• Typing the name of a function lists its contents; typing a variable evaluates it, or you can use print(x).
• A great feature of R functions is that they offer a lot of control to the user, but provide sensible hidden defaults. A good example is the histogram function, which works very well without any explicit control:

• First, generate a set of 10,000 Gaussian distributed random numbers:
```data <- rnorm(1e4)  # Gaussian distributed numbers with mean=0 & sigma=1
hist(data)          # Plots a histogram, with sensible bin choices by default
```
See ?hist for the full range of control parameters available, e.g. to specifiy 7 bins:
```hist(data, breaks=7)
```
• # Data Input/Output

• Assuming you have a file (file.dat) containing the following data, with either spaces or tabs between fields:
```r       x       y
1       4.2     14.2
2       2.4     64.8
3       8.76    63.4
4       5.9     32.2
5       3.4     89.6
```
Now read this file into R:
```> inp <- read.table("file.dat", header=T)      # Read data into data frame
> inp                       # Print contents of inp
> inp                    # Print the type of object that inp is:
NULL data frame with 5 rows

> colnames(inp)             # Print the column names for inp:
 "r" "x" "y"             # This is simply a vector that can easily be changed:
> colnames(inp) <- c("radius", "temperature", "time")
> colnames(inp)
```
Note that you can refer to R objects, names etc. using the first few letters of the full name, provided that is unambiguous, e.g.:
```> inp\$r                     # Print the Radius column
```
but note what happens if the information is ambiguous or if the column doesn't exist:
```> inp\$t                     # Could match "inp\$temperature" or "inp\$time"
NULL
> inp\$wibble                # "wibble" column doesn't exist
NULL
```
Alternatively, you can refer to the columns by number:
```> inp[]     # Print data as a vector (use "[[" & "]]")
 1 2 3 4 5
> inp       # Print data as a data.frame (only use "[" and "]")
```
• Writing data out to a file (if no filename is specified (the default), the output is written to the console)
```> write.table(inp, quote=F, row.names=F, col.names=T)
1 4.2 14.2
2 2.4 64.8
3 8.76 63.4
4 5.9 32.2
5 3.4 89.6
```
By default the columns are separated by whitespace, but you can change this with the sep= option (see ?write.table for details), e.g. use a : with a tab either side:
```> write.table(inp, quote=F, row.names=F, col.names=T, sep="\t:\t")
1       :       4.2     :       14.2
2       :       2.4     :       64.8
3       :       8.76    :       63.4
4       :       5.9     :       32.2
5       :       3.4     :       89.6
```
• Other types of connection
```?connections     # Help page on opening/closing connections
write.table()    # Write data to file
# You can also refer to URLs for files, e.g.
```
```source("commands.R")
# Or, to load from a path specified by an environment variable
#    "\$ENV_VAR/commands.R"
source(paste(Sys.getenv("ENV_VAR"),"/commands.R",sep=""))
```
• Saving data
```?save
save(inp, t, file="data.RData")
# - this will load the saved R objects into the current session, with
# the same properties, i.e. all the variable will have the same names
# and contents
```
Note that when you quit R (by typing q()), it asks if you want to save the workspace image, if you specify yes (y), it writes out two files to the current directory, called .RData and .Rhistory. The former contains the contents of the saved session (i.e. all the R objects in memory) and the latter is a list of all the command history (it's just a simple text file you can look at).

At any time you can save the history of commands using:
```savehistory(file="my.Rhistory")
```
• # Plotting data

• Useful functions:
```?plot           # Help page for plot command
?par            # Help page for graphics parameter control
?Devices        # or "?device"
# (R can output in postscript, PDF, bitmap, PNG, JPEG and more formats)
dev.list()      # list graphics devices
colours()       # or "colors()" List all available colours
?plotmath       # Help page for writing maths expressions in R
```
Tip: to generate png, jpeg etc. files, I find it's best to create pdf versions in R, then convert them in gimp (having specified strong antialiasing, if asked, when loading the pdf file into gimp), otherwise the figures are "blocky" (i.e. suffer from aliasing).
• To create an output file copy of a plot for printing or including in a document etc.
```dev.copy2pdf(file="myplot.pdf")   # } Copy contents of current graphics device
dev.copy2eps(file="myplot.eps")   # }  to a PDF or Postscript file
dev.copy()
```
• Adding more datasets to a plot:
```x <- 1:10; y <- x^2
z <- 0.9*x^2
plot(x, y)                     # Plot original data
points(x, z, pch="+")          # Add new data, with different symbols
lines(x,y)                     # Add a solid line for original data
lines(x, z, col="red", lty=2)  # Add a red dashed line for new data
curve(1.1*x^2, add=T, lty=3, col="blue")    # Plot a function as a curve
text(2, 60, "An annotation")   # Write text on plot
abline(h=50)                   # Add a horizontal line
abline(v=3, col="orange")      # Add a vertical line
```
```source("http://www.sr.bham.ac.uk/~ajrs/R/scripts/errorbar.R")  # source code
# Create some data to plot:
x <- seq(1,10); y <- x^2
xlo <- 0.9*x; xup <- 1.08*x; ylo <- 0.85*y; yup <- 1.2*y
```
Plot graph, but don't show the points (type="n") & plot errors:
```plot(x, y, log="xy", type="n")errorbar(x, xlo, xup, y, ylo, yup)
```
?plot gives more info on plotting options (as does ?par). To use different line colours, styles, thicknesses & errorbar types for different points:
```errorbar(x, xlo, xup, y, ylo, yup, col=rainbow(length(x)), lty=1:5,
type=c("b", "d", "c", "x"), lwd=1:3)
```
• Plotting a function or equation
```curve(x^2, from=1, to=100)             # Plot a function over a given range
curve(x^2, from=1, to=100, log="xy")   # With log-log axes
```
Plot chi-squared probability vs. reduced chi-squared for 10 and then 100 degrees of freedom
```dof <- 10
curve(pchisq(x*dof,df=dof), from=0.1, to=3, xlab="Reduced chi-squared",
ylab="Chi-sq distribution function")
abline(v=1, lty=2, col="blue")         # Plot dashed line for reduced chisq=1
dof <- 100
curve(pchisq(x*dof, df=dof), from=0.1, to=3, xlab="Reduced chi-squared",
ylab="Chi-sq distribution function")
abline(v=1, lty=2, col="blue")         # Plot dashed line for reduced chisq=1
```
Define & plot custom function:
```myfun <- function(x) x^3 + log(x)*sin(x)

#--Plot dotted (lty=3) red line, with 3x normal line width (lwd=3):
curve(myfun, from=1, to=10, lty=3, col="red", lwd=3)

#--Add a legend, inset slightly from top left of plot:
legend("topleft", inset=0.05, "My function", lty=3, lwd=3, col="red")
```
• Plotting maths symbols:
```demo(plotmath)    # Shows a demonstration of many examples
```
• # Analysis

• Basic stuff
```x <- rnorm(100)    # Create 100 Gaussian-distributed random numbers
hist(x)            # Plot histogram of x
summary(x)         # Some basic statistical info
d <- density(x)    # Compute kernel density estimate
d                  # Print info on density analysis ("?density" for details)
plot(d)            # Plot density curve for x
plot(density(x))   # Plot density curve directly
```
Plot histogram in terms of probability & also overlay density plot:
```hist(x, probability=T)
lines(density(x), col="red")   # overlay smoothed density curve
#--Standard stuff:
mean(x); median(x); max(x); min(x); sum(x); weighted.mean(x)
sd(x)              # Standard deviation
var(x)             # Variance
mad(x)             # median absolute deviation (robust sd)
```
• Testing the robustness of statisical estimators
```?replicate         # Repeatedly evaluate an expression
```
Calculate standard deviation of 100 Gaussian-distributed random numbers (NB default sd is 1; can specify with rnorm(100, sd=3) or whatever)
```sd(rnorm(100))
```
Now, let's run N=5 Monte Carlo simulations of the above test:
```replicate(5, sd(rnorm(100)))
```
and then compute the mean of the N=5 values:
```mean(replicate(5, sd(rnorm(100))))
```
Obviously 5 is rather small for a number of MC sims (repeat the above command & see how the answer fluctuates). Let's try 10,000 instead:
```mean(replicate(1e4, sd(rnorm(100))))
```
With a sample size of 100, the measured sd closely matches the actual value used to create the random data (sd=1). However, see what happens when the sample size decreases:
```mean(replicate(1e4, sd(rnorm(10))))    # almost a 3% bias low
mean(replicate(1e4, sd(rnorm(3))))     # >10% bias low
```
You can see the bias more clearly with a histogram or density plot:
```a <- replicate(1e4, sd(rnorm(3)))
hist(b, breaks=100, probability=T)     # Specify lots of bins
lines(density(b), col="red")           # Overlay kernel density estimate
abline(v=1, col="blue", lwd=3)         # Mark true value with thick blue line
```
This bias is important to consider when esimating the velocity dispersion of a stellar or galaxy system with a small number of tracers. The bias is even worse for the robust sd estimator mad():
```mean(replicate(1e4, mad(rnorm(10))))   # ~9% bias low
mean(replicate(1e4, mad(rnorm(3))))    # >30% bias low
```
However, in the presence of outliers, mad() is robust compared to sd() as seen with the following demonstration. First, create a function to add a 5 sigma outlier to a Gaussian random distribution:
```outlier <- function(N,mean=0,sd=1) {
a <- rnorm(N, mean=mean, sd=sd)
a <- mean+5*sd   # Replace 1st value with 5 sigma outlier
return(a)
}
```
Now compare the performance of mad() vs. sd():
```mean(replicate(1e4, mad(outlier(10))))
mean(replicate(1e4, sd(outlier(10))))
```
You can also compare the median vs. mean:
```mean(replicate(1e4, median(outlier(10))))
mean(replicate(1e4, mean(outlier(10))))
```
...and without the outlier:
```mean(replicate(1e4, median(rnorm(10))))
mean(replicate(1e4, mean(rnorm(10))))
```
• Kolmogorov-Smirnov test, taken from manual page (?ks.test)
```x <- rnorm(50)   # 50 Gaussian random numbers
y <- runif(30)   # 30 uniform random numbers
# Do x and y come from the same distribution?
ks.test(x, y)
```
• Correlation tests
Load in some data (see section on data frames, below):
```dfile <- "http://www.sr.bham.ac.uk/~ajrs/papers/sanderson06/table1.txt"
cor.test(A\$z, A\$Tx)   # Specify X & Y data separately
cor.test(~ z + Tx, data=A)  # Alternative format when using a data frame ("A")
cor.test(A\$z, A\$Tx, method="spearman")  # } Alternative methods
cor.test(A\$z, A\$Tx, method="kendall")   # }
```
• Spline interpolation of data
```x <- 1:20
y <- jitter(x^2, factor=20)       # Add some noise to vector
sf <- splinefun(x, y)             # Perform cubic spline interpolation

# - note that "sf" is a function:
sf(1.5)                           # Evaluate spline function at x=1.5
plot(x,y); curve(sf(x), add=T)    # Plot data points & spline curve
```
• Scatter plot smoothing
```#--Using above data frame ("A"):
plot(Tx ~ z, data=A)
#--Return (X & Y values of locally-weighted polynomial regression
lowess(A\$z, A\$Tx)
#--Plot smoothed data as line on graph:
lines(lowess(A\$z, A\$Tx), col="red")
```
• Numerical integration of a function (see Functions section below)
```fun <- function(x, norm, index) norm*x^index    # Create simple function

# See "?integrate" for details. Note that:
#  1) the names of arguments in "fun" must not match those of arguments
#     in "integrate()" itself.
#  2) fun must be able to return a vector, if supplied with a vector
#     (i.e. not just a single value)
#
i <- integrate(fun, lower=1, upper=10, norm=0.5, index=2)
> i
166.5 with absolute error < 1.8e-12
> i
list()                         # Note that integrate() returns a list
> names(i)                     # Show names components in list
 "value"        "abs.error"    "subdivisions" "message"      "call"
> i\$value                      # If you want the integral value alone
 166.5
```
• Regression
(see also the tutorial at Penn State's Center for Astrostatistics here, which is more thorough.) Load in some data (see section on data frames, below; this dataset is described in more detail in the section on factors, below):
```dfile <- "http://www.sr.bham.ac.uk/~ajrs/papers/sanderson06/table1.txt"
names(A)                   # Print column names in data frame
m <- lm(Tx ~ z, data=A)    # Fit linear model
summary(m)                 # Show best-fit parameters & errors
```
Specify 4 plots on same page, in 2x2 configuration:
```layout(matrix(1:4, nrow=2, ncol=2))
layout.show(4)             # Show plot layout for 4 plots
plot(m)                    # Shows useful diagnostics of the fit (4 plots)
```
Plot data & best-fit model:
```plot(Tx ~ z, A)
abline(m, col="blue")      # add best-fit straight line model
```
Plot residuals:
```plot(A\$z, resid(m))
```
Something fancier: show data, model & residuals in 2 panels:
```layout(matrix(c(1,1,2)),heights=c(1,1))  # Specify layout for panels
layout.show(2)             # Show plot layout for 2 plots
plot(Tx ~ z, data=A)
abline(m, col="blue")
plot(A\$z, resid(m), main="Residuals", col="blue", pch=20)
```
Nonlinear least-squares estimates of the parameters of a nonlinear model:
```?nls
```
Since R was developed with experimental (rather than observational) sciences in mind, it is not geared up to handle errors in both X and Y directions. Therefore the existing algorithms all perform regression in one direction only. However, unweighed orthogonal regression is equivalent to principal components analysis.
• # Data Frames

• Create a data frame:
```A <- data.frame(a=1:3, b=4:6, c=7:9)
colnames(A)        # List column names for A
rownames(A)        # List row names (will just be numbers if not specified)
```

#### Accessing data frames

```A\$a                 # Print the column named "a"
A[c("a", "c")]      # Print the columns listed in the square brackets
A                # Print column 2 of A (type=data frame)
A[]              # Print column 2 of A (type=vector)
A[,2]               # Print column 2 of A (type=vector)
A[2,]               # Print row 2 of A
A[1, ]              # Print the 1st row of data frame A
A[2:3, ]            # Print rows 2 to 3
A[, -1]             # Print everything except the 1st column of A
A                # Print type of object (data frame)
A[]           # (numeric, i.e. vector)
A[A\$a%%2 == 1, ]    # Print rows of A where column "a" has an odd number
A[A\$b<6 & A\$c>7, ]  # Print rows of A according to a combination of constraints
A[A\$b<6 | A\$c>7, ]  # ("&" = logical AND; "|" = logical OR)
A\$a <- NULL         # Delete a column from a data frame
```
To retrieve a named column, where the column name is itself a variable:
```name <- "b"
A[name]             # Produces a data frame
A[[name]]           # Produces a vector
```

#### Manipulating data frames

```B <- data.frame(d=11:13, e=14:16, f=17:19)
A + 1              # Perform arithmetic on each element of A
A * B              # Operates on each element, if A & B have same dimensions
A[order(A\$a, decreasing=T), ] # Sort A according to a specified column
t(A)               # Transpose A (swap rows for columns & vice versa)

# NB the type of t(A) is no longer a data frame, but you can make it so:
as.data.frame(t(A))
```
To join data frame B to A (side by side), you could use:
```A <- data.frame(A, B)     # Can also use "cbind(A, B)"
```
However, in a situation where this operation might be occuring iteratively, you could end up with multiple, repeated instances of B:
```A <- data.frame(A, B)
# To recover the original data, use:
A <- A[, 1:3]
```
A better way might be to use:
```A[colnames(B)] <- B
# - this will replace any existing named columns from B that are already
# present in A
```
To append one data frame below another (instead of side-by-side), use:
```rbind(A, A, A)
```
Note that if column names are specified, they must match in all data.frames for rbind to work:
```rbind(A, B)    # Fails!
colnames(B) <- colnames(A)  # Use same column names for B as for A
rbind(A, B)    # Works
```
• #### Working with data frames

• You can use attach(A) and detach(A) to add/remove data frame to/from the current search path, allowing you to use the column names as if they were standalone vectors. However, this approach is usually best avoided, since it can cause confusion if any objects exist with the same name as a column of an attached data frame. It's better to use with to achieve the same effect within a controlled setting:
• The with function enables you to work with column names without having to prefix the data frame name, as follows
```A <- data.frame(a=1:20, b=rnorm(20))
with(A, a^2 + 2*b)
```
• This principle is used in the transform function, to allow you to construct new columns in the data frame using references to the other column names:
```transform(A, c=a^2 + 2*b)  # Add new column to data frame
```
• Similarly, the subset function works in the same way to enable easy filtering of data frames:
```subset(A, b >= 0 & a%%2 == 0)  # Returns rows with positive b & even numbered a
```
• You can also easily perform database join operations, using merge:
```B <- data.frame(a=1:7, x=runif(7))
merge(A, B)   # Return rows with same "a", combining unique columns from A & B
```
• # Functions

• Basics
```cat            # Type function name without brackets to list contents
args(cat)      # Return arguments of any function
body(cat)      # Return main body of function
formals(fun)   # Get or set the formal arguments of a function
debug(fun); undebug(fun)   # Set or unset the debugging flag on a function
```
```> fun <- function(x, a, b, c) (a*x^2) + (b*x^2) + c
> fun(3, 1, 2, 3) 30> fun(5, 1, 2, 3)
 78
```
• A more complicated example of a function. First, create some data:
```set.seed(123)                    # allow reproducible random numbers
a <- rnorm(1000, mean=10, sd=1)  # 1000 Gaussian random numbers
b <- rnorm(100, mean=50, sd=15)  # smaller population of higher numbers
x <- c(a, b)                     # Combine datasets
hist(x)                          # Shows outlier population clearly
sd(x)                            # Strongly biased by outliers
mad(x)                           # Robustly estimates sd of main sample
mean(x)                          # biased
median(x)                        # robust
```
Now create a simple sigma clipping function (you can paste the following straight into R):
```sigma.clip <- function(vec, nclip=3, N.max=5) {
mean <- mean(vec); sigma <- sd(vec)
clip.lo <- mean - (nclip*sigma)
clip.up <- mean + (nclip*sigma)
vec <- vec[vec < clip.up & vec > clip.lo]    # Remove outliers
if ( N.max > 0 ) {
N.max <- N.max - 1
# Note the use of recursion here (i.e. the function calls itself):
vec <- Recall(vec, nclip=nclip, N.max=N.max)
}
return(vec)
}
```
Now apply the function to the test dataset:
```new <- sigma.clip(x)
hist(new)              # Only the main population remains!
length(new)            # } Compare numbers of
length(x)              # }  values before & after clipping
```
• # Factors

Factors are a vector type which treats the values as character strings which form part of a base set of values (called "levels"). The result of plotting a factor is to produce a frequency histogram of the number of entries in each level (see ?factor for more info).

• Basics
```a <- rpois(100, 3)   # Create 100 Poisson distributed random numbers (lambda=3)
hist(a)              # Plot frequency histogram of data
a                 # List type of vector ("numeric")
b <- as.factor(a)    # Change type to factor
```
List type of vector b ("factor"): also lists the "levels", i.e. all the different types of value:
```b
```
Now, plot b, to get a barchart showing the frequency of entries in each level:
```plot(b)
table(b)             # a tabular summary of the same information
```
• A more practical example. Read in table of data (from Sanderson et al. 2006), which refers to a sample of 20 clusters of galaxies with Chandra X-ray data:
```file <- "http://www.sr.bham.ac.uk/~ajrs/papers/sanderson06/table1.txt"
```
By default, read.table() will treat non-numeric values as factors. Sometimes this is annoying, in which case use as.is=T; you can specify the type of all the input columns explicitly -- see ?read.table.
```A[1,]       # Print 1st row of data frame (will also show column names):
plot(A\$det)     # Plot the numbers of each detector type
plot(A\$cctype)  # Plot the numbers of cool-core and non-cool core clusters
```
The det column is the detector used for the observation of each cluster: S for ACIS-S and I for ACIS-I. The cctype denotes the cool core type: CC for cool core, NCC for non-cool core.

Now, plot one factor against another, to get a plot showing the number of overlaps between the two categories (the cool-core clusters are almost all observed with ACIS-S):
```plot(cctype ~ det, data=a, xlab="ACIS Detector", ylab="Cool-core type")
```
You can easily explore the properties of the different categories, by plotting a factor against a numeric variable. The result is to show a boxplot (see ?boxplot) of the numeric variable for each catagory. For example, compare the mean temperature of cool-core & non-cool core clusters:
```plot(kT ~ cctype, data=A)
```
and compare the redshifts of clusters observed with ACIS-S & ACIS-I:
```plot(z ~ det, data=A)
```
You can define your own factor categories by partitioning a numeric vector of data into discrete bins using cut(), as shown here:
• Converting a numeric vector into factors
Split cluster redshift range into 2 bins of equal width (nearer & further clusters), and assign each z point accordingly; create new column of data frame with the bin assignments (A\$z.bin):
```A\$z.bin <- cut(A\$z, 2)
A\$z.bin                # Show levels & bin assignments
plot(A\$z.bin)          # Show bar chart
```
Plot mean temperatures of nearer & further clusters. Since this is essentially a flux-limited sample of galaxy clusters, the more distant clusters (higher redshift) are hotter (and hence more X-ray luminous - an example of the common selection effect known as Malmquist bias):
```plot(z ~ T.bin, data=A)
plot(Tx ~ z, data=A)      # More clearly seen!
```
Check if redshifts of (non) cool-core clusters differ:
```plot(cctype ~ z.bin, data=A)  # Equally mixed
```
Now, let's say you wanted to colour-code the different CC types. First, create a vector of colours by copying the cctype column (you can add this as a column to the data frame, or keep it as a separate vector):
```colour <- A\$cctype
```
Now all you need to do is change the names of the levels:
```levels(colour)                        # Show existing levels
levels(colour) <- c("blue", "red")    # specify blue=CC, red=NCC
colour           # Show data type (& levels, since it's is a factor)
```
Now change the type of the vector to character (i.e. not a factor):
```col <- as.character(colour)
col                       # Confirm change of data type
col                          # } Print values and
colour                       # }  note the difference
```
Now plot mean temperature vs. redshift, colour coded by CC type:
```plot(Tx ~ z, A, col=col)
```
Why not add a plot legend:
```legend(x="topleft", legend=c(levels(A\$cctype)), text.col=levels(colour))
```
Now for something a little bit fancier:
```plot(Tx ~ z, A, col=col, xlab="Redshift", ylab="Temperature (keV)")
legend(x="topleft", c(levels(A\$cctype)), col=levels(colour),
text.col=levels(colour), inset=0.02, pch=1)
```

For further information, you can find out more about how to access, manipulate, summarise, plot and analyse data using R.

Also, why not check out some of the graphs and plots shown in the R gallery, with the accompanying R source code used to create them.