R provides a variety of methods for summarising data in tabular and other forms.

# View data structure

• Before you do anything else, it is important to understand the structure of your data and that of any objects derived from it.
```A <- data.frame(a=LETTERS[1:10], x=1:10)
class(A)          # "data.frame"
sapply(A, class)  # show classes of all columns
typeof(A)         # "list"
names(A)          # show list components
dim(A)            # dimensions of object, if any
head(A)           # extract first few (default 6) parts
tail(A, 1)        # extract last row
head(1:10, -1)    # extract everything except the last element
```
• It is sometimes useful to work with a smaller version of a large data frame, by creating a representative subset of the data, via random sampling:
```A.small <- A[sample(nrow(A), 4), ]   # select 4 rows at random
```

# Basic numerical summaries

• Generate and summarise some random numbers:
```a <- rnorm(50)
summary(a)           # gives min, max, mean, median, 1st & 3rd quartiles
min(a); max(a)       # }
range(a)             # } self-explanatory
mean(a); median(a)   # }
sd(a); mad(a)        # standard deviation, median absolute deviation
IQR(a)               # interquartile range
quantile(a)          # quartiles (by default)
quantile(a, c(1, 3)/4)  # specific percentiles (25% & 75% in this case)
```
• Data frame summaries:
```A <- data.frame(a=rnorm(10), b=rpois(10, lambda=10))
summary(A)           # summarise data frame
apply(A, 1, mean)    # calculate row means
apply(A, 2, mean)    # calculate column means: same as "mean(A)"
```
• which.min & which.max return the element number of the lowest/highest value:
```set.seed(123)     # allow reproducible random numbers
x <- sample(10)
> which.max(x)
 7
> x[which.max(x)]
 10
```
This can be used in a data frame to extract the corresponding row containing the min/max value of one of the columns:
```A <- data.frame(x=rnorm(10), y=runif(10))
A[which.min(A\$x), ]
#--Alternatively:
subset(A, x == min(x))
```
• Other summaries:
```x <- rnorm(100)
fivenum(x)       # Tukey's five number summary, used to construct a boxplot:
boxplot(x)       # see ?boxplot.stats for more details
stem(x)          # A stem-and-leaf plot
```
• Matrix summaries:
```A <- matrix(rnorm(50), nrow=10)    # create 10x5 random number matrix
colSums(A); rowSums(A); colMeans(A), rowMeans(A)    # self-explanatory
max.col(A)    # maximum position for each row of a matrix, same as:
which.max(A[1,]); which.max(A[2,])    # etc.
```

# Tables

• Load some data on a sample of 20 galaxy clusters with a categorical classification status (cctype) indicating whether there is a cool core or not and a factor (det) specifying which of two detectors was used to make the X-ray observation of the cluster:
```file <- "http://www.sr.bham.ac.uk/~ajrs/papers/sanderson09/sanderson09_table2.txt"
#
table(a\$cctype)                   # count numbers in each cctype category
table(a\$cctype, a\$det)            # 2-way table
xtabs(~ cctype + det, data=a)     # alternative (formula) syntax
addmargins(xtabs(~ cctype + det, data=a))   # add row/col summary (default is sum)
prop.table(xtabs(~ cctype + det, data=a))   # show counts as proportions of total
```
• To test whether the input factors are independent of each other:
```chisq.test(xtabs(~ det + cctype, data=a), simulate.p.value=TRUE)
```
-there is marginal evidence (p=0.07) of an interaction: clusters observed with ACIS-S are more likely to have a cool core than not.

# Calculate aggregate statistics

• Calculate numerical summaries for subsets of a data frame (using above dataset):
```> aggregate( kT ~ cctype, data=a, FUN=mean)
cctype       kT
1     CC 5.121111
2 non-CC 6.146364

# mean cluster redshift of each cctype for each detector:
> aggregate(z ~ cctype + det, data=a, FUN=mean)
cctype det          z
1     CC   I 0.06070000
2 non-CC   I 0.05137500
3     CC   S 0.04105714
4 non-CC   S 0.03636667

#--Show mean values of a few quantitied, for each cctype:
aggregate(. ~ cctype, data=a[c("cctype", "z", "kT", "Z", "S01", "index")], mean)
```
• You can also apply multi-number summaries:
```> aggregate( index ~ cctype, data=a, FUN=range)
cctype index.1 index.2
1     CC   0.714   1.120
2 non-CC   0.283   0.944
```

For further information, you can find out more about how to access, manipulate, summarise, plot and analyse data using R.

Also, why not check out some of the graphs and plots shown in the R gallery, with the accompanying R source code used to create them.