R aggregate
aggregate() function splits a dataset into subsets, and executes summary statistics for each, then combine the results
into a form to return.
aggregate(x, by, FUN, ..., simplify=TRUE, drop=TRUE) //for data frame
aggregate(formula, data, FUN, ..., subset, na.action=na.omit) //for formula
aggregate(x, nfrequency=1, FUN=sum, ndeltat=1, ts.eps=getOption("ts.eps"), ...) for time-series objects
by: a list of grouping elements, each as long as the variables in the data frame x. The elements are coerced to factors before use.
FUN: a function to compute the summary statistics which can be applied to all data subsets.
simplify: a logical indicating whether results should be simplified to a vector or matrix if possible.
formula: a formula, such as y ~ x or cbind(y1, y2) ~ x1 + x2, where the y variables are numeric data to be split into groups according to the grouping x variables (usually factors).
data: a data frame (or list) from which the variables in formula should be taken.
subset: an optional vector specifying a subset of observations to be used.
na.action: a function which indicates what should happen when the data contain NA values. The default is to ignore missing values in the given variables.
nfrequency: new number of observations per unit of time; must be a divisor of the frequency of x.
ndeltat: new fraction of the sampling period between successive observations; must be a divisor of the sampling interval of x.
ts.eps: tolerance used to decide if nfrequency is a sub-multiple of the original frequency.
Let's use the builtin dataset CO2 (Carbon Dioxide Uptake in Grass Plants).
> CO2
> aggregate(CO2$uptake,by=list(Plant=CO2$Plant),FUN=mean)
Plant x
1 Qn1 33.22857
2 Qn2 35.15714
3 Qn3 37.61429
4 Qc1 29.97143
5 Qc3 32.58571
6 Qc2 32.70000
7 Mn3 24.11429
8 Mn2 27.34286
9 Mn1 26.40000
10 Mc2 12.14286
11 Mc3 17.30000
12 Mc1 18.00000
> aggregate(formula=uptake ~ conc, data=CO2, FUN=mean)
conc uptake
1 95 12.25833
2 175 22.28333
3 250 28.87500
4 350 30.66667
5 500 30.87500
6 675 31.95000
7 1000 33.58333
> aggregate(formula=cbind(uptake,conc) ~ Plant, data=CO2, FUN=mean)
Plant uptake conc
1 Qn1 33.22857 435
2 Qn2 35.15714 435
3 Qn3 37.61429 435
4 Qc1 29.97143 435
5 Qc3 32.58571 435
6 Qc2 32.70000 435
7 Mn3 24.11429 435
8 Mn2 27.34286 435
9 Mn1 26.40000 435
10 Mc2 12.14286 435
11 Mc3 17.30000 435
12 Mc1 18.00000 435