R aggregate


aggregate() function splits a dataset into subsets, and executes summary statistics for each, then combine the results into a form to return.
aggregate(x, by, FUN, ..., simplify=TRUE, drop=TRUE) //for data frame
aggregate(formula, data, FUN, ..., subset, na.action=na.omit) //for formula
aggregate(x, nfrequency=1, FUN=sum, ndeltat=1, ts.eps=getOption("ts.eps"), ...) for time-series objects

by: a list of grouping elements, each as long as the variables in the data frame x. The elements are coerced to factors before use.
FUN: a function to compute the summary statistics which can be applied to all data subsets.
simplify: a logical indicating whether results should be simplified to a vector or matrix if possible.
formula: a formula, such as y ~ x or cbind(y1, y2) ~ x1 + x2, where the y variables are numeric data to be split into groups according to the grouping x variables (usually factors).
data: a data frame (or list) from which the variables in formula should be taken.
subset: an optional vector specifying a subset of observations to be used.
na.action: a function which indicates what should happen when the data contain NA values. The default is to ignore missing values in the given variables.
nfrequency: new number of observations per unit of time; must be a divisor of the frequency of x.
ndeltat: new fraction of the sampling period between successive observations; must be a divisor of the sampling interval of x.
ts.eps: tolerance used to decide if nfrequency is a sub-multiple of the original frequency.


Let's use the builtin dataset CO2 (Carbon Dioxide Uptake in Grass Plants).
> CO2



> aggregate(CO2$uptake,by=list(Plant=CO2$Plant),FUN=mean)
   Plant        x
1    Qn1 33.22857
2    Qn2 35.15714
3    Qn3 37.61429
4    Qc1 29.97143
5    Qc3 32.58571
6    Qc2 32.70000
7    Mn3 24.11429
8    Mn2 27.34286
9    Mn1 26.40000
10   Mc2 12.14286
11   Mc3 17.30000
12   Mc1 18.00000

> aggregate(formula=uptake ~ conc, data=CO2, FUN=mean)
  conc   uptake
1   95 12.25833
2  175 22.28333
3  250 28.87500
4  350 30.66667
5  500 30.87500
6  675 31.95000
7 1000 33.58333


> aggregate(formula=cbind(uptake,conc) ~ Plant, data=CO2, FUN=mean)
   Plant   uptake conc
1    Qn1 33.22857  435
2    Qn2 35.15714  435
3    Qn3 37.61429  435
4    Qc1 29.97143  435
5    Qc3 32.58571  435
6    Qc2 32.70000  435
7    Mn3 24.11429  435
8    Mn2 27.34286  435
9    Mn1 26.40000  435
10   Mc2 12.14286  435
11   Mc3 17.30000  435
12   Mc1 18.00000  435







endmemo.com © 2024  | Terms of Use | Privacy | Home