R stabsel.mboostLSS
Selection of influential variables or model components with error control.
stabsel.mboostLSS is located in package gamboostLSS. Please install and load package gamboostLSS before use.
## a method to compute stability selection paths for fitted mboostLSS models
## S3 method for class 'mboostLSS'
stabsel(x, cutoff, q, PFER, mstop = NULL,
folds = subsample(model.weights(x), B = B),
B = ifelse(sampling.type == "MB", 100, 50),
assumption = c("unimodal", "r-concave", "none"),
sampling.type = c("SS", "MB"),
papply = mclapply, verbose = TRUE, FWER, eval = TRUE, ...)
## a method to get the selected parameters
## S3 method for class 'stabsel_mboostLSS'
selected(object, parameter = NULL, ...)
x
an fitted model of class "mboostLSS" or "nc_mboostLSS".
cutoff
cutoff between 0.5 and 1. Preferably a value between 0.6 and 0.9 should be used.
q
number of (unique) selected variables (or groups of variables depending on the model) that are selected on each subsample.
PFER
upper bound for the per-family error rate. This specifies the amount of falsely selected base-learners, which is tolerated. See details.
mstop
mstop value to use, if no value is supplied the mstop value of the fitted model is used.
folds
a weight matrix with number of rows equal to the number of observations, see cvrisk and subsample. Usually one should not change the default here as subsampling with a fraction of 1/2 is needed for the error bounds to hold. One usage scenario where specifying the folds by hand might be the case when one has dependent data (e.g. clusters) and thus wants to draw clusters (i.e., multiple rows together) not individuals.
assumption
Defines the type of assumptions on the distributions of the selection probabilities and simultaneous selection probabilities. Only applicable for sampling.type = "SS". For sampling.type = "MB" we always use code"none".
sampling.type
use sampling scheme of of Shah & Samworth (2013), i.e., with complementarty pairs (sampling.type = "SS"), or the original sampling scheme of Meinshausen & Buehlmann (2010).
B
number of subsampling replicates. Per default, we use 50 complementary pairs for the error bounds of Shah & Samworth (2013) and 100 for the error bound derived in Meinshausen & Buehlmann (2010). As we use B complementray pairs in the former case this leads to 2B subsamples.
papply
(parallel) apply function, defaults to mclapply. Alternatively, parLapply can be used. In the latter case, usually more setup is needed (see example of cvrisk for some details).
verbose
logical (default: TRUE) that determines wether warnings should be issued.
FWER
deprecated. Only for compatibility with older versions, use PFER instead.
eval
logical. Determines whether stability selection is evaluated (eval = TRUE; default) or if only the parameter combination is returned.
object
a object of class "stabsel_mboostLSS".
parameter
select one or multiple effects.
...
additional arguments to parallel apply methods such as mclapply and to cvrisk.
install.packages("gamboostLSS", repo="http://cran.r-project.org", dep=T)
library(gamboostLSS)
### Data generating process:
set.seed(1907)
x1 <- rnorm(500)
x2 <- rnorm(500)
x3 <- rnorm(500)
x4 <- rnorm(500)
x5 <- rnorm(500)
x6 <- rnorm(500)
mu <- exp(1.5 +1 * x1 +0.5 * x2 -0.5 * x3 -1 * x4)
sigma <- exp(-0.4 * x3 -0.2 * x4 +0.2 * x5 +0.4 * x6)
y <- numeric(500)
for( i in 1:500)
y[i] <- rnbinom(1, size = sigma[i], mu = mu[i])
dat <- data.frame(x1, x2, x3, x4, x5, x6, y)
### linear model with y ~ . for both components: 400 boosting iterations
model <- glmboostLSS(y ~ ., families = NBinomialLSS(), data = dat,
control = boost_control(mstop = 400),
center = TRUE, method = "noncyclic")
### Do not test the following code per default on CRAN as it takes some time to run:
#run stability selection
(s <- stabsel(model, q = 5, PFER = 1))
#get selected effects
selected(s)
#visualize selection frequencies
plot(s)
### END (don't test automatically)
Return Values:
An object of class stabsel with a special print method. The object has the following elements:
selected
elements with maximal selection probability greater cutoff.
max
maximum of selection probabilities.
q
average number of selected variables used.
sampling.type
the sampling type used for stability selection.
assumption
the assumptions made on the selection probabilities.
Details: Stability selection is to be preferably used with non-cyclic gamboostLSS models, as proposed by Thomas et al. (2018). In this publication, the combination of package gamboostLSS with stability selection was devoloped and is investigated in depth.
For details on stability selection see stabsel in package stabs and Hofner et al. (2014).
See Also: stabsel and stabsel_parameters
References:B. Hofner, L. Boccuto and M. Goeker (2015), Controlling false discoveries in high-dimensional situations: Boosting with stability selection. BMC Bioinformatics, 16:144. N. Meinshausen and P. Buehlmann (2010), Stability selection. Journal of the Royal Statistical Society, Series B, 72, 417–473. R.D. Shah and R.J. Samworth (2013), Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society, Series B, 75, 55–80. Thomas, J., Mayr, A., Bischl, B., Schmid, M., Smith, A., and Hofner, B. (2018), Gradient boosting for distributional regression - faster tuning and improved variable selection via noncyclical updates. Statistics and Computing. 28: 673-687. DOI 10.1007/s11222-017-9754-6
(Preliminary version: http://arxiv.org/abs/1611.10171).