Cross-validated hierarchical nested regularization for subgroup models
Source:R/cv_hierNest.R
cv.hierNest.RdFits regularization paths for hierarchical subgroup-specific penalized learning problems, leveraging nested group structure (such as Major Diagnostic Categories [MDC] and Diagnosis-Related Groups [DRG]) with options for lasso or overlapping group lasso penalties. Performs cross-validation to select tuning parameters for penalization and subgroup structure.
This function enables information sharing across related subgroups by reparameterizing covariate effects into overall, group-specific, and subgroup-specific components and supports structured shrinkage through hierarchical regularization. Users may select between the lasso (hierNest-Lasso) and overlapping group lasso (hierNest-OGLasso) frameworks, as described in Jiang et al. (2024, submitted, see details below).
Usage
cv.hierNest(
x,
y,
group = NULL,
family = c("gaussian", "binomial"),
nlambda = 100,
lambda.factor = NULL,
pred.loss = c("default", "mse", "deviance", "mae", "misclass", "ROC"),
lambda = NULL,
pf_group = NULL,
pf_sparse = NULL,
intercept = FALSE,
asparse1 = c(0.5, 20),
asparse2 = c(0.01, 0.2),
asparse1_num = 4,
asparse2_num = 4,
standardize = TRUE,
lower_bnd = -Inf,
upper_bnd = Inf,
eps = 1e-08,
maxit = 3e+06,
hier_info = NULL,
method = "overlapping",
partition = "subgroup",
cvmethod = "general"
)Arguments
- x
Matrix of predictors, of dimension \(n \times p\); each row is an observation. Can be a dense or sparse matrix.
- y
Response variable. For `family="gaussian"`, should be numeric. For `family="binomial"`, should be a factor with two levels or a numeric vector with two unique values.
- group
Optional vector or factor indicating group assignments for variables. Used for custom grouping.
- family
Character string specifying the model family. Options are `"gaussian"` (default) for least-squares regression, or `"binomial"` for logistic regression.
- nlambda
Number of lambda values to use for regularization path. Default is 100.
- lambda.factor
Factor determining the minimal value of lambda in the sequence, where `min(lambda) = lambda.factor * max(lambda)`. See Details.
- pred.loss
Character string indicating loss to minimize during cross-validation. Options include `"default"`, `"mse"`, `"deviance"`, `"mae"`, `"misclass"`, and `"ROC"`.
- lambda
Optional user-supplied sequence of lambda values (overrides `nlambda`/`lambda.factor`).
- pf_group
Optional penalty factors on the groups, as a numeric vector. Default adjusts for group size.
- pf_sparse
Optional penalty factors on the l1-norm (for sparsity), as a numeric vector.
- intercept
Logical; whether to include an intercept in the model. Default is TRUE.
- asparse1
Relative weight(s) for the first (e.g., group) layer of the overlapping group lasso penalty. Default is c(0.5, 20).
- asparse2
Relative weight(s) for the second (e.g., subgroup) layer. Default is c(0.01, 0.2).
- asparse1_num
Number of values in asparse1 grid (for grid search). Default is 4.
- asparse2_num
Number of values in asparse2 grid (for grid search). Default is 4.
- standardize
Logical; whether to standardize predictors prior to model fitting. Default is TRUE.
- lower_bnd
Lower bound(s) for coefficient values. Default is
-Inf.- upper_bnd
Upper bound(s) for coefficient values. Default is
Inf.- eps
Convergence tolerance for optimization. Default is 1e-8.
- maxit
Maximum number of optimization iterations. Default is 3e6.
- hier_info
Required for `method = "overlapping"`; a matrix describing the hierarchical structure of the subgroups (see Details).
- method
Character; either `"overlapping"` for overlapping group lasso, `"sparsegl"` for sparse group lasso, or `"general"` for other hierarchical regularization. Default is `"overlapping"`.
- partition
Character string; determines subgroup partitioning. Default is `"subgroup"`.
- cvmethod
Cross-validation method. Options include `"general"` (default), `"grid_search"`, or `"user_supply"` (for user-supplied grid).
Value
An object containing the fitted hierarchical model and cross-validation results, including:
- fit
Fitted model object.
- lambda
Sequence of lambda values considered.
- cv_error
Cross-validation error/loss for each combination of tuning parameters.
- best_params
Best tuning parameters selected.
- ...
Additional diagnostic and output fields.
Details
The hierarchical nested framework decomposes covariate effects into overall, group, and subgroup-specific components, with regularization encouraging fusion or sparsity across these hierarchical levels. The function can fit both the lasso penalty (allowing arbitrary zero/non-zero patterns) and the overlapping group lasso penalty (enforcing hierarchical selection structure), as described in Jiang et al. (2024, submitted).
The argument `hier_info` must be supplied for `"overlapping"` method, and encodes the hierarchical relationship between groups and subgroups (e.g., MDCs and DRGs).
Cross-validation is used to select tuning parameters, optionally over a grid for hierarchical penalty weights (asparse1, asparse2), and the regularization parameter lambda.