rbartpackages.missBART.missBART2¶
- class rbartpackages.missBART.missBART2(x, y, x_predict=None, *, n_reg_trees=100, n_class_trees=100, burn=1000, iters=1000, thin=2, predict=None, MH_sd=None, tree_prior_params=None, hypers=None, scale=True, include_x=True, include_y=True, show_progress=True, progress_every=10, pdp_range=(-0.5, 0.5), make_pdp=False, mice_impute=True, **hyperparams)[source]¶
Fit BART to outcomes with missing entries, imputing them.
Python interface to R’s
missBART::missBART2. It jointly fits a regression BART to the (possibly multivariate) outcomeyand a probit BART to its missingness pattern, imputing the missing entries ofyalong the MCMC. Missing entries of the predictorsxare handled by augmentingxwith binary missingness-indicator columns. Arguments left toNoneare omitted from the R call, so R computes its own defaults, described below.- Parameters:
x (
Float64[ndarray, 'n q']) – Predictor matrix; rows are observations. Missing entries, marked withNaN, augment it with one binary missingness-indicator column per predictor (they are not imputed, unlike those ofy).y (
Float64[ndarray, 'n p']) – Outcome matrix (one column per response) or vector;NaNmarks the entries to impute.x_predict (
Float64[ndarray, 'm q']|None, default:None) – Out-of-sample predictors at which to draw the posterior predictive; if omitted, no out-of-sample predictions are made (see Notes).n_reg_trees (
int, default:100) – Number of trees of the regression BART modelingy.n_class_trees (
int, default:100) – Number of trees of the probit BART modeling the missingness ofy.burn (
int, default:1000) – Number of burn-in MCMC iterations discarded.iters (
int, default:1000) – Number of post-burn-in iterations retained after thinning.thin (
int, default:2) – Thinning interval; the chain runsburn + thin * itersiterations.predict (
bool|None, default:None) – Whether to draw the posterior predictive atx_predict; see Notes for the interaction withx_predict.MH_sd (
float|None, default:None) – Standard deviation of the Metropolis-Hastings proposal updating the missing entries ofy; default0.5 / p.tree_prior_params (
TreePriorParams|None, default:None) – Tree-prior parameters as aTreePriorParamsdict; it must be complete, as passing it directly skipstree_list’s own defaults. Setting the individual parameters as keyword arguments instead (see**hyperparams) avoids this.hypers (
Hypers|None, default:None) – Prior hyperparameters as aHypersdict; it must be complete, as passing it directly skipshypers_list’s own defaults. Setting the individual hyperparameters as keyword arguments instead (see**hyperparams) avoids this.scale (
bool, default:True) – Whether to scaleyto[-0.5, 0.5]before fitting.include_x (
bool, default:True) – Whether the missingness probit model usesxas predictors.include_y (
bool, default:True) – Whether the missingness probit model usesyas predictors.show_progress (
bool, default:True) – Whether to display a progress bar in the R console.progress_every (
int, default:10) – Update the progress bar every this many iterations.pdp_range (
Float64[ndarray, '2']|tuple[float,float], default:(-0.5, 0.5)) – Range over which the partial dependence plot is evaluated (withmake_pdp).make_pdp (
bool, default:False) – Whether to compute partial dependence output; univariateyonly.mice_impute (
bool, default:True) – Whether the missing entries ofyare initialized withmice::mice; otherwise they start at zero.**hyperparams (
Unpack[Hyperparams]) – Extra keyword arguments, of theHyperparamskeys, forwarded verbatim (R’s...), which populate the unset entries oftree_prior_params(throughtree_list) andhypers(throughhypers_list); the intended way to set individual tree-prior parameters and hyperparameters.
- Raises:
ValueError – If
predict=Trueis passed withoutx_predict.
Notes
If
x_predictis not specified, the wrapper passespredict=Falseand a placeholderx_predict, because the R code crashes on its own defaultx_predict = c()(as.matrix(NULL)is an error). Explicitly passingpredict=Truewithoutx_predictraisesValueError.The R arguments
true_trees_data,true_trees_missing,true_change_pointsandtrue_change_points_missare accepted but never used by the upstream implementation, so they are not exposed (they remain reachable through**hyperparamsif ever needed).R documentation
title ----- Title name ---- missBART2 alias ----- missBART2 description ----------- Title usage ----- missBART2( x, y, x_predict = c(), n_reg_trees = 100, n_class_trees = 100, burn = 1000, iters = 1000, thin = 2, predict = TRUE, MH_sd = 0.5, tree_prior_params = tree_list(...), hypers = hypers_list(...), scale = TRUE, include_x = TRUE, include_y = TRUE, show_progress = TRUE, progress_every = 10, pdp_range = c(-0.5, 0.5), make_pdp = FALSE, mice_impute = TRUE, true_trees_data = NA, true_trees_missing = NA, z_true, true_change_points = NA, true_change_points_miss = NA, ... ) arguments --------- x covariates y response x_predict out-of-sample covariates. If not specificied, the default is set to NA and no out-of-sample predictions will be made. n_reg_trees number of BART trees n_class_trees number of probit BART trees burn burn-in samples iters post-burn-in samples thin thinning predict make out-of-sample predictions? MH_sd standard deviation for MH proposal for missing Y tree_prior_params prior parameters for BART trees hypers prior parameters for BART parameters scale scale data? include_x Include x in probit model? include_y Include y in probit model? show_progress logical progress_every integer value stating how often to update the progress bar. pdp_range range for partial dependence plots make_pdp logical indicating whether to produce a partial dependence plot mice_impute logical indicating whether to impute missing values via mice prior to prior calibration true_trees_data true trees for BART component true_trees_missing true trees for probit BART component z_true true latent variable for probit BART component true_change_points true change points for BART trees true_change_points_miss true change points for probit BART trees ... Catches unused arguments value ----- a list containing BART predictions and imputed values examples -------- # x <- matrix(runif(6), ncol = 2) # y <- matrix(runif(6), ncol = 2) %*% matrix(rnorm(4), ncol=2) # bart_out <- missBART2(x, y, n_trees = 2, burn = 2, # iters = 2, thin = 1, scale = FALSE)
- max_y: Float64[ndarray, 'p']¶
Per-output-column maxima of
ycomputed before scaling. Used to invert the [-0.5, 0.5] scaling when reporting predictions.
- min_y: Float64[ndarray, 'p']¶
Per-output-column minima of
ycomputed before scaling.
- x: Float64[ndarray, 'n q'] | Float64[ndarray, 'n 2*q']¶
Covariate matrix actually used by the sampler. If the input
xcontained missing values, this is the input augmented column-wise with binary missingness indicators (one per original column). The missingness indicator columns come all together after the value columns.
- y_miss_accept: Bool[ndarray, 'total_iters n_missing']¶
Acceptance flags of the Metropolis-Hastings proposals for the missing
yentries. One row per MCMC iteration (including burn-in), one column per missing entry, listed in column-major order ofy.
- MH_sd: float¶
Standard deviation of the Metropolis-Hastings proposal used to update the missing entries of
y. If not supplied at construction, the R code sets it to0.5 / p.
- thin: int¶
Thinning interval applied to the post-burn-in chain. The total number of MCMC iterations is
burn + thin * iters.
- new_y_post: Float64[ndarray, 'iters n_predict p'] | None = None¶
Posterior predictive draws (incl. error term) at the out-of-sample covariates
x_predict, on the original scale.Noneifpredict=Falseorx_predictwas not supplied. Withscale=Falsethe values are garbled because the upstream code applies the un-scaling anyway.
- pdp_out: Any | None = None¶
Partial dependence plot output.
Noneunlessmake_pdp=Trueandyis univariate.
- y_post: Float64[ndarray, 'iters n p']¶
Posterior draws of the BART regression mean for the training rows, on the original (un-scaled) scale of
y.
- z_post: Float64[ndarray, 'iters n p']¶
Posterior draws of the latent probit variables of the missingness model.
- omega_post: Float64[ndarray, 'iters 1 1'] | Float64[ndarray, 'iters p'] | Float64[ndarray, 'iters p p']¶
Posterior draws of the residual variance of the BART regression, on the original scale of
y. Shape(iters, 1, 1)for univariatey; for multivariateythe full covariance matrix(iters, p, p)withscale=False, but only its diagonal(iters, p)withscale=True.
- y_impute: Float64[ndarray, 'iters n_missing']¶
Posterior draws of the imputed values for the missing entries of
y, on the original scale. Columns are ordered as iny_miss_accept.
- var_imp: list[Float64[ndarray, '...']]¶
Per-retained-iteration variable importance scores derived from the classification (probit) BART trees that model missingness. The upstream code stores one entry per variable that was actually used as a split during that iteration, so the per-iteration vector length varies and the attribute is left as a list of arrays.
- y_pred: list¶
In-sample posterior predictive draws. Currently always empty in the upstream R implementation.
- reg_trees: list[list[Shaped[ndarray, 'num_nodes']]]¶
Accepted regression-BART tree structures, indexed as
reg_trees[i][j]for retained iterationiand treej. Each tree is a numpy structured array whose records carry the fieldsparent,lower,upper,split_variable,split_value,depth,direction,NA_direction.
- class_trees: list[list[Shaped[ndarray, 'num_nodes']]]¶
Accepted probit-BART tree structures for the missingness model, same layout as
reg_trees.