rbartpackages.missBART.missBART2

class rbartpackages.missBART.missBART2(x, y, x_predict=None, *, n_reg_trees=100, n_class_trees=100, burn=1000, iters=1000, thin=2, predict=None, MH_sd=None, tree_prior_params=None, hypers=None, scale=True, include_x=True, include_y=True, show_progress=True, progress_every=10, pdp_range=(-0.5, 0.5), make_pdp=False, mice_impute=True, **hyperparams)[source]

Fit BART to outcomes with missing entries, imputing them.

Python interface to R’s missBART::missBART2. It jointly fits a regression BART to the (possibly multivariate) outcome y and a probit BART to its missingness pattern, imputing the missing entries of y along the MCMC. Missing entries of the predictors x are handled by augmenting x with binary missingness-indicator columns. Arguments left to None are omitted from the R call, so R computes its own defaults, described below.

Parameters:
  • x (Float64[ndarray, 'n q']) – Predictor matrix; rows are observations. Missing entries, marked with NaN, augment it with one binary missingness-indicator column per predictor (they are not imputed, unlike those of y).

  • y (Float64[ndarray, 'n p']) – Outcome matrix (one column per response) or vector; NaN marks the entries to impute.

  • x_predict (Float64[ndarray, 'm q'] | None, default: None) – Out-of-sample predictors at which to draw the posterior predictive; if omitted, no out-of-sample predictions are made (see Notes).

  • n_reg_trees (int, default: 100) – Number of trees of the regression BART modeling y.

  • n_class_trees (int, default: 100) – Number of trees of the probit BART modeling the missingness of y.

  • burn (int, default: 1000) – Number of burn-in MCMC iterations discarded.

  • iters (int, default: 1000) – Number of post-burn-in iterations retained after thinning.

  • thin (int, default: 2) – Thinning interval; the chain runs burn + thin * iters iterations.

  • predict (bool | None, default: None) – Whether to draw the posterior predictive at x_predict; see Notes for the interaction with x_predict.

  • MH_sd (float | None, default: None) – Standard deviation of the Metropolis-Hastings proposal updating the missing entries of y; default 0.5 / p.

  • tree_prior_params (TreePriorParams | None, default: None) – Tree-prior parameters as a TreePriorParams dict; it must be complete, as passing it directly skips tree_list’s own defaults. Setting the individual parameters as keyword arguments instead (see **hyperparams) avoids this.

  • hypers (Hypers | None, default: None) – Prior hyperparameters as a Hypers dict; it must be complete, as passing it directly skips hypers_list’s own defaults. Setting the individual hyperparameters as keyword arguments instead (see **hyperparams) avoids this.

  • scale (bool, default: True) – Whether to scale y to [-0.5, 0.5] before fitting.

  • include_x (bool, default: True) – Whether the missingness probit model uses x as predictors.

  • include_y (bool, default: True) – Whether the missingness probit model uses y as predictors.

  • show_progress (bool, default: True) – Whether to display a progress bar in the R console.

  • progress_every (int, default: 10) – Update the progress bar every this many iterations.

  • pdp_range (Float64[ndarray, '2'] | tuple[float, float], default: (-0.5, 0.5)) – Range over which the partial dependence plot is evaluated (with make_pdp).

  • make_pdp (bool, default: False) – Whether to compute partial dependence output; univariate y only.

  • mice_impute (bool, default: True) – Whether the missing entries of y are initialized with mice::mice; otherwise they start at zero.

  • **hyperparams (Unpack[Hyperparams]) – Extra keyword arguments, of the Hyperparams keys, forwarded verbatim (R’s ...), which populate the unset entries of tree_prior_params (through tree_list) and hypers (through hypers_list); the intended way to set individual tree-prior parameters and hyperparameters.

Raises:

ValueError – If predict=True is passed without x_predict.

Notes

If x_predict is not specified, the wrapper passes predict=False and a placeholder x_predict, because the R code crashes on its own default x_predict = c() (as.matrix(NULL) is an error). Explicitly passing predict=True without x_predict raises ValueError.

The R arguments true_trees_data, true_trees_missing, true_change_points and true_change_points_miss are accepted but never used by the upstream implementation, so they are not exposed (they remain reachable through **hyperparams if ever needed).

R documentation

title
-----

Title

name
----

missBART2

alias
-----

missBART2

description
-----------

 Title


usage
-----


 missBART2(
   x,
   y,
   x_predict = c(),
   n_reg_trees = 100,
   n_class_trees = 100,
   burn = 1000,
   iters = 1000,
   thin = 2,
   predict = TRUE,
   MH_sd = 0.5,
   tree_prior_params = tree_list(...),
   hypers = hypers_list(...),
   scale = TRUE,
   include_x = TRUE,
   include_y = TRUE,
   show_progress = TRUE,
   progress_every = 10,
   pdp_range = c(-0.5, 0.5),
   make_pdp = FALSE,
   mice_impute = TRUE,
   true_trees_data = NA,
   true_trees_missing = NA,
   z_true,
   true_change_points = NA,
   true_change_points_miss = NA,
   ...
 )


arguments
---------


 x covariates

 y response

 x_predict out-of-sample covariates. If not specificied, the default is set to NA and no out-of-sample predictions will be made.

 n_reg_trees number of BART trees

 n_class_trees number of probit BART trees

 burn burn-in samples

 iters post-burn-in samples

 thin thinning

 predict make out-of-sample predictions?

 MH_sd standard deviation for MH proposal for missing Y

 tree_prior_params prior parameters for BART trees

 hypers prior parameters for BART parameters

 scale scale data?

 include_x Include x in probit model?

 include_y Include y in probit model?

 show_progress logical

 progress_every integer value stating how often to update the progress bar.

 pdp_range range for partial dependence plots

 make_pdp logical indicating whether to produce a partial dependence plot

 mice_impute logical indicating whether to impute missing values via mice prior to prior calibration

 true_trees_data true trees for BART component

 true_trees_missing true trees for probit BART component

 z_true true latent variable for probit BART component

 true_change_points true change points for BART trees

 true_change_points_miss true change points for probit BART trees

 ... Catches unused arguments


value
-----


 a list containing BART predictions and imputed values


examples
--------


 # x <- matrix(runif(6), ncol = 2)
 # y <- matrix(runif(6), ncol = 2) %*% matrix(rnorm(4), ncol=2)
 # bart_out <- missBART2(x, y, n_trees = 2, burn = 2,
 #                       iters = 2, thin = 1, scale = FALSE)
max_y: Float64[ndarray, 'p']

Per-output-column maxima of y computed before scaling. Used to invert the [-0.5, 0.5] scaling when reporting predictions.

min_y: Float64[ndarray, 'p']

Per-output-column minima of y computed before scaling.

x: Float64[ndarray, 'n q'] | Float64[ndarray, 'n 2*q']

Covariate matrix actually used by the sampler. If the input x contained missing values, this is the input augmented column-wise with binary missingness indicators (one per original column). The missingness indicator columns come all together after the value columns.

y_miss_accept: Bool[ndarray, 'total_iters n_missing']

Acceptance flags of the Metropolis-Hastings proposals for the missing y entries. One row per MCMC iteration (including burn-in), one column per missing entry, listed in column-major order of y.

MH_sd: float

Standard deviation of the Metropolis-Hastings proposal used to update the missing entries of y. If not supplied at construction, the R code sets it to 0.5 / p.

burn: int

Number of burn-in MCMC iterations (discarded).

iters: int

Number of post-burn-in MCMC iterations retained after thinning.

thin: int

Thinning interval applied to the post-burn-in chain. The total number of MCMC iterations is burn + thin * iters.

new_y_post: Float64[ndarray, 'iters n_predict p'] | None = None

Posterior predictive draws (incl. error term) at the out-of-sample covariates x_predict, on the original scale. None if predict=False or x_predict was not supplied. With scale=False the values are garbled because the upstream code applies the un-scaling anyway.

pdp_out: Any | None = None

Partial dependence plot output. None unless make_pdp=True and y is univariate.

y_post: Float64[ndarray, 'iters n p']

Posterior draws of the BART regression mean for the training rows, on the original (un-scaled) scale of y.

z_post: Float64[ndarray, 'iters n p']

Posterior draws of the latent probit variables of the missingness model.

omega_post: Float64[ndarray, 'iters 1 1'] | Float64[ndarray, 'iters p'] | Float64[ndarray, 'iters p p']

Posterior draws of the residual variance of the BART regression, on the original scale of y. Shape (iters, 1, 1) for univariate y; for multivariate y the full covariance matrix (iters, p, p) with scale=False, but only its diagonal (iters, p) with scale=True.

y_impute: Float64[ndarray, 'iters n_missing']

Posterior draws of the imputed values for the missing entries of y, on the original scale. Columns are ordered as in y_miss_accept.

var_imp: list[Float64[ndarray, '...']]

Per-retained-iteration variable importance scores derived from the classification (probit) BART trees that model missingness. The upstream code stores one entry per variable that was actually used as a split during that iteration, so the per-iteration vector length varies and the attribute is left as a list of arrays.

y_pred: list

In-sample posterior predictive draws. Currently always empty in the upstream R implementation.

reg_trees: list[list[Shaped[ndarray, 'num_nodes']]]

Accepted regression-BART tree structures, indexed as reg_trees[i][j] for retained iteration i and tree j. Each tree is a numpy structured array whose records carry the fields parent, lower, upper, split_variable, split_value, depth, direction, NA_direction.

class_trees: list[list[Shaped[ndarray, 'num_nodes']]]

Accepted probit-BART tree structures for the missingness model, same layout as reg_trees.

reg_mu: list[list[Float64[ndarray, 'n_leaves p']]]

Leaf-node parameters of the regression-BART trees, indexed as reg_mu[i][j] for iteration i and tree j. The outer list has length burn + thin * iters (i.e. it includes burn-in iterations, unlike reg_trees); each leaf array has shape (n_leaves, p).

class_mu: list[list[Float64[ndarray, 'n_leaves p']]]

Leaf-node parameters of the probit-BART trees, same layout as reg_mu. The trailing dimension is p (one mean per response column, since the probit trees model the per-column missingness indicators of y).