rbartpackages.BART3.gbart

class rbartpackages.BART3.gbart(x_train, y_train, x_test=None, *, type='wbart', sparse=False, theta=0.0, omega=1.0, a=0.5, b=1.0, augment=False, rho=0.0, grp=None, varprob=None, xinfo=None, usequants=False, rm_const=True, sigest=None, sigdf=3.0, sigquant=0.9, k=2.0, power=2.0, base=0.95, impute_mult=None, impute_prob=None, impute_miss=None, lambda_=None, tau_num=None, offset=None, w=None, ntree=None, numcut=100, ndpost=1000, nskip=100, keepevery=None, printevery=100, transposed=False, probs=(0.025, 0.975), mc_cores=None, nice=19, seed=99, meta=False, verbose=1, shards=1, weight=None)[source]

Fit BART to continuous or binary outcomes with a single MCMC chain.

Python interface to R’s BART3::gbart. Same parameters as mc_gbart, but the fit runs in the current R process: mc_cores (defaulting to 1 here) is only recorded in the chains attribute, and nice, seed and meta are ignored; seed the fit through R’s set.seed instead.

R documentation

title
-----

Generalized BART for continuous and binary outcomes

name
----

gbart

alias
-----

mc.gbart

keyword
-------

nonlinear

description
-----------

 BART is a Bayesian  sum-of-trees  model.
 For a numeric response  y , we have
 y = f(x) + \epsilon y = f(x) + e ,
 where  \epsilon \sim N(0,\sigma^2) e ~ N(0,sigma^2) .

 f  is the sum of many tree models.
 The goal is to have very flexible inference for the uknown
 function  f .

 In the spirit of  ensemble models ,
 each tree is constrained by a prior to be a weak learner
 so that it contributes a small amount to the overall fit.


usage
-----


 gbart(
       x.train, y.train,
       x.test=matrix(0,0,0), type='wbart',
       ntype=as.integer(
           factor(type, levels=c('wbart', 'pbart', 'lbart'))),
       sparse=FALSE, theta=0, omega=1,
       a=0.5, b=1, augment=FALSE, rho=0, grp=NULL, varprob=NULL,
       xinfo=matrix(0,0,0), usequants=FALSE,
       rm.const=TRUE,
       sigest=NA, sigdf=3, sigquant=0.90,
       k=2, power=2, base=0.95,
       impute.mult=NULL, impute.prob=NULL,
       impute.miss=NULL,
        %sigmaf=NA,
       lambda=NA, tau.num=c(NA, 3, 6)[ntype],  %tau.interval=0.9973,
       offset=NULL, w=rep(1, length(y.train)),
       ntree=c(200L, 50L, 50L)[ntype], numcut=100L,
        %ntree=200L, numcut=100L,
       ndpost=1000L, nskip=100L,  %keepevery=1L,
       keepevery=c(1L, 10L, 10L)[ntype],
       printevery=100L, transposed=FALSE,
       probs=c(0.025, 0.975),
       mc.cores = 1L, ## mc.gbart only
       nice = 19L,    ## mc.gbart only
       seed = 99L,    ## mc.gbart only
       meta = FALSE,  ## mc.gbart only
       TSVS = FALSE,  ## gbart only
       verbose = 1L, shards = 1L, weight=rep(NA, shards)
 )

 mc.gbart(
          x.train=matrix(0,0,0), y.train=NULL,
          x.test=matrix(0,0,0), type='wbart',
          ntype=as.integer(
              factor(type, levels=c('wbart', 'pbart', 'lbart'))),
          sparse=FALSE, theta=0, omega=1,
          a=0.5, b=1, augment=FALSE, rho=0, grp=NULL, varprob=NULL,
          xinfo=matrix(0,0,0), usequants=FALSE,
          rm.const=TRUE,
          sigest=NA, sigdf=3, sigquant=0.90,
          k=2, power=2, base=0.95,
          impute.mult=NULL, impute.prob=NULL,
          impute.miss=NULL,
           %sigmaf=NA,
          lambda=NA, tau.num=c(NA, 3, 6)[ntype],  %tau.interval=0.9973,
          offset=NULL, w=rep(1, length(y.train)),
           %ntree=200L, numcut=100L,
          ntree=c(200L, 50L, 50L)[ntype], numcut=100L,
          ndpost=1000L, nskip=100L,  %keepevery=1L,
          keepevery=c(1L, 10L, 10L)[ntype],
          printevery=100L, transposed=FALSE,
          probs=c(0.025, 0.975),
          mc.cores = getOption('mc.cores', 2L), ## mc.gbart only
          nice = 19L,  ## mc.gbart only
          seed = 99L,  ## mc.gbart only
          meta = FALSE,## mc.gbart only
          TSVS = FALSE,  ## gbart only
          verbose = 1L, shards = 1L, weight=rep(NA, shards)
 )



arguments
---------



    x.train  Explanatory variables for training (in sample)
     data.  May be a matrix or a data frame, with (as usual) rows
     corresponding to observations and columns to variables.  If a
     variable is a factor in a data frame, it is replaced with dummies.
     Note that  q  dummies are created if  q>2  and one dummy
     created if  q=2  where  q  is the number of levels of the
     factor.   gbart  will generate draws of  f(x)  for each
      x  which is a row of  x.train .

     y.train
    Continuous or binary dependent variable for training (in sample) data.
 If  y  is numeric, then a continuous BART model is fit (Normal errors).
 If  y  is binary (has only 0's and 1's), then a binary BART model
 with a probit link is fit by default: you can over-ride the default via the
 argument  type  to specify a logit BART model.


     x.test  Explanatory variables for test (out of sample)
    data. Should have same structure as  x.train .
     gbart  will generate draws of  f(x)  for each  x  which
    is a row of  x.test .

   type  You can use this argument to specify the type of fit.
     'wbart'  for continuous BART,  'pbart'  for probit BART or
     'lbart'  for logit BART.

   ntype  The integer equivalent of  type  where
    'wbart'  is 1,  'pbart'  is 2 and
    'lbart'  is 3.
   %\item{rfinit}{ Whether to initialize BART with a greedy RandomForest
   %  fit: the default is \code{FALSE}.}
     sparse Whether to perform variable selection based on a
      sparse Dirichlet prior rather than simply uniform; see Linero 2016.
     theta Set  theta  parameter; zero means random.
     omega Set  omega  parameter; zero means random.
     a Sparse parameter for  Beta(a, b)  prior:
       0.5<=a<=1  where lower values inducing more sparsity.
     b Sparse parameter for  Beta(a, b)  prior; typically,
       b=1 .

     augment Whether data augmentation is to be performed in sparse
      variable selection.
     rho A multiplier for the inverse weights of the Dirichlet prior
      arguments.  For sparsity,  rho=p  where  p  is the
      number of covariates under consideration: the default,  rho=0
      means  rho=p  (computed by  rho=sum(1/grp) ). For more sparsity,
       rho<p , set this argument manually.  See also  grp .
     grp A vector of inverse weights for the Dirichlet prior arguments.
      If all the variables are continuous, then  grp  is a vector of 1s.
      However, for categorical variables (like factors in a data.frame), the
    inverse weights are the number of categories.  See  bartModelMatrix
    for the details of the default automated derivation when  grp=NULL .

     varprob  You initialize the variable selection probability:
      defaults to  NULL  that means  1/p .

     xinfo  You can provide the cutpoints to BART or let BART
      choose them for you.  To provide them, use the  xinfo
      argument to specify a list (matrix) where the items (rows) are the
      covariates and the contents of the items (columns) are the
      cutpoints.

     usequants  If  usequants=FALSE , then the
     cutpoints in  xinfo  are generated uniformly; otherwise,
     if  TRUE , uniform quantiles are used for the cutpoints.

     rm.const  Whether or not to remove constant variables.

     sigest  The prior for the error variance
    ( sigma^2 sigma\^2 ) is inverted chi-squared (the standard
    conditionally conjugate prior).  The prior is specified by choosing
    the degrees of freedom, a rough estimate of the corresponding
    standard deviation and a quantile to put this rough estimate at.  If
     sigest=NA  then the rough estimate will be the usual least squares
    estimator.  Otherwise the supplied value will be used.
    Not used if  y  is binary.


     sigdf
    Degrees of freedom for error variance prior.
    Not used if  y  is binary.


     sigquant  The quantile of the prior that the rough estimate
    (see  sigest ) is placed at.  The closer the quantile is to 1, the more
    aggresive the fit will be as you are putting more prior weight on
    error standard deviations ( sigma ) less than the rough
    estimate.  Not used if  y  is binary.

     k  For numeric  y ,  k  is the number of prior
      standard deviations  E(Y|x) = f(x)  is away from +/-0.5.
       %The response, code{y.train}, is internally scaled to range from
       %-0.5 to 0.5.
      For binary  y ,  k  is the number of prior standard
    deviations  f(x)  is away from +/-3.  The bigger  k  is, the more
    conservative the fitting will be.

     power
    Power parameter for tree prior.


     base
    Base parameter for tree prior.

   impute.mult  A vector of the columns of  x.train
    which are multinomial indicators that require imputation:
    the default is  NULL .
   impute.prob  A matrix of probabilities for the
    multinomial indicators that require imputation:
    the default is  NULL .
   impute.miss  A vector of missing indicators for
    the multinomial indicators that require imputation:
    the default is  NULL .
     %% \item{sigmaf}{
     %% The SD of \eqn{f}.  Not used if \eqn{y} is binary.
     %% }

     lambda
    The scale of the prior for the variance.  If  lambda  is zero,
      then the variance is to be considered fixed and known at the given
      value of  sigest .  Not used if  y  is binary.


   tau.num  The numerator in the  tau  definition, i.e.,
     tau=tau.num/(k*sqrt(ntree)) .
     %% \item{tau.interval}{
     %%   The width of the interval to scale the variance for the terminal
     %%   leaf values.  Only used if \eqn{y} is binary.}

     offset  Continous BART operates on  y.train  centered by
     offset  which defaults to  mean(y.train) .  With binary
    BART, the centering is  P(Y=1 | x) = F(f(x) + offset)  where
     offset  defaults to  F^{-1}(mean(y.train)) .  You can use
    the  offset  parameter to over-ride these defaults.

     w  Vector of weights which multiply the standard deviation.
    Not used if  y  is binary.

     ntree
    The number of trees in the sum.


     numcut  The number of possible values of  c  (see
     usequants ).  If a single number if given, this is used for all
    variables.  Otherwise a vector with length equal to
     ncol(x.train)  is required, where the  i^{th} i^th
    element gives the number of  c  used for the  i^{th} i^th
    variable in  x.train .  If usequants is false, numcut equally
    spaced cutoffs are used covering the range of values in the
    corresponding column of  x.train .  If  usequants  is true, then
     min(numcut, the number of unique values in the corresponding
    columns of x.train - 1)  values are used.

     ndpost
    The number of posterior draws returned.


     nskip
    Number of MCMC iterations to be treated as burn in.


     printevery
    As the MCMC runs, a message is printed every printevery draws.


     keepevery
    Every keepevery draw is kept to be returned to the user.
     %% A \dQuote{draw} will consist of values of the error standard deviation (\eqn{\sigma}{sigma})
     %% and \eqn{f^*(x)}{f*(x)}
     %% at \eqn{x} = rows from the train(optionally) and test data, where \eqn{f^*}{f*} denotes
     %% the current draw of \eqn{f}.


     transposed
    When running  gbart  in parallel, it is more memory-efficient
    to transpose  x.train  and  x.test , if any, prior to
    calling  mc.gbart .
     probs  The lower and upper quantiles to summarize:
      the default is  c(0.025, 0.975) .

   %% \item{hostname}{
   %%   When running on a cluster occasionally it is useful
   %%   to track on which node each chain is running; to do so
   %%   set this argument to \code{TRUE}.
   %% }

      seed
      Setting the seed required for reproducible MCMC.


     mc.cores
      Number of cores to employ in parallel.


     nice
      Set the job niceness.  The default
      niceness is 19: niceness goes from 0 (highest) to 19 (lowest).

     TSVS  If  TRUE , then avoid unnecessary calculations
      for speed:  gbart  only.
     verbose  If set to  0L , then compute silently.
     shards  For the Modified LISA method, this is the number of
      shards: the default is  1L .
     weight  For the Modified LISA method, this is a vector of
      weights to combine the shards: the default is  rep(NA, shards) .
     meta  Whether or not to produce meta-analysis-like
      estimates from a sharded analysis (as opposed to a Modified LISA
      approach): default is  FALSE .


details
-------


    BART is a Bayesian MCMC method.
    At each MCMC interation, we produce a draw from the joint posterior
     (f,\sigma) | (x,y) (f,sigma) \| (x,y)  in the numeric  y  case
    and just  f  in the binary  y  case.

    Thus, unlike a lot of other modelling methods in R, we do not produce
    a single model object from which fits and summaries may be extracted.
    The output consists of values  f^*(x) f*(x)  (and
     \sigma^* sigma*  in the numeric case) where * denotes a
    particular draw.  The  x  is either a row from the training data,
     x.train  or the test data,  x.test .

    For  x.train / x.test  with missing data elements,  gbart
    will singly impute them with hot decking. For one or more missing
    covariates, record-level hot-decking imputation  deWaPann11  is
    employed that is biased towards the null, i.e., nonmissing values
    from another record are randomly selected regardless of the
    outcome. Since  mc.gbart  runs multiple  gbart  threads in
    parallel,  mc.gbart  performs multiple imputation with hot
    decking, i.e., a separate imputation for each thread.  This
    record-level hot-decking imputation is biased towards the null, i.e.,
    nonmissing values from another record are randomly selected
    regardless of  y.train .



value
-----


     %% The \code{plot} method sets mfrow to c(1,2) and makes two plots.\cr
     %% The first plot is the sequence of kept draws of \eqn{\sigma}{sigma}
     %% including the burn-in draws.  Initially these draws will decline as BART finds fit
     %% and then level off when the MCMC has burnt in.\cr
     %% The second plot has \eqn{y} on the horizontal axis and posterior intervals for
     %% the corresponding \eqn{f(x)} on the vertical axis.

     gbart  returns an object of type  gbart  which is
    essentially a list.  % assigned class \sQuote{bart}.
    In the numeric  y  case, the list has components:

     offset The data centering value for the BART prior.
     x.train The training data returned with any updates
      due to missing value imputation, factor expansion, etc.
     yhat.train
    A matrix with  ndpost  rows and  nrow(x.train)  columns.
    Each row corresponds to a draw  f^* f*  from the posterior of  f
    and each column corresponds to a row of x.train.
    The  (i,j)  value is  f^*(x) f*(x)  for the  i^{th} i\^th  kept draw of  f
    and the  j^{th} j\^th  row of x.train.
    Burn-in is dropped.

     yhat.test Same as yhat.train but now with  x.test  data.
     yhat.*.mean mean of  yhat.train/test  fit.
     yhat.*.lower lower quantile of  yhat.train/test  fit.
     yhat.*.upper upper quantile of  yhat.train/test  fit.
     prob.train/test produced for dichotomous outcomes.
     prob.*.mean mean of  prob.train/test  fit.
     prob.*.lower lower quantile of  prob.train/test  fit.
     prob.*.upper upper quantile of  prob.train/test  fit.
     sigma all draws of sigma including burn-in.
     sigma. only kept draws of sigma with burn-in discarded.
     sigest
    The rough error standard deviation ( \sigma sigma ) used in the prior.

   treedraws A list containing the tree draws.
     varcount matrix with  ndpost  rows and  nrow(x.train)  columns.
    Each row is for a draw. For each variable (corresponding to the columns),
    the total count of the number of times
    that variable is used in a tree decision rule (over all trees) is given.
     varprob instead of counts, this is the probability that each
    variable is chosen.
   var*.mean The  varcount/prob  mean of its posterior.
   accept the accept probability from Metropolis-Hastings step
    within Gibbs.
   chains The number of MCMC chains.
   grp The grouping variable for grouped variables with the
    DART sparse prior.
   proc.time The time elapsed as returned by  proc.time() .
   LPML The Log Pseudo-Marginal Likelihood.  Beware for
    nonparametric models like BART, this quantity can be unstable.


seealso
-------


 bartModelMatrix


examples
--------


 ##simulate data (example from Friedman MARS paper)
 f = function(x){
 10*sin(pi*x[,1]*x[,2]) + 20*(x[,3]-.5)^2+10*x[,4]+5*x[,5]
 }
 n = 100      ##number of observations
 set.seed(99)
 x=matrix(runif(n*10),n,10) ##10 variables, only first 5 matter
 y=f(x)+rnorm(n)

 ##test BART with token run to ensure installation works
 set.seed(99)
 bartFit = gbart(x,y,nskip=5,ndpost=5)


 ##run BART
 set.seed(99)
 bartFit = gbart(x,y)
LPML: float | None = None

Log pseudo-marginal likelihood; None without burn-in. Unstable for BART.

impute_miss: Int32[ndarray, 'n'] | None = None

Missingness indicator of each training row (multinomial imputation only).

predict(newdata, *, mc_cores=None, openmp=None, mult_impute=None, seed=None, mu=None, probs=None, dodraws=None, nice=None)[source]

Compute predictions at new covariate points.

Python interface to R’s predict method for the fit, dispatched on the fit type. For continuous (‘wbart’) fits the result is the matrix of posterior latent-function draws (their mean with dodraws=False); for binary (‘pbart’/’lbart’) fits R returns a list, exposed here as a PredictBinary dict. Arguments left to None are omitted from the R call, so R computes its own defaults, described below; R rejects the arguments marked for specific fit types when used with the others.

Parameters:
  • newdata (Float64[ndarray, 'm p'] | DataFrame) – Covariates to predict at; rows are observations, with one column per kept x_train column (see rm_const). A dataframe’s factor columns are expanded into indicator columns.

  • mc_cores (int | None, default: None) – Number of OpenMP threads or forked R processes (see openmp) computing the predictions; default R’s mc.cores option, or 1.

  • openmp (bool | None, default: None) – Whether mc_cores counts OpenMP threads rather than forked R processes; default whether BART3 was compiled with OpenMP.

  • mult_impute (int | None, default: None) – Number of hot-deck imputations averaged over when newdata has missing values; default 4. Not accepted by ‘lbart’ fits.

  • seed (int | None, default: None) – Seed set in R before imputing missing values (default 99). ‘wbart’ fits only (‘pbart’ accepts but ignores it).

  • mu (float | None, default: None) – Value added to the function draws in place of the fit’s offset. ‘wbart’ fits only.

  • probs (tuple[float, float] | None, default: None) – Lower and upper quantiles of the prob_test_lower/_upper summaries; default (0.025, 0.975). Binary fits only.

  • dodraws (bool | None, default: None) – Whether to return the posterior draws (the default) rather than only their mean. ‘wbart’ fits only.

  • nice (int | None, default: None) – Unix niceness of the forked processes, from 0 (highest priority) to 19 (lowest, the default); ignored unless forking.

Returns:

Float64[ndarray, 'ndpost m'] | Float64[ndarray, 'm'] | PredictBinary – The function draws at newdata for continuous fits (their mean with dodraws=False), or a PredictBinary dict for binary fits.

Notes

The R arguments cutpoints and trees (fallbacks for fits missing treedraws, which gbart fits always carry) and transposed (a pre-transposed newdata cannot pass the method’s own column-count check) are not exposed.

prob_test: None | Float64[ndarray, 'ndpost m'] = None

Test-point success-probability draws (binary outcomes only).

prob_test_lower: Float64[ndarray, 'm'] | None = None

Lower probs quantile of prob_test (default 2.5%).

prob_test_mean: None | Float64[ndarray, 'm'] = None

Posterior mean of prob_test.

prob_test_upper: Float64[ndarray, 'm'] | None = None

Upper probs quantile of prob_test (default 97.5%).

prob_train: None | Float64[ndarray, 'ndpost n'] = None

Training-point success-probability draws (binary outcomes only).

prob_train_mean: None | Float64[ndarray, 'n'] = None

Posterior mean of prob_train.

sigest: float | None = None

Rough residual SD used to set the sigma prior (continuous only).

None for binary outcomes; nan when the mc.gbart mc_cores > 1 bug overwrites it with a logical missing value.

sigma_: Float64[ndarray, 'ndpost'] | None = None

Kept sigma draws with burn-in dropped; None without burn-in.

sigma_mean: float | None = None

Mean of sigma_; falls back to sigest when no draws are kept.

x_test: Float64[ndarray, 'm <=p'] | None = None

Test design matrix as used (imputed, factors expanded, constant columns dropped).

yhat_test_lower: Float64[ndarray, 'm'] | None = None

Lower probs quantile of yhat_test (default 2.5%, continuous only).

yhat_test_mean: Float64[ndarray, 'm'] | None = None

Posterior mean of yhat_test.

yhat_test_upper: Float64[ndarray, 'm'] | None = None

Upper probs quantile of yhat_test (default 97.5%, continuous only).

yhat_train_lower: Float64[ndarray, 'n'] | None = None

Lower probs quantile of yhat_train (default 2.5%, continuous only).

yhat_train_mean: Float64[ndarray, 'n'] | None = None

Posterior mean of yhat_train.

yhat_train_upper: Float64[ndarray, 'n'] | None = None

Upper probs quantile of yhat_train (default 97.5%, continuous only).

chains: int

Number of MCMC chains, i.e. the mc_cores actually used.

grp: Float64[ndarray, 'p']

Group index of each column for the sparse (DART) variable-selection prior.

ndpost: int

Number of posterior draws kept, after burn-in and thinning.

offset: float

Data centering value for the response (link scale for binary).

proc_time: ProcTime

Timing of the fit, from R’s proc.time.

rho: float

Concentration of the sparse (DART) prior; defaults to sum(1/grp).

rm_const: Int32[ndarray, '<=p']

0-based indices of the x_train columns kept (constant columns dropped).

treedraws: TreeDraws

Sampled trees, as a per-variable cutpoint grid and the serialized ensemble.

varcount: Int32[ndarray, 'ndpost p']

Per-draw count of splits on each variable, summed over trees.

varcount_mean: Float64[ndarray, 'p']

Posterior mean of varcount per variable.

varprob: Float64[ndarray, 'ndpost p']

Per-draw probability assigned to each variable for splitting.

varprob_mean: Float64[ndarray, 'p']

Posterior mean of varprob per variable.

x_train: Float64[ndarray, 'n <=p']

Training design matrix as used (original scale, not binned; constant columns dropped).

yhat_test: Float64[ndarray, 'ndpost m']

Test-point posterior function draws (latent scale for binary).

Always present: R’s cgbart allocates it unconditionally, so without test data it is an empty (ndpost, 0) array rather than None (unlike the derived yhat_test_mean/yhat_test_lower/yhat_test_upper, which R only fills when test data is given).

yhat_train: Float64[ndarray, 'ndpost n']

Training-point posterior function draws (latent scale for binary).

accept: Float64[ndarray, 'nskip+ndpost*keepevery']

Per-iteration Metropolis-Hastings acceptance rate (every MCMC iteration).

sigma: Float64[ndarray, 'nskip+ndpost'] | None = None

Error-SD draws including burn-in (continuous only).