New interface¶
- class bartz.Bart(x_train, y_train, *, x_test=None, type='wbart', sparse=False, theta=None, a=0.5, b=1.0, rho=None, xinfo=None, usequants=False, rm_const=True, sigest=None, sigdf=3.0, sigquant=0.9, k=2.0, power=2.0, base=0.95, lamda=None, tau_num=None, offset=None, w=None, ntree=None, numcut=100, ndpost=1000, nskip=100, keepevery=None, printevery=100, num_chains=None, num_chain_devices=None, num_data_devices=None, devices=None, seed=0, maxdepth=6, init_kw=None, run_mcmc_kw=None)[source]¶
Nonparametric regression with Bayesian Additive Regression Trees (BART) [2].
Regress
y_trainonx_trainwith a latent mean function represented as a sum of decision trees. The inference is carried out by sampling the posterior distribution of the tree ensemble with an MCMC.- Parameters:
x_train (
Real[Array, 'p n']|DataFrame) – The training predictors.y_train (
Bool[Array, 'n']|Float32[Array, 'n']|Series) – The training responses.x_test (
Real[Array, 'p m']|DataFrame|None, default:None) – The test predictors.type (
Literal['wbart','pbart'], default:'wbart') – The type of regression. ‘wbart’ for continuous regression, ‘pbart’ for binary regression with probit link.sparse (
bool, default:False) – Whether to activate variable selection on the predictors as done in [1].theta (
float|Float[Any, '']|None, default:None)a (
float|Float[Any, ''], default:0.5)b (
float|Float[Any, ''], default:1.0)rho (
float|Float[Any, '']|None, default:None) –Hyperparameters of the sparsity prior used for variable selection.
The prior distribution on the choice of predictor for each decision rule is
\[(s_1, \ldots, s_p) \sim \operatorname{Dirichlet}(\mathtt{theta}/p, \ldots, \mathtt{theta}/p).\]If
thetais not specified, it’s a priori distributed according to\[\frac{\mathtt{theta}}{\mathtt{theta} + \mathtt{rho}} \sim \operatorname{Beta}(\mathtt{a}, \mathtt{b}).\]If not specified,
rhois set to the number of predictors p. To tune the prior, consider setting a lowerrhoto prefer more sparsity. If settingthetadirectly, it should be in the ballpark of p or lower as well.xinfo (
Float[Array, 'p n']|None, default:None) –A matrix with the cutpoins to use to bin each predictor. If not specified, it is generated automatically according to
usequantsandnumcut.Each row shall contain a sorted list of cutpoints for a predictor. If there are less cutpoints than the number of columns in the matrix, fill the remaining cells with NaN.
xinfoshall be a matrix even ifx_trainis a dataframe.usequants (
bool, default:False) – Whether to use predictors quantiles instead of a uniform grid to bin predictors. Ignored ifxinfois specified.rm_const (
bool|None, default:True) – How to treat predictors with no associated decision rules (i.e., there are no available cutpoints for that predictor). IfTrue(default), they are ignored. IfFalse, an error is raised if there are any. IfNone, no check is performed, and the output of the MCMC may not make sense if there are predictors without cutpoints. The optionNoneis provided only to allow jax tracing.sigest (
float|Float[Any, '']|None, default:None) – An estimate of the residual standard deviation ony_train, used to setlamda. If not specified, it is estimated by linear regression (with intercept, and without taking into accountw). Ify_trainhas less than two elements, it is set to 1. If n <= p, it is set to the standard deviation ofy_train. Ignored iflamdais specified.sigdf (
float|Float[Any, ''], default:3.0) – The degrees of freedom of the scaled inverse-chisquared prior on the noise variance.sigquant (
float|Float[Any, ''], default:0.9) – The quantile of the prior on the noise variance that shall matchsigestto set the scale of the prior. Ignored iflamdais specified.k (
float|Float[Any, ''], default:2.0) – The inverse scale of the prior standard deviation on the latent mean function, relative to half the observed range ofy_train. Ify_trainhas less than two elements,kis ignored and the scale is set to 1.power (
float|Float[Any, ''], default:2.0)base (
float|Float[Any, ''], default:0.95) – Parameters of the prior on tree node generation. The probability that a node at depthd(0-based) is non-terminal isbase / (1 + d) ** power.lamda (
float|Float[Any, '']|None, default:None) – The prior harmonic mean of the error variance. (The harmonic mean of x is 1/mean(1/x).) If not specified, it is set based onsigestandsigquant.tau_num (
float|Float[Any, '']|None, default:None) – The numerator in the expression that determines the prior standard deviation of leaves. If not specified, default to(max(y_train) - min(y_train)) / 2(or 1 ify_trainhas less than two elements) for continuous regression, and 3 for binary regression.offset (
float|Float[Any, '']|None, default:None) – The prior mean of the latent mean function. If not specified, it is set to the mean ofy_trainfor continuous regression, and toPhi^-1(mean(y_train))for binary regression. Ify_trainis empty,offsetis set to 0. With binary regression, ify_trainis allFalseorTrue, it is set toPhi^-1(1/(n+1))orPhi^-1(n/(n+1)), respectively.w (
Float[Array, 'n']|Series|None, default:None) – Coefficients that rescale the error standard deviation on each datapoint. Not specifyingwis equivalent to setting it to 1 for all datapoints. Note:wis ignored in the automatic determination ofsigest, so either the weights should be O(1), orsigestshould be specified by the user.ntree (
int|None, default:None) – The number of trees used to represent the latent mean function. By default 200 for continuous regression and 50 for binary regression.numcut (
int, default:100) –If
usequantsisFalse: the exact number of cutpoints used to bin the predictors, ranging between the minimum and maximum observed values (excluded).If
usequantsisTrue: the maximum number of cutpoints to use for binning the predictors. Each predictor is binned such that its distribution inx_trainis approximately uniform across bins. The number of bins is at most the number of unique values appearing inx_train, ornumcut + 1.Before running the algorithm, the predictors are compressed to the smallest integer type that fits the bin indices, so
numcutis best set to the maximum value of an unsigned integer type, like 255.Ignored if
xinfois specified.ndpost (
int, default:1000) – The number of MCMC samples to save, after burn-in.ndpostis the total number of samples across all chains.ndpostis rounded up to the first multiple ofmc_cores.nskip (
int, default:100) – The number of initial MCMC samples to discard as burn-in. This number of samples is discarded from each chain.keepevery (
int|None, default:None) – The thinning factor for the MCMC samples, after burn-in. By default, 1 for continuous regression and 10 for binary regression.printevery (
int|None, default:100) – The number of iterations (including thinned-away ones) between each log line. Set toNoneto disable logging. ^C interrupts the MCMC only everyprinteveryiterations, so with logging disabled it’s impossible to kill the MCMC conveniently.num_chains (
int|None, default:None) –The number of independent Markov chains to run. By default only one chain is run.
The difference between not specifying
num_chainsand setting it to 1 is that in the latter case in the object attributes and some methods there will be an explicit chain axis of size 1.num_chain_devices (
int|None, default:None) – The number of devices to spread the chains across. Must be a divisor ofnum_chains. Each device will run a fraction of the chains.num_data_devices (
int|None, default:None) –The number of devices to split datapoints across. Must be a divisor of
n. This is useful only with very highn, about > 1000_000.If both num_chain_devices and num_data_devices are specified, the total number of devices used is the product of the two.
devices (
Device|Sequence[Device] |None, default:None) – One or more devices used to run the MCMC on. If not specified, the computation will follow the placement of the input arrays. If a list of devices, this argument can be longer than the number of devices needed.seed (
int|Key[Array, ''], default:0) – The seed for the random number generator.maxdepth (
int, default:6) – The maximum depth of the trees. This is 1-based, so with the defaultmaxdepth=6, the depths of the levels range from 0 to 5.init_kw (
dict|None, default:None) – Additional arguments passed tobartz.mcmcstep.init.run_mcmc_kw (
dict|None, default:None) – Additional arguments passed tobartz.mcmcloop.run_mcmc.
- Variables:
offset (Float32[Array, '']) – The prior mean of the latent mean function.
sigest (Float32[Array, ''] | None) – The estimated standard deviation of the error used to set
lamda.yhat_test (Float32[Array, 'ndpost m'] | None) – The conditional posterior mean at
x_testfor each MCMC iteration.
References
- property ndpost[source]¶
The total number of posterior samples after burn-in across all chains.
May be larger than the initialization argument
ndpostif it was not divisible by the number of chains.
- property prob_test: Float32[Array, 'ndpost m'] | None[source]¶
The posterior probability of y being True at
x_testfor each MCMC iteration.
- property prob_test_mean: Float32[Array, 'm'] | None[source]¶
The marginal posterior probability of y being True at
x_test.
- property prob_train: Float32[Array, 'ndpost n'] | None[source]¶
The posterior probability of y being True at
x_trainfor each MCMC iteration.
- property prob_train_mean: Float32[Array, 'n'] | None[source]¶
The marginal posterior probability of y being True at
x_train.
- property sigma: Float32[Array, 'nskip+ndpost'] | Float32[Array, 'nskip+ndpost/mc_cores mc_cores'] | None[source]¶
The standard deviation of the error, including burn-in samples.
- property sigma_: Float32[Array, 'ndpost'] | None[source]¶
The standard deviation of the error, only over the post-burnin samples and flattened.
- property sigma_mean: Float32[Array, ''] | None[source]¶
The mean of
sigma, only over the post-burnin samples.
- property varcount: Int32[Array, 'ndpost p'][source]¶
Histogram of predictor usage for decision rules in the trees.
- property varprob: Float32[Array, 'ndpost p'][source]¶
Posterior samples of the probability of choosing each predictor for a decision rule.
- property varprob_mean: Float32[Array, 'p'][source]¶
The marginal posterior probability of each predictor being chosen for a decision rule.
- property yhat_test_mean: Float32[Array, 'm'] | None[source]¶
The marginal posterior mean at
x_test.Not defined with binary regression because it’s error-prone, typically the right thing to consider would be
prob_test_mean.
- property yhat_train: Float32[Array, 'ndpost n'][source]¶
The conditional posterior mean at
x_trainfor each MCMC iteration.
- property yhat_train_mean: Float32[Array, 'n'] | None[source]¶
The marginal posterior mean at
x_train.Not defined with binary regression because it’s error-prone, typically the right thing to consider would be
prob_train_mean.
- predict(x_test)[source]¶
Compute the posterior mean at
x_testfor each MCMC iteration.- Parameters:
x_test (
Real[Array, 'p m']|DataFrame) – The test predictors.- Returns:
Float32[Array, 'ndpost m']– The conditional posterior mean atx_testfor each MCMC iteration.- Raises:
ValueError – If
x_testhas a different format thanx_train.