rbartpackages.dbarts.dbartsControl

class rbartpackages.dbarts.dbartsControl(*, verbose=False, keepTrainingFits=True, useQuantiles=False, keepTrees=False, n_samples=None, n_cuts=100, n_burn=200, n_trees=75, n_chains=4, n_threads=None, n_thin=1, printEvery=100, printCutoffs=0, rngKind='default', rngNormalKind='default', rngSeed=None, updateState=True)[source]

Configure a dbarts sampler.

Python interface to R’s dbarts::dbartsControl, which bundles the sampler settings into an R S4 object with no components exposed; pass it as the control argument of dbarts, which also hands it back through the dbarts.control property. Arguments left to None are omitted from the R call, so R computes its own defaults, described below.

Parameters:
  • verbose (bool, default: False) – Whether the sampler prints to the R console as it runs.

  • keepTrainingFits (bool, default: True) – Whether the training-point fits are returned when the sampler runs; they are always computed, so disabling only drops them from the output.

  • useQuantiles (bool, default: False) – Whether the tree decision rules use empirical quantiles of each predictor rather than values spaced uniformly over its range.

  • keepTrees (bool, default: False) – Whether the sampled trees are cached as drawn (n_trees * n_samples of them), which dbarts.predict and bart tree extraction require; memory-intensive.

  • n_samples (int | None, default: None) – Default number of samples returned per run; usually set through dbarts and overridable per dbarts.run.

  • n_cuts (int | Integer[ndarray, 'p'], default: 100) – Number of decision rules per predictor (a scalar recycled over the predictors, or one value each); fewer may be used for a predictor with few unique values.

  • n_burn (int, default: 200) – Number of samples discarded at the start of a run.

  • n_trees (int, default: 75) – Number of trees in the sum-of-trees.

  • n_chains (int, default: 4) – Number of independent chains.

  • n_threads (int | None, default: None) – Number of threads for internal calculations and chains; default the detected core count. Single-threaded is often faster below ~10k observations.

  • n_thin (int, default: 1) – Number of tree-only iterations between recorded samples (thinning).

  • printEvery (int, default: 100) – Interval, in post-thinning samples, of the progress messages (with verbose).

  • printCutoffs (int, default: 0) – Number of a variable’s decision rules printed in verbose mode.

  • rngKind (str, default: 'default') – Random-number-generator kind, as in R’s set.seed.

  • rngNormalKind (str, default: 'default') – Random-number-generator normal kind, as in R’s set.seed.

  • rngSeed (int | None, default: None) – Random-number-generator seed; None (R’s NA) seeds from the clock when applicable.

  • updateState (bool, default: True) – Default for whether the methods refresh the object’s cached state, which is only needed to save/load a sampler.

R documentation

title
-----

Discrete Bayesian Additive Regression Trees Sampler Control

name
----

dbartsControl

alias
-----

dbartsControl

description
-----------

   Convenience function to create a control object for use with a  dbarts  sampler.


usage
-----


 dbartsControl(
     verbose = FALSE, keepTrainingFits = TRUE, useQuantiles = FALSE,
     keepTrees = FALSE, n.samples = NA_integer_,
     n.cuts = 100L, n.burn = 200L, n.trees = 75L, n.chains = 4L,
     n.threads = dbarts::guessNumCores(), n.thin = 1L, printEvery = 100L,
     printCutoffs = 0L,
     rngKind = "default", rngNormalKind = "default", rngSeed = NA_integer_,
     updateState = TRUE)


arguments
---------


     verbose
      Logical controlling sampler output to console.

     keepTrainingFits
      Logical controlling whether or not training fits are returned when the sampler runs. These are always computed as part of the fitting procedure, so disabling will not substantially impact running time.

     useQuantiles
      Logical to determine if the empirical quantiles of a columns of predictors should be used to determine the tree decision rules. If  FALSE , the rules are spaced uniformly throughout the range of covariate values.

     keepTrees
      A logical that determines whether or not trees are cached as they are sampled. In all cases, the current state of the sampler is stored as a single set of  n.trees . When  keepTrees  is  TRUE , a set of  n.trees * n.samples  trees are set aside and populated as the sampler runs. If the sampler is stopped and restarted, samples proceed from the previously stored tree, looping over if necessary.

     n.samples
      A non-negative integer giving the default number of samples to return each time the sampler is run. Generally specified by  dbarts  instead, and can be overridden on a per-use basis whenever the sampler is  run .

     n.cuts
      A positive integer or integer vector giving the number of decision rules to be used for each given predictor. If of length less than the number of predictors, earlier values are recycled. If for any predictor more values are specified than are coherent, fewer may be used. See details for more information.

     n.burn
      A non-negative integer determining how many samples, if any, are thrown away at the beginning of a run of the sampler.

     n.trees
      A positive integer giving the number of trees used in the sum-of-trees formulation.

     n.chains
      A positive integer detailing the number of independent chains for the sampler to use.

     n.threads
      A positive integer controlling how many threads will be used for various internal calculations, as well as the number of chains. Internal calculations are highly optimized so that single-threaded performance tends to be superior unless the number of observations is very large (>10k), so that it is often not necessary to have the number of threads exceed the number of chains.

     n.thin
      A positive integer determining how many iterations the MCMC chain should jump on the decision trees alone before recording a sample. Serves to  thin  the samples against serial correlation.  n.samples  are returned regardless of the value of  n.thin .

     printEvery
      If  verbose  is  TRUE , every  printEvery  potential samples (after thinning) will issue a verbal statement. Must be a positive integer.

     printCutoffs
      A non-negative integer specifying how many of the decision rules for a variable are printed in verbose mode.

     rngKind
      Random number generator kind, as used in  set.seed . For type  "default" , the built-in generator will be used if possible. Otherwise, will attempt to match the built-in generator's type. Success depends on the number of threads.

     rngNormalKind
      Random number generator normal kind, as used in  set.seed . For type  "default" , the built-in generator will be used if possible. Otherwise, will attempt to match the built-in generator's type. Success depends on the number of threads and the  rngKind .

     rngSeed
      Random number generator seed, as used in  set.seed . If the sampler is running single-threaded or has one chain, the behavior will be as any other sequential algorithm. If the sampler is multithreaded, the seed will be used to create an additional pRNG object, which in turn will be used sequentially seed the thread-specific pRNGs. If equal to  NA , the clock will be used to seed pRNGs when applicable.

     updateState
      Logical setting the default behavior for many  sampler  methods with regards to the immediate updating of the cached state of the object. A current, cached state is only useful when  saving / loading  the sampler.



value
-----


   An object of class  dbartControl .


seealso
-------


    dbarts