rbartpackages.BART.bartModelMatrix

class rbartpackages.BART.bartModelMatrix(X, numcut=0, *, usequants=False, type=7, rm_const=False, cont=False, xinfo=None)[source]

Convert covariates to a matrix and compute the BART cutpoints.

Python interface to R’s BART::bartModelMatrix. With the default numcut=0 the constructor returns the bare design matrix instead of a class instance; otherwise the instance carries the matrix together with the cutpoints metadata.

Parameters:
  • X (Float64[ndarray, 'N p'] | DataFrame) – The covariates to convert; rows are observations. A dataframe’s factor columns are expanded into indicator columns.

  • numcut (int, default: 0) – Maximum number of cutpoints per variable; 0 means return the bare matrix without computing cutpoints.

  • usequants (bool, default: False) – Whether the cutpoints are quantiles of the data rather than uniformly spaced over its range.

  • type (int, default: 7) – The quantile algorithm used with usequants (see R’s quantile).

  • rm_const (bool, default: False) – Whether to remove the constant columns from X (they are flagged in rm_const either way).

  • cont (bool, default: False) – Whether to treat all variables as continuous, spacing numcut cutpoints over the range even when fewer unique values would do.

  • xinfo (Float64[ndarray, 'p numcut'] | None, default: None) – Cutpoints to use, one row per variable; overrides the computed ones.

R documentation

title
-----

Create a matrix out of a vector or data.frame

name
----

bartModelMatrix

alias
-----

bartModelMatrix

keyword
-------

utilities

description
-----------

   The external BART functions operate on matrices in memory.  Therefore,
   if the user submits a vector or data.frame, then this function converts
   it to a matrix.  Also, it determines the number of cutpoints necessary
   for each column when asked to do so.


usage
-----


 bartModelMatrix(X, numcut=0L, usequants=FALSE, type=7,
                 rm.const=FALSE, cont=FALSE, xinfo=NULL)


arguments
---------


    X A vector or data.frame to create the matrix from.
    numcut The maximum number of cutpoints to consider.
     If  numcut=0 , then just return a matrix; otherwise,
     return a list containing a matrix  X , a vector  numcut
     and a list  xinfo .
    usequants  If  usequants  is  FALSE , then the
     cutpoints in  xinfo  are generated uniformly; otherwise,
     if  TRUE , then quantiles are used for the cutpoints.
    type  Determines which quantile algorithm is employed.
    rm.const  Whether or not to remove constant variables.
    cont  Whether or not to assume all variables are continuous.
    xinfo  You can provide the cutpoints to BART or let BART
      choose them for you.  To provide them, use the  xinfo
      argument to specify a list (matrix) where the items (rows) are the
      covariates and the contents of the items (columns) are the
      cutpoints.



seealso
-------


     class.ind


examples
--------



 set.seed(99)

 a <- rbinom(10, 4, 0.4)

 table(a)

 x <- runif(10)

 df <- data.frame(a=factor(a), x=x)

 b <- bartModelMatrix(df)

 b

 b <- bartModelMatrix(df, numcut=9)

 b

 b <- bartModelMatrix(df, numcut=9, usequants=TRUE)

 b


     f <- bartModelMatrix(as.character(a))
X: Float64[ndarray, 'N p']

Design matrix, with vectors and data frames coerced to numeric and factors expanded to indicators.

numcut: Int32[ndarray, 'p']

Number of cutpoints chosen per column.

rm_const: Int32[ndarray, '<=p']

0-based indices of the non-constant columns of the expanded design.

The indices refer to the columns of X before removal: rm.const=True removes the constant columns from X, numcut and xinfo, while the default only detects them.

xinfo: Float64[ndarray, 'p numcut']

Per-column cutpoint grid, NaN-padded to the maximum cut count.

grp: Int32[ndarray, 'p'] | Float64[ndarray, '1'] | None

1-based input-column index each output column comes from (factors expand to one indicator column per level); None for matrix input.