bartz.prepcovars.UniqueQuantileBinner

class bartz.prepcovars.UniqueQuantileBinner(X, *, max_bins=256, max_subsample=100_000, key=None)[source]

Binner with quantile-based cutpoints from observed unique values.

For each predictor, cutpoints are placed between sorted unique values so that the empirical distribution is approximately uniform across bins. The number of cutpoints is at most max_bins - 1 and at most one less than the number of unique values, so different predictors may end up with different effective cutpoint counts. Trailing unused entries of the cutpoint matrix are padded with the maximum value representable in the dtype of X.

Note: the quantiles are over the unique values, not over the original distribution.

When n > max_subsample, the predictor matrix is randomly thinned along the observation axis to max_subsample columns before quantilization. Each predictor row is thinned independently and without replacement. This keeps quantilization tractable on very large datasets at the cost of approximate quantiles.

Parameters:
  • X (Real[Array, 'p n']) – Training predictors with p predictors and n observations.

  • max_bins (int, default: 256) – The maximum number of bins per predictor.

  • max_subsample (int | None, default: 100_000) – The maximum number of observations to use when computing quantiles. If None, no subsampling is performed. If n exceeds this, key is required.

  • key (Key[Array, ''] | None, default: None) – Random key for subsampling. Required when X.shape[1] > max_subsample; otherwise unused.

Raises:

ValueError – If subsampling would trigger but key is None.

max_split: UInt[Array, 'p']

The number of cutpoints actually used for each of the p predictors.

bin(X)[source]

Map predictors to bin indices using the cutpoints chosen at construction.

Parameters:

X (Real[Array, 'p n']) – A matrix with p predictors and n observations. Must have the same number of predictors as the training matrix passed to the constructor.

Returns:

UInt[Array, 'p n'] – Quantized X with minimal data type.