bartz.prepcovars.UniqueQuantileBinner¶
- class bartz.prepcovars.UniqueQuantileBinner(X, *, max_bins=256, max_subsample=100_000, key=None)[source]¶
Binner with quantile-based cutpoints from observed unique values.
For each predictor, cutpoints are placed between sorted unique values so that the empirical distribution is approximately uniform across bins. The number of cutpoints is at most
max_bins - 1and at most one less than the number of unique values, so different predictors may end up with different effective cutpoint counts. Trailing unused entries of the cutpoint matrix are padded with the maximum value representable in the dtype ofX.Note: the quantiles are over the unique values, not over the original distribution.
When
n > max_subsample, the predictor matrix is randomly thinned along the observation axis tomax_subsamplecolumns before quantilization. Each predictor row is thinned independently and without replacement. This keeps quantilization tractable on very large datasets at the cost of approximate quantiles.- Parameters:
X (
Real[Array, 'p n']) – Training predictors withppredictors andnobservations.max_bins (
int, default:256) – The maximum number of bins per predictor.max_subsample (
int|None, default:100_000) – The maximum number of observations to use when computing quantiles. IfNone, no subsampling is performed. Ifnexceeds this,keyis required.key (
Key[Array, '']|None, default:None) – Random key for subsampling. Required whenX.shape[1] > max_subsample; otherwise unused.
- Raises:
ValueError – If subsampling would trigger but
keyisNone.
- max_split: UInt[Array, 'p']¶
The number of cutpoints actually used for each of the
ppredictors.
- bin(X)[source]¶
Map predictors to bin indices using the cutpoints chosen at construction.
- Parameters:
X (
Real[Array, 'p n']) – A matrix withppredictors andnobservations. Must have the same number of predictors as the training matrix passed to the constructor.- Returns:
UInt[Array, 'p n']– QuantizedXwith minimal data type.