bartz.testing.DGP

class bartz.testing.DGP(x, y, z, mulin_shared, mulin_separate, mulin, muquad_shared, muquad_separate, muquad, mu, error_scale, params)[source]

Output of gen_data / gen_data_from_params: sampled data and parameters.

See Params for the definition of the generative model. The _shared fields are the lambda_=1 limit (common across components), the _separate fields are the lambda_=0 limit (independent across components), and the plain names are the realized mix at the sampled params.lambda_.

x: Float[Array, 'p n']

Predictors of shape (p, n), drawn i.i.d. from the standardized family params.x_distr.

y: Float[Array, 'k n'] | Float[Array, 'n']

Noisy outcomes of shape (k, n), or (n,) if gen_data was called with k=None.

z: Float[Array, 'k n'] | Float[Array, 'n']

Latent outcomes (the Z of Params) of shape (k, n), or (n,) if gen_data was called with k=None. y equals z for continuous components and thresholds it at 0 for binary ones.

mulin_shared: Float[Array, 'n']

Shared linear mean of shape (n,).

mulin_separate: Float[Array, 'k n'] | None

Separate linear mean of shape (k, n), rows independent. None in univariate mode (k is None).

mulin: Float[Array, 'k n'] | Float[Array, 'n']

Linear part of the latent mean of shape (k, n), or (n,) in univariate mode (k is None, equal to mulin_shared).

muquad_shared: Float[Array, 'n']

Shared quadratic mean of shape (n,).

muquad_separate: Float[Array, 'k n'] | None

Separate quadratic mean of shape (k, n), rows independent. None in univariate mode (k is None).

muquad: Float[Array, 'k n'] | Float[Array, 'n']

Quadratic part of the latent mean of shape (k, n), or (n,) in univariate mode (k is None, equal to muquad_shared).

mu: Float[Array, 'k n'] | Float[Array, 'n']

Latent mean mulin + muquad + params.offset[..., None] of shape (k, n), or (n,) in univariate mode (k is None).

error_scale: Float[Array, 'n'] | Float[Array, 'k n'] | None

Per-datapoint error standard-deviation scale (the W of Params), suitable as the error_scale argument of bartz.mcmcstep.init. Shape (n,) for het_shape='scalar', (k, n) for 'vector', None when homoskedastic.

params: Params

DGP parameters, see Params.

split(n_train=None)[source]

Split the data into training and test sets.

Parameters:

n_train (int | None, default: None) – Number of training observations. If None, split in half.

Returns:

tuple[DGP, DGP] – Two DGP object with the train and test splits.

quantize(max_bins=256)[source]

Quantize the predictors into the format expected by bartz.mcmcstep.init.

Parameters:

max_bins (int, default: 256) – Maximum number of levels per predictor.

Returns:

QuantizedData – A QuantizedData with the quantized predictors, y and max_split.