bedroc package

Submodules

bedroc.containers module

bedroc.core module

Core classes and functions

class bedroc.core.DataContainer(dataframe: DataFrame, *, name: str = 'data', feature_suffix: str = '_feature', feature_std_suffix: str = '_uncertainty', std_scale: float = 1.0, select_features: Iterable[str] | None = None, select_data: Iterable[Any] | None = None, data_column: str = 'ID')

Bases: object

A generic data container

Parameters:

dataframe – A dataframe with columns of feature values and their standard deviations
name – Data container name. Defaults to data.
feature_suffix – Suffix of feature value columns. Defaults to _feature.
feature_std_suffix – Suffix of feature standard deviation columns. Defaults to _uncertainty.
std_scale – Number of standard deviations represented by the uncertainty columns. For example, use 2.0 if the input uncertainties are reported as 2SE. Defaults to 1.0.
select_features – An optional iterable (tuple or list) of bare feature names (without feature_suffix) to select. Defaults to None to select all features.
select_data – An optional iterable (tuple or list) of data to select. Defaults to None to select all data.
data_column – Name of the data column used by select_data. Defaults to ID.

classmethod from_csv(filename_path: str | Path, **kwargs) → Self

Creates an instance from a CSV file.

Parameters:

filename_path – Path to the CSV file
**kwargs – Arbitrary keyword arguments for constructor

Returns:

An instance

classmethod from_excel(filename_path: str | Path, sheet_name: Any, **kwargs) → Self

Creates an instance from an Excel file.

Parameters:

filename_path – Path to the Excel file
sheet_name – Sheet name
**kwargs – Arbitrary keyword arguments for constructor

Returns:

An instance

property data_names: list[str]: Sample names

property feature_columns: Index: Index of feature columns

property feature_std_columns: Index: Index of feature uncertainty columns

property feature_names: Index: Index of feature names with the suffix removed

property n_data: int: Number of samples

property n_features: int: Number of features

_compute_scaling_means() → Series: Computes the feature means for scaling

_compute_scaling_stds() → Series: Computes the feature standard deviations for scaling

_compute_standardized_data() → DataFrame: Computes standardized data

get_dataframe(*, standardized: bool = True) → DataFrame: Returns standardized (default) or raw dataframe

get_destandardized_values(standardized_values: ndarray[tuple[Any, ...], dtype[float64]]) → ndarray[tuple[Any, ...], dtype[float64]]

Gets destandardized values.

Parameters:: standardized_values – Standardized values. Must have a shape of: (n_data, n_features) or (n_data, n_features, n_samples)
Returns:: Destandardized values with matching shape

get_feature_values(*, standardized: bool = True) → ndarray[tuple[Any, ...], dtype[float64]]

Returns standardized (default) or raw feature values

Parameters:: standardized – Whether to return standardized feature values. Defaults to True.
Returns:: Feature values

get_feature_stds(*, standardized: bool = True) → ndarray[tuple[Any, ...], dtype[float64]]

Returns standardized (default) or raw feature standard deviations

Parameters:: standardized – Whether to return standardized standard deviations. Defaults to True.
Returns:: Feature standard deviations

get_covariance_matrix(*, standardized: bool = True) → ndarray[tuple[Any, ...], dtype[float64]]

Gets the covariance matrix.

Parameters:: standardized – Whether to use standardized feature values. Defaults to True.
Returns:: Covariance matrix

plot_pearson_correlation_coefficient(*, standardized: bool = True) → Axes

Plots a heatmap of the Pearson correlation coefficient.

Parameters:: standardized – Whether to use standardized feature values. Defaults to True.
Returns:: Figure axes

train_test_split(test_size: float | None = 0.2, random_state: int | None = None, shuffle: bool = True, stratify: ArrayLike | None = None, *, standardized: bool = True) → dict[str, Any]

Splits the data into training and test sets.

Parameters:

test_size – Proportion of the dataset to include in the test split. Defaults to 0.2.
random_state – Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls. Defaults to None.
shuffle – Whether or not to shuffle the data before splitting. Defaults to True.
stratify – The target variable for stratification. Defaults to None.
standardized – Whether to use standardized feature values. Defaults to True.

Returns:

Dictionary containing train-test split

bedroc.core.trim_samples(samples: ndarray[tuple[Any, ...], dtype[_ScalarT]]) → ndarray[tuple[Any, ...], dtype[float64]]

Trims samples.

Parameters:: samples – Samples to trim
Returns:: Trimmed samples

bedroc.core.resolve_path(p: Traversable | Path) → Path

Resolve a Traversable or Path to a concrete filesystem path.

This function ensures that resources packaged using importlib.resources (e.g., files inside wheels or zipped packages) are converted into a real Path object. If p is already a Path, it is returned unchanged. Otherwise, the underlying resource is extracted to a temporary location and its path is returned.

Note

The temporary file extracted for Traversable objects is valid only for the duration of the context in which it is created. Since this function returns the resolved Path inside the context manager, the file is guaranteed to exist when the function returns.

Parameters:: p – A filesystem Path or an importlib.resources.Traversable object.
Returns:: A concrete filesystem path pointing to the resolved resource
Return type:: Path

bedroc.hierarchical module

Utilities for building and working with Bayesian hierarchical models

bedroc.hierarchical.RANDOM_SEED: int | None = 123: Random seed for reproducibility. Set to None for random behavior.

bedroc.hierarchical.SAVEFIG_KWARGS: dict[str, Any] = {'bbox_inches': 'tight', 'dpi': 300, 'format': 'pdf'}: Default savefig options

bedroc.hierarchical.get_coords(X: ndarray[tuple[Any, ...], dtype[float64]], X_group_idx: ndarray[tuple[Any, ...], dtype[int64]], *, sample_names: Iterable | None = None, feature_names: Iterable | None = None, group_names: Iterable | None = None) → dict[str, list]

Utility function to generate group and feature names with defaults.

Parameters:

X – Observations (n_samples, n_features)
X_group_idx – Group ID of observations (n_samples,)
sample_names – Sample names. Defaults to None to generate sequential sample names.
feature_names – Feature names. Defaults to None to generate sequential feature names.
group_names – Group names. Defaults to None to generate generic names.

Returns:

Dictionary of coordinates used for PyMC models

bedroc.hierarchical.zero_difference_model(X: ndarray[tuple[Any, ...], dtype[float64]], X_group_idx: ndarray[tuple[Any, ...], dtype[int64]], *, group_names: Iterable | None = None, feature_names: Iterable | None = None, X_sigma: ndarray[tuple[Any, ...], dtype[float64]] | None = None, draws: int = 2000, tune: int = 1000, target_accept: float = 0.95, random_seed: int | None = None) → tuple[Model, DataTree]

Model assuming no difference between two groups.

This model is a “null”-like version of the group-centric hierarchical model: it assumes that the feature-wise means of Group B are identical to those of Group A (i.e., delta = 0). Each feature has its own observation noise, shared across groups. Observations are modeled as independent given their feature means and noise.

Parameters:

X – Observations (n_samples, n_features)
X_group_idx – Group ID of observations, must be 0 or 1 (n_samples,)
group_names – Group names. Defaults to unique values in X_group_idx.
feature_names – Feature names. Defaults to sequential feature names.
X_sigma – Sigma of observations (n_samples, n_features). Defaults to None.
draws – Number of posterior draws. Defaults to 2000.
tune – Number of tuning (warm-up) steps. Defaults to 1000.
target_accept – Target acceptance probability for the sampler. Defaults to 0.95.
random_seed – Seed for random number generation to enable reproducibility. Defaults to None.

Returns:

PyMC model object
InferenceData containing posterior samples

Return type:

tuple

bedroc.hierarchical.feature_centric_hierarchical_model(X: ndarray[tuple[Any, ...], dtype[float64]], X_group_idx: ndarray[tuple[Any, ...], dtype[int64]], *, group_names: Iterable | None = None, feature_names: Iterable | None = None, X_sigma: ndarray[tuple[Any, ...], dtype[float64]] | None = None, draws: int = 2000, tune: int = 1000, target_accept: float = 0.95, random_seed: int | None = None) → tuple[Model, DataTree]

Bayesian hierarchical model for feature-centered group comparisons.

This model estimates feature-wise latent structure shared across groups, while allowing group-specific deviations that are partially pooled across features.

The model is feature-centric: each feature has a global baseline mean, and each group expresses deviations from this baseline with hierarchical shrinkage controlled at the feature level.

This structure allows:

feature-specific heterogeneity in group effects
partial pooling of group deviations across features
stable estimation of group differences in high-dimensional settings

Note

The variable names in the model are fixed and are propagated downstream and expected by helper functions and analysis/plotting utilities.

Parameters:

X – Observations (n_samples, n_features)
X_group_idx – Group ID of observations (n_samples,)
group_names – Group labels. Defaults to unique values in X_group_idx.
feature_names – Feature names. Defaults to sequential feature labels.
X_sigma – Measurement noise per observation (n_samples, n_features). If None, noise is inferred.
draws – Number of posterior samples.
tune – Number of warm-up steps.
target_accept – NUTS target acceptance probability.
random_seed – RNG seed for reproducibility.

Returns:

PyMC model
ArviZ InferenceData

Return type:

tuple

class bedroc.hierarchical.SyntheticDataGenerator(n_samples: int = 100, *, n_features: int = 5, feature_offsets: ArrayLike = 1.0, feature_sigma: ArrayLike = 0.5, random_seed: int | None = None, output_directory: Path | None = None)

Bases: object

Generates synthetic multivariate data for two types (A & B) with configurable parameters.

Parameters:

n_samples – Number of samples per type. Defaults to 100.
n_features – Number of features per sample. Defaults to 5.
feature_offsets – Optional shift to apply to the Type B feature means relative to Type A. May be either a scalar (applied to every feature) or an array of shape (n_features,) specifying per-feature offsets. Defaults to 1.0.
feature_sigma – Standard deviation of the noise (stddev) for features. May be either a scalar (applied to every feature) or an array of shape (n_features,) specifying per-feature noise. Defaults to 0.5.
random_seed – Optional seed for reproducibility. Defaults to None.
output_directory – Optional path to save generated data. Defaults to None (no saving).

n_samples: int

n_features: int

feature_offsets: ndarray[tuple[Any, ...], dtype[float64]]

feature_sigma: ndarray[tuple[Any, ...], dtype[float64]]

random_seed: int | None

output_directory: Path | None

mu_A: ndarray[tuple[Any, ...], dtype[float64]]

mu_B: ndarray[tuple[Any, ...], dtype[float64]]

property X: ndarray[tuple[Any, ...], dtype[_ScalarT]]: Type A data (n_samples, n_features)

property X_group_idx: ndarray[tuple[Any, ...], dtype[int64]]: Group idx

generate() → None: Generates multivariate data for 2 types (A & B) and stores internally.

generate_out_of_sample_data(n_samples: int = 100) → tuple[ndarray, ndarray]

Generates out-of-sample synthetic data using previously-sampled true parameters.

Parameters:

n_samples – Number of out-of-sample points per type. Defaults to 100.

Returns:

Type A data (n_samples, n_features)
Type B data (n_samples, n_features)

Return type:

tuple

bedroc.pca module

Bayesian PCA/latent factor models

bedroc.pca.bayesian_pca(feature_values: ndarray[tuple[Any, ...], dtype[float64]], feature_stds: ndarray[tuple[Any, ...], dtype[float64]], data_labels: Iterable[str] | None = None, feature_labels: Iterable[str] | None = None, n_components: int = 2, draws: int = 2000, tune: int = 1000, target_accept: float = 0.95, random_seed: int | None = None) → tuple[Model, DataTree]

Bayesian PCA model

Parameters:

feature_values – Feature values
feature_stds – Feature standard deviations
data_labels – Labels for the data points. Defaults to None.
feature_labels – Labels for the features. Defaults to None.
n_components – Number of latent factors. Defaults to 2.
draws – Number of posterior draws. Defaults to 2000.
tune – Number of tuning (warm-up) steps. Defaults to 1000.
target_accept – Target acceptance probability for the sampler. Defaults to 0.95.
random_seed – Seed for random number generation to enable reproducibility. Defaults to None.

Returns:

PyMC model object
InferenceData containing posterior samples

Return type:

tuple

class bedroc.pca.PCAFactorAnalyzer(latent_factors: ndarray[tuple[Any, ...], dtype[float64]], loading_matrix: ndarray[tuple[Any, ...], dtype[float64]])

Bases: object

PCA factor analyzer

Helper class to compute outputs associated with a PCA/factor analysis. This is useful to compute output quantities for a Bayesian PCA to compare to a deterministic PCA.

Note

This class assumes that the observed data have been standardized (z-scored) before performing Bayesian PCA. In particular, observed_variance_by_feature() returns ones for all features, which is only correct when the input data are normalized.

Parameters:

latent_factors – Latent factors, which represent the projections or scores onto the latent space. Should be of shape (n_data, n_components, n_samples).
loading_matrix – Loading matrix, which contains the factor loadings. Should be of shape (n_components, n_features, n_samples).

property n_components: int: Number of components

property n_data: int: Number of data

property n_features: int: Number of features

property n_samples: int: Number of samples

explained_variance_ratio_by_factor() → ndarray[tuple[Any, ...], dtype[float64]]

Explained variance ratio by latent factor

This has been compared with sklearn.decomposition.PCA, specifically the attribute explained_variance_ratio_, and gives the same result.

Returns:: Explained variance ratio by latent factor with shape (n_components, n_samples)

explained_variance_ratio_by_feature(latent_factors: ndarray[tuple[Any, ...], dtype[float64]] | None = None, loading_matrix: ndarray[tuple[Any, ...], dtype[float64]] | None = None) → ndarray[tuple[Any, ...], dtype[float64]]

Explained variance ratio by feature

Parameters:

latent_factors – Latent factors. Defaults to None to use all values.
loading_matrix – Loading matrix. Defaults to None to use all values.

Returns:

Explained variance ratio by feature with shape (n_features, n_samples)

explained_variance_ratio_total() → ndarray[tuple[Any, ...], dtype[float64]]

Total explained variance ratio across all features and components

Compute the total variance ratio because in general for factor analysis the components are not orthogonal, and therefore the total cannot be determined by summing the contributions from the individual latent factors. This sums the variance across all features and latent factors, assuming that the reconstructed data includes all factors.

Returns:: Total explained variance ratio across all features and latent factors

observed_variance_by_feature() → ndarray[tuple[Any, ...], dtype[float64]]

Variance of each feature in the standardized observed data

This is unity by construction for standardized (z-score) data.

Returns:: Variance of each feature in the standardized observed data with shape (n_features,)

reconstruct_data(latent_factors: ndarray[tuple[Any, ...], dtype[float64]] | None = None, loading_matrix: ndarray[tuple[Any, ...], dtype[float64]] | None = None) → ndarray[tuple[Any, ...], dtype[float64]]

Reconstructs data.

This uses the model’s latent structure (mu = Z * alpha), without sampling likelihood noise.

Parameters:

latent_factors – Latent factors. Defaults to None to use all values.
loading_matrix – Loading matrix. Defaults to None to use all values.

Returns:

Reconstructed data, usually with shape (n_data, n_features, n_samples)

_reconstruct_variance_by_feature(latent_factors: ndarray[tuple[Any, ...], dtype[float64]] | None = None, loading_matrix: ndarray[tuple[Any, ...], dtype[float64]] | None = None) → ndarray[tuple[Any, ...], dtype[float64]]

Variance of each feature in the reconstructed data

Parameters:

latent_factors – Latent factors. Defaults to None to use all values.
loading_matrix – Loading matrix. Defaults to None to use all values.

Returns:

Variance of each feature in the reconstructed data with shape (n_features, n_samples)

class bedroc.pca.Analyzer(model: Model, idata: DataTree)

Bases: object

Analyzer for the Bayesian PCA

Parameters:

model – PyMC model object
idata – Trace data from sampling

property component_names: ndarray[tuple[Any, ...], dtype[_ScalarT]]

property feature_names: ndarray[tuple[Any, ...], dtype[_ScalarT]]

plot_explained_variance_by_feature(ax: Axes | None = None) → tuple[Axes, DataFrame]

Plots explained variance by feature and calculates summary statistics.

Parameters:

ax – Pre-existing axes for the plot. Defaults to None to create the axes.

Returns:

Plot axes
Summary statistics

Return type:

tuple

plot_explained_variance_by_factor(ax: Axes | None = None) → tuple[Axes, DataFrame]

Plots explained variance by factor and calculates summary statistics.

Parameters:

ax – Pre-existing axes for the plot. Defaults to None to create the axes.

Returns:

Plot axes
Summary statistics

Return type:

tuple

bedroc.type_aliases module

Common type aliases

This module centralizes type definitions for NumPy arrays and scalar values. Having a single place for these aliases improves readability and consistency across the codebase, whilst also simplifying type checking and documentation.

Module contents

Package level variables and initializes the package logger

bedroc.complex_formatter() → Formatter: Complex formatter

bedroc.simple_formatter() → Formatter

Simple formatter for logging

Returns:: Formatter for logging

bedroc.debug_logger() → Logger

Sets up debug logging to the console.

Returns:: A logger

bedroc.debug_file_logger() → Logger

Sets up info logging to the console and debug logging to a file.

Returns:: A logger

bedroc package

Subpackages

Submodules

bedroc.containers module

bedroc.core module

bedroc.hierarchical module

bedroc.pca module

bedroc.type_aliases module

Module contents