aiqclib.common.utils package

Submodules

aiqclib.common.utils.config module

A set of utilities for handling YAML configuration files.

This module provides utility functions for locating, reading, and parsing configuration files, typically in YAML format. It facilitates easy retrieval of specific items within the parsed configuration data.

aiqclib.common.utils.config.get_config_file(config_file)[source]

Determine the absolute path for a configuration file.

If the provided path does not exist, a FileNotFoundError is raised. If config_file is None, a ValueError is raised.

Parameters:

config_file (Optional[str]) – The path to the configuration file, or None.

Raises:

ValueError – If config_file is None.
FileNotFoundError – If the path specified by config_file does not exist.

Returns:

The resolved absolute path to the configuration file.

Return type:

str

aiqclib.common.utils.config.get_config_item(config, section, name)[source]

Retrieve a specific item from a section of a configuration dictionary.

This function iterates through a list of items within a specified section of the configuration, looking for an item where the "name" key matches the given name.

Parameters:

config (Dict[str, Any]) – The configuration dictionary, e.g., from read_config().
section (str) – The top-level key in config that contains a list of items.
name (str) – The value of the “name” key to match within the item.

Raises:

KeyError – If the section does not exist in the config dictionary.
TypeError – If the value at config[section] is not iterable.
ValueError – If no item with the specified name is found in the section.

Returns:

The dictionary of the matching configuration item.

Return type:

Dict[str, Any]

aiqclib.common.utils.config.read_config(config_file)[source]

Read and parse a YAML configuration file.

This function uses the provided config_file path to locate, read, and parse a YAML file into a Python dictionary.

Parameters:

config_file (Optional[str]) – Full path to the config file, or None to indicate no specific file was provided.

Raises:

ValueError – If config_file is None (propagated from get_config_file()).
FileNotFoundError – If no file is found at the resolved path (propagated from get_config_file()).
yaml.YAMLError – If the configuration file is not valid YAML.

Returns:

A dictionary representing the parsed YAML configuration.

Return type:

Dict[str, Any]

aiqclib.common.utils.file module

This module provides utility functions for reading various file formats into Polars DataFrames.

It supports common data formats like Parquet, TSV (tab-separated values), and CSV (comma-separated values), including their gzipped versions, and allows for automatic file type inference based on file extensions.

aiqclib.common.utils.file.read_input_file(input_file, file_type=None, options=None)[source]

Read an input file into a Polars DataFrame, supporting formats such as Parquet, TSV (optionally gzipped), and CSV (optionally gzipped).

Parameters:

input_file (str) – The full path to the file to be read.
file_type (Optional[str]) –
The file format. Must be one of: - “parquet” - “tsv” - “tsv.gz” - “csv” - “csv.gz”

If set to None or an empty string, the file type is inferred from the file extension. Defaults to None.
options (Optional[Dict[str, Any]]) – A dictionary of additional keyword arguments to pass to the Polars reading function (e.g., “has_header”, “infer_schema_length”). Defaults to None.

Raises:

FileNotFoundError – If the specified input_file does not exist.
ValueError – If the file type cannot be inferred or is not supported.

Returns:

A Polars DataFrame containing the contents of the file.

Return type:

DataFrame

Example Usage:

>>> import polars as pl
>>> # Assuming 'data.parquet' and 'data.tsv.gz' exist for demonstration
>>> # df = read_input_file("data.parquet")
>>> # df2 = read_input_file("data.tsv.gz", file_type="tsv.gz", options={"has_header": True})

aiqclib.common.utils.input_preprocess module

Automatic creation of the profile_no and observation_no identifier columns.

Some raw inputs do not carry the sequential identifiers aiqclib needs. When enabled in the configuration, this module derives them from other columns, following the documented preprocessing recipe:

sort the rows so observations of one profile are grouped and ordered (by pressure);
build a temporary profile_key from the columns that together identify a profile (by default platform_code, profile_timestamp, longitude and latitude);
profile_no is the dense rank of that key within each platform_code;
observation_no is the 1-indexed running count within each key;
the temporary key is dropped.

The set of columns to create, the key columns and the sort columns are all configurable, so the inference can be tuned to a dataset (or disabled).

Warning

The profile key must genuinely identify a profile. Slightly jittered coordinates would split one profile into several; identical timestamps at the same coordinates would merge distinct profiles. Choose the key columns accordingly.

aiqclib.common.utils.input_preprocess.DEFAULT_CREATED_COLUMNS: List[str] = ['profile_no', 'observation_no']: Identifier columns created by default.

aiqclib.common.utils.input_preprocess.DEFAULT_KEY_COLUMNS: List[str] = ['platform_code', 'profile_timestamp', 'longitude', 'latitude']: Columns that, combined, identify a single profile.

aiqclib.common.utils.input_preprocess.DEFAULT_SORT_COLUMNS: List[str] = ['platform_code', 'profile_timestamp', 'longitude', 'latitude', 'pres']: Columns to sort by before numbering (the trailing pres orders observations within a profile).

aiqclib.common.utils.input_preprocess.PLATFORM_COLUMN: str = 'platform_code': Column over which profile_no is ranked.

aiqclib.common.utils.input_preprocess.create_identifier_columns(df, key_columns=None, sort_columns=None, columns=None, platform_column='platform_code')[source]

Create profile_no and/or observation_no from other columns.

Parameters:

df (DataFrame) – The input data (typically right after column renaming).
key_columns (Optional[List[str]]) – Columns whose combination identifies a profile. Defaults to DEFAULT_KEY_COLUMNS.
sort_columns (Optional[List[str]]) – Columns to sort by before numbering. Defaults to DEFAULT_SORT_COLUMNS.
columns (Optional[List[str]]) – Which identifier columns to create; any subset of ["profile_no", "observation_no"]. Defaults to both. Listed columns are (re)generated, overwriting any existing column of the same name.
platform_column (str) – Column over which profile_no is ranked.

Raises:

ValueError – If a required source column is missing.

Returns:

The DataFrame with the requested identifier columns added.

Return type:

DataFrame

aiqclib.common.utils.input_validation module

Validation and automatic type correction for mandatory input columns.

aiqclib requires every input dataset to provide a small set of identity and coordinate columns with specific data types. This module centralises:

REQUIRED_INPUT_COLUMNS, the editable table of mandatory columns and their expected logical types; and
validate_and_convert_input_columns(), which checks that those columns are present and, where a column has the wrong type, attempts to convert it.

The validation is intended to run immediately after column renaming, so it sees the final column names. Automatic conversion is especially useful for TSV/CSV inputs, where numeric and datetime columns are frequently read as strings. As noted below, datetime conversion can only be done automatically for genuine date/datetime values (or string representations of them); numeric epoch encodings are ambiguous and must be converted up front (see the data-preprocessing guide).

aiqclib.common.utils.input_validation.required_column_names()[source]

Return the list of mandatory input column names.

Returns:: Names from REQUIRED_INPUT_COLUMNS, in definition order.
Return type:: List[str]

aiqclib.common.utils.input_validation.validate_and_convert_input_columns(df, required_columns=None)[source]

Validate mandatory input columns and convert mismatched types in place.

For each entry in required_columns this checks that the column exists and that its dtype matches the expected category. Columns with the wrong type are converted where possible (e.g. numeric strings from CSV/TSV become floats/integers, and date/datetime strings become datetimes).

Parameters:

df (DataFrame) – The input data, typically immediately after column renaming.
required_columns (Optional[Dict[str, str]]) – The mandatory-column table to validate against. Defaults to REQUIRED_INPUT_COLUMNS.

Raises:

ValueError – If any required column is missing, or if a column’s type cannot be converted to the expected type.

Returns:

The validated DataFrame, with any necessary conversions applied.

Return type:

DataFrame

aiqclib.common.utils.metric_plots module

This module provides functions for generating and saving performance metric plots, specifically Receiver Operating Characteristic (ROC) curves and Precision-Recall (PR) curves. It supports plotting for individual models across multiple cross-validation folds (with mean and standard deviation) or comparing multiple models/methods on a single plot. Plots are saved as SVG files.

aiqclib.common.utils.metric_plots.create_metric_plots(model)[source]

Create and save ROC and Precision-Recall plots as an SVG file for a single model.

Generates a figure with two subplots (ROC on left, PR on right) based on the data in model.model_scores. If the model-scores table contains multiple unique ‘k’ values (folds), it plots individual fold curves and then the mean curve with a shaded confidence band (standard deviation).

The output file path is determined by model.output_file_names['metric_plot'].

Parameters:

model (object) –

An object containing evaluation results and output configuration. It is expected to have the following attributes:

model_scores (dict[str, polars.DataFrame]): A dictionary where keys are target names and values are Polars DataFrames. Each DataFrame must contain at least ‘k’ (fold identifier), ‘label’ (true binary labels), and ‘score’ (prediction probabilities/scores) columns.
output_file_names (dict[str, dict[str, str]]): A dictionary containing output file paths. Specifically, output_file_names['metric_plot'][target_name] should provide the full path where the plot for a given target will be saved.

Raises:

ValueError – If model.model_scores is empty.

Returns:

None

Return type:

None

aiqclib.common.utils.metric_plots.create_multi_method_metric_plots(model)[source]

Create and save ROC and Precision-Recall plots for multiple methods overlaid on the same figure. Assumes the model-scores tables have a ‘method’ column.

The output file path is determined by model.output_file_names['metric_plot'].

Parameters:

model (object) –

An object containing evaluation results and output configuration. It is expected to have the following attributes:

model_scores (dict[str, polars.DataFrame]): A dictionary where keys are target names and values are Polars DataFrames. Each DataFrame must contain at least ‘method’ (method identifier), ‘label’ (true binary labels), and ‘score’ (prediction probabilities/scores) columns. It aggregates results across all folds/runs for each method.
output_file_names (dict[str, dict[str, str]]): A dictionary containing output file paths. Specifically, output_file_names['metric_plot'][target_name] should provide the full path where the plot for a given target will be saved.

Raises:

ValueError – If model.model_scores is empty.

Returns:

None

Return type:

None

Code Issue

The calculation of Average Precision (AP) for the Precision-Recall curve using pr_auc = auc(rec[::-1], prec[::-1]) is incorrect. The sklearn.metrics.precision_recall_curve function returns recall values that are already in increasing order. Therefore, auc(rec, prec) should be used directly to calculate the Area Under the Curve for the Precision-Recall plot. Reversing rec and prec before passing them to auc when rec is already increasing will lead to an incorrect AP value.

aiqclib.common.utils.normalization module

Normalization utilities.

This module centralises the logic shared by every feature class and the feature extraction step when applying normalization. It supports four normalization “types”, each selected per-feature via stats_set.type in the feature_param_sets section of a configuration file:

raw: no normalization (the default).
min_max: min-max scaling using values supplied by hand in the feature_stats_sets section of the config. This is the historical behaviour and is kept unchanged.
auto_min_max: min-max scaling using min/max values derived automatically from the dataset’s summary statistics.
standard: standard scaling (x - mean) / sd using mean/sd values derived automatically from the dataset’s summary statistics.

For auto_min_max and standard the derived values are written to a YAML normalization file during dataset preparation and re-loaded during classification, so the same fitted normalization is applied at classification time without re-entering any values (and without access to the original training data).

The helpers here are deliberately small and pure so they can be unit-tested with synthetic Polars frames, independently of the wider pipeline.

aiqclib.common.utils.normalization.AUTO_SCALING_TYPES = ('auto_min_max', 'standard'): Normalization types whose values are derived from data (and therefore must be persisted to a normalization file for reuse at classification time), as opposed to min_max whose values are supplied directly in the config.

aiqclib.common.utils.normalization.SCALING_TYPES = ('min_max', 'auto_min_max', 'standard'): Normalization types that actually transform feature values. raw is intentionally excluded because it is a no-op.

aiqclib.common.utils.normalization.aggregate_profile_stats(summary_stats, variables=None, exclude=['longitude', 'latitude'])[source]

Aggregate per-profile summary statistics across profiles.

This reshapes the long per-profile rows of a summary_stats table (i.e. the rows whose platform_code is not "all") into one row per (variable, stats) pair, computing the distribution of each per-profile statistic across profiles: its min, mean, pct97.5, max and sd.

The across-profile sd is the only addition relative to the historical SummaryStatsBase.create_summary_stats_profile output; it is required to standard-scale profile_summary_stats features (whose columns are themselves per-profile statistics).

Parameters:

summary_stats (DataFrame) – The combined summary statistics table produced by SummaryStatsBase.calculate_stats().
variables (Optional[List[str]]) – Optional list of variables to keep. None keeps all.
exclude (List[str]) – Variables to drop before aggregating (location variables have no meaningful per-profile spread).

Returns:

A long-form frame with columns variable, stats, min, mean, pct97.5, max and sd.

Return type:

DataFrame

aiqclib.common.utils.normalization.build_scaling_expr(col_name, params, stats_type)[source]

Build a Polars expression that normalizes a single column.

The formula depends on stats_type:

min_max / auto_min_max: (x - min) / (max - min)
standard: (x - mean) / sd

A zero denominator (a constant column, e.g. a per-profile location whose standard deviation is zero) is handled gracefully by only subtracting the centre, which yields 0 for the constant value rather than inf/nan.

Parameters:

col_name (str) – Name of the column to scale (the output keeps the name).
params (Dict) – The statistics for this column. For min-max types this is {"min": ..., "max": ...}; for standard it is {"mean": ..., "sd": ...}.
stats_type (str) – One of SCALING_TYPES.

Returns:

A Polars expression aliased back to col_name.

Return type:

Expr

aiqclib.common.utils.normalization.derive_observation_stats(summary_stats, variables, stats_type)[source]

Derive flat per-variable normalization stats from the global (“all”) rows.

For each requested variable this reads its global summary row (platform_code == "all") and extracts either {min, max} (for auto_min_max) or {mean, sd} (for standard).

Parameters:

summary_stats (DataFrame) – The combined summary statistics table.
variables (List[str]) – The variables (column names) to derive stats for.
stats_type (str) – "auto_min_max" or "standard".

Returns:

{variable: {"min"/"max"} or {"mean"/"sd"}}.

Return type:

Dict[str, Dict]

aiqclib.common.utils.normalization.derive_profile_stats(profile_stats_long, variables, summary_stats_names, stats_type)[source]

Derive nested per-(variable, stat) normalization stats across profiles.

Used for profile_summary_stats features. profile_stats_long is the output of aggregate_profile_stats(); for each requested variable and each requested per-profile statistic, this extracts {min, max} (for auto_min_max) or {mean, sd} (for standard).

Parameters:

profile_stats_long (DataFrame) – The across-profile aggregation.
variables (List[str]) – The variables (e.g. ["temp", "psal", "pres"]).
summary_stats_names (List[str]) – The per-profile statistics that become feature columns (e.g. ["mean", "median", "sd"]).
stats_type (str) – "auto_min_max" or "standard".

Returns:

{variable: {stat: {"min"/"max"} or {"mean"/"sd"}}}.

Return type:

Dict[str, Dict]

aiqclib.common.utils.normalization.is_scaling_type(stats_type)[source]

Return whether a given stats_set type performs a value transformation.

Parameters:: stats_type (Optional[str]) – The normalization type (e.g. "min_max", "raw").
Returns:: True for min_max, auto_min_max and standard; False for raw, None and any unknown value.
Return type:: bool

aiqclib.common.utils.normalization.read_normalization_file(input_file)[source]

Read a normalization YAML file written by write_normalization_file().

Parameters:: input_file (str) – Path to the YAML normalization file.
Raises:: FileNotFoundError – If the file does not exist.
Returns:: A dictionary shaped like a feature_stats_set entry (i.e. with name plus auto_min_max / standard lists).
Return type:: Dict

aiqclib.common.utils.normalization.scale_flat_columns(df, stats, stats_type)[source]

Apply scaling to a frame whose stats are keyed directly by column name.

Used by features that operate on raw observed variables (e.g. basic_values, flank_up, flank_down, location), where stats looks like {"temp": {"min": ..., "max": ...}, ...}.

Columns present in stats but absent from df are skipped, so a single shared stats set can be reused across features that expose different subsets of columns.

Parameters:

df (DataFrame) – The frame to transform.
stats (Dict[str, Dict]) – Mapping of column name to its statistics.
stats_type (str) – One of SCALING_TYPES.

Returns:

A new frame with the relevant columns scaled.

Return type:

DataFrame

aiqclib.common.utils.normalization.scale_nested_columns(df, stats, stats_type)[source]

Apply scaling to a frame whose stats are keyed by variable then stat.

Used by profile_summary_stats, whose feature columns are named {variable}_{stat} (e.g. temp_mean) and whose stats looks like {"temp": {"mean": {"min": ..., "max": ...}, ...}, ...}.

Columns derived from the nested keys but absent from df are skipped.

Parameters:

df (DataFrame) – The frame to transform.
stats (Dict[str, Dict]) – Nested mapping {variable: {stat: stats_dict}}.
stats_type (str) – One of SCALING_TYPES.

Returns:

A new frame with the relevant columns scaled.

Return type:

DataFrame

aiqclib.common.utils.normalization.write_normalization_file(output_file, stats_set_name, resolved)[source]

Write derived normalization values to a YAML file.

The file mirrors the structure of a single feature_stats_sets entry so it can be loaded straight back into a configuration’s feature_stats_set and consumed by the existing stats-injection machinery. For example:

name: feature_set_1_stats_set_1
auto_min_max:
  - name: basic_values3
    stats: {temp: {min: 0.0, max: 20.0}, ...}
standard:
  - name: location
    stats: {longitude: {mean: 18.8, sd: 2.0}, ...}

Parameters:

output_file (str) – Destination path. Parent directories are created.
stats_set_name (str) – The name recorded at the top of the file.
resolved (Dict[str, Dict[str, Dict]]) – {stats_type: {entry_name: stats_dict}} to serialise.

Returns:

None

Return type:

None

aiqclib.common.utils.qc_flags module

QC flag constants and aggregation helpers for the NRT QC module.

Defines the subset of the IOC/Argo flag scheme used by the NRT QC items (1 = good, 3 = probably bad, 4 = bad) and a polars helper to aggregate several per-item flag columns into the most severe flag per observation.

aiqclib.common.utils.qc_flags.FLAG_BAD: int = 4: Flag value for bad data (default failure).

aiqclib.common.utils.qc_flags.FLAG_GOOD: int = 1: Flag value for data that passed a QC item.

aiqclib.common.utils.qc_flags.FLAG_PROBABLY_BAD: int = 3: Flag value for probably bad data (softened failure).

aiqclib.common.utils.qc_flags.FLAG_SEVERITY_ORDER: tuple = (1, 3, 4): Flag values ordered by ascending severity. Severity coincides with the numeric order, which is what worst_flag() relies on.

aiqclib.common.utils.qc_flags.FlagExpr

A flag column referenced by name, or any polars expression yielding flags.

alias of str | Expr

aiqclib.common.utils.qc_flags.worst_flag(*flags)[source]

Element-wise most severe flag across several flag columns/expressions.

Because severity coincides with the numeric order for the flag scheme used here (1 < 3 < 4), the most severe flag is the horizontal maximum. Null entries are ignored, so a column that does not apply to a given observation cannot degrade the result.

Parameters:: flags (Union[str, Expr]) – Column names or polars expressions of flag values.
Returns:: An expression yielding the most severe flag per row.
Return type:: Expr

aiqclib.common.utils.seawater module

EOS-80 seawater routines (UNESCO 1983) for the NRT QC module.

Implements the equation of state of seawater from Fofonoff & Millard (1983), UNESCO Technical Papers in Marine Science #44: the adiabatic lapse rate, potential temperature, density at atmospheric pressure, and the potential density anomaly sigma-0 used by the RTQC14 density inversion test.

All functions follow the UNESCO argument order (salinity, temperature, pressure) with practical salinity (PSS-78), in-situ temperature in degrees Celsius (IPTS-68), and pressure in decibars. Inputs may be scalars, numpy arrays, or polars Series; computation is vectorised with numpy and NaN values propagate through the results.

aiqclib.common.utils.seawater.adiabatic_lapse_rate(s, t, p)[source]

Adiabatic temperature gradient of seawater (Bryden 1973, UNESCO 1983).

Check value: adiabatic_lapse_rate(40, 40, 10000) = 3.255976e-4 °C/dbar.

Parameters:

s (Union[float, list, ndarray, Series]) – Practical salinity (PSS-78).
t (Union[float, list, ndarray, Series]) – In-situ temperature in °C (IPTS-68).
p (Union[float, list, ndarray, Series]) – Pressure in decibars.

Returns:

Adiabatic lapse rate in °C/dbar.

Return type:

ndarray

aiqclib.common.utils.seawater.density_at_surface(s, t)[source]

Density of seawater at atmospheric pressure (Millero & Poisson 1981).

The one-atmosphere International Equation of State of Seawater (IES 80) as given in UNESCO 1983.

Check values: density_at_surface(0, 5) = 999.96675 kg/m³, density_at_surface(35, 5) = 1027.67547 kg/m³, density_at_surface(35, 25) = 1023.34306 kg/m³.

Parameters:

s (Union[float, list, ndarray, Series]) – Practical salinity (PSS-78).
t (Union[float, list, ndarray, Series]) – Temperature in °C (IPTS-68).

Returns:

Density in kg/m³.

Return type:

ndarray

aiqclib.common.utils.seawater.potential_temperature(s, t, p, p_ref=0.0)[source]

Potential temperature of seawater (Fofonoff & Millard 1983).

Integrates the adiabatic lapse rate from the in-situ pressure to the reference pressure with the standard Runge-Kutta 4 scheme of the UNESCO PTMP routine.

Check value: potential_temperature(40, 40, 10000, 0) = 36.89073 °C.

Parameters:

s (Union[float, list, ndarray, Series]) – Practical salinity (PSS-78).
t (Union[float, list, ndarray, Series]) – In-situ temperature in °C (IPTS-68).
p (Union[float, list, ndarray, Series]) – Pressure in decibars.
p_ref (float) – Reference pressure in decibars, defaults to 0 (surface).

Returns:

Potential temperature in °C referenced to p_ref.

Return type:

ndarray

aiqclib.common.utils.seawater.sigma0(s, t, p)[source]

Potential density anomaly sigma-0 of seawater.

The density the water parcel would have at atmospheric pressure after an adiabatic move to the surface, minus 1000 kg/m³: rho(s, theta(s, t, p, 0), 0) - 1000. This is the quantity compared at consecutive profile levels by the RTQC14 density inversion test.

Parameters:

s (Union[float, list, ndarray, Series]) – Practical salinity (PSS-78).
t (Union[float, list, ndarray, Series]) – In-situ temperature in °C (IPTS-68).
p (Union[float, list, ndarray, Series]) – Pressure in decibars.

Returns:

Potential density anomaly in kg/m³.

Return type:

ndarray

aiqclib.common.utils.shap_io module

Utilities for importing the SHAP score files produced by aiqclib.

During the testing (training) and classification phases, aiqclib can write per-instance SHAP values to a Parquet file. Each such file has three metadata columns — label, predicted_label and score — followed by one column per feature, each suffixed with _shap (e.g. temp_mean_shap, longitude_shap).

This module reads such a file back into a Polars DataFrame and, by default, strips the _shap suffix from the feature columns so the result can be fed straight into common SHAP visualizations (mean-importance bar charts, summary plots, dependence plots, etc.).

aiqclib.common.utils.shap_io.SHAP_COLUMN_SUFFIX = '_shap': Suffix used by aiqclib for columns that hold SHAP values.

aiqclib.common.utils.shap_io.read_shap_scores(file_name, file_type=None, options=None, strip_suffix=True, suffix='_shap')[source]

Import a SHAP score file written by aiqclib.

The file is read via aiqclib.common.utils.file.read_input_file(), so Parquet, TSV and CSV (optionally gzipped) inputs are all supported and the format is inferred from the extension when file_type is not given.

By default, the _shap suffix is removed from every SHAP-value column, so each feature column is named by the feature itself (e.g. temp_mean_shap becomes temp_mean). The metadata columns (label, predicted_label, score) do not carry the suffix and are returned unchanged.

Parameters:

file_name (str) – Path to the SHAP score file.
file_type (Optional[str]) – Explicit file format ("parquet", "tsv", "tsv.gz", "csv", "csv.gz"). Inferred from the extension when None.
options (Optional[Dict[str, Any]]) – Extra keyword arguments forwarded to the underlying Polars reader.
strip_suffix (bool) – Whether to strip the SHAP suffix from column names. Defaults to True.
suffix (str) – The suffix identifying SHAP-value columns. Defaults to SHAP_COLUMN_SUFFIX.

Raises:

FileNotFoundError – If file_name does not exist.
ValueError – If the file type is unsupported, or if stripping the suffix would produce duplicate column names.

Returns:

A Polars DataFrame of SHAP scores, with feature columns renamed unless strip_suffix is False.

Return type:

DataFrame

Example

>>> import aiqclib as aq
>>> shap = aq.read_shap_scores("classify_shap_values_temp.parquet")
>>> # Feature columns (everything that carried the _shap suffix):
>>> features = [c for c in shap.columns
...             if c not in ("label", "predicted_label", "score")]
>>> # Matrix of SHAP values for plotting (e.g. mean importance, summary,
>>> # or dependence plots):
>>> values = shap.select(features).to_numpy()