aiqclib.train.step4_build_model package

Submodules

aiqclib.train.step4_build_model.build_model module

This module defines the BuildModel class, a specialized component for building and testing machine learning models.

It inherits from aiqclib.train.step4_build_model.build_model_base.BuildModelBase and orchestrates the training and evaluation of models for specified targets using Polars DataFrames.

class aiqclib.train.step4_build_model.build_model.BuildModel(config, training_sets=None, test_sets=None)[source]

Bases: BuildModelBase

A subclass of BuildModelBase designed to build and test models using provided training and test sets for each target.

This class sets its expected_class_name to "BuildModel", which must match the YAML configuration’s base_class if it is to be instantiated within that framework. It extends the base functionality to specifically manage training and testing workflows, including data preparation steps like column dropping for model input and result aggregation.

Parameters:
  • config (ConfigBase)

  • training_sets (Dict[str, DataFrame] | None)

  • test_sets (Dict[str, DataFrame] | None)

build(target_name)[source]

Build (train) a test model for the specified target, storing it in models.

This method:

  1. Reloads the base model via load_base_model().

  2. Attaches the training data for the target (dropping the k_fold column and common identifying columns).

  3. Calls base_model.build().

  4. Stores the built model in models[target_name].

Parameters:

target_name (str) – The target variable name, used to index training_sets and locate the training data.

Raises:

ValueError – If training_sets is empty, indicating no corresponding data is available for model building.

Return type:

None

build_final_model(target_name)[source]

Build (train) a model for the specified target, storing it in final_models.

This method:

  1. Reloads the base model via load_base_model().

  2. Attaches the training data for the target (dropping the k_fold column and common identifying columns).

  3. Attaches the test data for the target (dropping common identifying columns).

  4. Calls base_model.build().

  5. Stores the built model in models[target_name].

Parameters:

target_name (str) – The target variable name, used to index training_sets and locate the training data.

Raises:

ValueError – If training_sets or test_sets is empty, indicating no corresponding data is available for model building.

Return type:

None

expected_class_name: str = 'BuildModel'
test(target_name)[source]

Test the model for the given target, storing the results in results.

This method:

  1. Retrieves the previously built model from models[target_name].

  2. Resets the model’s model-scores table to ensure no data duplication from previous runs.

  3. Attaches the appropriate test set from test_sets[target_name], dropping common identifying columns.

  4. Calls base_model.test().

  5. Stores the test report in reports[target_name].

  6. Stores the model-scores table in model_scores[target_name].

  7. Stores the SHAP values in shap_values[target_name].

  8. Stores the test predictions, augmented with identifying information and the true label, in predictions[target_name].

Parameters:

target_name (str) – The target variable name, used to index both models and test_sets.

Return type:

None

aiqclib.train.step4_build_model.build_model_base module

Provides an abstract base class, aiqclib.common.base.build_model_base.BuildModelBase, for building and testing machine learning models using structured training and test datasets.

This module establishes a framework for model development within a larger data quality control (DMQC) system, integrating with configuration management and model loading utilities. Subclasses are expected to implement specific model building and testing logic tailored to different modeling paradigms or frameworks.

class aiqclib.train.step4_build_model.build_model_base.BuildModelBase(config, training_sets=None, test_sets=None, step_name='build')[source]

Bases: DataSetBase

An abstract base class to build and test models, using training/test sets and a YAML-based configuration.

Inherits from aiqclib.common.base.dataset_base.DataSetBase (with step name "build") to ensure that the provided configuration matches the expected fields for model-building. Subclasses must define their own logic in the build() and test() abstract methods, potentially for different modeling frameworks.

Parameters:
  • config (ConfigBase)

  • training_sets (Dict[str, DataFrame] | None)

  • test_sets (Dict[str, DataFrame] | None)

  • step_name (str)

base_model: ModelBase | None

The base model instance loaded from load_base_model(); can be overridden for each target.

abstractmethod build(target_name)[source]

Build a test model for the specified target name.

This abstract method must be implemented by subclasses to perform the steps necessary for initializing, training, and storing the model in models.

Parameters:

target_name (str) – The identifier for this target’s model in training_sets.

Return type:

None

abstractmethod build_final_model(target_name)[source]

Build a final model for the specified target name.

This abstract method must be implemented by subclasses to perform the steps necessary for initializing, training, and storing the model in final_models.

Parameters:

target_name (str) – The identifier for this target’s model in training_sets.

Return type:

None

build_final_model_targets()[source]

Iterate over all targets from the configuration, calling build_final_model() for each target.

Return type:

None

build_targets()[source]

Iterate over all targets from the configuration, calling build_test() for each target.

Return type:

None

create_metric_plots()[source]

Create and save ROC and Precision-Recall plots as an SVG file for each target.

Calls the common utility function aiqclib.common.utils.metric_plots.create_metric_plots().

Return type:

None

default_file_names: Dict[str, str]

Default names for model files and test reports, with placeholders for the target name.

default_model_file_name: str
final_models: Dict[str, ModelBase | None]
load_base_model()[source]

Load the base model class from the configuration.

The loaded model is stored in base_model and may be cloned, specialized, or reloaded for each target in the building process.

Return type:

None

model_file_names: Dict[str, str]

A dictionary mapping “model” to target-specific file paths.

model_scores: Dict[str, DataFrame]

A dictionary to store model-scores tables keyed by target name.

models: Dict[str, ModelBase | None]

A dictionary to store model objects keyed by target name.

output_file_names: Dict[str, Dict[str, str]]

A dictionary mapping result type (e.g., “report”, “prediction”) to target-specific file paths.

predictions: Dict[str, DataFrame]

A dictionary to store prediction results keyed by target name.

read_models()[source]

Read and restore each target’s model from disk, storing the loaded model in models.

Raises:
  • FileNotFoundError – If a model file does not exist for a particular target.

  • RuntimeError – If the base_model is not loaded, which is required to update model thread settings.

Return type:

None

reports: Dict[str, DataFrame]

A dictionary to store test reports keyed by target name.

shap_values: Dict[str, DataFrame]

A dictionary to store SHAP values keyed by target name.

abstractmethod test(target_name)[source]

Test a model for the specified target name.

Typically, this includes running predictions, evaluating performance metrics, and storing results in reports.

Parameters:

target_name (str) – The identifier for this target’s model and test set in test_sets (plus entries in models).

Return type:

None

test_sets: Dict[str, DataFrame] | None

A dictionary containing test data keyed by target name.

test_targets()[source]

Iterate over all targets, ensuring that a model has been built before calling test().

Raises:

ValueError – If a target has no corresponding entry in models.

Return type:

None

training_sets: Dict[str, DataFrame] | None

A dictionary containing training data keyed by target name.

write_model_scores()[source]

Write each target’s model-scores table to a Parquet file.

Raises:

ValueError – If model_scores is empty, indicating no tests have been carried out or no tables stored.

Return type:

None

write_models()[source]

Serialize and write each target’s model to disk.

Raises:

ValueError – If models is empty, indicating no models have been built for writing.

Return type:

None

write_predictions()[source]

Serialize and write each target’s predictions to disk.

Raises:

ValueError – If predictions is empty, indicating no predictions have been built for writing.

Return type:

None

write_reports()[source]

Write each target’s test reports to a TSV file.

Raises:

ValueError – If reports is empty, indicating no tests have been carried out or no reports stored.

Return type:

None

write_shap_values()[source]

Write each target’s SHAP values to a Parquet file.

This method checks if SHAP values are enabled in the base model. If not, it returns without writing.

Raises:

ValueError – If shap_values is empty while SHAP is enabled, indicating no SHAP values were computed or stored.

Return type:

None

aiqclib.train.step4_build_model.build_model_suite module

This module defines the BuildModelSuite class, a specialized component for building and testing multiple machine learning models concurrently using a model suite (e.g., ModelSuite).

It inherits from aiqclib.train.step4_build_model.build_model_base.BuildModelBase and aggregates the results across all methods into single output files per target.

class aiqclib.train.step4_build_model.build_model_suite.BuildModelSuite(config, training_sets=None, test_sets=None)[source]

Bases: BuildModelBase

A subclass of aiqclib.train.step4_build_model.build_model_base.BuildModelBase designed to build and test models using a model suite (multi-model configuration).

This class iterates through all ML methods defined in the provided base model. It saves individual models with composite keys, but aggregates test reports, predictions, and model-scores tables into single datasets per target name by introducing a ‘method’ column.

Parameters:
  • config (ConfigBase)

  • training_sets (Dict[str, DataFrame] | None)

  • test_sets (Dict[str, DataFrame] | None)

build(target_name)[source]

Build (train) models for the specified target across all configured methods, storing them in models with composite keys.

Parameters:

target_name (str) – The name of the target variable to build models for.

Raises:

ValueError – If training_sets are empty.

Return type:

None

build_final_model(target_name)[source]

Build (train) models for the specified target across all configured methods, storing them in models with composite keys.

Parameters:

target_name (str) – The name of the target variable to build models for.

Raises:

ValueError – If training_sets or test_sets is empty.

Return type:

None

create_metric_plots()[source]

Override parent method to call the multi-method metric plotter.

Return type:

None

expected_class_name: str = 'BuildModelSuite'
read_models()[source]

Read and restore each target’s models from disk for all methods in the suite, storing the loaded models in models.

Raises:

FileNotFoundError – If a model file path does not exist on disk.

Return type:

None

test(target_name)[source]

Test the models for the given target across all methods, appending a ‘method’ column and aggregating the results into single datasets.

Data types for model outputs (class, score, etc.) are standardized to Int64 and Float64 to prevent Polars SchemaErrors when concatenating results from different ML libraries (e.g., XGBoost vs Scikit-Learn).

Parameters:

target_name (str) – The name of the target variable to test models for.

Return type:

None

test_targets()[source]

Iterate over all targets, ensuring that models have been built for all configured methods before calling test().

Raises:

ValueError – If a target/method combination has no corresponding entry in models.

Return type:

None