aiqclib.train.step4_build_model package
Submodules
aiqclib.train.step4_build_model.build_model module
This module defines the BuildModel class, a specialized component
for building and testing machine learning models.
It inherits from aiqclib.train.step4_build_model.build_model_base.BuildModelBase
and orchestrates the training and evaluation of models for specified targets
using Polars DataFrames.
- class aiqclib.train.step4_build_model.build_model.BuildModel(config, training_sets=None, test_sets=None)[source]
Bases:
BuildModelBaseA subclass of
BuildModelBasedesigned to build and test models using provided training and test sets for each target.This class sets its
expected_class_nameto"BuildModel", which must match the YAML configuration’sbase_classif it is to be instantiated within that framework. It extends the base functionality to specifically manage training and testing workflows, including data preparation steps like column dropping for model input and result aggregation.- Parameters:
config (ConfigBase)
training_sets (Dict[str, DataFrame] | None)
test_sets (Dict[str, DataFrame] | None)
- build(target_name)[source]
Build (train) a test model for the specified target, storing it in
models.This method:
Reloads the base model via
load_base_model().Attaches the training data for the target (dropping the
k_foldcolumn and common identifying columns).Calls
base_model.build().Stores the built model in
models[target_name].
- Parameters:
target_name (
str) – The target variable name, used to indextraining_setsand locate the training data.- Raises:
ValueError – If
training_setsis empty, indicating no corresponding data is available for model building.- Return type:
None
- build_final_model(target_name)[source]
Build (train) a model for the specified target, storing it in
final_models.This method:
Reloads the base model via
load_base_model().Attaches the training data for the target (dropping the
k_foldcolumn and common identifying columns).Attaches the test data for the target (dropping common identifying columns).
Calls
base_model.build().Stores the built model in
models[target_name].
- Parameters:
target_name (
str) – The target variable name, used to indextraining_setsand locate the training data.- Raises:
ValueError – If
training_setsortest_setsis empty, indicating no corresponding data is available for model building.- Return type:
None
- expected_class_name: str = 'BuildModel'
- test(target_name)[source]
Test the model for the given target, storing the results in
results.This method:
Retrieves the previously built model from
models[target_name].Resets the model’s model-scores table to ensure no data duplication from previous runs.
Attaches the appropriate test set from
test_sets[target_name], dropping common identifying columns.Calls
base_model.test().Stores the test report in
reports[target_name].Stores the model-scores table in
model_scores[target_name].Stores the SHAP values in
shap_values[target_name].Stores the test predictions, augmented with identifying information and the true label, in
predictions[target_name].
- Parameters:
target_name (
str) – The target variable name, used to index bothmodelsandtest_sets.- Return type:
None
aiqclib.train.step4_build_model.build_model_base module
Provides an abstract base class, aiqclib.common.base.build_model_base.BuildModelBase,
for building and testing machine learning models using structured training and test datasets.
This module establishes a framework for model development within a larger data quality control (DMQC) system, integrating with configuration management and model loading utilities. Subclasses are expected to implement specific model building and testing logic tailored to different modeling paradigms or frameworks.
- class aiqclib.train.step4_build_model.build_model_base.BuildModelBase(config, training_sets=None, test_sets=None, step_name='build')[source]
Bases:
DataSetBaseAn abstract base class to build and test models, using training/test sets and a YAML-based configuration.
Inherits from
aiqclib.common.base.dataset_base.DataSetBase(with step name"build") to ensure that the provided configuration matches the expected fields for model-building. Subclasses must define their own logic in thebuild()andtest()abstract methods, potentially for different modeling frameworks.- Parameters:
config (ConfigBase)
training_sets (Dict[str, DataFrame] | None)
test_sets (Dict[str, DataFrame] | None)
step_name (str)
- base_model: ModelBase | None
The base model instance loaded from
load_base_model(); can be overridden for each target.
- abstractmethod build(target_name)[source]
Build a test model for the specified target name.
This abstract method must be implemented by subclasses to perform the steps necessary for initializing, training, and storing the model in
models.- Parameters:
target_name (
str) – The identifier for this target’s model intraining_sets.- Return type:
None
- abstractmethod build_final_model(target_name)[source]
Build a final model for the specified target name.
This abstract method must be implemented by subclasses to perform the steps necessary for initializing, training, and storing the model in
final_models.- Parameters:
target_name (
str) – The identifier for this target’s model intraining_sets.- Return type:
None
- build_final_model_targets()[source]
Iterate over all targets from the configuration, calling
build_final_model()for each target.- Return type:
None
- build_targets()[source]
Iterate over all targets from the configuration, calling
build_test()for each target.- Return type:
None
- create_metric_plots()[source]
Create and save ROC and Precision-Recall plots as an SVG file for each target.
Calls the common utility function
aiqclib.common.utils.metric_plots.create_metric_plots().- Return type:
None
- default_file_names: Dict[str, str]
Default names for model files and test reports, with placeholders for the target name.
- default_model_file_name: str
- load_base_model()[source]
Load the base model class from the configuration.
The loaded model is stored in
base_modeland may be cloned, specialized, or reloaded for each target in the building process.- Return type:
None
- model_file_names: Dict[str, str]
A dictionary mapping “model” to target-specific file paths.
- model_scores: Dict[str, DataFrame]
A dictionary to store model-scores tables keyed by target name.
- output_file_names: Dict[str, Dict[str, str]]
A dictionary mapping result type (e.g., “report”, “prediction”) to target-specific file paths.
- predictions: Dict[str, DataFrame]
A dictionary to store prediction results keyed by target name.
- read_models()[source]
Read and restore each target’s model from disk, storing the loaded model in
models.- Raises:
FileNotFoundError – If a model file does not exist for a particular target.
RuntimeError – If the
base_modelis not loaded, which is required to update model thread settings.
- Return type:
None
- reports: Dict[str, DataFrame]
A dictionary to store test reports keyed by target name.
- shap_values: Dict[str, DataFrame]
A dictionary to store SHAP values keyed by target name.
- abstractmethod test(target_name)[source]
Test a model for the specified target name.
Typically, this includes running predictions, evaluating performance metrics, and storing results in
reports.
- test_sets: Dict[str, DataFrame] | None
A dictionary containing test data keyed by target name.
- test_targets()[source]
Iterate over all targets, ensuring that a model has been built before calling
test().- Raises:
ValueError – If a target has no corresponding entry in
models.- Return type:
None
- training_sets: Dict[str, DataFrame] | None
A dictionary containing training data keyed by target name.
- write_model_scores()[source]
Write each target’s model-scores table to a Parquet file.
- Raises:
ValueError – If
model_scoresis empty, indicating no tests have been carried out or no tables stored.- Return type:
None
- write_models()[source]
Serialize and write each target’s model to disk.
- Raises:
ValueError – If
modelsis empty, indicating no models have been built for writing.- Return type:
None
- write_predictions()[source]
Serialize and write each target’s predictions to disk.
- Raises:
ValueError – If
predictionsis empty, indicating no predictions have been built for writing.- Return type:
None
- write_reports()[source]
Write each target’s test reports to a TSV file.
- Raises:
ValueError – If
reportsis empty, indicating no tests have been carried out or no reports stored.- Return type:
None
- write_shap_values()[source]
Write each target’s SHAP values to a Parquet file.
This method checks if SHAP values are enabled in the base model. If not, it returns without writing.
- Raises:
ValueError – If
shap_valuesis empty while SHAP is enabled, indicating no SHAP values were computed or stored.- Return type:
None
aiqclib.train.step4_build_model.build_model_suite module
This module defines the BuildModelSuite class, a specialized component
for building and testing multiple machine learning models concurrently using
a model suite (e.g., ModelSuite).
It inherits from aiqclib.train.step4_build_model.build_model_base.BuildModelBase
and aggregates the results across all methods into single output files per target.
- class aiqclib.train.step4_build_model.build_model_suite.BuildModelSuite(config, training_sets=None, test_sets=None)[source]
Bases:
BuildModelBaseA subclass of
aiqclib.train.step4_build_model.build_model_base.BuildModelBasedesigned to build and test models using a model suite (multi-model configuration).This class iterates through all ML methods defined in the provided base model. It saves individual models with composite keys, but aggregates test reports, predictions, and model-scores tables into single datasets per target name by introducing a ‘method’ column.
- Parameters:
config (ConfigBase)
training_sets (Dict[str, DataFrame] | None)
test_sets (Dict[str, DataFrame] | None)
- build(target_name)[source]
Build (train) models for the specified target across all configured methods, storing them in
modelswith composite keys.- Parameters:
target_name (
str) – The name of the target variable to build models for.- Raises:
ValueError – If
training_setsare empty.- Return type:
None
- build_final_model(target_name)[source]
Build (train) models for the specified target across all configured methods, storing them in
modelswith composite keys.- Parameters:
target_name (
str) – The name of the target variable to build models for.- Raises:
ValueError – If
training_setsortest_setsis empty.- Return type:
None
- create_metric_plots()[source]
Override parent method to call the multi-method metric plotter.
- Return type:
None
- expected_class_name: str = 'BuildModelSuite'
- read_models()[source]
Read and restore each target’s models from disk for all methods in the suite, storing the loaded models in
models.- Raises:
FileNotFoundError – If a model file path does not exist on disk.
- Return type:
None
- test(target_name)[source]
Test the models for the given target across all methods, appending a ‘method’ column and aggregating the results into single datasets.
Data types for model outputs (class, score, etc.) are standardized to Int64 and Float64 to prevent Polars SchemaErrors when concatenating results from different ML libraries (e.g., XGBoost vs Scikit-Learn).
- Parameters:
target_name (
str) – The name of the target variable to test models for.- Return type:
None