aiqclib.train.step2_validate_model packageο
Submodulesο
aiqclib.train.step2_validate_model.kfold_validation moduleο
This module provides the KFoldValidation class, an implementation of k-fold cross-validation for model training and evaluation. It extends ValidationBase to perform iterative model building and testing across defined data folds, accumulating performance reports.
- class aiqclib.train.step2_validate_model.kfold_validation.KFoldValidation(config, training_sets=None)[source]ο
Bases:
ValidationBaseA subclass of
ValidationBasethat performs k-fold cross-validation on training sets.This class iterates over the specified number of folds, trains (builds) the model on all folds except one, then tests it on the held-out fold. Results are accumulated in
reports.- Parameters:
config (ConfigBase)
training_sets (Dict[str, DataFrame] | None)
- default_k_fold: intο
The default number of folds if none is specified in the config.
- expected_class_name: str = 'KFoldValidation'ο
- get_k_fold()[source]ο
Retrieve the number of folds to use for cross-validation from the
validatesection of the YAML config, or fall back todefault_k_fold.- Returns:
The number of folds for k-fold cross-validation.
- Return type:
int
- validate(target_name)[source]ο
Conduct k-fold cross-validation for the given target name, storing model objects, test results, and model-scores tables in
models,reports, andmodel_scores.For each fold out of
get_k_fold():Reload or re-initialize the model using
load_base_model().Set
base_model.kto the fold index.Build the model using all training data except rows in the current fold.
Test the model on the held-out fold.
Accumulate test results and model-scores tables.
- Parameters:
target_name (
str) β The identifier for which target dataset to validate, referring to the corresponding DataFrame withintraining_sets.- Return type:
None
aiqclib.train.step2_validate_model.kfold_validation_suite moduleο
This module provides the KFoldValidationSuite class, an implementation of k-fold cross-validation tailored for validating multiple ML algorithms simultaneously via the ModelSuite class.
- class aiqclib.train.step2_validate_model.kfold_validation_suite.KFoldValidationSuite(config, training_sets=None)[source]ο
Bases:
ValidationBaseA subclass of
ValidationBasethat performs k-fold cross-validation on training sets across multiple machine learning methods provided by a model suite (e.g.,ModelSuite).This class iterates over the specified number of folds and across all methods defined in the base model. Results are accumulated with composite keys (method + target) to ensure outputs are saved uniquely per method.
- Parameters:
config (ConfigBase)
training_sets (Dict[str, DataFrame] | None)
- default_k_fold: intο
The default number of folds if none is specified in the config.
- expected_class_name: str = 'KFoldValidationSuite'ο
- get_k_fold()[source]ο
Retrieve the number of folds to use for cross-validation from the
validatesection of the YAML config, or fall back todefault_k_fold.- Returns:
The number of folds for k-fold cross-validation.
- Return type:
int
- validate(target_name)[source]ο
Conduct k-fold cross-validation for the given target name across all methods in the ModelSuite.
- For each method in
base_model.method_objs: Iterate over the defined number of folds.
Build the model using all training data except the current fold.
Test the model on the held-out fold.
Accumulate test results and model-scores tables under a composite key ({method_name}_{target_name}).
Update output_file_names to replace the {method} placeholder.
- Parameters:
target_name (
str) β The identifier for which target dataset to validate.- Return type:
None
- For each method in
aiqclib.train.step2_validate_model.validate_base moduleο
This module defines the ValidationBase abstract base class, providing
a foundational framework for validating trained machine learning models.
It integrates with the aiqclib libraryβs configuration and data handling
mechanisms, enabling robust and standardized validation routines across
different model types and datasets. Subclasses are expected to implement
the specific validation logic tailored to their model and data.
- class aiqclib.train.step2_validate_model.validate_base.ValidationBase(config, training_sets=None)[source]ο
Bases:
DataSetBaseAn abstract base class that provides a framework for validating trained model(s) using a specified training set. Inherits from
DataSetBaseto leverage YAML-based configuration checks and the step name"validate".Note
If this class is to be instantiated directly (rather than a subclass), you may need to define an
expected_class_nameattribute. Otherwise,DataSetBasemay raise aNotImplementedErrorif the YAMLβsbase_classdoes not match.- Parameters:
config (ConfigBase)
training_sets (Dict[str, DataFrame] | None)
- base_modelο
Base model class instantiated through the model loader.
- create_metric_plots()[source]ο
Generate and save ROC and Precision-Recall plots for each target.
This method iterates through the validation reports stored in
reports, and for each target, it generates and saves an SVG file containing the ROC curve and Precision-Recall curve using theaiqclib.common.utils.metric_plots.create_metric_plots()utility function. The output file path for each plot is determined byoutput_file_names.- Return type:
None
- default_file_names: Dict[str, str]ο
Default file naming pattern for validation reports and model-scores tables.
- load_base_model()[source]ο
Load the primary model class specified in the training configuration.
The loaded model class is stored in
base_modeland can be used or extended in the subclassβs validation routines.- Return type:
None
- model_scores: Dict[str, DataFrame]ο
A dictionary mapping each target name to a Polars DataFrame of model-scores tables (e.g., fold index, label, prediction score).
- models: Dict[str, List]ο
Subclasses or the validation routine can store specialized model instances here.
- output_file_names: Dict[str, Dict[str, str]]ο
A dictionary mapping βresultβ to a dictionary of target-specific file paths.
- process_targets()[source]ο
Iterate over the target names defined in
configand validate each usingvalidate().- Return type:
None
- reports: Dict[str, DataFrame]ο
A dictionary mapping each target name to a Polars DataFrame of validation reports (e.g., predictions, metrics).
- summarised_reports: Dict[str, DataFrame]ο
A dictionary for storing any summarised metrics derived from
reports.
- training_sets: Dict[str, DataFrame] | Noneο
Optional Polars DataFrame with training sets (or dictionary if the structure is aggregated).
- abstractmethod validate(target_name)[source]ο
An abstract method for validating one or more models on a specific target.
Subclasses must implement the logic to use
training_sets(and possiblybase_modelormodels) to evaluate performance, store metrics inreportsandmodel_scores, etc.- Parameters:
target_name (
str) β The key identifying which target to validate.- Return type:
None
- write_model_scores()[source]ο
Write the model-scores tables stored in
model_scoresto Parquet files.Each targetβs model_scores DataFrame is written to a file specified by
output_file_names. Directories are created if they do not exist.- Raises:
ValueError β If
model_scoresis empty.- Return type:
None
- write_reports()[source]ο
Write the validation results stored in
reportsto TSV files.Each targetβs report DataFrame is written to a file specified by
output_file_names. Directories are created if they do not exist.- Raises:
ValueError β If
reportsis empty, indicating no validation results are available to write.- Return type:
None