aiqclib.train.step2_validate_model package

Submodules

aiqclib.train.step2_validate_model.kfold_validation module

This module provides the KFoldValidation class, an implementation of k-fold cross-validation for model training and evaluation. It extends ValidationBase to perform iterative model building and testing across defined data folds, accumulating performance reports.

class aiqclib.train.step2_validate_model.kfold_validation.KFoldValidation(config, training_sets=None)[source]

Bases: ValidationBase

A subclass of ValidationBase that performs k-fold cross-validation on training sets.

This class iterates over the specified number of folds, trains (builds) the model on all folds except one, then tests it on the held-out fold. Results are accumulated in reports.

Parameters:
  • config (ConfigBase)

  • training_sets (Dict[str, DataFrame] | None)

default_k_fold: int

The default number of folds if none is specified in the config.

expected_class_name: str = 'KFoldValidation'
get_k_fold()[source]

Retrieve the number of folds to use for cross-validation from the validate section of the YAML config, or fall back to default_k_fold.

Returns:

The number of folds for k-fold cross-validation.

Return type:

int

validate(target_name)[source]

Conduct k-fold cross-validation for the given target name, storing model objects, test results, and model-scores tables in models, reports, and model_scores.

For each fold out of get_k_fold():

  1. Reload or re-initialize the model using load_base_model().

  2. Set base_model.k to the fold index.

  3. Build the model using all training data except rows in the current fold.

  4. Test the model on the held-out fold.

  5. Accumulate test results and model-scores tables.

Parameters:

target_name (str) – The identifier for which target dataset to validate, referring to the corresponding DataFrame within training_sets.

Return type:

None

aiqclib.train.step2_validate_model.kfold_validation_suite module

This module provides the KFoldValidationSuite class, an implementation of k-fold cross-validation tailored for validating multiple ML algorithms simultaneously via the ModelSuite class.

class aiqclib.train.step2_validate_model.kfold_validation_suite.KFoldValidationSuite(config, training_sets=None)[source]

Bases: ValidationBase

A subclass of ValidationBase that performs k-fold cross-validation on training sets across multiple machine learning methods provided by a model suite (e.g., ModelSuite).

This class iterates over the specified number of folds and across all methods defined in the base model. Results are accumulated with composite keys (method + target) to ensure outputs are saved uniquely per method.

Parameters:
  • config (ConfigBase)

  • training_sets (Dict[str, DataFrame] | None)

default_k_fold: int

The default number of folds if none is specified in the config.

expected_class_name: str = 'KFoldValidationSuite'
get_k_fold()[source]

Retrieve the number of folds to use for cross-validation from the validate section of the YAML config, or fall back to default_k_fold.

Returns:

The number of folds for k-fold cross-validation.

Return type:

int

validate(target_name)[source]

Conduct k-fold cross-validation for the given target name across all methods in the ModelSuite.

For each method in base_model.method_objs:
  1. Iterate over the defined number of folds.

  2. Build the model using all training data except the current fold.

  3. Test the model on the held-out fold.

  4. Accumulate test results and model-scores tables under a composite key ({method_name}_{target_name}).

  5. Update output_file_names to replace the {method} placeholder.

Parameters:

target_name (str) – The identifier for which target dataset to validate.

Return type:

None

aiqclib.train.step2_validate_model.validate_base module

This module defines the ValidationBase abstract base class, providing a foundational framework for validating trained machine learning models.

It integrates with the aiqclib library’s configuration and data handling mechanisms, enabling robust and standardized validation routines across different model types and datasets. Subclasses are expected to implement the specific validation logic tailored to their model and data.

class aiqclib.train.step2_validate_model.validate_base.ValidationBase(config, training_sets=None)[source]

Bases: DataSetBase

An abstract base class that provides a framework for validating trained model(s) using a specified training set. Inherits from DataSetBase to leverage YAML-based configuration checks and the step name "validate".

Note

If this class is to be instantiated directly (rather than a subclass), you may need to define an expected_class_name attribute. Otherwise, DataSetBase may raise a NotImplementedError if the YAML’s base_class does not match.

Parameters:
  • config (ConfigBase)

  • training_sets (Dict[str, DataFrame] | None)

base_model

Base model class instantiated through the model loader.

create_metric_plots()[source]

Generate and save ROC and Precision-Recall plots for each target.

This method iterates through the validation reports stored in reports, and for each target, it generates and saves an SVG file containing the ROC curve and Precision-Recall curve using the aiqclib.common.utils.metric_plots.create_metric_plots() utility function. The output file path for each plot is determined by output_file_names.

Return type:

None

default_file_names: Dict[str, str]

Default file naming pattern for validation reports and model-scores tables.

load_base_model()[source]

Load the primary model class specified in the training configuration.

The loaded model class is stored in base_model and can be used or extended in the subclass’s validation routines.

Return type:

None

model_scores: Dict[str, DataFrame]

A dictionary mapping each target name to a Polars DataFrame of model-scores tables (e.g., fold index, label, prediction score).

models: Dict[str, List]

Subclasses or the validation routine can store specialized model instances here.

output_file_names: Dict[str, Dict[str, str]]

A dictionary mapping β€œresult” to a dictionary of target-specific file paths.

process_targets()[source]

Iterate over the target names defined in config and validate each using validate().

Return type:

None

reports: Dict[str, DataFrame]

A dictionary mapping each target name to a Polars DataFrame of validation reports (e.g., predictions, metrics).

summarised_reports: Dict[str, DataFrame]

A dictionary for storing any summarised metrics derived from reports.

training_sets: Dict[str, DataFrame] | None

Optional Polars DataFrame with training sets (or dictionary if the structure is aggregated).

abstractmethod validate(target_name)[source]

An abstract method for validating one or more models on a specific target.

Subclasses must implement the logic to use training_sets (and possibly base_model or models) to evaluate performance, store metrics in reports and model_scores, etc.

Parameters:

target_name (str) – The key identifying which target to validate.

Return type:

None

write_model_scores()[source]

Write the model-scores tables stored in model_scores to Parquet files.

Each target’s model_scores DataFrame is written to a file specified by output_file_names. Directories are created if they do not exist.

Raises:

ValueError – If model_scores is empty.

Return type:

None

write_reports()[source]

Write the validation results stored in reports to TSV files.

Each target’s report DataFrame is written to a file specified by output_file_names. Directories are created if they do not exist.

Raises:

ValueError – If reports is empty, indicating no validation results are available to write.

Return type:

None