aiqclib.common.base packageο
Submodulesο
aiqclib.common.base.config_base moduleο
Module for handling YAML-based configuration management.
This module provides the ConfigBase abstract base class, which facilitates loading, validating, and retrieving structured data from YAML configuration files. It uses JSON schemas for validation and supports template-based configuration loading.
- class aiqclib.common.base.config_base.ConfigBase(section_name, config_file, auto_select=False)[source]ο
Bases:
ABCAbstract base class for loading and accessing YAML configurations.
This class provides a common interface for handling configuration files. It supports loading from a file path or from a built-in template, validating the configuration against a predefined JSON schema, and providing convenient methods to access specific parts of the config.
Subclasses must override the
expected_class_nameattribute to match thebase_classvalue specified in the YAML configuration.Note
This is an abstract base class and should not be instantiated directly.
- Variables:
expected_class_name (str, optional) β Must be overridden by subclasses to match the YAMLβs
base_classentry.section_name (str) β The top-level section of the config this instance manages.
yaml_schema (dict) β The JSON schema used for validating the configuration.
full_config (dict) β The entire configuration loaded from the YAML file.
valid_yaml (bool) β flag indicating if the loaded configuration is valid.
data (dict, optional) β The specific configuration dictionary for the selected entry.
dataset_name (str, optional) β The name of the selected dataset or task.
- Parameters:
section_name (str)
config_file (str)
auto_select (bool)
- auto_select()[source]ο
Automatically validate and select a single configuration entry.
- Raises:
ValueError β If the YAML is invalid or multiple entries exist.
- Returns:
None
- Return type:
None
- expected_class_name = Noneο
- get_base_class(step_name)[source]ο
Retrieve the associated class name for a specified step.
- Parameters:
step_name (
str) β The name of the step.- Returns:
The class name defined for the step.
- Return type:
str
- get_base_path(step_name)[source]ο
Retrieve the base path for a given processing step.
- Parameters:
step_name (
str) β The name of the step (e.g., βpreprocessβ).- Returns:
The configured base path.
- Return type:
str- Raises:
ValueError β If no base path is found.
- get_dataset_folder_name(step_name)[source]ο
Get the dataset-specific folder name for a given step.
- Parameters:
step_name (
str) β The name of the step.- Returns:
The folder name for the dataset, or an empty string.
- Return type:
str
- get_file_name(step_name, default_name=None)[source]ο
Retrieve the file name for a given step.
- Parameters:
step_name (
str) β The name of the step.default_name (
Optional[str]) β Fallback file name if not defined in config.
- Returns:
The file name for the step.
- Return type:
str- Raises:
ValueError β If no file name is found and no default is provided.
- get_full_file_name(step_name, default_file_name=None, use_dataset_folder=True, folder_name_auto=True)[source]ο
Construct a full, normalized file path for a step.
- Parameters:
step_name (
str) β The name of the step.default_file_name (
Optional[str]) β Default file name if not in config.use_dataset_folder (
bool) β If True, include dataset folder. Defaults to True.folder_name_auto (
bool) β If True, auto-generate step folder name. Defaults to True.
- Returns:
The complete, normalized file path.
- Return type:
str
- get_model_params(model_long_name, model_short_name)[source]ο
Retrieve the parameters dictionary for a model.
- Parameters:
model_long_name (
str) β The long-form name of the model.model_short_name (
str) β The short-form name of the model.
- Returns:
Parameters for the specified model or the whole model param dict.
- Return type:
Dict
- get_normalization_file_name(default_file_name='normalization_stats.yaml')[source]ο
Resolve the full path of the normalization statistics file.
This file holds the data-derived normalization values (for
auto_min_maxandstandardfeatures). It is written during dataset preparation and read back during classification so that the identical fitted normalization is applied without re-entering values.The path is resolved through the standard step-path machinery using the logical step name
"normalize". The folder defaults tonormalizeand the file name can be overridden viastep_param_sets.steps.normalize.file_namein the configuration.- Parameters:
default_file_name (
str) β File name used when none is set in the config.- Returns:
The complete, normalized path to the normalization file.
- Return type:
str
- get_step_folder_name(step_name, folder_name_auto=True)[source]ο
Get the folder name for a specific processing step.
- Parameters:
step_name (
str) β The name of the step.folder_name_auto (
bool) β If True, uses step_name as fallback. Defaults to True.
- Returns:
The folder name for the step.
- Return type:
str
- get_step_params(step_name)[source]ο
Retrieve the parameters dictionary for a specific step.
- Parameters:
step_name (
str) β The name of the step.- Returns:
Parameters for the specified step.
- Return type:
Dict- Raises:
KeyError β If the step or param set is missing.
- get_summary_stats(stats_name, stats_type='min_max')[source]ο
Retrieve specific summary statistics parameters from the configuration.
- Parameters:
stats_name (
str) β Name of the summary statistics set to retrieve.stats_type (
str) β Type of statistics (e.g., βmin_maxβ). Defaults to βmin_maxβ.
- Raises:
ValueError β If the specified stats name is not found.
- Returns:
A dictionary containing the requested statistics.
- Return type:
Dict
- get_target_dict()[source]ο
Get target variable definitions as a name-keyed dictionary.
- Returns:
Mapping of target names to their definitions.
- Return type:
Dict[str,Dict]
- get_target_file_names(step_name, default_file_name=None, use_dataset_folder=True, folder_name_auto=True)[source]ο
Construct a dictionary of full file paths for each target variable.
- Parameters:
step_name (
str) β The name of the step.default_file_name (
Optional[str]) β Default file name template.use_dataset_folder (
bool) β If True, include dataset folder. Defaults to True.folder_name_auto (
bool) β If True, auto-generate step folder name. Defaults to True.
- Returns:
Dictionary mapping target names to formatted file paths.
- Return type:
Dict[str,str]
- get_target_names()[source]ο
Get the names of all target variables.
- Returns:
List of target variable names.
- Return type:
List[str]
- get_target_variables()[source]ο
Get the list of target variable definitions from the configuration.
- Returns:
List of target variable definition dictionaries.
- Return type:
List[Dict]
- select(dataset_name)[source]ο
Select and load a specific configuration entry from the YAML.
- Parameters:
dataset_name (
str) β The name of the configuration to select.- Raises:
ValueError β If validation fails or the dataset name is not found.
- Returns:
None
- Return type:
None
- set_base_class(step_name, value)[source]ο
Set the associated class name for a specified step.
- Parameters:
step_name (
str) β The name of the step.value (
str) β The class name value to set.
- Returns:
None
- Return type:
None
- update_feature_param_with_stats(types=None)[source]ο
Update feature parameters with corresponding summary statistics in-place.
For each feature whose
stats_set.typeis a scaling type (i.e. notraw), the resolved statistics are looked up indataβsfeature_stats_set(by name and type) and stored under the featureβsstatskey, ready for use by the feature classes.- Parameters:
types (
Optional[List[str]]) β If provided, only resolve features whosestats_set.typeis in this list. This allows the manually-suppliedmin_maxstatistics to be resolved at configuration-load time while deferring the data-derivedauto_min_maxandstandardstatistics until after the summary statistics have been computed. IfNone, every non-rawfeature is resolved (the historical behaviour).- Returns:
None
- Return type:
None
aiqclib.common.base.dataset_base moduleο
This module defines the abstract base class DataSetBase, which serves as a foundation for implementing various dataset classes.
It provides a common structure for dataset initialization, including validation of the expected_class_name attribute against the provided configuration. Subclasses are expected to override the expected_class_name attribute to match their specific class identifier in the systemβs configuration.
- class aiqclib.common.base.dataset_base.DataSetBase(step_name, config)[source]ο
Bases:
ABCBase class for dataset classes.
Subclasses must define an
expected_class_nameattribute, which is used to validate the YAML entryβsstep_class_sets.- Variables:
expected_class_name (str or None) β The expected class name for validation against configuration. This must be overridden by child classes.
step_name (str) β The name of the step identified in the configuration.
config (ConfigBase) β A configuration object that provides the necessary information.
- Parameters:
step_name (str)
config (ConfigBase)
Note
This class extends the
abc.ABCin order to indicate that it is an abstract base class.- config: ConfigBaseο
- expected_class_name: str | None = Noneο
- step_name: strο
aiqclib.common.base.feature_base moduleο
Standardized Feature Extraction and Scaling Module.
This module defines the FeatureBase abstract base class (ABC), which provides a standardized framework for feature engineering tasks using the Polars library. It ensures that subclasses implement a consistent pipeline for feature extraction and multi-stage scaling.
- class aiqclib.common.base.feature_base.FeatureBase(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]ο
Bases:
ABCAbstract base class for extracting and scaling features.
Child classes must implement all abstract methods to define specific logic for feature generation and normalization. This class serves as a container for the data and metadata required during the transformation lifecycle.
- Variables:
target_name (
Optional[str]) β Name of the target variable.feature_info (
Optional[Dict]) β Metadata or configuration for features.selected_profiles (
Optional[DataFrame]) β Polars DataFrame of pre-selected profiles.filtered_input (
Optional[DataFrame]) β Polars DataFrame of pre-filtered input data.selected_rows (
Optional[Dict[str,DataFrame]]) β Mapping of identifiers to specific Polars DataFrames.summary_stats (
Optional[DataFrame]) β Polars DataFrame containing summary statistics.features (
Optional[DataFrame]) β Polars DataFrame containing the processed features.
- Parameters:
target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)
- abstractmethod extract_features()[source]ο
Extract features from the provided data sources.
This method must be implemented by subclasses to generate raw features from inputs like filtered_input or selected_rows. The resulting DataFrame should be assigned to self.features.
- Returns:
None
- Return type:
None
aiqclib.common.base.model_base moduleο
This module provides the ModelBase abstract base class, which serves as the foundational interface for all machine learning model implementations within the library. It enforces a consistent structure for building, testing, and persisting models while managing configuration and result storage.
- class aiqclib.common.base.model_base.ModelBase(config)[source]ο
Bases:
ABCAbstract base class for modeling tasks.
Subclasses must define:
expected_class_nameto match the configuration.The
build()method for model building.The
test()method for model testing.
Note
Since this class inherits from
abc.ABC, it cannot be directly instantiated and must be subclassed.- Parameters:
config (ConfigBase)
- abstractmethod build()[source]ο
Build the model architecture or pipeline.
Subclasses must implement logic to create, configure, and compile the model.
- Return type:
None
- expected_class_name: str | None = Noneο
- load_model(file_name)[source]ο
Load or deserialize a model from the given file path.
- Parameters:
file_name (
str) β The path to the file from which the model will be loaded.- Raises:
FileNotFoundError β If the specified file does not exist.
ValueError β If the loaded model type does not match the expected class defined by the configuration.
- Return type:
None
- multi = Falseο
- save_model(file_name)[source]ο
Save or serialize the current model to the provided file path.
- Parameters:
file_name (
str) β The path indicating where the model will be saved.- Return type:
None
- short_name: str | None = Noneο
- abstractmethod test()[source]ο
Evaluate the model performance on a provided test set or validation data.
Subclasses must implement how the model is used to make predictions and how accuracy or performance measures are computed.
- Return type:
None
- update_model_score()[source]ο
Updates the internal model-scores table with the current test set predictions.
Each row records the model that produced the prediction (method), the fold index (k), the ground truth (label), and the predicted probability (score). The data is stored in the
model_scoreattribute as a Polars DataFrame.The
methodcolumn is the lowercasedshort_nameof the model (e.g."xgb","dt") and is always present, for both single-model and suite pipelines. This makes the model-scores file self-describing about which model produced each row.Note that
predicted_labelis intentionally NOT stored: it is derivable fromscoreand a threshold (score >= threshold), so keeping it would bake in a single threshold and make the file less useful for external threshold-sweeping (ROC/PR analysis). Consumers apply their own threshold toscoreas needed.If
model_scoreis already populated (e.g., during cross-validation), the new results are appended (vstacked) to the existing DataFrame.- Raises:
ValueError β If
test_setorpredictionsareNone.- Return type:
None
- abstractmethod update_nthreads(model)[source]ο
Update the number of threads set in the model.
Subclasses must implement logic to update the number of threads.
- Parameters:
model (
Self) β The model instance that needs to be updated.- Returns:
The model instance with updated thread settings.
- Return type:
Self
aiqclib.common.base.scikit_learn_model_base moduleο
This module defines SklearnModelBase, an abstract base class for models that adhere to the Scikit-Learn API (including XGBoost and native sklearn models).
It implements common workflows for data conversion, model building, prediction, reporting, and SHAP value calculation for Explainable AI (XAI).
- class aiqclib.common.base.scikit_learn_model_base.SklearnModelBase(config)[source]ο
Bases:
ModelBaseAbstract base class for Scikit-Learn compatible models.
This class implements the standard lifecycle methods (
build(),test(),predict(),create_report()) assuming the underlying model object supports the standardfit,predict, andpredict_probamethods.It also integrates SHAP (SHapley Additive exPlanations) to provide feature importance values. SHAP calculation is controlled by the calculate_shap configuration flag, and can be overridden via self.enable_shap to disable it during computationally heavy steps like k-fold validation.
- Subclasses must implement:
_get_model_class(): To return the specific class type.
- Parameters:
config (ConfigBase)
- build()[source]ο
Train the classifier using the assigned training set.
- Return type:
None
- Steps:
Convert the Polars DataFrame (
training_set) to Pandas.Separate features (X) and labels (y).
Initialize the model class provided by
_get_model_class()withmodel_params.Fit the model.
- Raises:
ValueError β If
training_setisNoneor empty.- Return type:
None
- calculate_shap()[source]ο
Calculates SHAP values for the test set based on the specific model type.
It automatically selects the optimal Explainer (TreeExplainer, LinearExplainer, or KernelExplainer). SHAP results are formatted into a Polars DataFrame and stored in
shap_values.- Raises:
ValueError β If
test_setorpredictionsareNone.- Return type:
None
- create_report()[source]ο
Computes and compiles a comprehensive classification report based on test results.
Calculates precision, recall, f1-score, and support using
sklearn.metrics.classification_report(). Stores the result inreport.- Raises:
ValueError β If
test_setorpredictionsareNone.- Return type:
None
- predict()[source]ο
Generates predictions for the test set using the trained model.
Converts the Polars test set to a Pandas DataFrame, makes predictions, and stores the results in
predictions.- Raises:
ValueError β If
test_setisNone.- Return type:
None
- test()[source]ο
Evaluate the trained classifier on the assigned test set.
- Return type:
None
- Steps:
Call
predict()to generate predictions on the test set.Call
create_report()to compute metrics.Call
update_model_score()to store scores.Call
calculate_shap()to compute feature importances (if enabled).
- Raises:
ValueError β If
test_setisNone.- Return type:
None