aiqclib.common.config package

Submodules

aiqclib.common.config.classify_config module

This module defines the ClassificationConfig class, a specialized configuration handler for managing dataset-related settings pertinent to machine learning classification tasks. It extends ConfigBase to provide structured access and resolution of various sub-configurations (e.g., target sets, feature sets, step class definitions) from YAML-based configuration files, simplifying the management of complex ML pipeline configurations.

class aiqclib.common.config.classify_config.ClassificationConfig(config_file, auto_select=False)[source]

Bases: ConfigBase

A configuration class for retrieving and organizing dataset-related configurations specific to classification tasks.

Extends aiqclib.common.base.config_base.ConfigBase by adding logic to select datasets from YAML-based configuration files. The selected dataset references various sub-configurations (e.g., target sets, feature sets, and step class definitions). These references are resolved and stored within data.

Parameters:
  • config_file (str)

  • auto_select (bool)

expected_class_name: str = 'ClassificationConfig'

The class name expected by this configuration to validate it aligns with the YAML definition. Used by aiqclib.common.base.config_base.ConfigBase.

select(dataset_name)[source]

Choose a dataset by name and load its sub-configuration items (e.g., target sets, feature sets) into data.

This method retrieves multiple related configurations by calling aiqclib.common.utils.config.get_config_item() on relevant sections of the YAML file. It expects that the initial self.data population from super().select contains references to these sub-configurations, which are then resolved.

Parameters:

dataset_name (str) – The name (key) of the desired dataset in the YAML’s “classification_sets” dictionary.

Raises:

KeyError – If dataset_name is not present in the “classification_sets” section of the YAML, or if a referenced sub-configuration name (e.g., “target_set” within the selected dataset) is not found in its corresponding top-level section (e.g., “target_sets”), or if any of the required sub-configuration keys (e.g., “target_set”, “feature_set”) are missing from the selected dataset configuration itself.

Returns:

None

Return type:

None

aiqclib.common.config.dataset_config module

This module defines the DataSetConfig class, a specialized configuration handler for managing dataset-specific settings within a larger YAML configuration structure.

It extends aiqclib.common.base.config_base.ConfigBase to provide interfaces for selecting and resolving dataset-related configurations such as target sets, feature sets, and step class definitions from a hierarchical configuration file.

class aiqclib.common.config.dataset_config.DataSetConfig(config_file, auto_select=False)[source]

Bases: ConfigBase

A configuration class that provides dataset-related configuration interfaces.

This class extends ConfigBase with handling for one or more dataset-specific YAML sections, mapping them to container dictionaries within data. The selected dataset name is used to look up configurations for target sets, feature sets, step classes, etc.

Note

expected_class_name must match the YAML’s base_class if instantiated directly.

Parameters:
  • config_file (str)

  • auto_select (bool)

expected_class_name: str = 'DataSetConfig'

The class name expected by the configuration. Used by ConfigBase to validate consistency with the YAML data.

select(dataset_name)[source]

Select a dataset entry by name from data_sets in the YAML config, then retrieve related configuration items (e.g., target_set, feature_set, etc.).

This method populates data with relevant sub-configurations by calling aiqclib.common.utils.config.get_config_item() on specified fields.

Parameters:

dataset_name (str) – The key name of the dataset to select from the YAML.

Raises:

KeyError – If the dataset name does not exist in the YAML’s data_sets dictionary.

Return type:

None

aiqclib.common.config.training_config module

This module defines the TrainingConfig class, which is responsible for managing and accessing training-related configurations from a YAML file.

It extends aiqclib.common.base.config_base.ConfigBase to provide structured access to dataset settings, including targets, step classes, and step parameters, by resolving references within the configuration.

class aiqclib.common.config.training_config.TrainingConfig(config_file, auto_select=False)[source]

Bases: ConfigBase

A configuration class providing interfaces for training dataset settings.

Inherits from ConfigBase with an expectation of working under the “training_sets” section in the YAML configuration. Leverages methods like select() to initialize and fetch subset configurations (e.g., target sets, step parameters).

Note

expected_class_name must match the YAML’s base_class property if you intend to instantiate this class directly from config.

Parameters:
  • config_file (str)

  • auto_select (bool)

expected_class_name: str = 'TrainingConfig'

The class name expected by ConfigBase for consistency checks when instantiating TrainingConfig from YAML.

select(dataset_name)[source]

Select a named dataset from the training_sets configuration, retrieving nested configurations for targets, step classes, and step parameters.

After calling select(), sub-keys (target_set, step_class_set, etc.) are populated from their respective config dictionaries by resolving their references within the full configuration.

Parameters:

dataset_name (str) – The key name of the dataset to select within data (which references the training_sets section).

Raises:

KeyError – If dataset_name is not found within the training_sets dictionary.

Return type:

None

aiqclib.common.config.yaml_schema module

Module providing YAML-based JSON schemas used to validate dataset, training, and classification configuration files. Each function returns a YAML string describing the structure and constraints for a specific configuration schema.

aiqclib.common.config.yaml_schema.get_classification_config_schema()[source]

Retrieve the YAML-based JSON schema for classification configurations.

The returned schema requires certain objects and properties (e.g., path_info_sets, target_sets, feature_sets, etc.), each with nested type constraints and additional properties set to false when appropriate.

Returns:

A YAML string representing the JSON schema for classification configurations.

Return type:

str

aiqclib.common.config.yaml_schema.get_data_set_config_schema()[source]

Retrieve the YAML-based JSON schema for dataset configurations.

The returned schema requires certain objects and properties (e.g., path_info_sets, target_sets, feature_sets, etc.), each with nested type constraints and additional properties set to false when appropriate.

Returns:

A YAML string representing the JSON schema for dataset configurations.

Return type:

str

aiqclib.common.config.yaml_schema.get_training_config_schema()[source]

Retrieve the YAML-based JSON schema for training configurations.

The returned schema specifies required objects and properties under categories such as path_info_sets, target_sets, step_class_sets, step_param_sets, and training_sets. Additional properties are disallowed to ensure constraints remain strict.

Returns:

A YAML string representing the JSON schema for training configurations.

Return type:

str

aiqclib.common.config.yaml_templates module

Module providing YAML templates for both dataset preparation and training configurations. These templates can be customized to fit various data pipeline requirements.

aiqclib.common.config.yaml_templates.get_config_classify_set_full_template()[source]

Retrieve a YAML template string for classification configurations with normalization.

This template includes:

  • path_info_sets: specifying common, input, model, and concatenation paths.

  • target_sets: defining which variables to process and their flags.

  • summary_stats_sets: defining summary statistics.

  • feature_sets: listing named sets of feature extraction modules.

  • feature_param_sets: detailing parameters for each feature.

  • feature_stats_sets: detailing methods and stats for normalization.

  • step_class_sets: referencing classes for each classification step (e.g., input, summary, select, locate, extract, model, classify, concat).

  • step_param_sets: referencing parameters for the classification steps.

  • classification_sets: referencing specific dataset folders, files, and associated configuration sets (e.g., step_class_set, step_param_set).

Returns:

A string containing the YAML template.

Return type:

str

aiqclib.common.config.yaml_templates.get_config_classify_set_template()[source]

Retrieve a YAML template string for classification configurations.

This template includes:

  • path_info_sets: specifying common, input, model, and concatenation paths.

  • target_sets: defining which variables to process and their flags.

  • summary_stats_sets: defining summary statistics.

  • feature_sets: listing named sets of feature extraction modules.

  • feature_param_sets: detailing parameters for each feature.

  • feature_stats_sets: detailing methods and stats for normalization.

  • step_class_sets: referencing classes for each classification step (e.g., input, summary, select, locate, extract, model, classify, concat).

  • step_param_sets: referencing parameters for the classification steps.

  • classification_sets: referencing specific dataset folders, files, and associated configuration sets (e.g., step_class_set, step_param_set).

Returns:

A string containing the YAML template.

Return type:

str

aiqclib.common.config.yaml_templates.get_config_data_set_all_template()[source]

Retrieve a YAML template string for dataset preparation configurations with ‘All’ step variants.

This template includes:

  • path_info_sets: specifying common, input, and split paths.

  • target_sets: defining which variables to process and their flags.

  • summary_stats_sets: defining summary statistics.

  • feature_sets: listing named sets of feature extraction modules.

  • feature_param_sets: detailing parameters for each feature.

  • feature_stats_sets: detailing methods and stats for normalization.

  • step_class_sets: referencing classes for each preparation step (e.g., input, summary, select, locate, extract, split) with ‘All’ variants.

  • step_param_sets: referencing parameters for the preparation steps with ‘All’ variants.

  • data_sets: referencing specific dataset folders, files, and associated configuration sets (e.g., step_class_set, step_param_set).

Returns:

A string containing the YAML template.

Return type:

str

aiqclib.common.config.yaml_templates.get_config_data_set_full_template()[source]

Retrieve a YAML template string for dataset preparation configurations with normalization.

This template includes:

  • path_info_sets: specifying common, input, and split paths.

  • target_sets: defining which variables to process and their flags.

  • summary_stats_sets: defining summary statistics.

  • feature_sets: listing named sets of feature extraction modules.

  • feature_param_sets: detailing parameters for each feature.

  • feature_stats_sets: detailing methods and stats for normalization.

  • step_class_sets: referencing classes for each preparation step (e.g., input, summary, select, locate, extract, split).

  • step_param_sets: referencing parameters for the preparation steps.

  • data_sets: referencing specific dataset folders, files, and associated configuration sets (e.g., step_class_set, step_param_set).

Returns:

A string containing the YAML template.

Return type:

str

aiqclib.common.config.yaml_templates.get_config_data_set_template()[source]

Retrieve a YAML template string for dataset preparation configurations.

This template includes:

  • path_info_sets: specifying common, input, and split paths.

  • target_sets: defining which variables to process and their flags.

  • summary_stats_sets: defining summary statistics.

  • feature_sets: listing named sets of feature extraction modules.

  • feature_param_sets: detailing parameters for each feature.

  • feature_stats_sets: detailing methods and stats for normalization.

  • step_class_sets: referencing classes for each preparation step (e.g., input, summary, select, locate, extract, split).

  • step_param_sets: referencing parameters for the preparation steps.

  • data_sets: referencing specific dataset folders, files, and associated configuration sets (e.g., step_class_set, step_param_set).

Returns:

A string containing the YAML template.

Return type:

str

aiqclib.common.config.yaml_templates.get_config_train_set_template()[source]

Retrieve a YAML template string for training configurations.

This template includes:

  • path_info_sets: specifying common paths and subfolders for input, validate, and build.

  • target_sets: defining variables and associated flags for training.

  • step_class_sets: mapping each step (input, validate, model, build) to corresponding Python class names.

  • step_param_sets: detailing optional parameters for each training step.

  • training_sets: referencing specific dataset folders, the path_info used, the target set, and which step_class_set and step_param_set apply.

Returns:

A string containing the YAML template.

Return type:

str