aiqclib.classify.step7_concat_datasets package

Submodules

aiqclib.classify.step7_concat_datasets.concat_base module

Module for concatenating input datasets with machine learning model predictions.

This module provides the ConcatDatasetsBase class, which handles the logic of merging original input features with generated predictions from one or more targets and saving the result to a Parquet file.

class aiqclib.classify.step7_concat_datasets.concat_base.ConcatDatasetsBase(config, input_data=None, predictions=None)[source]

Bases: DataSetBase

Abstract base class for concatenating predictions and the original dataset.

Inherits from DataSetBase to ensure configuration consistency. The concatenated dataset, once generated, can be written to Parquet files.

Parameters:

config (ConfigBase)
input_data (DataFrame | None)
predictions (Dict[str, DataFrame] | None)

default_file_name: str: The default pattern to use when writing feature files for each target.

merge_predictions()[source]

Merges the input data with the predictions for each target into a single Polars DataFrame.

The method concatenates individual prediction DataFrames (one per target) and then joins them with the original input data based on common identifier columns (‘platform_code’, ‘profile_no’, ‘observation_no’). The ‘label’, ‘predicted_label’, and ‘score’ columns from each target’s predictions are renamed to include the target key (e.g., ‘targetA_label’, ‘targetA_predicted’) to avoid name collisions.

The result is stored in the merged_predictions attribute.

Raises:: ValueError – If predictions or input_data is None when this method is called.
Returns:: None
Return type:: None

output_file_name: str: Output file name to store the concatenated dataset

predictions: Dict[str, DataFrame] | None: A dict of Polars DataFrames, one per target, containing classification results.

write_merged_predictions()[source]

Writes the merged predictions DataFrame to a Parquet file.

The output directory is created if it does not exist. The file path is determined by output_file_name.

Raises:: ValueError – If merged_predictions is None when this method is called.
Returns:: None
Return type:: None

aiqclib.classify.step7_concat_datasets.dataset_all module

This module defines the ConcatDataSetAll class, which extends ConcatDatasetsBase to facilitate the concatenation of model predictions with the original input dataset. It is designed to integrate into a larger data quality control (DQC) workflow, specifically within the classification and merging steps.

class aiqclib.classify.step7_concat_datasets.dataset_all.ConcatDataSetAll(config, input_data=None, predictions=None)[source]

Bases: ConcatDatasetsBase

A subclass of ConcatDatasetsBase to concatenate predictions and the input dataset.

This class sets its expected_class_name to "ConcatDataSetAll", ensuring it is recognized in the YAML configuration as a valid processing class. It inherits the concatenation pipeline from ConcatDatasetsBase.

Variables:

expected_class_name (str) – The identifier used for configuration mapping.

Parameters:

config (ConfigBase)
input_data (DataFrame | None)
predictions (Dict[str, DataFrame] | None)

expected_class_name: str = 'ConcatDataSetAll'

aiqclib.classify.step7_concat_datasets.dataset_suite module

This module provides the ConcatDataSetSuite class, which is responsible for merging multi-method model predictions into a wide-format dataset aligned with the original input data.

class aiqclib.classify.step7_concat_datasets.dataset_suite.ConcatDataSetSuite(config, input_data=None, predictions=None)[source]

Bases: ConcatDatasetsBase

A subclass of ConcatDatasetsBase to concatenate multi-method predictions and the input dataset.

This class handles predictions containing a ‘method’ column, expanding them into a wide format where each method’s predictions and scores become separate columns formatted as {method}_{target}_predicted and {method}_{target}_score.

Variables:

expected_class_name (str) – The name of the class used for validation or logging.

Parameters:

config (ConfigBase)
input_data (DataFrame | None)
predictions (Dict[str, DataFrame] | None)

expected_class_name: str = 'ConcatDataSetSuite'

merge_predictions()[source]

Merges the input data with the multi-method predictions for each target into a single wide Polars DataFrame.

The method pivots the ‘method’ column into distinct prediction and score columns for each algorithm. It uses the following column naming convention:

{key}_label

{method}_{key}_predicted

{method}_{key}_score

The result is stored in the merged_predictions attribute.

Raises:: ValueError – If predictions or input_data is None.
Returns:: None
Return type:: None