aiqclib.classify.step7_concat_datasets package
Submodules
aiqclib.classify.step7_concat_datasets.concat_base module
Module for concatenating input datasets with machine learning model predictions.
This module provides the ConcatDatasetsBase class, which handles the logic
of merging original input features with generated predictions from one or more
targets and saving the result to a Parquet file.
- class aiqclib.classify.step7_concat_datasets.concat_base.ConcatDatasetsBase(config, input_data=None, predictions=None)[source]
Bases:
DataSetBaseAbstract base class for concatenating predictions and the original dataset.
Inherits from
DataSetBaseto ensure configuration consistency. The concatenated dataset, once generated, can be written to Parquet files.- Parameters:
config (ConfigBase)
input_data (DataFrame | None)
predictions (Dict[str, DataFrame] | None)
- default_file_name: str
The default pattern to use when writing feature files for each target.
- merge_predictions()[source]
Merges the input data with the predictions for each target into a single Polars DataFrame.
The method concatenates individual prediction DataFrames (one per target) and then joins them with the original input data based on common identifier columns (‘platform_code’, ‘profile_no’, ‘observation_no’). The ‘label’, ‘predicted_label’, and ‘score’ columns from each target’s predictions are renamed to include the target key (e.g., ‘targetA_label’, ‘targetA_predicted’) to avoid name collisions.
The result is stored in the
merged_predictionsattribute.- Raises:
ValueError – If
predictionsorinput_datais None when this method is called.- Returns:
None
- Return type:
None
- output_file_name: str
Output file name to store the concatenated dataset
- predictions: Dict[str, DataFrame] | None
A dict of Polars DataFrames, one per target, containing classification results.
- write_merged_predictions()[source]
Writes the merged predictions DataFrame to a Parquet file.
The output directory is created if it does not exist. The file path is determined by
output_file_name.- Raises:
ValueError – If
merged_predictionsis None when this method is called.- Returns:
None
- Return type:
None
aiqclib.classify.step7_concat_datasets.dataset_all module
This module defines the ConcatDataSetAll class, which extends ConcatDatasetsBase to facilitate the concatenation of model predictions with the original input dataset. It is designed to integrate into a larger data quality control (DQC) workflow, specifically within the classification and merging steps.
- class aiqclib.classify.step7_concat_datasets.dataset_all.ConcatDataSetAll(config, input_data=None, predictions=None)[source]
Bases:
ConcatDatasetsBaseA subclass of
ConcatDatasetsBaseto concatenate predictions and the input dataset.This class sets its
expected_class_nameto"ConcatDataSetAll", ensuring it is recognized in the YAML configuration as a valid processing class. It inherits the concatenation pipeline fromConcatDatasetsBase.- Variables:
expected_class_name (str) – The identifier used for configuration mapping.
- Parameters:
config (ConfigBase)
input_data (DataFrame | None)
predictions (Dict[str, DataFrame] | None)
- expected_class_name: str = 'ConcatDataSetAll'
aiqclib.classify.step7_concat_datasets.dataset_suite module
This module provides the ConcatDataSetSuite class, which is responsible for merging multi-method model predictions into a wide-format dataset aligned with the original input data.
- class aiqclib.classify.step7_concat_datasets.dataset_suite.ConcatDataSetSuite(config, input_data=None, predictions=None)[source]
Bases:
ConcatDatasetsBaseA subclass of
ConcatDatasetsBaseto concatenate multi-method predictions and the input dataset.This class handles predictions containing a ‘method’ column, expanding them into a wide format where each method’s predictions and scores become separate columns formatted as
{method}_{target}_predictedand{method}_{target}_score.- Variables:
expected_class_name (str) – The name of the class used for validation or logging.
- Parameters:
config (ConfigBase)
input_data (DataFrame | None)
predictions (Dict[str, DataFrame] | None)
- expected_class_name: str = 'ConcatDataSetSuite'
- merge_predictions()[source]
Merges the input data with the multi-method predictions for each target into a single wide Polars DataFrame.
The method pivots the ‘method’ column into distinct prediction and score columns for each algorithm. It uses the following column naming convention:
{key}_label{method}_{key}_predicted{method}_{key}_score
The result is stored in the
merged_predictionsattribute.- Raises:
ValueError – If
predictionsorinput_datais None.- Returns:
None
- Return type:
None