aiqclib.train.step1_read_input package

Submodules

aiqclib.train.step1_read_input.dataset_a module

This module provides a specialized input class, InputTrainingSetA, designed for reading training and test datasets specific to Copernicus CTD data. It extends aiqclib.train.step1_read_input.input_base.InputTrainingSetBase to handle particular data configuration and validation requirements.

class aiqclib.train.step1_read_input.dataset_a.InputTrainingSetA(config)[source]

Bases: InputTrainingSetBase

A specialized input class for reading training and test sets for Copernicus CTD data.

This class extends aiqclib.train.step1_read_input.input_base.InputTrainingSetBase and provides specific implementations or configurations for handling CTD datasets. It sets its expected_class_name to “InputTrainingSetA” so that configuration validation in the parent class can correctly match the base_class value specified in YAML.

Parameters:: config (ConfigBase)

expected_class_name: str = 'InputTrainingSetA'

aiqclib.train.step1_read_input.input_base module

This module defines the InputTrainingSetBase class, which serves as a base for importing pre-split training and test datasets. It leverages a training-specific configuration to identify and load Parquet files into Polars DataFrames, managing both training and test sets for multiple targets.

class aiqclib.train.step1_read_input.input_base.InputTrainingSetBase(config)[source]

Bases: DataSetBase

A base class for importing pre-split training and test data sets, leveraging the training-specific configuration (ConfigBase).

This class extends DataSetBase to ensure that the given YAML configuration is valid for the step named "input". It provides logic for iterating over targets, identifying the Parquet files for each, and reading them into memory.

Note

Since this class inherits from DataSetBase, a subclass or this class itself may need to define an expected_class_name that matches the YAML’s base_class if you plan to instantiate it directly. Otherwise, DataSetBase may raise a NotImplementedError.

Parameters:: config (ConfigBase)

default_file_names: Dict[str, str]: Default file naming patterns for train/test sets. The substring {target_name} will be replaced dynamically.

input_file_names: Dict[str, Dict[str, str]]

A mapping of “train” and “test” to dictionaries of target-specific file names.

Example format:

{
    "train": {"targetA": "path/to/targetA_train.parquet", ...},
    "test":  {"targetA": "path/to/targetA_test.parquet", ...}
}

process_targets()[source]

Iterate over all targets defined in the config and read both training and test sets from Parquet files.

Utilizes read_training_set() and read_test_sets() for each target name returned by get_target_names().

Return type:: None

read_test_sets(target_name)[source]

Read a single target-specific test set from a Parquet file into test_sets.

Parameters:: target_name (str) – The identifier of the target dataset to be loaded.
Raises:: FileNotFoundError – If the corresponding Parquet file does not exist.
Return type:: None

read_training_set(target_name)[source]

Read a single target-specific training set from a Parquet file into training_sets.

Parameters:: target_name (str) – The identifier of the target dataset to be loaded.
Raises:: FileNotFoundError – If the corresponding Parquet file does not exist.
Return type:: None

test_sets: Dict[str, DataFrame]: A dictionary mapping target names to Polars DataFrames containing their test set.

training_sets: Dict[str, DataFrame]: A dictionary mapping target names to Polars DataFrames containing their training set.