aiqclib.train.step1_read_input package
Submodules
aiqclib.train.step1_read_input.dataset_a module
This module provides a specialized input class, InputTrainingSetA,
designed for reading training and test datasets specific to Copernicus CTD data.
It extends aiqclib.train.step1_read_input.input_base.InputTrainingSetBase
to handle particular data configuration and validation requirements.
- class aiqclib.train.step1_read_input.dataset_a.InputTrainingSetA(config)[source]
Bases:
InputTrainingSetBaseA specialized input class for reading training and test sets for Copernicus CTD data.
This class extends
aiqclib.train.step1_read_input.input_base.InputTrainingSetBaseand provides specific implementations or configurations for handling CTD datasets. It sets itsexpected_class_nameto “InputTrainingSetA” so that configuration validation in the parent class can correctly match thebase_classvalue specified in YAML.- Parameters:
config (ConfigBase)
- expected_class_name: str = 'InputTrainingSetA'
aiqclib.train.step1_read_input.input_base module
This module defines the InputTrainingSetBase class, which serves as a base for importing pre-split training and test datasets. It leverages a training-specific configuration to identify and load Parquet files into Polars DataFrames, managing both training and test sets for multiple targets.
- class aiqclib.train.step1_read_input.input_base.InputTrainingSetBase(config)[source]
Bases:
DataSetBaseA base class for importing pre-split training and test data sets, leveraging the training-specific configuration (
ConfigBase).This class extends
DataSetBaseto ensure that the given YAML configuration is valid for the step named"input". It provides logic for iterating over targets, identifying the Parquet files for each, and reading them into memory.Note
Since this class inherits from
DataSetBase, a subclass or this class itself may need to define anexpected_class_namethat matches the YAML’sbase_classif you plan to instantiate it directly. Otherwise,DataSetBasemay raise aNotImplementedError.- Parameters:
config (ConfigBase)
- default_file_names: Dict[str, str]
Default file naming patterns for train/test sets. The substring
{target_name}will be replaced dynamically.
- input_file_names: Dict[str, Dict[str, str]]
A mapping of “train” and “test” to dictionaries of target-specific file names.
Example format:
{ "train": {"targetA": "path/to/targetA_train.parquet", ...}, "test": {"targetA": "path/to/targetA_test.parquet", ...} }
- process_targets()[source]
Iterate over all targets defined in the config and read both training and test sets from Parquet files.
Utilizes
read_training_set()andread_test_sets()for each target name returned byget_target_names().- Return type:
None
- read_test_sets(target_name)[source]
Read a single target-specific test set from a Parquet file into
test_sets.- Parameters:
target_name (
str) – The identifier of the target dataset to be loaded.- Raises:
FileNotFoundError – If the corresponding Parquet file does not exist.
- Return type:
None
- read_training_set(target_name)[source]
Read a single target-specific training set from a Parquet file into
training_sets.- Parameters:
target_name (
str) – The identifier of the target dataset to be loaded.- Raises:
FileNotFoundError – If the corresponding Parquet file does not exist.
- Return type:
None
- test_sets: Dict[str, DataFrame]
A dictionary mapping target names to Polars DataFrames containing their test set.
- training_sets: Dict[str, DataFrame]
A dictionary mapping target names to Polars DataFrames containing their training set.