aiqclib.classify.step4_select_rows package

Submodules

aiqclib.classify.step4_select_rows.dataset_all module

Module for selecting all data rows from combined Copernicus CTD data.

This module provides the LocateDataSetAll class, which extends LocatePositionBase to identify and label data points for machine learning tasks based on Quality Control (QC) flags.

class aiqclib.classify.step4_select_rows.dataset_all.LocateDataSetAll(config, input_data=None, selected_profiles=None)[source]

Bases: LocatePositionBase

A subclass of LocatePositionBase that locates all rows from Copernicus CTD data for training or evaluation purposes.

This class assigns a default file naming scheme for target rows and uses configuration details (e.g., QC flags) to identify relevant data rows for each target.

Variables:

expected_class_name (str) – The expected name of the class for validation.

Parameters:
  • config (ConfigBase)

  • input_data (DataFrame | None)

  • selected_profiles (DataFrame | None)

default_file_name: str

Default file name template for writing target rows (one file per target).

expected_class_name: str = 'LocateDataSetAll'
locate_target_rows(target_name, target_value)[source]

Locate target rows for training or evaluation by calling select_all_rows().

This method acts as a wrapper, ensuring all rows are considered for the target based on the provided QC flag.

Parameters:
  • target_name (str) – Name of the target variable.

  • target_value (Dict) – A dictionary of target metadata, including the QC flag variable name used for labeling (e.g., {"flag": "TEMP_QC_FLAG"}).

Return type:

None

output_file_names: Dict[str, str]

Dictionary mapping each target name to the corresponding output Parquet file path.

select_all_rows(target_name, target_value)[source]

Collect all rows for a specified target by applying flag-based labeling to each record.

This method assumes that input_data has been set prior to its call.

Parameters:
  • target_name (str) – The name (key) of the target in the configuration’s target dictionary.

  • target_value (Dict) – A dictionary of target metadata, including the relevant QC flag variable name, positive flag values, and negative flag values.

Raises:
  • ValueError – If input_data is None when this method is called.

  • KeyError – If β€˜flag’ is not present in target_value.

Return type:

None