aiqclib.classify.step3_select_profiles package

Submodules

aiqclib.classify.step3_select_profiles.dataset_all module

This module defines the SelectDataSetAll class, a specialized profile selection mechanism within the aiqclib library. It is designed to select all available profiles from a given input dataset (typically Copernicus CTD data) and assign initial labels and identifiers for subsequent classification tasks.

class aiqclib.classify.step3_select_profiles.dataset_all.SelectDataSetAll(config, input_data=None)[source]

Bases: ProfileSelectionBase

A subclass of ProfileSelectionBase that selects all profiles from Copernicus CTD data.

This class initializes a selection process where all input profiles are considered, and initial labels (e.g., ‘negative’) and unique identifiers are assigned before further processing or classification.

Parameters:
  • config (ConfigBase)

  • input_data (DataFrame | None)

default_file_name: str

Default file name to which selected profiles are written.

expected_class_name: str = 'SelectDataSetAll'
key_col_names: List[str]

Columns used as unique identifiers for grouping/merging (e.g., by platform or profile).

label_profiles()[source]

Select and label positive and negative datasets before combining them into a single DataFrame in selected_profiles.

In this specific implementation, all profiles are initially selected and labeled as ‘negative’ (label 0) by calling select_all_profiles(). This method effectively serves as the entry point for the profile selection and initial labeling process.

Return type:

None

output_file_name: str

Full path for the output file, resolved via the config.

select_all_profiles()[source]

Select all profiles from the input data and prepare them with initial labeling and unique identifiers.

This method processes the input_data to create a DataFrame of unique profiles. It adds the following columns:

  • neg_profile_id (uint32): Initialized to 0. This column can serve as a placeholder for later assignment of specific negative profile identifiers, though it is not a unique ID in this step.

  • label (uint32): Initialized to 0, indicating an unclassified or ‘negative’ profile in the context of subsequent classification.

  • profile_id (int): A unique 1-based row index assigned to each selected profile, serving as its primary identifier.

The resulting DataFrame is assigned to selected_profiles. All profiles are made unique based on their key columns (platform, profile number, timestamp, longitude, latitude) before profile_id is assigned.

Return type:

None