aiqclib.prepare.features package

Submodules

aiqclib.prepare.features.basic_values module

This module provides the BasicValues class for extracting target value observations from Polars DataFrames.

It extends FeatureBase and is designed for specific data processing needs, such as those encountered with Copernicus CTD data.

class aiqclib.prepare.features.basic_values.BasicValues(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: FeatureBase

A feature-extraction class for retrieving target values from Copernicus CTD data, extending FeatureBase.

Parameters:
  • target_name (str | None)

  • feature_info (Dict | None)

  • selected_profiles (DataFrame | None)

  • filtered_input (DataFrame | None)

  • selected_rows (Dict[str, DataFrame] | None)

  • summary_stats (DataFrame | None)

extract_features()[source]

Initiate the multi-step process of creating the feature set in features.

Steps:

  1. _init_features() - Prepare a base DataFrame with essential columns (row_id, platform_code, profile_no, observation_no).

  2. For each column specified in feature_info["col_names"], call _add_features() to join the pivoted data onto our feature table.

  3. _clean_features() - Drop columns no longer needed.

Return type:

None

scale_first()[source]

Apply a pre-feature-extraction scaling step on filtered_input.

This normalizes each relevant raw input column in place according to the normalization type declared in feature_info["stats_set"]["type"]: min_max/auto_min_max apply min-max scaling and standard applies standard scaling, both using the values in feature_info["stats"]. raw leaves the data untouched.

Return type:

None

scale_second()[source]

Apply a post-feature-extraction scaling step if needed.

This method is currently unimplemented but retains its placeholder for potential future additions of scaling or normalization after features have been pivoted and expanded.

Return type:

None

aiqclib.prepare.features.day_of_year module

This module defines a feature extraction class, DayOfYearFeat, that calculates the day of the year from timestamps.

It is designed to be part of a larger feature engineering pipeline, extending the FeatureBase class to derive temporal features, specifically the day-of-year, and optionally apply a sinusoidal transformation for cyclical encoding.

class aiqclib.prepare.features.day_of_year.DayOfYearFeat(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: FeatureBase

A feature-extraction class that derives day-of-year features from Copernicus CTD data.

This class specifically leverages the profile_timestamp column to generate a day-of-year value, optionally applying a sinusoidal transformation for cyclical encoding.

Parameters:
  • target_name (str | None)

  • feature_info (Dict | None)

  • selected_profiles (DataFrame | None)

  • filtered_input (DataFrame | None)

  • selected_rows (Dict[str, DataFrame] | None)

  • summary_stats (DataFrame | None)

convert_cosine()[source]

Optionally apply a cosinusoidal transformation to the day-of-year values.

The transformation formula used is:

\[day\_of\_year_{transformed} = \frac{{\cos((day\_of\_year - 1) \cdot 2 \cdot \pi / 364) + 1}}{2}\]
Returns:

None

Return type:

None

convert_sine()[source]

Optionally apply a sinusoidal transformation to the day-of-year values.

The transformation formula used is:

\[day\_of\_year_{transformed} = \frac{{\sin((day\_of\_year - 1) \cdot 2 \cdot \pi / 364) + 1}}{2}\]
Returns:

None

Return type:

None

extract_features()[source]

Derive the day-of-year feature from the profile_timestamp column in selected_profiles and merge it with the target rows.

Steps:

  1. Select columns row_id, platform_code, and profile_no from selected_rows[target_name].

  2. Join the subset with profile_timestamp from selected_profiles based on platform_code and profile_no.

  3. Compute the day of year from profile_timestamp via Polars’ polars.Expr.dt.ordinal_day().

  4. Remove columns no longer needed (i.e., the join keys and timestamp).

Returns:

None

Return type:

None

scale_first()[source]

(Optional) Perform the initial scaling step.

Currently, no transformations are applied to day-of-year values in this step, but it can be extended for outlier removal or other domain-specific logic.

Returns:

None

Return type:

None

scale_second()[source]

Optionally apply a sinusoidal or cosinusoidal transformation to the day-of-year values.

If "convert" is specified as either "sine" or "cosine" in feature_info, transforms each day-of-year value into a cyclical feature in the range [0, 1].

Returns:

None

Return type:

None

aiqclib.prepare.features.flank_down module

This module provides the FlankDown class for extracting “flanking” (neighboring) observations around target rows within Copernicus CTD datasets. It specializes in downstream observation expansion and feature pivoting.

class aiqclib.prepare.features.flank_down.FlankDown(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: FeatureBase

A feature-extraction class for retrieving target values and their “flanking” values from Copernicus CTD data, extending FeatureBase.

The term “flanking values” refers to the concept of capturing neighboring observations around a specified index (e.g., observation_no) by shifting backward a specified amount.

Parameters:
  • target_name (str | None)

  • feature_info (Dict | None)

  • selected_profiles (DataFrame | None)

  • filtered_input (DataFrame | None)

  • selected_rows (Dict[str, DataFrame] | None)

  • summary_stats (DataFrame | None)

extract_features()[source]

Initiate the multi-step process of creating the feature set in features.

Steps:
  1. _init_features() - Prepare a base DataFrame with essential columns.

  2. _expand_observations() - Expand observations based on “flank_down”.

  3. For each column in feature_info["col_names"]: - _pivot_features() to pivot the data. - _add_features() to join the pivoted data onto the feature table.

  4. _clean_features() - Drop metadata columns.

Returns:

None

Return type:

None

scale_first()[source]

Apply a pre-feature-extraction scaling step on filtered_input using min-max scaling derived from feature_info["stats"].

Returns:

None

Return type:

None

scale_second()[source]

Apply a post-feature-extraction scaling step if needed. Currently unimplemented.

Returns:

None

Return type:

None

aiqclib.prepare.features.flank_up module

This module defines the FlankUp class, which is responsible for extracting neighboring (flanking) observations for specific target rows in a dataset. It is primarily used for feature engineering with Copernicus CTD data.

class aiqclib.prepare.features.flank_up.FlankUp(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: FeatureBase

A feature-extraction class for retrieving target values and their “flanking” values from Copernicus CTD data, extending aiqclib.common.base.feature_base.FeatureBase.

The term “flanking values” refers to the concept of capturing neighboring observations around a specified index (e.g., observation_no) by shifting backward a specified amount.

Parameters:
  • target_name (str | None)

  • feature_info (Dict | None)

  • selected_profiles (DataFrame | None)

  • filtered_input (DataFrame | None)

  • selected_rows (Dict[str, DataFrame] | None)

  • summary_stats (DataFrame | None)

extract_features()[source]

Initiate the multi-step process of creating the feature set in features.

Steps:

  1. _init_features() - Prepare a base DataFrame with essential columns (row_id, platform_code, profile_no).

  2. _expand_observations() - Expand observations by adding rows for the specified number of “flank” steps (based on feature_info["flank_up"]).

  3. For each column in feature_info["col_names"], call: - _pivot_features() to pivot the data for that column, - _add_features() to join the pivoted data onto our feature table.

  4. _clean_features() - Drop columns no longer needed.

Return type:

None

scale_first()[source]

Apply a pre-feature-extraction scaling step on filtered_input using min-max scaling derived from feature_info["stats"].

This modifies filtered_input in place for each relevant column.

Return type:

None

scale_second()[source]

Apply a post-feature-extraction scaling step if needed.

Currently, unimplemented; retains placeholders for additional scaling/normalization after feature pivoting and expansion.

Return type:

None

aiqclib.prepare.features.location module

This module defines the LocationFeat class, a specialized feature extractor for geographical coordinates (longitude, latitude) within a specified dataset.

It extends the generic FeatureBase to handle the specific requirements of location data, including extraction from raw profiles and optional scaling.

class aiqclib.prepare.features.location.LocationFeat(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: FeatureBase

A feature extraction class designed specifically for location-based fields (e.g., longitude, latitude) within the Copernicus CTD dataset.

This class uses the provided data frames to gather location-related fields and optionally apply scaling methods. It inherits from FeatureBase which defines a generic feature extraction workflow.

Parameters:
  • target_name (str | None)

  • feature_info (Dict | None)

  • selected_profiles (DataFrame | None)

  • filtered_input (DataFrame | None)

  • selected_rows (Dict[str, DataFrame] | None)

  • summary_stats (DataFrame | None)

extract_features()[source]

Gather and merge location columns (e.g., longitude and latitude) from selected_profiles into selected_rows to form the final feature set in features.

Returns:

None. The result is stored in the features attribute.

Return type:

None

scale_first()[source]

Initial scaling or normalization procedure (currently unimplemented).

Returns:

None.

Return type:

None

scale_second()[source]

Apply scaling to each location feature column according to the normalization type in feature_info["stats_set"]["type"].

min_max/auto_min_max apply min-max scaling and standard applies standard scaling, both using feature_info["stats"]. raw leaves the columns unchanged.

Returns:

None. Scaling is applied to the features DataFrame.

Return type:

None

aiqclib.prepare.features.profile_summary module

This module provides the ProfileSummaryStats class, which is responsible for extracting and scaling statistical features from Polars DataFrames by merging row-level data with summary statistics.

class aiqclib.prepare.features.profile_summary.ProfileSummaryStats(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: FeatureBase

A feature-extraction class that combines row references from selected_rows with summary statistics from summary_stats. It constructs columns of summarized metrics (e.g., min, max) for specified variables and optionally applies scaling.

This class inherits from FeatureBase, which provides a generic framework for feature extraction, including placeholders for multi-stage scaling.

Parameters:
  • target_name (str | None)

  • feature_info (Dict | None)

  • selected_profiles (DataFrame | None)

  • filtered_input (DataFrame | None)

  • selected_rows (Dict[str, DataFrame] | None)

  • summary_stats (DataFrame | None)

extract_features()[source]

Traverse the feature_info structure to assemble columns from summary_stats, merging them into features.

Return type:

None

Steps:
  1. Initialize features via _filter_selected_rows_cols().

  2. Join metrics from summary_stats for each variable/metric pair.

  3. Remove join keys (platform_code, profile_no) from the final result.

scale_first()[source]

An initial scaling hook (unimplemented).

Return type:

None

scale_second()[source]

Scale the newly joined summary statistics based on feature_info.

Transforms columns named {variable}_{metric} according to the normalization type in feature_info["stats_set"]["type"]: min_max/auto_min_max apply min-max scaling and standard applies standard scaling, using the nested values in feature_info["stats"]. raw leaves the columns unchanged.

Return type:

None