aiqclib.prepare.features package
Submodules
aiqclib.prepare.features.basic_values module
This module provides the BasicValues class for extracting target value observations from Polars DataFrames.
It extends FeatureBase and is designed for specific data processing needs, such as those encountered with Copernicus CTD data.
- class aiqclib.prepare.features.basic_values.BasicValues(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]
Bases:
FeatureBaseA feature-extraction class for retrieving target values from Copernicus CTD data, extending
FeatureBase.- Parameters:
target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)
- extract_features()[source]
Initiate the multi-step process of creating the feature set in
features.Steps:
_init_features()- Prepare a base DataFrame with essential columns (row_id, platform_code, profile_no, observation_no).For each column specified in
feature_info["col_names"], call_add_features()to join the pivoted data onto our feature table._clean_features()- Drop columns no longer needed.
- Return type:
None
- scale_first()[source]
Apply a pre-feature-extraction scaling step on
filtered_input.This normalizes each relevant raw input column in place according to the normalization type declared in
feature_info["stats_set"]["type"]:min_max/auto_min_maxapply min-max scaling andstandardapplies standard scaling, both using the values infeature_info["stats"].rawleaves the data untouched.- Return type:
None
aiqclib.prepare.features.day_of_year module
This module defines a feature extraction class, DayOfYearFeat, that calculates the day of the year from timestamps.
It is designed to be part of a larger feature engineering pipeline, extending the FeatureBase class to derive temporal features, specifically the day-of-year, and optionally apply a sinusoidal transformation for cyclical encoding.
- class aiqclib.prepare.features.day_of_year.DayOfYearFeat(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]
Bases:
FeatureBaseA feature-extraction class that derives day-of-year features from Copernicus CTD data.
This class specifically leverages the
profile_timestampcolumn to generate a day-of-year value, optionally applying a sinusoidal transformation for cyclical encoding.- Parameters:
target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)
- convert_cosine()[source]
Optionally apply a cosinusoidal transformation to the day-of-year values.
The transformation formula used is:
\[day\_of\_year_{transformed} = \frac{{\cos((day\_of\_year - 1) \cdot 2 \cdot \pi / 364) + 1}}{2}\]- Returns:
None
- Return type:
None
- convert_sine()[source]
Optionally apply a sinusoidal transformation to the day-of-year values.
The transformation formula used is:
\[day\_of\_year_{transformed} = \frac{{\sin((day\_of\_year - 1) \cdot 2 \cdot \pi / 364) + 1}}{2}\]- Returns:
None
- Return type:
None
- extract_features()[source]
Derive the day-of-year feature from the
profile_timestampcolumn inselected_profilesand merge it with the target rows.Steps:
Select columns
row_id,platform_code, andprofile_nofromselected_rows[target_name].Join the subset with
profile_timestampfromselected_profilesbased onplatform_codeandprofile_no.Compute the day of year from
profile_timestampvia Polars’polars.Expr.dt.ordinal_day().Remove columns no longer needed (i.e., the join keys and timestamp).
- Returns:
None
- Return type:
None
aiqclib.prepare.features.flank_down module
This module provides the FlankDown class for extracting “flanking” (neighboring) observations around target rows within Copernicus CTD datasets. It specializes in downstream observation expansion and feature pivoting.
- class aiqclib.prepare.features.flank_down.FlankDown(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]
Bases:
FeatureBaseA feature-extraction class for retrieving target values and their “flanking” values from Copernicus CTD data, extending
FeatureBase.The term “flanking values” refers to the concept of capturing neighboring observations around a specified index (e.g., observation_no) by shifting backward a specified amount.
- Parameters:
target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)
- extract_features()[source]
Initiate the multi-step process of creating the feature set in
features.- Steps:
_init_features()- Prepare a base DataFrame with essential columns._expand_observations()- Expand observations based on “flank_down”.For each column in
feature_info["col_names"]: -_pivot_features()to pivot the data. -_add_features()to join the pivoted data onto the feature table._clean_features()- Drop metadata columns.
- Returns:
None
- Return type:
None
aiqclib.prepare.features.flank_up module
This module defines the FlankUp class, which is responsible for extracting neighboring (flanking) observations for specific target rows in a dataset. It is primarily used for feature engineering with Copernicus CTD data.
- class aiqclib.prepare.features.flank_up.FlankUp(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]
Bases:
FeatureBaseA feature-extraction class for retrieving target values and their “flanking” values from Copernicus CTD data, extending
aiqclib.common.base.feature_base.FeatureBase.The term “flanking values” refers to the concept of capturing neighboring observations around a specified index (e.g., observation_no) by shifting backward a specified amount.
- Parameters:
target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)
- extract_features()[source]
Initiate the multi-step process of creating the feature set in
features.Steps:
_init_features()- Prepare a base DataFrame with essential columns (row_id, platform_code, profile_no)._expand_observations()- Expand observations by adding rows for the specified number of “flank” steps (based onfeature_info["flank_up"]).For each column in
feature_info["col_names"], call: -_pivot_features()to pivot the data for that column, -_add_features()to join the pivoted data onto our feature table._clean_features()- Drop columns no longer needed.
- Return type:
None
aiqclib.prepare.features.location module
This module defines the LocationFeat class, a specialized feature extractor for geographical coordinates (longitude, latitude) within a specified dataset.
It extends the generic FeatureBase to handle the specific requirements of location data, including extraction from raw profiles and optional scaling.
- class aiqclib.prepare.features.location.LocationFeat(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]
Bases:
FeatureBaseA feature extraction class designed specifically for location-based fields (e.g., longitude, latitude) within the Copernicus CTD dataset.
This class uses the provided data frames to gather location-related fields and optionally apply scaling methods. It inherits from
FeatureBasewhich defines a generic feature extraction workflow.- Parameters:
target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)
- extract_features()[source]
Gather and merge location columns (e.g., longitude and latitude) from
selected_profilesintoselected_rowsto form the final feature set infeatures.- Returns:
None. The result is stored in the
featuresattribute.- Return type:
None
- scale_first()[source]
Initial scaling or normalization procedure (currently unimplemented).
- Returns:
None.
- Return type:
None
- scale_second()[source]
Apply scaling to each location feature column according to the normalization type in
feature_info["stats_set"]["type"].min_max/auto_min_maxapply min-max scaling andstandardapplies standard scaling, both usingfeature_info["stats"].rawleaves the columns unchanged.- Returns:
None. Scaling is applied to the
featuresDataFrame.- Return type:
None
aiqclib.prepare.features.profile_summary module
This module provides the ProfileSummaryStats class, which is responsible for extracting and scaling statistical features from Polars DataFrames by merging row-level data with summary statistics.
- class aiqclib.prepare.features.profile_summary.ProfileSummaryStats(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]
Bases:
FeatureBaseA feature-extraction class that combines row references from
selected_rowswith summary statistics fromsummary_stats. It constructs columns of summarized metrics (e.g., min, max) for specified variables and optionally applies scaling.This class inherits from
FeatureBase, which provides a generic framework for feature extraction, including placeholders for multi-stage scaling.- Parameters:
target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)
- extract_features()[source]
Traverse the
feature_infostructure to assemble columns fromsummary_stats, merging them intofeatures.- Return type:
None
- Steps:
Initialize
featuresvia_filter_selected_rows_cols().Join metrics from
summary_statsfor each variable/metric pair.Remove join keys (platform_code, profile_no) from the final result.
- scale_second()[source]
Scale the newly joined summary statistics based on
feature_info.Transforms columns named
{variable}_{metric}according to the normalization type infeature_info["stats_set"]["type"]:min_max/auto_min_maxapply min-max scaling andstandardapplies standard scaling, using the nested values infeature_info["stats"].rawleaves the columns unchanged.- Return type:
None