aiqclib.prepare.features package

Submodules

aiqclib.prepare.features.basic_values module

This module provides the BasicValues class for extracting target value observations from Polars DataFrames.

It extends FeatureBase and is designed for specific data processing needs, such as those encountered with Copernicus CTD data.

class aiqclib.prepare.features.basic_values.BasicValues(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: FeatureBase

A feature-extraction class for retrieving target values from Copernicus CTD data, extending FeatureBase.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

extract_features()[source]

Initiate the multi-step process of creating the feature set in features.

Steps:

_init_features() - Prepare a base DataFrame with essential columns (row_id, platform_code, profile_no, observation_no).

For each column specified in feature_info["col_names"], call _add_features() to join the pivoted data onto our feature table.

_clean_features() - Drop columns no longer needed.

Return type:: None

scale_first()[source]

Apply a pre-feature-extraction scaling step on filtered_input.

This normalizes each relevant raw input column in place according to the normalization type declared in feature_info["stats_set"]["type"]: min_max/auto_min_max apply min-max scaling and standard applies standard scaling, both using the values in feature_info["stats"]. raw leaves the data untouched.

Return type:: None

scale_second()[source]

Apply a post-feature-extraction scaling step if needed.

This method is currently unimplemented but retains its placeholder for potential future additions of scaling or normalization after features have been pivoted and expanded.

Return type:: None

aiqclib.prepare.features.day_of_year module

This module defines a feature extraction class, DayOfYearFeat, that calculates the day of the year from timestamps.

It is designed to be part of a larger feature engineering pipeline, extending the FeatureBase class to derive temporal features, specifically the day-of-year, and optionally apply a sinusoidal transformation for cyclical encoding.

class aiqclib.prepare.features.day_of_year.DayOfYearFeat(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: FeatureBase

A feature-extraction class that derives day-of-year features from Copernicus CTD data.

This class specifically leverages the profile_timestamp column to generate a day-of-year value, optionally applying a sinusoidal transformation for cyclical encoding.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

convert_cosine()[source]

Optionally apply a cosinusoidal transformation to the day-of-year values.

The transformation formula used is:

\[day\_of\_year_{transformed} = \frac{{\cos((day\_of\_year - 1) \cdot 2 \cdot \pi / 364) + 1}}{2}\]

Returns:: None
Return type:: None

convert_sine()[source]

Optionally apply a sinusoidal transformation to the day-of-year values.

The transformation formula used is:

\[day\_of\_year_{transformed} = \frac{{\sin((day\_of\_year - 1) \cdot 2 \cdot \pi / 364) + 1}}{2}\]

Returns:: None
Return type:: None

extract_features()[source]

Derive the day-of-year feature from the profile_timestamp column in selected_profiles and merge it with the target rows.

Steps:

Select columns row_id, platform_code, and profile_no from selected_rows[target_name].

Join the subset with profile_timestamp from selected_profiles based on platform_code and profile_no.

Compute the day of year from profile_timestamp via Polars’ polars.Expr.dt.ordinal_day().

Remove columns no longer needed (i.e., the join keys and timestamp).

Returns:: None
Return type:: None

scale_first()[source]

(Optional) Perform the initial scaling step.

Currently, no transformations are applied to day-of-year values in this step, but it can be extended for outlier removal or other domain-specific logic.

Returns:: None
Return type:: None

scale_second()[source]

Optionally apply a sinusoidal or cosinusoidal transformation to the day-of-year values.

If "convert" is specified as either "sine" or "cosine" in feature_info, transforms each day-of-year value into a cyclical feature in the range [0, 1].

Returns:: None
Return type:: None

aiqclib.prepare.features.flank_down module

This module provides the FlankDown class for extracting “flanking” (neighboring) observations around target rows within Copernicus CTD datasets. It specializes in downstream observation expansion and feature pivoting.

class aiqclib.prepare.features.flank_down.FlankDown(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: FeatureBase

A feature-extraction class for retrieving target values and their “flanking” values from Copernicus CTD data, extending FeatureBase.

The term “flanking values” refers to the concept of capturing neighboring observations around a specified index (e.g., observation_no) by shifting backward a specified amount.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

extract_features()[source]

Initiate the multi-step process of creating the feature set in features.

Steps:

_init_features() - Prepare a base DataFrame with essential columns.
_expand_observations() - Expand observations based on “flank_down”.
For each column in feature_info["col_names"]: - _pivot_features() to pivot the data. - _add_features() to join the pivoted data onto the feature table.
_clean_features() - Drop metadata columns.

Returns:: None
Return type:: None

scale_first()[source]

Apply a pre-feature-extraction scaling step on filtered_input using min-max scaling derived from feature_info["stats"].

Returns:: None
Return type:: None

scale_second()[source]

Apply a post-feature-extraction scaling step if needed. Currently unimplemented.

Returns:: None
Return type:: None

aiqclib.prepare.features.flank_up module

This module defines the FlankUp class, which is responsible for extracting neighboring (flanking) observations for specific target rows in a dataset. It is primarily used for feature engineering with Copernicus CTD data.

class aiqclib.prepare.features.flank_up.FlankUp(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: FeatureBase

A feature-extraction class for retrieving target values and their “flanking” values from Copernicus CTD data, extending aiqclib.common.base.feature_base.FeatureBase.

The term “flanking values” refers to the concept of capturing neighboring observations around a specified index (e.g., observation_no) by shifting backward a specified amount.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

extract_features()[source]

Initiate the multi-step process of creating the feature set in features.

Steps:

_init_features() - Prepare a base DataFrame with essential columns (row_id, platform_code, profile_no).
_expand_observations() - Expand observations by adding rows for the specified number of “flank” steps (based on feature_info["flank_up"]).
For each column in feature_info["col_names"], call: - _pivot_features() to pivot the data for that column, - _add_features() to join the pivoted data onto our feature table.
_clean_features() - Drop columns no longer needed.

Return type:: None

scale_first()[source]

Apply a pre-feature-extraction scaling step on filtered_input using min-max scaling derived from feature_info["stats"].

This modifies filtered_input in place for each relevant column.

Return type:: None

scale_second()[source]

Apply a post-feature-extraction scaling step if needed.

Currently, unimplemented; retains placeholders for additional scaling/normalization after feature pivoting and expansion.

Return type:: None

aiqclib.prepare.features.location module

This module defines the LocationFeat class, a specialized feature extractor for geographical coordinates (longitude, latitude) within a specified dataset.

It extends the generic FeatureBase to handle the specific requirements of location data, including extraction from raw profiles and optional scaling.

class aiqclib.prepare.features.location.LocationFeat(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: FeatureBase

A feature extraction class designed specifically for location-based fields (e.g., longitude, latitude) within the Copernicus CTD dataset.

This class uses the provided data frames to gather location-related fields and optionally apply scaling methods. It inherits from FeatureBase which defines a generic feature extraction workflow.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

extract_features()[source]

Gather and merge location columns (e.g., longitude and latitude) from selected_profiles into selected_rows to form the final feature set in features.

Returns:: None. The result is stored in the features attribute.
Return type:: None

scale_first()[source]

Initial scaling or normalization procedure (currently unimplemented).

Returns:: None.
Return type:: None

scale_second()[source]

Apply scaling to each location feature column according to the normalization type in feature_info["stats_set"]["type"].

min_max/auto_min_max apply min-max scaling and standard applies standard scaling, both using feature_info["stats"]. raw leaves the columns unchanged.

Returns:: None. Scaling is applied to the features DataFrame.
Return type:: None

aiqclib.prepare.features.profile_summary module

This module provides the ProfileSummaryStats class, which is responsible for extracting and scaling statistical features from Polars DataFrames by merging row-level data with summary statistics.

class aiqclib.prepare.features.profile_summary.ProfileSummaryStats(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: FeatureBase

A feature-extraction class that combines row references from selected_rows with summary statistics from summary_stats. It constructs columns of summarized metrics (e.g., min, max) for specified variables and optionally applies scaling.

This class inherits from FeatureBase, which provides a generic framework for feature extraction, including placeholders for multi-stage scaling.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

extract_features()[source]

Traverse the feature_info structure to assemble columns from summary_stats, merging them into features.

Return type:: None

Steps:

Initialize features via _filter_selected_rows_cols().
Join metrics from summary_stats for each variable/metric pair.
Remove join keys (platform_code, profile_no) from the final result.

scale_first()[source]

An initial scaling hook (unimplemented).

Return type:: None

scale_second()[source]

Scale the newly joined summary statistics based on feature_info.

Transforms columns named {variable}_{metric} according to the normalization type in feature_info["stats_set"]["type"]: min_max/auto_min_max apply min-max scaling and standard applies standard scaling, using the nested values in feature_info["stats"]. raw leaves the columns unchanged.

Return type:: None

aiqclib.prepare.features.qc_density_inversion module

This module defines QCDensityInversion, the RTQC14 density inversion test.

The potential density anomaly sigma-0 (UNESCO 1983 algorithm, see aiqclib.common.utils.seawater) is computed for every observation and compared at consecutive profile levels in both directions. From top to bottom, an observation fails when its density is smaller than the previous (lesser-pressure) density beyond the threshold; from bottom to top, when its density is larger than the next (greater-pressure) density beyond the threshold — so both levels of an inverted pair are flagged. Because the density combines temperature and salinity, both variables are flagged jointly. Small inversions below the configurable threshold are allowed.

class aiqclib.prepare.features.qc_density_inversion.QCDensityInversion(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: QCItemFeatureBase

RTQC14 density inversion test (observation-level, temp and psal jointly).

Fails both levels of a consecutive pair whose sigma-0 difference inverts by more than threshold (0.03 kg/m³ by default). Produces temp_qc_density_inversion and psal_qc_density_inversion columns carrying identical flags. Observations with a missing temperature, salinity, or pressure cannot be density-checked and always pass.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

compute_flags(df)[source]

Flag consecutive-level density inversions beyond the threshold.

Parameters:: df (DataFrame) – The observations to check (needs temp, psal, and pres).
Returns:: Observation keys plus the per-variable joint flag columns.
Return type:: DataFrame

default_params: Dict = {'threshold': 0.03}: Built-in default parameters (the values from the NRT QC spec). Parameters supplied via feature_info["params"] override these per key.

get_variables()[source]

Return the variables receiving the joint density flag.

Density combines temperature and salinity, so both are flagged by default; an explicit col_names list restricts the output columns.

Returns:: Variable names to produce flag columns for.
Return type:: List[str]

item_name: str = 'density_inversion': Short item name used in output column names (e.g. “global_range”).

order_column: str = 'observation_no': Column defining the vertical order of observations within a profile.

scalar_param_names: Tuple[str, ...] = ('threshold',): Parameter keys that are settings rather than variable entries; used when deriving the checked variables from params.

aiqclib.prepare.features.qc_digit_rollover module

This module defines QCDigitRollover, the RTQC12 digit rollover test.

Sensors store temperature and salinity in a limited bit range; when the range is exceeded, stored values roll over to the lower end. The test detects rollovers that were not compensated for when the profile was constructed, by flagging differences between vertically adjacent measurements greater than 10 degC for temperature and 5 for salinity (by default).

class aiqclib.prepare.features.qc_digit_rollover.QCDigitRollover(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: QCItemFeatureBase

RTQC12 digit rollover test (observation-level, per variable).

Fails a value when its absolute difference from the previous observation exceeds the variable’s threshold. Produces one {variable}_qc_digit_rollover column per configured variable; the first observation of a profile always passes.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

compute_flags(df)[source]

Flag adjacent-difference jumps larger than the variable threshold.

Parameters:: df (DataFrame) – The observations to check.
Returns:: Observation keys plus one flag column per variable.
Return type:: DataFrame
Raises:: ValueError – If a variable has no numeric threshold.

default_params: Dict = {'psal': 5.0, 'temp': 10.0}: Built-in default parameters (the values from the NRT QC spec). Parameters supplied via feature_info["params"] override these per key.

item_name: str = 'digit_rollover': Short item name used in output column names (e.g. “global_range”).

order_column: str = 'observation_no': Column defining the vertical order of observations within a profile.

aiqclib.prepare.features.qc_global_range module

This module defines QCGlobalRange, the RTQC6 global range test.

A gross filter on observed values that must accommodate all expected ocean extremes: temperature within [-2.5, 40.0] degC and salinity within [2.0, 41.0] by default. Each configured variable is tested independently and produces its own flag column. Null values pass (a missing measurement cannot be range-checked).

class aiqclib.prepare.features.qc_global_range.QCGlobalRange(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: QCItemFeatureBase

RTQC6 global range test (observation-level, per variable).

Fails when a value lies outside the variable’s [min, max] range (bounds inclusive). Produces one {variable}_qc_global_range column per configured variable.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

compute_flags(df)[source]

Flag values outside their variable’s configured range.

Parameters:: df (DataFrame) – The observations to check.
Returns:: Observation keys plus one flag column per variable.
Return type:: DataFrame
Raises:: ValueError – If no variables are configured for the item.

default_params: Dict = {'psal': {'max': 41.0, 'min': 2.0}, 'temp': {'max': 40.0, 'min': -2.5}}: Built-in default parameters (the values from the NRT QC spec). Parameters supplied via feature_info["params"] override these per key.

item_name: str = 'global_range': Short item name used in output column names (e.g. “global_range”).

aiqclib.prepare.features.qc_gradient module

This module defines QCGradient, the RTQC11 gradient test.

The test fails when the difference between vertically adjacent measurements is too steep. The test value is |V2 - (V3 + V1)/2| where V2 is the tested measurement and V1/V3 its neighbours, compared against depth-dependent thresholds (temp 9.0/3.0 degC and psal 1.5/0.5 around 500 db by default).

class aiqclib.prepare.features.qc_gradient.QCGradient(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: QCNeighborStencilBase

RTQC11 gradient test (observation-level, per variable).

Fails V2 when the gradient test value exceeds the depth-dependent threshold. Produces one {variable}_qc_gradient column per configured variable; profile-boundary observations always pass.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

default_params: Dict = {'depth_threshold': 500, 'psal': {'deep': 0.5, 'shallow': 1.5}, 'temp': {'deep': 3.0, 'shallow': 9.0}}: Built-in default parameters (the values from the NRT QC spec). Parameters supplied via feature_info["params"] override these per key.

item_name: str = 'gradient': Short item name used in output column names (e.g. “global_range”).

test_value_expr(v1, v2, v3)[source]

The gradient test value: |V2 - (V3 + V1)/2|.

Parameters:

v1 (Expr) – The neighbouring value above V2.
v2 (Expr) – The value being tested.
v3 (Expr) – The neighbouring value below V2.

Returns:

The gradient test value expression.

Return type:

Expr

aiqclib.prepare.features.qc_impossible_date module

This module defines QCImpossibleDate, the RTQC2 impossible date test.

The observation date must be sensible: after the minimum year and not in the future. Because the input timestamp column is a parsed datetime, structurally invalid dates cannot be represented and surface as nulls, which also fail the test. The test is profile-level: the flag applies to every observation of the profile through the shared timestamp column.

class aiqclib.prepare.features.qc_impossible_date.QCImpossibleDate(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: QCItemFeatureBase

RTQC2 impossible date test (profile-level).

Fails when the profile timestamp is null, its year is not greater than min_year (default 1950), or it lies in the future at processing time. Produces the single column qc_impossible_date.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

compute_flags(df)[source]

Flag observations whose profile timestamp is not sensible.

Parameters:: df (DataFrame) – The observations to check.
Returns:: Observation keys plus the qc_impossible_date column.
Return type:: DataFrame

default_params: dict = {'min_year': 1950, 'timestamp_column': 'profile_timestamp'}: Built-in default parameters (the values from the NRT QC spec). Parameters supplied via feature_info["params"] override these per key.

item_name: str = 'impossible_date': Short item name used in output column names (e.g. “global_range”).

aiqclib.prepare.features.qc_impossible_location module

This module defines QCImpossibleLocation, the RTQC3 impossible location test.

The observation latitude and longitude must be sensible: latitude within [-90, 90] and longitude within [-180, 180]. Null coordinates fail. The test is profile-level: the flag applies to every observation of the profile through the shared position columns.

class aiqclib.prepare.features.qc_impossible_location.QCImpossibleLocation(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: QCItemFeatureBase

RTQC3 impossible location test (profile-level).

Fails when latitude or longitude is null or outside the configured bounds. Produces the single column qc_impossible_location.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

compute_flags(df)[source]

Flag observations whose position is not sensible.

Parameters:: df (DataFrame) – The observations to check.
Returns:: Observation keys plus the qc_impossible_location column.
Return type:: DataFrame

default_params: dict = {'lat_max': 90.0, 'lat_min': -90.0, 'lon_max': 180.0, 'lon_min': -180.0}: Built-in default parameters (the values from the NRT QC spec). Parameters supplied via feature_info["params"] override these per key.

item_name: str = 'impossible_location': Short item name used in output column names (e.g. “global_range”).

aiqclib.prepare.features.qc_item_base module

This module defines QCItemFeatureBase, the common base class for the NRT QC item feature classes (RTQC tests from the NRT QC recommendation document).

Each QC item is an ordinary feature class registered in the feature registry, so its flag columns can be used both by the NRT QC module and as training features in the dataset preparation pipeline. Subclasses implement compute_flags(), which produces one flag column per checked variable (or a single profile-level column) for every observation of the input frame, using vectorised polars expressions.

aiqclib.prepare.features.qc_item_base.OBSERVATION_KEYS: List[str] = ['platform_code', 'profile_no', 'observation_no']: Columns identifying a single observation across the pipeline.

aiqclib.prepare.features.qc_item_base.PROFILE_KEYS: List[str] = ['platform_code', 'profile_no']: Columns identifying a profile.

class aiqclib.prepare.features.qc_item_base.QCItemFeatureBase(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: FeatureBase

Abstract base class for NRT QC item feature classes.

Subclasses define item_name, default_params, and compute_flags(). The base class resolves parameters (configuration params override default_params per key), the fail_flag emitted on failure (default 4, bad data), and the dual output mode:

Inside the dataset preparation pipeline (selected_rows given), features holds row_id plus the flag columns for the selected rows, like any other feature class.
Inside the NRT QC module (no selected_rows), features holds the observation keys plus the flag columns for every row of filtered_input.

Flag columns follow the naming scheme {variable}_qc_{item_name} for variable-specific items and qc_{item_name} for profile-level items, and are never null: 1 (good) or the item’s fail flag.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

abstractmethod compute_flags(df)[source]

Compute this item’s flag column(s) for every row of df.

Must return the observation key columns plus one flag column per checked variable (or a single profile-level flag column), with the same number of rows as df.

Parameters:: df (DataFrame) – The observations to check.
Returns:: Observation keys plus flag column(s).
Return type:: DataFrame

default_params: Dict = {}: Built-in default parameters (the values from the NRT QC spec). Parameters supplied via feature_info["params"] override these per key.

extract_features()[source]

Compute the flag columns and store them in features.

With selected_rows present (preparation pipeline), the flags are joined onto the target’s selected rows and keyed by row_id; otherwise (NRT QC module) the full flag frame is kept.

Return type:: None

fail_flag: int: The flag emitted when an observation fails this item.

flag_column_name(variable=None)[source]

Return the output column name for this item.

Parameters:: variable (Optional[str]) – The checked variable, or None for profile-level items.
Returns:: {variable}_qc_{item_name} or qc_{item_name}.
Return type:: str

get_variables()[source]

Return the variables this item produces flag columns for.

An explicit col_names list in feature_info wins; otherwise the variables are the parameter keys that are not settings (see scalar_param_names).

Returns:: Variable names, in configuration order.
Return type:: List[str]

item_name: str = '': Short item name used in output column names (e.g. “global_range”).

params: Dict

configuration values override the defaults.

Type:: Resolved parameters

scalar_param_names: Tuple[str, ...] = (): Parameter keys that are settings rather than variable entries; used when deriving the checked variables from params.

scale_first()[source]

(No-op) QC flags are not scaled before extraction.

Return type:: None

scale_second()[source]

(No-op) QC flags are kept as raw flag values. A mapping to [0, 1] can be added here if the flags are later used as model features.

Return type:: None

class aiqclib.prepare.features.qc_item_base.QCNeighborStencilBase(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: QCItemFeatureBase

Base class for QC items using the V1/V2/V3 neighbour stencil.

The spike (RTQC9) and gradient (RTQC11) tests both evaluate a test value from a measurement V2 and its vertical neighbours V1 (above) and V3 (below), compared against a depth-dependent threshold: shallow for pressures less than depth_threshold (500 db by default) and deep otherwise. Neighbours are taken in observation order within each profile; the first and last observations of a profile have no complete stencil and always pass, as do stencils involving null values.

Subclasses implement test_value_expr() with their formula.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

compute_flags(df)[source]

Flag values whose stencil test value exceeds the depth-dependent threshold.

Parameters:: df (DataFrame) – The observations to check.
Returns:: Observation keys plus one flag column per variable.
Return type:: DataFrame

order_column: str = 'observation_no': Column defining the vertical order of observations within a profile.

scalar_param_names: Tuple[str, ...] = ('depth_threshold',): Parameter keys that are settings rather than variable entries; used when deriving the checked variables from params.

abstractmethod test_value_expr(v1, v2, v3)[source]

Return the item’s test value for the V1/V2/V3 stencil.

Parameters:

v1 (Expr) – The neighbouring value above V2.
v2 (Expr) – The value being tested.
v3 (Expr) – The neighbouring value below V2.

Returns:

The test value expression compared against the threshold.

Return type:

Expr

aiqclib.prepare.features.qc_pressure_increasing module

This module defines QCPressureIncreasing, the RTQC8 pressure increasing test.

The profile must have monotonically increasing pressures in observation order. In a region of constant pressure, all but the first of the consecutive constant pressures are flagged. In a region where pressure reverses, all pressures in the reversed part (below the running maximum of the preceding pressures) are flagged. Pressure is shared by all variables, so the test produces a single profile-wide column.

class aiqclib.prepare.features.qc_pressure_increasing.QCPressureIncreasing(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: QCItemFeatureBase

RTQC8 pressure increasing test (observation-level, variable-independent).

Fails an observation when its pressure equals the previous pressure (constant run) or lies below the running maximum of the preceding pressures (reversed segment). Produces the single column qc_pressure_increasing.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

compute_flags(df)[source]

Flag constant and reversed pressure regions within each profile.

Parameters:: df (DataFrame) – The observations to check.
Returns:: Observation keys plus the qc_pressure_increasing column.
Return type:: DataFrame

item_name: str = 'pressure_increasing': Short item name used in output column names (e.g. “global_range”).

order_column: str = 'observation_no': Column defining the vertical order of observations within a profile.

aiqclib.prepare.features.qc_regional_range module

This module defines QCRegionalRange, the RTQC7 regional range test.

The same range check as the global range test, but with the ranges of the configuration file’s region (one configuration file is prepared per region, so no polygon membership test is needed). There are no built-in default ranges: the region’s bounds must be supplied in the configuration, and a regional_range item without them raises an error rather than silently passing everything.

class aiqclib.prepare.features.qc_regional_range.QCRegionalRange(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: QCGlobalRange

RTQC7 regional range test (observation-level, per variable).

Identical to QCGlobalRange except that the ranges come from the region of the active configuration file and have no defaults. Produces one {variable}_qc_regional_range column per configured variable.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

default_params: Dict = {}: Built-in default parameters (the values from the NRT QC spec). Parameters supplied via feature_info["params"] override these per key.

item_name: str = 'regional_range': Short item name used in output column names (e.g. “global_range”).

aiqclib.prepare.features.qc_spike module

This module defines QCSpike, the RTQC9 spike test.

A spike is a measurement quite different from its vertical neighbours in both size and gradient. The test value is |V2 - (V3 + V1)/2| - |(V3 - V1)/2| where V2 is the tested measurement and V1/V3 its neighbours, compared against depth-dependent thresholds (temp 6.0/2.0 degC and psal 0.9/0.3 around 500 db by default).

class aiqclib.prepare.features.qc_spike.QCSpike(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: QCNeighborStencilBase

RTQC9 spike test (observation-level, per variable).

Fails V2 when the spike test value exceeds the depth-dependent threshold. Produces one {variable}_qc_spike column per configured variable; profile-boundary observations always pass.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

default_params: Dict = {'depth_threshold': 500, 'psal': {'deep': 0.3, 'shallow': 0.9}, 'temp': {'deep': 2.0, 'shallow': 6.0}}: Built-in default parameters (the values from the NRT QC spec). Parameters supplied via feature_info["params"] override these per key.

item_name: str = 'spike': Short item name used in output column names (e.g. “global_range”).

test_value_expr(v1, v2, v3)[source]

The spike test value: |V2 - (V3 + V1)/2| - |(V3 - V1)/2|.

Parameters:

v1 (Expr) – The neighbouring value above V2.
v2 (Expr) – The value being tested.
v3 (Expr) – The neighbouring value below V2.

Returns:

The spike test value expression.

Return type:

Expr

aiqclib.prepare.features.qc_stuck_value module

This module defines QCStuckValue, the RTQC13 stuck value test.

The test looks for all measurements of a variable in a profile being identical, which indicates a stuck sensor. When it occurs, every value of the affected variable in the profile is flagged. Profiles with fewer than min_observations non-null measurements are exempt (a single observation cannot be “stuck”), as are profiles where the variable was not measured.

class aiqclib.prepare.features.qc_stuck_value.QCStuckValue(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: QCItemFeatureBase

RTQC13 stuck value test (profile-level, per variable).

Fails a whole profile’s variable when all its non-null measurements are identical and there are at least min_observations of them. Produces one {variable}_qc_stuck_value column per configured variable. The variables default to temp and psal; the empty per-variable dictionaries in the defaults act as enable markers.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

compute_flags(df)[source]

Flag profiles whose variable is stuck at a single value.

Parameters:: df (DataFrame) – The observations to check.
Returns:: Observation keys plus one flag column per variable.
Return type:: DataFrame

default_params: Dict = {'min_observations': 2, 'psal': {}, 'temp': {}}: Built-in default parameters (the values from the NRT QC spec). Parameters supplied via feature_info["params"] override these per key.

item_name: str = 'stuck_value': Short item name used in output column names (e.g. “global_range”).

scalar_param_names: Tuple[str, ...] = ('min_observations',): Parameter keys that are settings rather than variable entries; used when deriving the checked variables from params.

aiqclib.prepare.features.qc_temp_to_psal module

This module defines QCTempToPsal, the temperature-to-salinity flag propagation rule from the NRT QC recommendation document.

When salinity is computed from temperature and conductivity, a temperature flagged 4 (or 3) corrupts the salinity at the same observation, so the salinity is flagged 4 (or 3) as well. The item consumes an aggregated temperature flag column (temp_nrt_flag by default, produced by the NRT QC flag aggregation step) and emits the propagated salinity flag in its own column so the propagation stays traceable. Datasets with independently measured salinity simply omit this item from their configuration.

class aiqclib.prepare.features.qc_temp_to_psal.QCTempToPsal(target_name=None, feature_info=None, selected_profiles=None, filtered_input=None, selected_rows=None, summary_stats=None)[source]

Bases: QCItemFeatureBase

Temperature-to-salinity flag propagation (observation-level).

Copies the aggregated temperature flag onto salinity when it is 3 or 4 (the propagated flag keeps its severity — fail_flag does not apply to this item). Produces the single column psal_qc_temp_to_psal, which is 1 wherever the temperature flag is good or missing.

Parameters:

target_name (str | None)
feature_info (Dict | None)
selected_profiles (DataFrame | None)
filtered_input (DataFrame | None)
selected_rows (Dict[str, DataFrame] | None)
summary_stats (DataFrame | None)

compute_flags(df)[source]

Propagate probably-bad/bad temperature flags onto the target variable.

Parameters:: df (DataFrame) – The observations, including the aggregated temperature flag column (source_column).
Returns:: Observation keys plus the propagated flag column.
Return type:: DataFrame

default_params: Dict = {'source_column': 'temp_nrt_flag', 'target_variable': 'psal'}: Built-in default parameters (the values from the NRT QC spec). Parameters supplied via feature_info["params"] override these per key.

get_variables()[source]

Return the single variable receiving the propagated flag.

Returns:: The configured target variable (psal by default).
Return type:: List[str]

item_name: str = 'temp_to_psal': Short item name used in output column names (e.g. “global_range”).

scalar_param_names: Tuple[str, ...] = ('source_column', 'target_variable'): Parameter keys that are settings rather than variable entries; used when deriving the checked variables from params.