aiqclib.prepare.step2_calc_stats packageο
Submodulesο
aiqclib.prepare.step2_calc_stats.dataset_a moduleο
This module defines the SummaryDataSetA class, a specialized implementation of SummaryStatsBase for calculating summary statistics on specific datasets, such as Copernicus CTD data, using the Polars DataFrame library. It integrates with a configuration management system to ensure proper data processing.
- class aiqclib.prepare.step2_calc_stats.dataset_a.SummaryDataSetA(config, input_data=None)[source]ο
Bases:
SummaryStatsBaseSpecialized class for calculating summary statistics for Copernicus CTD data.
This class extends
aiqclib.prepare.step2_calc_stats.summary_base.SummaryStatsBaseand leverages the Polars DataFrame library for efficient data processing. It identifies itself via theexpected_class_nameattribute to match corresponding YAML configuration entries.- Parameters:
config (ConfigBase)
input_data (DataFrame | None)
- expected_class_name: str = 'SummaryDataSetA'ο
aiqclib.prepare.step2_calc_stats.summary_base moduleο
Summary Statistics Module.
This module provides the SummaryStatsBase class, which serves as a base
for calculating, aggregating, and exporting summary statistics from tabular
datasets using the Polars library. It handles global and per-profile calculations
and supports exporting results to TSV format.
- class aiqclib.prepare.step2_calc_stats.summary_base.SummaryStatsBase(config, input_data=None)[source]ο
Bases:
DataSetBaseAbstract base class for calculating summary statistics.
This class provides a framework for generating and writing summary statistics for a dataset. It handles both global (dataset-wide) and per-profile statistics for a specified set of numeric columns. Subclasses must define an
expected_class_nameto be instantiated.- Variables:
default_file_name (str) β The default filename for the output stats file.
output_file_name (str) β The full path for the output summary stats file, derived from the configuration.
input_data (polars.DataFrame or None) β The DataFrame containing the data to be analyzed.
summary_stats (polars.DataFrame or None) β DataFrame holding the combined global and per-profile statistics after calculation.
summary_stats_observation (polars.DataFrame or None) β DataFrame holding aggregated global statistics for key variables.
summary_stats_profile (polars.DataFrame or None) β DataFrame holding aggregated per-profile statistics for key variables.
val_col_names (list[str]) β List of numeric columns for which to compute statistics.
stats_col_names (list[str]) β The schema (column names) for the output statistics DataFrame.
profile_col_names (list[str]) β List of columns used to identify unique profiles for grouping.
- Parameters:
config (ConfigBase)
input_data (DataFrame | None)
- calculate_global_stats(val_col_name)[source]ο
Compute global summary statistics for a specified column.
These statistics are calculated across the entire dataset.
- Parameters:
val_col_name (
str) β Name of the column for which to calculate global statistics.- Returns:
A DataFrame with one row containing the summary statistics, structured to be compatible with per-profile stats.
- Return type:
DataFrame
- calculate_profile_stats(grouped_df, val_col_name)[source]ο
Compute per-profile summary statistics for a column.
- Parameters:
grouped_df (
DataFrame) β A Polars DataFrame already grouped by profile identifier columns (e.g., platform_code, profile_no).val_col_name (
str) β The name of the column for which to calculate per-profile stats.
- Returns:
A DataFrame containing statistics for each profile.
- Return type:
DataFrame
- calculate_stats()[source]ο
Calculate and combine global and per-profile statistics.
This method computes statistics for each column in
val_col_namesat both the global and per-profile level, then concatenates them into a single DataFrame stored insummary_stats.- Returns:
None
- Return type:
None
- create_summary_stats_observation()[source]ο
Create a summarized view of global observation statistics.
This method filters the main statistics table for global (βallβ) data, selects a subset of key metrics, and stores the result in
summary_stats_observation.- Raises:
ValueError β If
summary_statshas not been calculated yet.- Returns:
None
- Return type:
None
- create_summary_stats_profile()[source]ο
Create a summarized view of per-profile statistics.
This method filters the main statistics table for per-profile data, reshapes it to aggregate statistics (min, mean, max, etc.) across all profiles, and stores the result in
summary_stats_profile.- Raises:
ValueError β If
summary_statshas not been calculated yet.- Returns:
None
- Return type:
None
- default_file_name: strο
- static get_stats_expression(val_col_name)[source]ο
Build a list of Polars expressions to compute summary statistics.
- Parameters:
val_col_name (
str) β The name of the column to analyze.- Returns:
A list of Polars expressions for calculating min, max, mean, median, quantiles, and standard deviation.
- Return type:
List[Expr]
- input_data: DataFrame | Noneο
- output_file_name: strο
- summary_stats: DataFrame | Noneο
- summary_stats_observation: DataFrame | Noneο
- summary_stats_profile: DataFrame | Noneο
- write_summary_stats()[source]ο
Write the computed summary statistics to a TSV file.
The output path is determined by
output_file_name.- Raises:
ValueError β If
summary_statshas not been calculated yet.- Returns:
None
- Return type:
None