Profile Summary Statistics
The profile_summary_stats feature is a profile-level feature that represents the summary statistics of specified variables. All observations belonging to the same profile generally have the same profile_summary_stats feature values. The profile_summary_stats feature can contain the following nine statistics:
min: minimum
max: maximum
mean: mean
median: median
pct25: 25th percentile
pct75: 75th percentile
pct2.5: 2.5th percentile
pct97.5: 97.5th percentile
sd: standard deviation
Configuration: Summary Statistics
The profile_summary_stats feature requires the calculation of summary statistics prior to feature extraction. This can be specified in the summary_stats_sets section of a configuration file. The variables used for the feature should be specified in col_names.
summary_stats_sets:
- name: summary_stats_set_1
stats:
- name: profile_summary_stats
col_names: [ temp, psal, pres ]
Configuration: Setup
To include the profile_summary_stats feature in your training and classification datasets, the value profile_summary_stats needs to be specified in the feature_sets section.
feature_sets:
- name: feature_set_1
features:
- profile_summary_stats
Configuration: Parameters
The profile_summary_stats feature requires three mandatory parameters: col_names, summary_stats_names, and stats_set.
The
col_namesparameter specifies the column names in the input dataset that will be used for theprofile_summary_statsfeature.The
summary_stats_namesparameter specifies the names of the summary statistics to be used as features.The
stats_setparameter specifies how the feature values are normalized.aiqclibcurrently supportsrawandmin_maxas normalization methods. Thenamevalue instats_setmust correspond to anamein thefeature_stats_setssection.
feature_param_sets:
- name: feature_set_1_param_set_1
params:
- feature: profile_summary_stats
col_names: [ temp, psal, pres ]
summary_stats_names: [ mean, median, sd, pct25, pct75 ]
stats_set: { type: min_max, name: profile_summary_stats }
Configuration: Normalization
If the normalization method is not set to raw, the summary statistics specified here will be used for normalization.
feature_stats_sets:
- name: feature_set_1_stats_set_1
min_max:
- name: profile_summary_stats
stats: { temp: { mean: { min: 0, max: 12.5 },
median: { min: 0, max: 15 },
sd: { min: 0, max: 6.5 },
pct25: { min: 0, max: 12 },
pct75: { min: 1, max: 19 } },
psal: { mean: { min: 2.9, max: 12 },
median: { min: 2.9, max: 12 },
sd: { min: 0, max: 4 },
pct25: { min: 2.5, max: 8.5 },
pct75: { min: 3, max: 16 } },
pres: { mean: { min: 24, max: 105 },
median: { min: 24, max: 105 },
sd: { min: 13, max: 60 },
pct25: { min: 12, max: 53 },
pct75: { min: 35, max: 156 } } }
Note
aiqclib offers helper functions to calculate summary statistics (like min/max values). Please refer to the Feature Normalization guide for details.