aiqclib package
aiqclib Interface Module
This module provides a high-level interface to the aiqclib library, exposing core functionalities for configuration management, dataset preparation, model training and evaluation, and dataset classification.
- aiqclib.__version__
The version of the aiqclib library.
- Type:
str
- aiqclib.classify_dataset(config)[source]
Execute a series of steps to classify all observations in the given data set, as defined by the provided configuration object.
- This function performs the following steps in sequence:
Load and read the initial input data.
Calculate and write summary statistics.
Label and write selected profiles.
Locate and write target rows.
Extract and write target features.
Use the model to predict labels in the input data.
Merge the results with the original input data.
- Parameters:
config (
ConfigBase) – A configuration object specifying the classes and parameters for each step in the dataset preparation and classification process.- Returns:
None. The function performs I/O operations and modifies datasets based on the configuration but does not return a value.
- Return type:
None
- aiqclib.create_training_dataset(config)[source]
Execute a series of steps to produce a training dataset.
This function orchestrates the sequential loading and processing of data through multiple preparation steps, as defined by the provided configuration object. It relies on a series of helper functions (e.g.,
load_stepX_dataset) and class methods to perform distinct operations, ultimately generating and writing the final training and validation datasets.The processing involves the following stages: 1. Input Data Loading: Reads and prepares the initial raw data. 2. Summary Statistics Calculation: Computes and stores aggregate statistics. 3. Profile Selection: Identifies and labels specific profiles or data subsets. 4. Target Row Location: Pinpoints specific rows of interest within profiles. 5. Feature Extraction: Derives modeling features from the located rows. 6. Dataset Splitting: Divides features into training and validation sets.
- Parameters:
config (
ConfigBase) – A configuration object specifying the classes and parameters for each step in the dataset preparation process.- Returns:
None. This function performs I/O operations and does not return a value.
- Return type:
None- Example:
from aiqclib.common.base.config_base import ConfigBase cfg = ConfigBase(...) create_training_dataset(cfg)
- aiqclib.format_summary_stats(df, variables=[], summary_stats=['mean', 'median', 'sd', 'pct25', 'pct75'])[source]
Format a summary statistics DataFrame into a pretty-printed string.
This function takes a DataFrame of statistics (as produced by
get_summary_stats()) and converts it into a nested dictionary, which is then formatted into a string for display. The output can be filtered by variable and statistic type.- Parameters:
df (
DataFrame) – The input DataFrame containing summary statistics. It is expected to have a “stats” column for profile-level summaries, or only variable-level statistics for global summaries.variables (
List[str]) – An optional list of variable names to include. If empty, all variables are included.summary_stats (
List[str]) – An optional list of statistic names (e.g., “mean”, “sd”) to include for profile-level summaries. This parameter is ignored for global (non-“profiles”) summaries.
- Returns:
A string containing the pretty-printed, formatted statistics.
- Return type:
str
- aiqclib.get_summary_stats(input_file, summary_type)[source]
Calculate and retrieve summary statistics from a dataset file.
This function loads a dataset, computes global and per-profile summary statistics, and returns the requested type of summary as a Polars DataFrame. It uses a built-in configuration template and dynamically sets the input path based on the provided file.
- Parameters:
input_file (
str) – The path to the input dataset file (e.g., a TSV or Parquet file).summary_type (
str) – The type of summary to return. Supported values are “profiles” (for per-profile stats) and “all” (for global stats).
- Raises:
FileNotFoundError – If the
input_filedoes not exist.ValueError – If the
summary_typeis not a supported value.
- Returns:
A Polars DataFrame containing the requested summary statistics.
- Return type:
DataFrame
- aiqclib.read_config(file_name, set_name=None, auto_select=True)[source]
Read a YAML configuration file as a
ConfigBaseobject, automatically selecting the appropriate subclass based on the content.- This function:
Resolves the file path by calling
aiqclib.common.utils.config.get_config_file().Reads the specified YAML file and identifies the main key (e.g., “data_sets”, “training_sets”, or “classification_sets”) to map to the corresponding configuration class.
Instantiates and returns the matched configuration class with the resolved path.
If
set_nameis provided, it calls theselectmethod on the instantiated configuration object.
- Parameters:
file_name (
str) – The path (including filename) to the YAML file.set_name (
Optional[str]) – The name (key) of the desired configuration set within the YAML’s dictionary. Defaults to None.auto_select (
bool) – If True, the first available data set name will be selected automatically if no specificset_nameis provided. Defaults to True.
- Returns:
An instantiated configuration object (either
DataSetConfig,TrainingConfig, orClassificationConfig).- Return type:
- Raises:
ValueError – If no valid top-level configuration key is found in the YAML file.
- aiqclib.read_shap_scores(file_name, file_type=None, options=None, strip_suffix=True)[source]
Import a SHAP score file produced by
aiqclib.aiqclibwrites per-instance SHAP values with three metadata columns (label,predicted_label,score) followed by one<feature>_shapcolumn per feature. This function reads such a file into a Polars DataFrame and, by default, strips the_shapsuffix so each feature column is named by its feature — convenient for downstream SHAP plots.- Parameters:
file_name (
str) – Path to the SHAP score file.file_type (
Optional[str]) – Explicit file format ("parquet","tsv","tsv.gz","csv","csv.gz"). Inferred from the file extension whenNone.options (
Optional[Dict[str,Any]]) – Extra keyword arguments forwarded to the underlying Polars reader.strip_suffix (
bool) – Whether to strip the_shapsuffix from the SHAP columns. Defaults toTrue.
- Raises:
FileNotFoundError – If
file_namedoes not exist.ValueError – If the file type is unsupported, or if stripping the suffix would produce duplicate column names.
- Returns:
A Polars DataFrame of SHAP scores.
- Return type:
DataFrame
- aiqclib.train_and_evaluate(config)[source]
Perform a training and evaluation process based on the specified configuration.
This function orchestrates the end-to-end workflow, including data loading, model validation, and final model building and testing.
- Steps:
Load and process input training data.
Validate the model using the specified validation technique (e.g., k-fold).
Build and test the final model, saving results and trained model artifacts.
- Parameters:
config (
ConfigBase) – A training configuration object specifying classes and parameters.- Returns:
None. The function performs I/O operations and does not return a value.
- Return type:
None
- aiqclib.write_config_template(file_name, stage, extension='')[source]
Write a YAML configuration template for the specified stage (“prepare”, “train”, or “classify”) to a file.
- This function:
Chooses a template generator based on the combination of
stageandextension.Validates that the directory for
file_nameexists.Writes the generated YAML template text to the specified file.
- Parameters:
file_name (
str) – The path (including filename) where the YAML file will be written.stage (
str) – Determines which template to write; must be one of “prepare”, “train”, or “classify”.extension (
str) – Determines template extensions; must be one of “”, “full”, or “reduced”.
- Raises:
ValueError – If the combined stage and extension is not found in the registry.
IOError – If the directory of the specified file path does not exist.
- Return type:
None
Subpackages
- aiqclib.classify package
- Subpackages
- aiqclib.classify.step1_read_input package
- aiqclib.classify.step2_calc_stats package
- aiqclib.classify.step3_select_profiles package
- aiqclib.classify.step4_select_rows package
- aiqclib.classify.step5_extract_features package
- aiqclib.classify.step6_classify_dataset package
- aiqclib.classify.step7_concat_datasets package
- Subpackages
- aiqclib.common package
- Subpackages
- aiqclib.common.base package
- aiqclib.common.config package
- aiqclib.common.loader package
- Submodules
- aiqclib.common.loader.classify_loader module
- aiqclib.common.loader.classify_registry module
- aiqclib.common.loader.dataset_loader module
- aiqclib.common.loader.dataset_registry module
- aiqclib.common.loader.feature_loader module
- aiqclib.common.loader.feature_registry module
- aiqclib.common.loader.model_loader module
- aiqclib.common.loader.model_registry module
- aiqclib.common.loader.single_model_loader module
- aiqclib.common.loader.single_model_registry module
- aiqclib.common.loader.training_loader module
- aiqclib.common.loader.training_registry module
- aiqclib.common.utils package
- Subpackages
- aiqclib.interface package
- aiqclib.prepare package
- Subpackages
- aiqclib.train package
- Subpackages
- aiqclib.train.models package
- Submodules
- aiqclib.train.models.decision_tree module
- aiqclib.train.models.gaussian_naive_bayes module
- aiqclib.train.models.k_nearest_neighbors module
- aiqclib.train.models.linear_discriminant_analysis module
- aiqclib.train.models.logistic_regression module
- aiqclib.train.models.model_suite module
- aiqclib.train.models.multilayer_perceptron module
- aiqclib.train.models.random_forest module
- aiqclib.train.models.support_vector_machine module
- aiqclib.train.models.xgboost module
- aiqclib.train.step1_read_input package
- aiqclib.train.step2_validate_model package
- aiqclib.train.step3_optimise_model package
- aiqclib.train.step4_build_model package
- aiqclib.train.models package
- Subpackages