Step 3: Training & Evaluation
With a properly prepared dataset (from Step 2: Dataset Preparation), you are now ready to train and evaluate a machine learning model. This workflow leverages the training, validation, and test sets created in the previous step to build a model, rigorously assess its performance using cross-validation, and generate final evaluation metrics on a held-out test set.
Like all workflows in aiqclib, this process is controlled by a dedicated YAML configuration file, which, like the preparation config, utilizes the “building blocks” concept for modularity and reusability.
Prerequisites
This tutorial assumes you have successfully completed Step 2: Dataset Preparation. The training process directly uses the output files (the split datasets) generated in that step. Ensure your ~/aiqc_project/data/dataset_0001/training/ directory exists and contains the prepared data.
The Training Workflow
The training workflow follows a similar pattern to the preparation step: you will generate a new configuration template, customize it to define your model and validation strategy, point to your input data, and specify where the trained models should be saved.
Step 3.1: Generate the Configuration Template
First, use aiqclib to generate a boilerplate configuration template specifically for the training workflow.
import aiqclib as aq
import os
config_path = os.path.expanduser("~/aiqc_project/config/training_config.yaml")
aq.write_config_template(
file_name=config_path,
stage="train"
)
Step 3.2: Customize the Configuration File
Now, open the newly created ~/aiqc_project/config/training_config.yaml file in your text editor. Your primary goals are to define:
Input & Output Paths: Where to find the prepared dataset and where to save the trained model.
Model & Validation Strategy: Which machine learning model to train and what cross-validation method to use.
You will need to edit the path_info_sets, step_class_sets, step_param_sets, and training_sets sections.
Before you modify the config, let’s create a directory where your trained models will be saved:
mkdir -p ~/aiqc_project/models
Update your training_config.yaml file: Modify the file to align with the following structure. Remember to replace placeholder paths with your actual project setup.
Note
aiqclib integrates multiple ML algorithms, and it is easy to switch between them. For more details, see the dedicated Algorithm Selection page.
path_info_sets:
- name: data_set_1
common:
base_path: ~/aiqc_project/data # Root directory of the prepared dataset (from preparation step)
input:
step_folder_name: training # Subdirectory containing the split training/validation/test data
model:
base_path: ~/aiqc_project/models # Directory where the final trained models will be saved
# Define your model and validation strategy here.
# For this tutorial, we'll use a KFoldValidation and XGBoost model.
step_class_sets:
- name: training_step_set_1
steps:
input: InputTrainingSetA
validate: KFoldValidation # Specify your cross-validation class
model: XGBoost # Specify your ML model class (e.g., XGBoost, RandomForest)
build: BuildModel
# Define parameters for your chosen model and validation.
# For example, number of folds for CV, or model hyperparameters.
step_param_sets:
- name: training_param_set_1
steps:
input: { }
validate: { k_fold: 5 } # 5-fold cross-validation
model: { calculate_shap: False, # Control SHAP value calculation
model_params: { scale_pos_weight: 200, # Specify pos:neg ratio
n_jobs: -1 } } # Number of threads used by XGBoost
build: { }
training_sets:
- name: training_0001 # A unique name for this training job
dataset_folder_name: dataset_0001 # This MUST match the dataset_folder_name from your preparation config
path_info: data_set_1
target_set: target_set_1 # This needs to match a 'target_set' defined in your prepare_config.yaml
step_class_set: training_step_set_1
step_param_set: training_param_set_1
Note
The training configuration file includes many other options for advanced model selection, hyperparameter tuning, and cross-validation strategies. For a complete reference of all available parameters, please consult the dedicated Configuration of Training & Evaluation page.
Step 3.3: Run the Training Process
Once you have customized your training_config.yaml with the correct paths and model/validation configurations, you can execute the training and evaluation workflow.
Load the configuration file and then call the train_and_evaluate function:
import aiqclib as aq
import os
config_path = os.path.expanduser("~/aiqc_project/config/training_config.yaml")
config = aq.read_config(config_path)
aq.train_and_evaluate(config)
Understanding the Output
After the command finishes, aiqclib will have created new folders within your dataset’s output directory (e.g., ~/aiqc_project/data/dataset_0001/) and within your model’s base path (~/aiqc_project/models/). The primary outputs include:
validate: Contains detailed results from the cross-validation process, allowing you to inspect model performance across different data folds. This includes metrics, predictions, and potentially visualizations.
build: Holds a comprehensive report of the final model’s evaluation performance on the held-out test dataset, along with aggregated metrics.
models: Holds the final, trained model object(s) ready for classification. These are the artifacts you will use in the next step.
Next Steps
You have now successfully trained and evaluated a machine learning model using aiqclib! The final step in the workflow is to use this trained model to classify new, unseen data.
Proceed to the next tutorial: Step 4: Classification.