Manual Model Training provides a hands-on approach to building machine learning models within ML Clever. Unlike AutoML Training , which automates model selection and tuning, manual training gives you direct control over choosing specific algorithms and configuring their parameters.
This method is ideal when you have specific requirements, want to experiment with particular model architectures, or possess prior knowledge guiding your model selection and tuning process.
Manual Model Training operates exclusively on preprocessed datasets. You must first upload and preprocess your raw dataset using ML Clever's Data Preprocessing tools. This ensures data quality and prepares the features for modeling.
This differs from building ML Pipelines (both manual and AutoML), where preprocessing steps are typically included within the pipeline definition itself for end-to-end automation and data integrity. For standalone model training, preprocessing is a separate, mandatory preceding step.
Fig 1: Data Flow for Manual Model Training
Before you can start manual model training, ensure you have:
You must have already created a preprocessed dataset within ML Clever. The preprocessing status for this dataset needs to be complete
. This process also defines the task type (e.g., 'regression' or 'classification'), which determines the available models.
Know whether your preprocessed dataset is intended for a Regression (predicting continuous values) or Classification (predicting categories) task. This is determined during preprocessing and dictates the models you can select.
Follow these steps to configure and initiate manual model training on your preprocessed data:
Fig 2: High-level Steps in the UI
Go to the Data or Dashboard section of ML Clever and locate the specific preprocessed dataset you wish to use for training. Click on it to open its details page (e.g., /preprocessed_datasets/123
).
On the preprocessed dataset page, find and click the Train Models tab to access the model training configuration options.
Note: The Train Models tab requires the dataset's preprocessing status to be complete
.
Under the manual training section, you'll see a list of available machine learning models, filtered by the task type (Regression/Classification) set during preprocessing. Select one or more models using the checkboxes.
See Regression Models or Classification Models for algorithm details.
For each selected model, you can often adjust its hyperparameters (e.g., n_estimators
, max_depth
, learning_rate
, C
, alpha
). Default values are usually provided. Expand the model's settings to fine-tune.
Check the model's documentation or UI tooltips for details on each parameter (type, default, range).
Provide a custom name or tag (e.g., via a model_custom_name
field) for this training run to easily identify the resulting models later. A default name might be generated if left blank.
Click the "Train Manual Models" button (or similar) to submit the configuration and begin the training process on the backend.
You'll likely be redirected to a training progress page (e.g., /training_progress
or a specific task page) to monitor the job status.
The models available (e.g., Linear Regression
, Random Forest Classifier
) depend on the model_type
('regression' or 'classification') set during Data Preprocessing .
Hyperparameters control a model's learning process (e.g., number of trees). Adjusting them can significantly impact performance. Defaults are provided, but experimentation is key for manual tuning.
The platform provides parameter info (type, default, range) based on backend configurations (like get_regression_models
functions).
Clicking "Train" sends a request (e.g., POST to /train_model
) with selected models
, their parameters
(JSON), the dataset_id
, and optional model_custom_name
to the backend for execution.
Know the assumptions, strengths, and weaknesses of the models you select. Choose appropriately for your data and problem.
Begin with default parameters or simpler models. Evaluate, then iteratively adjust parameters or try more complex models based on results.
Use the custom naming option for descriptive labels (e.g., RandomForest_tuned_depth5
, LinReg_baseline
). This helps compare results.
Refer to specific algorithm documentation (in-platform or external) to understand hyperparameter effects.
Common issues and potential solutions:
Problem | Possible Cause & Solution |
---|---|
'Train Models' Tab Loading/Disabled | Cause: Preprocessed dataset status is not yet Solution: Wait for preprocessing to finish. Check dataset status. Review logs if failed. |
No Models in Selection List | Cause: Solution: Verify preprocessing setup (target variable, task type). |
Training Fails Immediately | Cause: Invalid parameters (out of range, wrong type), dataset ID issue, backend error. Solution: Double-check parameters against rules. Verify dataset. Check system status or contact support. |
Training Job Fails During Execution | Cause: Data issues (e.g., outliers), insufficient resources, algorithm errors. Solution: Review training logs for errors. Revisit preprocessing, try different parameters/algorithms. |
After initiating training:
Navigate to the training progress page to track job status (queued, running, completed, failed).
View Training ProgressOnce complete, access model results to view performance metrics (e.g., Accuracy, RMSE, R²) and compare models.
Learn about EvaluationUse your successfully trained models to make predictions on new data via the Predictions features.
Explore Predictions