Documentation > Models > Model Training > Manual Training

Manual Training

Manual Model Training provides a hands-on approach to building machine learning models within ML Clever. Unlike AutoML Training , which automates model selection and tuning, manual training gives you direct control over choosing specific algorithms and configuring their parameters.

This method is ideal when you have specific requirements, want to experiment with particular model architectures, or possess prior knowledge guiding your model selection and tuning process.

Important Prerequisite: Preprocessing

Manual Model Training operates exclusively on preprocessed datasets. You must first upload and preprocess your raw dataset using ML Clever's Data Preprocessing tools. This ensures data quality and prepares the features for modeling.

This differs from building ML Pipelines (both manual and AutoML), where preprocessing steps are typically included within the pipeline definition itself for end-to-end automation and data integrity. For standalone model training, preprocessing is a separate, mandatory preceding step.

Diagram showing data flow: Raw Data goes to Preprocessing, resulting in a Preprocessed Dataset which is then used for Manual Model Training

Fig 1: Data Flow for Manual Model Training

Prerequisites

Before you can start manual model training, ensure you have:

A Successfully Preprocessed Dataset

You must have already created a preprocessed dataset within ML Clever. The preprocessing status for this dataset needs to be complete. This process also defines the task type (e.g., 'regression' or 'classification'), which determines the available models.

Learn about Data Preprocessing

Understanding of Model Type

Know whether your preprocessed dataset is intended for a Regression (predicting continuous values) or Classification (predicting categories) task. This is determined during preprocessing and dictates the models you can select.

Step-by-Step: Training a Model Manually

Follow these steps to configure and initiate manual model training on your preprocessed data:

UI illustrating the main steps: Navigate to Dataset -> Select 'Train Models' Tab -> Choose Models -> Configure Parameters -> Initiate Training

Fig 2: High-level Steps in the UI

1

Navigate to Dataset

Go to the Data or Dashboard section of ML Clever and locate the specific preprocessed dataset you wish to use for training. Click on it to open its details page (e.g., /preprocessed_datasets/123).

2

Select 'Train Models' Tab

On the preprocessed dataset page, find and click the Train Models tab to access the model training configuration options.

Note: The Train Models tab requires the dataset's preprocessing status to be complete.

Screenshot highlighting the 'Train Models' tab on the dataset details page.
3

Choose Models

Under the manual training section, you'll see a list of available machine learning models, filtered by the task type (Regression/Classification) set during preprocessing. Select one or more models using the checkboxes.

See Regression Models or Classification Models for algorithm details.

UI showing checkboxes for selecting models like Linear Regression, Random Forest, etc.
4

Configure Parameters (Optional)

For each selected model, you can often adjust its hyperparameters (e.g., n_estimators, max_depth, learning_rate, C, alpha). Default values are usually provided. Expand the model's settings to fine-tune.

Check the model's documentation or UI tooltips for details on each parameter (type, default, range).

UI showing adjustable parameters (e.g., sliders, input fields) for a selected model.
5

Set Custom Model Name (Optional)

Provide a custom name or tag (e.g., via a model_custom_name field) for this training run to easily identify the resulting models later. A default name might be generated if left blank.

6

Start Training

Click the "Train Manual Models" button (or similar) to submit the configuration and begin the training process on the backend.

You'll likely be redirected to a training progress page (e.g., /training_progress or a specific task page) to monitor the job status.

Key Concepts

Task-Based Model Availability

The models available (e.g., Linear Regression, Random Forest Classifier) depend on the model_type ('regression' or 'classification') set during Data Preprocessing .

Hyperparameter Configuration

Hyperparameters control a model's learning process (e.g., number of trees). Adjusting them can significantly impact performance. Defaults are provided, but experimentation is key for manual tuning.

The platform provides parameter info (type, default, range) based on backend configurations (like get_regression_models functions).

Training Job Submission

Clicking "Train" sends a request (e.g., POST to /train_model) with selected models, their parameters (JSON), the dataset_id, and optional model_custom_name to the backend for execution.

Best Practices

Understand Your Models

Know the assumptions, strengths, and weaknesses of the models you select. Choose appropriately for your data and problem.

Start Simple, Iterate

Begin with default parameters or simpler models. Evaluate, then iteratively adjust parameters or try more complex models based on results.

Use Meaningful Names

Use the custom naming option for descriptive labels (e.g., RandomForest_tuned_depth5, LinReg_baseline). This helps compare results.

Consult Model Documentation

Refer to specific algorithm documentation (in-platform or external) to understand hyperparameter effects.

Troubleshooting

Common issues and potential solutions:

ProblemPossible Cause & Solution
'Train Models' Tab Loading/Disabled

Cause: Preprocessed dataset status is not yet complete.

Solution: Wait for preprocessing to finish. Check dataset status. Review logs if failed.

No Models in Selection List

Cause: model_type (regression/classification) may be incorrect or missing from preprocessing.

Solution: Verify preprocessing setup (target variable, task type).

Training Fails Immediately

Cause: Invalid parameters (out of range, wrong type), dataset ID issue, backend error.

Solution: Double-check parameters against rules. Verify dataset. Check system status or contact support.

Training Job Fails During Execution

Cause: Data issues (e.g., outliers), insufficient resources, algorithm errors.

Solution: Review training logs for errors. Revisit preprocessing, try different parameters/algorithms.

Next Steps

After initiating training:

Monitor Training

Navigate to the training progress page to track job status (queued, running, completed, failed).

View Training Progress

Evaluate Models

Once complete, access model results to view performance metrics (e.g., Accuracy, RMSE, R²) and compare models.

Learn about Evaluation

Make Predictions

Use your successfully trained models to make predictions on new data via the Predictions features.

Explore Predictions

Was this page helpful?

Need help?Contact Support
Questions?Contact Sales

Last updated: 5/2/2025

ML Clever Docs