LLM Training
Theaitraining llm command trains large language models with support for multiple trainers and techniques.
Quick Start
Available Trainers
| Trainer | Description |
|---|---|
default / sft / generic | Supervised fine-tuning |
dpo | Direct Preference Optimization |
orpo | Odds Ratio Preference Optimization |
ppo | Proximal Policy Optimization |
grpo | Group Relative Policy Optimization (custom environments) |
reward | Reward model training |
distillation | Knowledge distillation |
generic is an alias for default. All three (default, sft, generic) produce the same behavior.Parameter Groups
Parameters are organized into logical groups:Basic Parameters
| Parameter | Description | Default |
|---|---|---|
--model | Base model to fine-tune | google/gemma-3-270m |
--data-path | Path to training data | data |
--project-name | Output directory name | project-name |
--train-split | Training data split | train |
--valid-split | Validation data split | None |
Always specify these parameters: While
--model, --data-path, and --project-name have defaults, you should always explicitly set them for your use case. The --project-name parameter sets the output folder - use a path like --project-name ./models/my-experiment to control where the trained model is saved.Training Configuration
| Parameter | Description | Default |
|---|---|---|
--trainer | Training method | default |
--epochs | Number of training epochs | 1 |
--batch-size | Training batch size | 2 |
--lr | Learning rate | 3e-5 |
--mixed-precision | fp16/bf16/None | None |
--gradient-accumulation | Accumulation steps | 4 |
--warmup-ratio | Warmup ratio | 0.1 |
--optimizer | Optimizer | adamw_torch |
--scheduler | LR scheduler | linear |
--weight-decay | Weight decay | 0.0 |
--max-grad-norm | Max gradient norm | 1.0 |
--seed | Random seed | 42 |
Checkpointing & Evaluation
| Parameter | Description | Default |
|---|---|---|
--eval-strategy | When to evaluate (epoch, steps, no) | epoch |
--save-strategy | When to save (epoch, steps, no) | epoch |
--save-steps | Save every N steps (if save-strategy=steps) | 500 |
--save-total-limit | Max checkpoints to keep | 1 |
--logging-steps | Log every N steps (-1 for auto) | -1 |
--resume-from-checkpoint | Resume from checkpoint path, or auto to detect latest | None |
Performance & Memory
| Parameter | Description | Default |
|---|---|---|
--auto-find-batch-size | Automatically find optimal batch size | False |
--disable-gradient-checkpointing | Disable memory optimization | False |
--unsloth | Use Unsloth for faster training (SFT only, llama/mistral/gemma/qwen2) | False |
--use-sharegpt-mapping | Use Unsloth’s ShareGPT mapping | False |
--use-flash-attention-2 | Use Flash Attention 2 for faster training | False |
--attn-implementation | Attention implementation (eager, sdpa, flash_attention_2) | None |
Unsloth Requirements: Unsloth only works with
sft/default trainers and specific model architectures (llama, mistral, gemma, qwen2). See Unsloth Integration for details.Backend & Distribution
| Parameter | Description | Default |
|---|---|---|
--backend | Where to run (local, spaces) | local |
--distributed-backend | Distribution backend (ddp, deepspeed) | None |
--ddp-timeout | DDP/NCCL timeout in seconds | 7200 |
Multi-GPU Behavior: With multiple GPUs and
--distributed-backend not set, DDP is used automatically. Set --distributed-backend deepspeed for DeepSpeed Zero-3 optimization. Training is launched via Accelerate.PEFT/LoRA Parameters
| Parameter | Description | Default |
|---|---|---|
--peft | Enable LoRA training | False |
--lora-r | LoRA rank | 16 |
--lora-alpha | LoRA alpha | 32 |
--lora-dropout | LoRA dropout | 0.05 |
--target-modules | Modules to target | all-linear |
--quantization | int4/int8 quantization | None |
--merge-adapter | Merge LoRA after training | True |
Data Processing
| Parameter | Description | Default |
|---|---|---|
--text-column | Text column name | text |
--block-size | Max sequence length | -1 (model default) |
--model-max-length | Maximum model input length | Auto-detect from model |
--padding | Padding side (left or right) | right |
--add-eos-token | Append EOS token | True |
--chat-template | Chat template to use | Auto by trainer |
--packing | Enable sequence packing (requires flash attention) | None |
--auto-convert-dataset | Auto-detect and convert dataset format | False |
--max-samples | Limit dataset size for testing | None |
--save-processed-data | Save processed data: auto, local, hub, both, none | auto |
Chat Template Auto-Selection: SFT/DPO/ORPO/Reward trainers default to
tokenizer (model’s built-in template). Use --chat-template none for plain text training.Processed Data Saving: By default (
auto), processed data is saved locally to {project}/data_processed/. If the source dataset was from the Hub, it’s also pushed as a private dataset. Original columns are renamed to _original_* to prevent conflicts.Training Examples
SFT with LoRA
DPO Training
For DPO, you must specify the column names for prompt, chosen, and rejected responses:ORPO Training
ORPO combines SFT and preference optimization:GRPO Training
Train with Group Relative Policy Optimization using your own reward environment:GRPO generates multiple completions per prompt, scores them via your environment (0-1), and optimizes the policy. See GRPO Training for environment interface details.
Knowledge Distillation
Train a smaller model to mimic a larger one:Distillation defaults:
--distill-temperature 3.0, --distill-alpha 0.7, --distill-max-teacher-length 512Logging & Monitoring
Weights & Biases (Default)
W&B logging with LEET visualizer is enabled by default. The LEET visualizer shows real-time training metrics directly in your terminal.TensorBoard
Push to Hugging Face Hub
Upload your trained model:The repository is created as private by default. By default, the repo will be named
{username}/{project-name}.Custom Repository Name or Organization
Use--repo-id to push to a specific repository, useful for:
- Pushing to an organization instead of your personal account
- Using a different repo name than your local
project-name
| Parameter | Description | Default |
|---|---|---|
--push-to-hub | Enable pushing to Hub | False |
--hub-private / --no-hub-private | Create repo as private or public | True (private) |
--username | HF username (for default repo naming) | None |
--token | HF API token | None |
--repo-id | Full repo ID (e.g., org/model-name) | {username}/{project-name} |
Advanced Options
Hyperparameter Sweeps
Enhanced Evaluation
View All Parameters
See all parameters for a specific trainer:Next Steps
YAML Configs
Use configuration files
DPO Training
Deep dive into DPO
LoRA/PEFT
Efficient fine-tuning
Distillation
Knowledge distillation
GRPO Training
RL with custom environments