Training Your First LLM with SFT
This walkthrough takes you through every step of the wizard to train a language model using Supervised Fine-Tuning (SFT). SFT is the most common way to teach a model to follow instructions.Before You Start
Make sure you have:- AITraining installed (
pip install aitraining) - At least 8GB of RAM (16GB recommended)
- A GPU is helpful but not required (Apple Silicon works great!)
Step 0: Launch the Wizard
Step 1: Choose Trainer Type
1 and press Enter to select LLM training.
Step 2: Choose Training Method
1 and press Enter to select SFT.
default and sft are identical - they use the same training code. default is just the fallback if no trainer is specified.What Do These Mean?
| Trainer | When to Use |
|---|---|
| SFT / default | Teaching the model to follow instructions. You have examples of good responses. Start here! |
| DPO | You have pairs of good vs bad responses for the same prompt |
| ORPO | Like DPO but works with less data |
| PPO | Advanced: using a reward model to score responses |
| Reward | Train a reward model for scoring outputs (used with PPO) |
| Distillation | Transfer knowledge from a larger teacher model to a smaller student |
Step 3: Project Name
my-first-chatbot or press Enter to accept the default.
Step 4: Model Selection
This is the most important step. The wizard shows trending models from HuggingFace:Choosing the Right Model Size
I have a MacBook (8-16GB RAM)
I have a MacBook (8-16GB RAM)
Use
/filter then S for small models.Recommended: google/gemma-3-270m or meta-llama/Llama-3.2-1BThese will train in 15-30 minutes on Apple Silicon.I have a gaming PC (RTX 3060/3070, 8-12GB VRAM)
I have a gaming PC (RTX 3060/3070, 8-12GB VRAM)
Use
/filter then S or M.Recommended: google/gemma-2-2b or meta-llama/Llama-3.2-3BEnable quantization later for larger models.I have a workstation (RTX 3090/4090, 24GB+ VRAM)
I have a workstation (RTX 3090/4090, 24GB+ VRAM)
Any model up to 10B works well.Recommended:
meta-llama/Llama-3.2-8B or mistralai/Mistral-7B-v0.3I have a cloud GPU (A100, H100)
I have a cloud GPU (A100, H100)
Go big!Recommended:
meta-llama/Llama-3.1-70B with quantizationBase Model vs Instruction-Tuned
When selecting a model, you’ll see two types:| Model Name | Type | When to Use |
|---|---|---|
google/gemma-2-2b | Base (pretrained) | General purpose, learns your specific style |
google/gemma-2-2b-it | Instruction-tuned (IT) | Already follows instructions, fine-tune further |
meta-llama/Llama-3.2-1B | Base | Clean slate for your use case |
meta-llama/Llama-3.2-1B-Instruct | Instruction-tuned | Already helpful, refine it |
Rule of thumb: Use base models if you want full control. Use instruction-tuned (
-it, -Instruct) if you want a head start.Selecting Your Model
Option A: Type a number to select from the list:Step 5: Dataset Configuration
Understanding Dataset Size
Dataset Selection Options
Use a pre-built dataset (easiest):Dataset Format Analysis
The wizard automatically analyzes your dataset:y to enable automatic conversion. This ensures your data works correctly with the model’s chat template.
Train/Validation Splits
train split.
validation, test), enter it here. Otherwise, press Enter to skip.
Max Samples (Testing)
Step 6: Advanced Configuration (Optional)
When to Configure Advanced Options
| Situation | What to Change |
|---|---|
| Training is too slow | Enable LoRA (peft=True) to reduce memory |
| Out of memory | Reduce batch_size or enable quantization |
| Model isn’t learning | Adjust lr (learning rate) |
| Want to track training | Enable W&B logging |
Step 7: Review and Start
What Happens Next
- The model downloads (first time only)
- The dataset loads and converts
- Training begins with progress updates
- W&B LEET panel shows real-time metrics (if enabled)
- Your trained model saves to the project folder
Testing Your Model
After training completes:http://localhost:7860/inference and load your model from ./my-first-chatbot to test it!
Common Issues
Out of memory error
Out of memory error
- Use a smaller model (filter by size)
- Enable LoRA in advanced options
- Reduce batch size
- Enable quantization (
int4)
Model not learning (loss stays high)
Model not learning (loss stays high)
- Check your dataset format
- Try a higher learning rate
- Ensure your data has the right columns
Training is very slow
Training is very slow
- Enable mixed precision (
bf16) in advanced options - Use a smaller dataset first
- Enable LoRA
Next Steps
Understanding Models
Deep dive into model selection
Dataset Guide
Prepare your own training data
DPO Training
Train with preference data
LoRA Efficiency
Train large models on limited hardware