Changelog
Track all notable changes, bug fixes, and improvements to AITraining.
2026-03-14 (v0.0.53)
Feature: VLM Support for ORPO Trainer
ORPO training now supports vision-language models (e.g. Qwen 3.5-9B) with image+text preference data. A new VLMORPOTrainer subclass handles image processing via DataCollatorForVisionPreference, and an image_column parameter specifies which dataset column contains the images.
New parameter:
| Parameter | CLI Flag | Default | Description |
|---|
image_column | --image-column | None | Image column for VLM preference training (ORPO/DPO) |
Usage:
params = LLMTrainingParams(
model="Qwen/Qwen3.5-VL-9B",
trainer="orpo",
image_column="images",
text_column="chosen",
rejected_text_column="rejected",
prompt_text_column="prompt",
)
When image_column is set, the trainer automatically loads AutoProcessor, skips chat template pre-processing (handled by the data collator), and renames the image column to images for TRL compatibility.
2026-03-13 (v0.0.52)
Feature: Hub Repo Visibility Control + Fix: fp16 PEFT on T4 GPUs
New parameter: hub_private — Controls whether HF Hub repos are created as private or public. Previously all repos were hardcoded to private.
| Parameter | CLI Flag | Default | Description |
|---|
hub_private | --hub-private / --no-hub-private | True | Whether HF Hub repos should be private |
fp16 PEFT fix: TRL’s SFTTrainer casts trainable adapter parameters to bf16 for quantized PEFT models. On T4-class GPUs (which lack bf16 support), this crashes GradScaler when fp16 mixed precision is requested. Trainable params are now re-cast to float16 before training starts when using PEFT + quantization + fp16.
2026-03-08 (v0.0.51)
Issue: When using ORPO or DPO with pre-formatted (already-templated) data, the input_ids pre-tokenization step produced empty arrays because ORPO/DPO datasets have chosen/rejected columns but no text column. This caused a crash in transformers.floating_point_ops() with AttributeError: 'list' object has no attribute 'numel'.
Fix: Pre-tokenization of input_ids is now skipped for ORPO and DPO trainers. These trainers use chosen/rejected columns directly and don’t need the input_ids signal that SFT trainers use.
2026-03-07 (v0.0.50)
Qwen3.5 tool_calls fix: Qwen3.5’s chat template uses arguments|items Jinja filter which requires arguments to be a dict. OpenAI-format training data stores arguments as a JSON string. safe_apply_chat_template now auto-parses JSON string arguments to dicts when the tokenizer supports tool_calls natively.
Token leak defense:
AutoTrainParams.__repr__() now masks token and wandb_token fields to prevent leaks in logs/tracebacks
- HF tokens are scrubbed from log files before
upload_folder to prevent HF Hub secret scanning rejection
Dependency upgrades:
| Package | Old | New |
|---|
trl | >=0.28.0 | >=0.29.0 |
transformers | ==4.57.3 | >=5.3.0 |
accelerate | ==1.11.0 | >=1.13.0 |
peft | ==0.14.0 | >=0.18.1 |
huggingface_hub | >=0.34.0 | >=1.6.0 |
sentence-transformers | ==3.3.1 | >=5.2.3 |
Install:
pip install aitraining>=0.0.50
2026-03-02 (v0.0.49)
Fix: Qwen3.5 Support and 409 Repo Conflict Handling
Qwen3.5 tool_calls detection: _check_tool_calls_support now tries both string and dict formats for the arguments probe. Qwen3.5’s template uses arguments|items which only works with dicts, so the string-only probe returned a false negative.
409 repo conflict handling: UploadLogs callback now detects 409 Conflict from HF Hub when the repo already exists and creates a datetime-versioned repo (e.g., model-20260302-1820) instead of failing. The versioned repo_id is propagated to the config so the final push_to_hub uses the same repo.
Chat templates: Synced 42 templates from unsloth 2026.2.1 + added Qwen3.5 template from tokenizer (43 total). Added sync_chat_templates.py script for future updates.
Other: Added exist_ok=True to all HF Hub create_repo calls.
2026-03-01 (v0.0.47)
Fix: ORPOConfig Import for TRL 0.29
Issue: TRL 0.29 removed ORPOConfig from the top-level trl package, moving it to trl.experimental.orpo.
Fix: ORPO trainer now uses a try/except fallback:
try:
from trl import ORPOConfig, ORPOTrainer
except ImportError:
from trl.experimental.orpo import ORPOConfig, ORPOTrainer
This supports both TRL 0.28 and 0.29+.
2026-02-24 (v0.0.46)
Breaking Change: Remove max_prompt_length from ORPO/DPO Configs
TRL 0.28.0 moved ORPOConfig to trl.experimental and removed max_prompt_length. DPO deprecated it too. Prompt length is now inferred from max_length - max_completion_length.
Action required: If you were passing --max-prompt-length to ORPO or DPO training, remove it. The parameter is no longer accepted. Set --block-size (max_length) and --max-completion-length instead.
2026-02-24 (v0.0.45)
Issue: ORPO and DPO prompt extraction always derived the prompt from chosen[:-1] (all messages except the last). This breaks multi-turn preference data where the completion spans multiple turns.
Fix: When an explicit prompt column is present and contains a messages list, it is now used directly instead of deriving from chosen[:-1]. Single-turn data without a prompt column continues to work as before.
Example multi-turn data:
{
"prompt": [
{"role": "user", "content": "Book me a hotel"},
{"role": "assistant", "content": "Sure, let me search."}
],
"chosen": [
{"role": "user", "content": "Book me a hotel"},
{"role": "assistant", "content": "Sure, let me search."},
{"role": "user", "content": "In Paris please"},
{"role": "assistant", "content": "Done, booked Hotel Lumiere."}
],
"rejected": [
{"role": "user", "content": "Book me a hotel"},
{"role": "assistant", "content": "Sure, let me search."},
{"role": "user", "content": "In Paris please"},
{"role": "assistant", "content": "I cannot do that."}
]
}
2026-02-19 (v0.0.44)
Feature: GRPO Loss Type and Truncation Masking
GRPO training now supports multiple loss types beyond the default grpo. This enables recent RL loss variants from the literature.
New parameters:
| Parameter | CLI Flag | Default | Description |
|---|
rl_loss_type | --rl-loss-type | grpo | Loss type: grpo, dr_grpo, dapo, bnpo, cispo, sapo |
rl_mask_truncated_completions | --rl-mask-truncated-completions | False | Mask truncated completions from loss (recommended for stability) |
Usage:
aitraining llm --train --trainer grpo \
--model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
--rl-env-module my_envs.hotel_env \
--rl-env-class HotelEnv \
--rl-loss-type dr_grpo \
--rl-mask-truncated-completions
2026-02-19 (v0.0.43)
Fix: DDP Timeout via NCCL Environment Variables
Issue: v0.0.42 attempted to pass --timeout to accelerate launch, but this flag does not exist in Accelerate.
Fix: Removed the non-existent --timeout flag. Instead, the ddp_timeout value is now applied via:
NCCL_TIMEOUT environment variable — set before subprocess launch, read by PyTorch at process group initialization
- Direct
ProcessGroupNCCL.options._timeout patch after trainer init (GRPO only) — overrides the per-operation timeout for long-running reward scoring
2026-02-19 (v0.0.42)
Fix: DDP Timeout Not Reaching dist.init_process_group
Issue: The ddp_timeout parameter set TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC but this only controls the heartbeat watchdog. The actual dist.init_process_group timeout (used during collective operations) was not being set.
Fix: Pass --timeout to accelerate launch for multi-GPU DDP and DeepSpeed, so Accelerate sets the correct timedelta for process group initialization.
This fix was superseded by v0.0.43 — the --timeout flag does not actually exist in Accelerate. See v0.0.43 for the correct approach.
2026-02-18 (v0.0.41)
Fix: vllm_server_url Key Name for TRL 0.28.0
Issue: The vllm_server_url parameter was being passed to GRPOConfig with the key name vllm_server_url, but TRL 0.28.0 renamed it to vllm_server_base_url.
Fix: Map config.vllm_server_url to training_args["vllm_server_base_url"] when constructing GRPOConfig.
Note: The CLI flag remains --vllm-server-url — only the internal mapping to TRL was fixed.
2026-02-18 (v0.0.40)
Feature: Resume Training from Checkpoint
All trainers now support resuming training from a checkpoint. This is useful when training is interrupted or when you want to continue training from a specific point.
New parameter:
| Parameter | CLI Flag | Default | Description |
|---|
resume_from_checkpoint | --resume-from-checkpoint | None | Path to checkpoint directory, or auto to detect the latest |
Usage:
# Resume from a specific checkpoint
aitraining llm --train \
--model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
--trainer sft \
--resume-from-checkpoint ./my-model/checkpoint-500
# Auto-detect latest checkpoint
aitraining llm --train \
--model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
--trainer sft \
--resume-from-checkpoint auto
When set to auto, true, or latest, the system scans the output directory for checkpoint-* folders, sorts them numerically, and resumes from the most recent one. If no checkpoints are found, training starts fresh with a warning.
Available for all trainers: SFT, DPO, ORPO, PPO, GRPO, Reward, Distillation, and Default.
2026-02-16 (v0.0.39)
Feature: DDP Timeout Configuration
Long-running operations (e.g., GRPO reward scoring with multi-turn episodes) could cause NCCL timeouts in multi-GPU setups. A new ddp_timeout parameter controls both the DDP timeout in training args and the TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC environment variable.
New parameter:
| Parameter | CLI Flag | Default | Description |
|---|
ddp_timeout | --ddp-timeout | 7200 | DDP/NCCL timeout in seconds |
Available for all trainers.
Feature: vLLM Server Mode for GRPO
In addition to colocate mode (vLLM shares GPU with training), GRPO now supports server mode — vLLM runs as a separate server on dedicated GPUs, and training processes are automatically reduced to account for the reserved GPUs.
New parameters:
| Parameter | CLI Flag | Default | Description |
|---|
vllm_server_url | --vllm-server-url | None | URL of external vLLM server (e.g., http://localhost:8000/v1) |
vllm_tensor_parallel_size | --vllm-tensor-parallel-size | 1 | Number of GPUs for vLLM tensor parallelism |
vllm_server_gpus | --vllm-server-gpus | 1 | GPUs dedicated to vLLM server (subtracted from training processes) |
Usage:
# 8 GPUs: 6 for training, 2 for vLLM server
aitraining llm --train --trainer grpo \
--model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
--rl-env-module my_envs.hotel_env \
--rl-env-class HotelEnv \
--use-vllm \
--vllm-mode server \
--vllm-server-gpus 2 \
--vllm-tensor-parallel-size 2
2026-02-15 (v0.0.38)
Fix: device_map for Multi-GPU DDP Training
Issue: When using DDP (Distributed Data Parallel) with multiple GPUs, device_map="auto" caused conflicts — the model was spread across GPUs by auto, but DDP expects each process to own a single GPU.
Fix: Now detects multi-GPU DDP via WORLD_SIZE environment variable. When WORLD_SIZE > 1, sets device_map={"": local_rank} to place the full model on the correct GPU for each process. Single-GPU still uses device_map="auto".
Affected: Both PEFT and non-PEFT code paths in get_model().
2026-02-15 (v0.0.37)
Feature: vLLM Support for GRPO Training
GRPO training can now use vLLM for faster generation of completions. vLLM provides optimized inference with PagedAttention, significantly speeding up the generation phase of GRPO training.
New parameters:
| Parameter | CLI Flag | Default | Description |
|---|
use_vllm | --use-vllm | False | Enable vLLM for generation |
vllm_mode | --vllm-mode | colocate | Mode: colocate (same GPU) or server (separate) |
vllm_gpu_memory_utilization | --vllm-gpu-memory-utilization | 0.3 | GPU memory fraction for vLLM (colocate mode) |
Usage:
aitraining llm --train --trainer grpo \
--model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
--rl-env-module my_envs.hotel_env \
--rl-env-class HotelEnv \
--use-vllm \
--vllm-gpu-memory-utilization 0.3
Install: pip install aitraining[vllm] (requires vllm>=0.14.0)
Dependency Updates
pydantic: ==2.10.4 → >=2.12.5
fastapi: ==0.115.6 → >=0.129.0
2026-02-14 (v0.0.36)
Fix: torch_dtype Not Set for PEFT Models on CUDA
Issue: When using --peft with --mixed-precision bf16/fp16 on CUDA, the PEFT code path in get_model() didn’t set torch_dtype, causing the model to load in float32 (2x VRAM).
Fix: Added torch_dtype to model_kwargs in the PEFT branch, matching the existing non-PEFT behavior.
Fix: data_path No Longer Required for GRPO
Issue: The CLI required --data-path for all trainers, but GRPO builds its dataset from the environment’s build_dataset() method — no external data file is needed.
Fix: Skip data_path validation when --trainer grpo is used.
2026-02-14 (v0.0.35)
Feature: GRPO Trainer — Group Relative Policy Optimization with Custom Environments
Train language models using GRPO with your own reward environments. Instead of a pre-trained reward model (like PPO), you provide a Python module with an environment class that runs multi-turn episodes and returns scores 0-1. GRPO generates multiple completions per prompt, scores them via your environment, and optimizes the policy relative to the group.
New trainer: --trainer grpo
New parameters:
--rl-env-module — Python module path for the environment (e.g., my_envs.hotel_env)
--rl-env-class — Class name in the environment module (e.g., HotelEnv)
--rl-num-generations — Number of completions per prompt (default: 4)
Shared RL parameters (--rl-kl-coef, --rl-clip-range, --rl-env-config, --rl-max-new-tokens, --rl-top-k, --rl-top-p, --rl-temperature) now work with both PPO and GRPO trainers.
Environment interface (user implements):
class MyEnv:
def build_dataset(self, tokenizer) -> Dataset:
"""Return HF Dataset with 'prompt' column."""
def score_episode(self, model, tokenizer, completion, case_idx) -> float:
"""Run multi-turn episode, return 0.0-1.0 score."""
def get_tools(self) -> list[dict]:
"""Return tool schemas for generation (optional)."""
Usage:
aitraining llm --train --trainer grpo \
--model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
--rl-env-module my_envs.hotel_env \
--rl-env-class HotelEnv \
--rl-num-generations 4 \
--rl-max-new-tokens 256
Other changes:
- TRL dependency updated from
>=0.26.0 to >=0.28.0 (required for GRPOTrainer)
- Validation: GRPO trainer requires both
--rl-env-module and --rl-env-class
See GRPO Training for full documentation.
2026-01-25 (v0.0.34)
Feature: Reasoning Content Support (DeepSeek/Jan Thinking)
Training data with reasoning_content fields (used by DeepSeek, Jan, and other reasoning models) is now fully supported.
What it does:
- Adds
reasoning_content field to the Message dataclass
- Passes reasoning content through to
apply_chat_template, allowing templates to render <think> tags
- Detects templates that intentionally filter out reasoning content (DeepSeek, Jan) and bypasses the filter using placeholders, so thinking traces are preserved in training data
Why it matters: Models like DeepSeek-R1 produce chain-of-thought inside <think> tags. Without this, those thinking traces would be silently dropped during template application, losing valuable reasoning data.
Supported patterns: last_query_index, loop.index0 >, split('</think>')
Commit: 58b69bc
2026-01-25 (v0.0.33)
Bug Fix
- Fix reasoning_content serialization in Conversation from_dict/to_dict
2026-01-25 (v0.0.30)
Users with externally pre-formatted data can now benefit from response-only training without needing to set chat_template.
New behavior:
apply_chat_template=false now properly skips template application
- Pre-formatted data (auto-detected via template tokens like
<start_of_turn>) gets completion_mask automatically
- Enables response-only loss even when using externally processed datasets
Use case: You have data already formatted with chat templates from another pipeline, but want AITraining’s label masking for SFT.
Commit: 8bb4b06
2026-01-20 (v0.0.29)
Bug Fixes
- Fix response template newline pattern detection
- Fix double completion_mask processing
- Fix text column selection after preprocessing
2026-01-12 (v0.0.26)
Bug Fixes
- Fix tool_calls content duplication in training data
- Fix tokenizer settings and turn marker validation
- Fix pre-tokenization for TRL 0.26 compatibility
- Fix completion_mask generation during preprocessing
2026-01-11 (v0.0.25)
Feature: Response-Only Training (SFT Label Masking)
Major change for proper SFT behavior. Models now see the full conversation context in attention but only compute loss on assistant responses. This is the expected behavior for supervised fine-tuning and post-training.
Why this matters:
- SFT/Post-training: Train the model to generate good responses given context. The model should attend to user messages and system prompts but only be trained to predict assistant outputs.
- Pre-training: Different goal - maximize generalization and memorization across all tokens.
How it works with TRL 0.26:
- Full attention mask: Model sees entire conversation (system + user + assistant)
- Label masking: Loss computed only on assistant/completion tokens
- Result: Model learns response patterns without memorizing prompts
New parameter: --response-only-loss (default: true)
Supported models: Gemma, Qwen, Llama, Phi, Mistral (auto-detects response templates)
Commit: 87a87c1
2026-01-10 (v0.0.24)
Change: Tool calls are now serialized in full OpenAI format instead of the simplified format. This matches the format used in system prompt instructions for better model learning.
Before (v0.0.23):
{"tool": "get_weather", "arguments": {"location": "Paris"}}
After (v0.0.24):
{"content": "Let me check the weather.", "tool_calls": [{"id": "call_001", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\": \"Paris\"}"}}]}
Commit: 3f6bc15
2026-01-10 (v0.0.23)
Change: Removed the [Tool Call] prefix from serialized tool calls. Tool calls are now output as plain JSON for cleaner training data.
Before:
[Tool Call] {"tool": "get_weather", "arguments": {"location": "Paris"}}
After:
{"tool": "get_weather", "arguments": {"location": "Paris"}}
Also removed: The format instruction footer from tool definitions injection (models learn the format from examples).
Commit: cde1948
2026-01-10 (v0.0.22)
New: Models that don’t natively support the tools parameter (like Gemma) can now train on function calling data with tool definitions.
How it works:
- Detects if tokenizer supports
tools parameter natively
- If not supported, injects tool definitions as formatted text into the system prompt (or first user message)
- Models learn to understand and respond to tool definitions
Functions added:
check_tools_support() - Detects native tools parameter support
format_tools_as_text() - Formats tool definitions as readable text
inject_tools_into_messages() - Injects tools into system/user message
Example injection:
You have access to the following tools:
1. get_weather
Description: Get current weather for a location
Parameters:
- location (string, required): City name
- units (string, optional): celsius or fahrenheit
Commit: a4af6fe
2026-01-08 (v0.0.21)
Fix: SFTTrainer Using Wrong Column After Chat Template Processing
Issue: When chat template processing converted messages to text, SFTTrainer was still trying to use the original messages column. This caused tokenization errors because it tried to tokenize a list instead of the processed string.
Fix: Now correctly sets dataset_text_field='text' when chat template is applied.
Commit: c2bdf05
2026-01-08 (v0.0.20)
Fix: Double BOS Token Issue
Issue: When training with pre-processed datasets or using chat templates, models would get duplicate BOS tokens (e.g., <bos><bos> or <|begin_of_text|><|begin_of_text|>). This happened because the chat template added BOS, and then the tokenizer added another one during training.
Fix: BOS tokens are now stripped from rendered text before saving to processed datasets. This allows the tokenizer to add BOS correctly during training, preventing duplicates. Works universally for all tokenizers:
- Gemma:
<bos>
- Llama 3:
<|begin_of_text|>
- Llama 2/Mistral:
<s>
Commit: b124223
Issue: When loading datasets that were previously processed with chat templates, Llama 3 (which lacks the add_bos_token attribute) would always get double BOS tokens.
Fix: BOS tokens are now stripped directly from text data when loading already-formatted datasets. This works for any tokenizer with a bos_token defined.
Commit: 24a3af9
Feature: Preserve Original Messages Column
Issue: Processing overwrote the original messages column, making it impossible to inspect the source data. Other tools could also auto-detect and incorrectly use the unprocessed column.
Fix: Processing now:
- Creates a
text column with formatted output
- Renames original columns to
_original_* prefix (e.g., _original_messages)
- Prevents auto-detection conflicts with other frameworks
Commit: f73a7e3, bb146bb
Feature: Processed Dataset Saving and Model Card Improvements
New: Processed training data is now automatically saved:
- Locally to
{project}/data_processed/
- Optionally to Hub as private dataset
- New CLI param:
--save-processed-data (auto|local|hub|both|none)
Model Card Improvements:
- Training details table (base model, trainer, dataset, epochs, LR, etc.)
- Extra params section (LoRA rank/alpha, quantization, chat template)
- Updated links to AITraining GitHub repo
Commit: 299b873
Issue: Tool calls were serialized using the raw OpenAI format with nested "function" key, making training data verbose and format-specific. Additionally, the older OpenAI "function" role (used for tool responses before the "tool" role existed) was not handled.
Fix:
- Tool calls are now serialized to a clean format:
- Before:
[Tool Calls] [{"id": "call_123", "type": "function", "function": {"name": "search", "arguments": "..."}}]
- After:
[Tool Call] {"tool": "search", "arguments": {"query": "weather"}}
- The
"function" role (older OpenAI format) is now handled the same as "tool" role - converted to "user" with [Tool Result] prefix for models that don’t support it natively.
Example:
# Input with OpenAI format
{
"role": "assistant",
"tool_calls": [{"id": "call_123", "type": "function", "function": {"name": "search", "arguments": "{\"q\": \"test\"}"}}]
}
# Output (clean format)
{
"role": "assistant",
"content": "[Tool Call] {\"tool\": \"search\", \"arguments\": {\"q\": \"test\"}}"
}
Commit: 5bbbdd8
Issue: The v0.0.18 fix for tool_calls was incomplete - several code paths still dropped tool_calls:
render_conversation() in message renderer blindly serialized without checking tokenizer support
- Fallback functions in
project.py and preprocessor/llm.py dropped tool_calls
format_chat_prompt() and build_supervised_example() in rendering utils dropped tool_calls
Fix:
- Added
_check_tool_calls_support() to TokenizerNativeRenderer to detect native support
render_conversation() now:
- Passes tool_calls through natively for models that support it (Qwen, Llama 3.1+)
- Only serializes to JSON for models that don’t (Gemma)
- All code paths now preserve tool_calls when creating Message objects
- Fallback functions preserve tool_calls in content
Pattern: All main code paths now check tokenizer support before converting. This matches the existing pattern for tool role detection.
2026-01-07
Issue: When training data contains tool_calls field (from function calling conversations), the field was silently dropped. Models never learned to make tool calls.
Root Cause: The Message class only extracted role and content from messages:
Message(role=m["role"], content=m["content"]) # tool_calls ignored!
Fix: Added smart tool_calls handling that:
- Detects if the tokenizer supports
tool_calls natively (Qwen, Llama 3.1+)
- Preserves native format for models that support it
- Serializes to JSON in content for models that don’t (Gemma, older models)
Example for models without native support:
# Input with tool_calls
{
"role": "assistant",
"content": "Let me check.",
"tool_calls": [{"function": {"name": "weather", "arguments": "{\"city\": \"Paris\"}"}}]
}
# Output (auto-serialized for Gemma)
{
"role": "assistant",
"content": "Let me check.\n[Tool Call] {\"tool\": \"weather\", \"arguments\": {\"city\": \"Paris\"}}"
}
Note: At inference, parse the [Tool Call] JSON, execute the tool, and don’t show the JSON to the user.
Fix: Message Alternation Errors with Strict Models
Issue: Training data with consecutive same-role messages or system → assistant patterns (without a user message in between) failed on strict-alternation models like Gemma:
Conversation roles must alternate user/assistant/user/assistant/...
Root Cause: Some datasets have:
- Consecutive assistant messages (e.g., multi-part responses)
- System message followed directly by assistant (no user prompt)
- Multiple user messages in a row
Fix: Added automatic message alternation fix that:
- Merges consecutive same-role messages (preserving content)
- Inserts placeholder
[Continued] user messages when assistant follows system/assistant
- Only applies when the tokenizer rejects the format (dynamic detection)
Example transformation:
# Input with consecutive assistants
[
{"role": "system", "content": "You are helpful"},
{"role": "assistant", "content": "Hello!"},
{"role": "assistant", "content": "How can I help?"}
]
# Output (auto-fixed)
[
{"role": "system", "content": "You are helpful"},
{"role": "user", "content": "[Continued]"},
{"role": "assistant", "content": "Hello!\nHow can I help?"}
]
Note: This fix combines with the tool role fix below - both are applied automatically as needed.
Issue: When training data contains tool role messages (from function calling), models that require strict user/assistant alternation (like Gemma) would fail with:
Conversation roles must alternate user/assistant/user/assistant/...
Root Cause: The TokenizerNativeRenderer passed messages directly to tokenizer.apply_chat_template() without preprocessing. Tokenizers like Gemma don’t support the tool role.
Fix: Added smart tool role handling that:
- Detects if the tokenizer supports
tool role by testing with a sample message (result is cached)
- Only converts
tool → user with [Tool Result] prefix when the tokenizer doesn’t support it
- Preserves native tool handling for models that support it (Llama 3.1+, Mistral, etc.)
- Merges consecutive same-role messages to maintain strict alternation when needed
Example transformation (only for non-supporting models like Gemma):
# Input with tool role
[
{"role": "user", "content": "What's 2+2?"},
{"role": "assistant", "content": "Let me calculate"},
{"role": "tool", "content": "4"},
{"role": "assistant", "content": "The answer is 4"}
]
# Output for Gemma (auto-converted)
[
{"role": "user", "content": "What's 2+2?"},
{"role": "assistant", "content": "Let me calculate"},
{"role": "user", "content": "[Tool Result] 4"},
{"role": "assistant", "content": "The answer is 4"}
]
# Output for Llama 3.1+ (preserved as-is)
# Same as input - native tool support used
Affected models: Gemma 2, Gemma 3, Gemma 3n, and any model with strict alternation requirements. Models with native tool support are unaffected.
Issue: When using --chat-template tokenizer (the default for SFT training), the system incorrectly used ChatML format instead of the model’s native chat template. This caused ChatML tokens (<|im_start|>, <|im_end|>) to be added as literal text in training data.
Impact: Models trained with this bug learned to output ChatML tokens as regular text. For example, a Gemma model would output:
Response text<|im_end|><end_of_turn>
Instead of just:
Response text<end_of_turn>
Root Cause: In clm/utils.py, the chat format mapping had:
"tokenizer": "chatml", # BUG - should be "native"
This caused ChatMLRenderer to be used (which adds ChatML tokens via string concatenation) instead of TokenizerNativeRenderer (which correctly uses tokenizer.apply_chat_template()).
Fix: Changed the mapping to:
"tokenizer": "native", # Use tokenizer's native apply_chat_template
Affected models: Any non-ChatML model trained with --chat-template tokenizer or the SFT trainer default.
Retraining required: Models trained before this fix that exhibit ChatML token output need to be retrained.
Fix: HuggingFace Push Using Full Path as Repo Name (All Trainers)
Issue: When project_name was a full path like /workspace/trainings/my-model, pushing to HuggingFace Hub created an invalid repo ID like username//workspace/trainings/my-model.
Fix: Now uses basename(project_name) to extract just the folder name, creating valid repo IDs like username/my-model.
Affected trainers (all fixed):
- CLM (LLM fine-tuning)
- VLM (Vision-Language Models)
- Text Classification
- Text Regression
- Token Classification
- Sentence Transformers
- Image Classification
- Image Regression
- Object Detection
- Seq2Seq
- Extractive QA
- Tabular
Feature: —repo-id Parameter for Custom HuggingFace Destination
Added --repo-id CLI parameter to specify a custom HuggingFace repository destination. Useful for:
- Pushing to an organization instead of your personal account
- Using a different repo name than your local
project_name
Usage:
# Push to organization
aitraining llm --train \
--push-to-hub \
--repo-id my-organization/my-model \
--token $HF_TOKEN
# Push with custom name
aitraining llm --train \
--push-to-hub \
--repo-id username/production-model \
--token $HF_TOKEN
When --repo-id is set, --username is not required since the repo ID already specifies the destination.
Feature: Post-Trial Actions for Hyperparameter Sweeps
Added ability to execute custom actions after each sweep trial completes.
CLI Usage:
aitraining llm --train \
--use-sweep \
--post-trial-script 'if [ "$TRIAL_IS_BEST" = "true" ]; then git add . && git commit -m "Best model"; fi'
Environment Variables Available:
TRIAL_NUMBER - Trial index (0-based)
TRIAL_METRIC_VALUE - Metric value for this trial
TRIAL_IS_BEST - Whether this is the best trial so far (true/false)
TRIAL_OUTPUT_DIR - Output directory for the trial
TRIAL_PARAMS - Trial parameters as string
Python API:
from autotrain.utils import HyperparameterSweep, SweepConfig, TrialInfo
def on_trial_complete(trial_info: TrialInfo):
if trial_info.is_best:
save_checkpoint(trial_info.output_dir)
config = SweepConfig(
parameters={"lr": (1e-5, 1e-3, "log_uniform")},
post_trial_callback=on_trial_complete,
)
2026-01-06
Feature: —wandb-run-id Parameter for Run Resumption
Added --wandb-run-id CLI parameter to resume an existing W&B run instead of creating a new one. Useful when running AITraining from external W&B sweep agents.
Usage:
autotrain llm --wandb-run-id abc123xyz ...
When set, AITraining automatically sets WANDB_RESUME=allow so the trainer resumes the specified run instead of creating a duplicate.
Fix: Duplicate W&B Runs in Sweeps
Issue: Each sweep trial was creating 2 W&B runs - one from the sweep code and one from the trainer.
Root Cause: Sweep code called wandb.init(), then trainer also called wandb.init() internally, creating a duplicate run.
Fix: After sweep’s wandb.init(), set WANDB_RUN_ID and WANDB_RESUME=allow env vars so the trainer resumes the same run instead of creating a new one.
Improvement: Better Error Message for Missing Text Column
When dataset has a messages column but training expects text, the error now suggests the fix:
Hint: Your dataset has a 'messages' column. Use --text-column messages for chat format data.
Fix: WANDB_PROJECT Using Path Instead of Name
Issue: Running sweeps with W&B logging failed with:
wandb.errors.UsageError: Invalid project name '/workspace/trainings/hotel-sft-optuna-v2': cannot contain characters '/,\\,#,?,%,:', found '/'
Root Cause: The fix in 0.0.10 for W&B sweep logging was using config.project_name (the output path) instead of just the project name when falling back.
Fix: Use os.path.basename(config.project_name) to extract just the project name from the path.
Fix: Model Loaded in float32 Instead of bf16/fp16 on CUDA
Issue: When using mixed_precision=bf16 or fp16 on CUDA, the model was loaded in float32, causing 2x VRAM usage.
Root Cause: The torch_dtype parameter wasn’t being passed to from_pretrained() in the CUDA code path. Only MPS had dtype conversion.
Impact:
- Model weights used 2x more VRAM than necessary
- Training still worked (trainer used bf16 for compute), but was suboptimal
Fix: Added torch_dtype to model_kwargs when CUDA is available:
if torch.cuda.is_available():
model_kwargs["device_map"] = "auto"
if config.mixed_precision == "bf16":
model_kwargs["torch_dtype"] = torch.bfloat16
elif config.mixed_precision == "fp16":
model_kwargs["torch_dtype"] = torch.float16
Fix: W&B Sweep Logs to Wrong Project
Issue: During sweeps with W&B logging, trainer runs were logged to the default “huggingface” project instead of the configured sweep project.
Root Cause: The sweep created wandb.init() with the correct project, but the trainer’s internal wandb.init() didn’t know about it.
Fix: Set WANDB_PROJECT and WANDB_ENTITY environment variables before calling the trainer, so any subsequent wandb.init() uses the correct project.
Fix: bitsandbytes CUDA 12.x Compatibility
Issue: Training with LoRA failed on CUDA 12.8 environments with:
CUDA SETUP: Required library version not found: libbitsandbytes_cuda128.so
RuntimeError: CUDA Setup failed despite GPU being available.
Root Cause: bitsandbytes 0.42.0 doesn’t have pre-compiled binaries for CUDA 12.8.
Fix: Upgraded bitsandbytes from ==0.42.0 to >=0.45.0. Version 0.45.0+ uses a new multi-backend system that doesn’t require version-specific CUDA binaries.
Commit: f13a068
2026-01-05
Feature: W&B Native Sweep Integration
Added native Weights & Biases sweep support for hyperparameter optimization. When enabled, sweep runs are grouped in W&B’s native sweep dashboard, providing aggregated views and parallel coordinates plots.
New Parameters:
wandb_sweep: Enable W&B native sweep dashboard (default: false)
wandb_sweep_project: W&B project name for sweep (defaults to project_name)
wandb_sweep_entity: W&B entity (team/username) for sweep
wandb_sweep_id: Existing sweep ID to continue (skips creating new sweep)
Usage:
autotrain llm \
--use-sweep \
--sweep-backend optuna \
--wandb-sweep \
--wandb-sweep-project my-sweep-project \
--wandb-sweep-entity my-team
When wandb_sweep is enabled, each trial run is linked to the sweep via wandb.init(group=sweep_id), creating an aggregated view in W&B.
Commit: e49abc9
Fix: CLI Missing FIELD_SCOPES for W&B Sweep Parameters
Issue: Running autotrain llm --wandb-sweep via CLI failed with:
ValueError: Scope metadata is required for all fields but missing for: wandb_sweep, wandb_sweep_project, wandb_sweep_entity, wandb_sweep_id
Root Cause: The new W&B sweep parameters were added to LLMTrainingParams but not to FIELD_SCOPES in the CLI argument parser.
Fix: Added the missing fields to FIELD_SCOPES and added a test to prevent this regression.
Note: This only affected the CLI (autotrain llm ...). The Python API and TUI were not affected.
Commit: 7994989
Fixed sweep_params to accept both list and dict formats. Previously only list format worked, now both are supported:
# List format (always worked)
sweep_params = json.dumps({
"batch_size": [2, 4, 8],
})
# Dict format (now works)
sweep_params = json.dumps({
"lr": {"type": "loguniform", "low": 1e-5, "high": 1e-3},
"batch_size": {"type": "categorical", "values": [2, 4, 8]},
"warmup_ratio": {"type": "uniform", "low": 0.0, "high": 0.2},
})
Supported dict types: categorical, loguniform, uniform, int.
Commit: 15aa38a
Fix: Auto-detect model_max_length from Model Config
Previously model_max_length defaulted to 2048 regardless of model capability, causing block_size to be silently capped even when the model supports longer sequences.
The Problem:
- Gemma 3 supports 32K-128K context (depending on variant), but
block_size was capped to 2048
- Users had to manually set
--model-max-length to use longer sequences
The Fix:
- Auto-detect
max_position_embeddings from model config
- Handles VLMs (reads from
text_config) and regular LLMs
- Falls back to 2048 with warning if auto-detect fails
- User can still override with
--model-max-length
# Before: block_size silently capped to 2048
aitraining llm --model google/gemma-3-4b-it --block-size 4096
# block_size was capped to 2048!
# After: auto-detects model context length, allows 4096
aitraining llm --model google/gemma-3-4b-it --block-size 4096
# block_size is 4096 as expected
Commit: 85bd37c
Dependency Update: Gemma 3n Support
Updated dependencies to support Gemma 3n and other new models:
transformers: 4.57.1 → 4.57.3
timm: 1.0.12 → 1.0.22 (adds mobilenetv5_300m_enc for Gemma 3n vision tower)
huggingface_hub: ==0.34.4 → >=0.34.0 (flexible constraint)
This enables support for Gemma 3n and other new models released in late 2024/2025.
2025-12-02
Bug Fix: ORPO Training Beta Parameter Not Applied
Issue: The dpo_beta parameter was not being passed to TRL’s ORPOConfig during ORPO training, causing user-specified beta values to be silently ignored.
Impact: Users setting dpo_beta for ORPO training (e.g., dpo_beta=0.5) would have their setting ignored. ORPO would always use TRL’s default value of 0.1 regardless of user configuration.
Root Cause: In train_clm_orpo.py, the code was missing the line to pass the beta parameter to ORPOConfig:
# Before (bug):
training_args["max_length"] = config.block_size
training_args["max_prompt_length"] = config.max_prompt_length
training_args["max_completion_length"] = config.max_completion_length
args = ORPOConfig(**training_args) # beta not passed!
# After (fix):
training_args["max_length"] = config.block_size
training_args["max_prompt_length"] = config.max_prompt_length
training_args["max_completion_length"] = config.max_completion_length
training_args["beta"] = config.dpo_beta # Now correctly passed
args = ORPOConfig(**training_args)
Fix: Added training_args["beta"] = config.dpo_beta to ensure the user’s beta value is passed to ORPO training.
Test Added: New test test_orpo_beta_parameter verifies that different beta values (0.01, 0.1, 0.5) are correctly applied during ORPO training.
Commit: a37e288
For questions or issues, please open an issue on GitHub.