Changelog

Track all notable changes, bug fixes, and improvements to AITraining.

2026-03-14 (v0.0.53)

Feature: VLM Support for ORPO Trainer

ORPO training now supports vision-language models (e.g. Qwen 3.5-9B) with image+text preference data. A new VLMORPOTrainer subclass handles image processing via DataCollatorForVisionPreference, and an image_column parameter specifies which dataset column contains the images. New parameter:

Parameter	CLI Flag	Default	Description
`image_column`	`--image-column`	`None`	Image column for VLM preference training (ORPO/DPO)

Usage:

params = LLMTrainingParams(
    model="Qwen/Qwen3.5-VL-9B",
    trainer="orpo",
    image_column="images",
    text_column="chosen",
    rejected_text_column="rejected",
    prompt_text_column="prompt",
)

When image_column is set, the trainer automatically loads AutoProcessor, skips chat template pre-processing (handled by the data collator), and renames the image column to images for TRL compatibility.

2026-03-13 (v0.0.52)

Feature: Hub Repo Visibility Control + Fix: fp16 PEFT on T4 GPUs

New parameter: hub_private — Controls whether HF Hub repos are created as private or public. Previously all repos were hardcoded to private.

Parameter	CLI Flag	Default	Description
`hub_private`	`--hub-private` / `--no-hub-private`	`True`	Whether HF Hub repos should be private

fp16 PEFT fix: TRL’s SFTTrainer casts trainable adapter parameters to bf16 for quantized PEFT models. On T4-class GPUs (which lack bf16 support), this crashes GradScaler when fp16 mixed precision is requested. Trainable params are now re-cast to float16 before training starts when using PEFT + quantization + fp16.

2026-03-08 (v0.0.51)

Fix: ORPO/DPO Crash on Pre-Formatted Data with Empty input_ids

Issue: When using ORPO or DPO with pre-formatted (already-templated) data, the input_ids pre-tokenization step produced empty arrays because ORPO/DPO datasets have chosen/rejected columns but no text column. This caused a crash in transformers.floating_point_ops() with AttributeError: 'list' object has no attribute 'numel'. Fix: Pre-tokenization of input_ids is now skipped for ORPO and DPO trainers. These trainers use chosen/rejected columns directly and don’t need the input_ids signal that SFT trainers use.

2026-03-07 (v0.0.50)

Fix: Qwen3.5 tool_calls Crash + Token Leak Defense + Dependency Upgrades

Qwen3.5 tool_calls fix: Qwen3.5’s chat template uses arguments|items Jinja filter which requires arguments to be a dict. OpenAI-format training data stores arguments as a JSON string. safe_apply_chat_template now auto-parses JSON string arguments to dicts when the tokenizer supports tool_calls natively. Token leak defense:

AutoTrainParams.__repr__() now masks token and wandb_token fields to prevent leaks in logs/tracebacks
HF tokens are scrubbed from log files before upload_folder to prevent HF Hub secret scanning rejection

Dependency upgrades:

Package	Old	New
`trl`	`>=0.28.0`	`>=0.29.0`
`transformers`	`==4.57.3`	`>=5.3.0`
`accelerate`	`==1.11.0`	`>=1.13.0`
`peft`	`==0.14.0`	`>=0.18.1`
`huggingface_hub`	`>=0.34.0`	`>=1.6.0`
`sentence-transformers`	`==3.3.1`	`>=5.2.3`

Install:

pip install aitraining>=0.0.50

2026-03-02 (v0.0.49)

Fix: Qwen3.5 Support and 409 Repo Conflict Handling

Qwen3.5 tool_calls detection: _check_tool_calls_support now tries both string and dict formats for the arguments probe. Qwen3.5’s template uses arguments|items which only works with dicts, so the string-only probe returned a false negative. 409 repo conflict handling: UploadLogs callback now detects 409 Conflict from HF Hub when the repo already exists and creates a datetime-versioned repo (e.g., model-20260302-1820) instead of failing. The versioned repo_id is propagated to the config so the final push_to_hub uses the same repo. Chat templates: Synced 42 templates from unsloth 2026.2.1 + added Qwen3.5 template from tokenizer (43 total). Added sync_chat_templates.py script for future updates. Other: Added exist_ok=True to all HF Hub create_repo calls.

2026-03-01 (v0.0.47)

Fix: ORPOConfig Import for TRL 0.29

Issue: TRL 0.29 removed ORPOConfig from the top-level trl package, moving it to trl.experimental.orpo. Fix: ORPO trainer now uses a try/except fallback:

try:
    from trl import ORPOConfig, ORPOTrainer
except ImportError:
    from trl.experimental.orpo import ORPOConfig, ORPOTrainer

This supports both TRL 0.28 and 0.29+.

2026-02-24 (v0.0.46)

Breaking Change: Remove max_prompt_length from ORPO/DPO Configs

TRL 0.28.0 moved ORPOConfig to trl.experimental and removed max_prompt_length. DPO deprecated it too. Prompt length is now inferred from max_length - max_completion_length. Action required: If you were passing --max-prompt-length to ORPO or DPO training, remove it. The parameter is no longer accepted. Set --block-size (max_length) and --max-completion-length instead.

2026-02-24 (v0.0.45)

Fix: ORPO/DPO Multi-Turn Prompt Extraction

Issue: ORPO and DPO prompt extraction always derived the prompt from chosen[:-1] (all messages except the last). This breaks multi-turn preference data where the completion spans multiple turns. Fix: When an explicit prompt column is present and contains a messages list, it is now used directly instead of deriving from chosen[:-1]. Single-turn data without a prompt column continues to work as before. Example multi-turn data:

{
  "prompt": [
    {"role": "user", "content": "Book me a hotel"},
    {"role": "assistant", "content": "Sure, let me search."}
  ],
  "chosen": [
    {"role": "user", "content": "Book me a hotel"},
    {"role": "assistant", "content": "Sure, let me search."},
    {"role": "user", "content": "In Paris please"},
    {"role": "assistant", "content": "Done, booked Hotel Lumiere."}
  ],
  "rejected": [
    {"role": "user", "content": "Book me a hotel"},
    {"role": "assistant", "content": "Sure, let me search."},
    {"role": "user", "content": "In Paris please"},
    {"role": "assistant", "content": "I cannot do that."}
  ]
}

2026-02-19 (v0.0.44)

Feature: GRPO Loss Type and Truncation Masking

GRPO training now supports multiple loss types beyond the default grpo. This enables recent RL loss variants from the literature. New parameters:

Parameter	CLI Flag	Default	Description
`rl_loss_type`	`--rl-loss-type`	`grpo`	Loss type: `grpo`, `dr_grpo`, `dapo`, `bnpo`, `cispo`, `sapo`
`rl_mask_truncated_completions`	`--rl-mask-truncated-completions`	`False`	Mask truncated completions from loss (recommended for stability)

Usage:

aitraining llm --train --trainer grpo \
  --model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
  --rl-env-module my_envs.hotel_env \
  --rl-env-class HotelEnv \
  --rl-loss-type dr_grpo \
  --rl-mask-truncated-completions

2026-02-19 (v0.0.43)

Fix: DDP Timeout via NCCL Environment Variables

Issue: v0.0.42 attempted to pass --timeout to accelerate launch, but this flag does not exist in Accelerate. Fix: Removed the non-existent --timeout flag. Instead, the ddp_timeout value is now applied via:

NCCL_TIMEOUT environment variable — set before subprocess launch, read by PyTorch at process group initialization
Direct ProcessGroupNCCL.options._timeout patch after trainer init (GRPO only) — overrides the per-operation timeout for long-running reward scoring

2026-02-19 (v0.0.42)

Fix: DDP Timeout Not Reaching dist.init_process_group

Issue: The ddp_timeout parameter set TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC but this only controls the heartbeat watchdog. The actual dist.init_process_group timeout (used during collective operations) was not being set. Fix: Pass --timeout to accelerate launch for multi-GPU DDP and DeepSpeed, so Accelerate sets the correct timedelta for process group initialization.

This fix was superseded by v0.0.43 — the --timeout flag does not actually exist in Accelerate. See v0.0.43 for the correct approach.

2026-02-18 (v0.0.41)

Fix: vllm_server_url Key Name for TRL 0.28.0

Issue: The vllm_server_url parameter was being passed to GRPOConfig with the key name vllm_server_url, but TRL 0.28.0 renamed it to vllm_server_base_url. Fix: Map config.vllm_server_url to training_args["vllm_server_base_url"] when constructing GRPOConfig. Note: The CLI flag remains --vllm-server-url — only the internal mapping to TRL was fixed.

2026-02-18 (v0.0.40)

Feature: Resume Training from Checkpoint

All trainers now support resuming training from a checkpoint. This is useful when training is interrupted or when you want to continue training from a specific point. New parameter:

Parameter	CLI Flag	Default	Description
`resume_from_checkpoint`	`--resume-from-checkpoint`	`None`	Path to checkpoint directory, or `auto` to detect the latest

Usage:

# Resume from a specific checkpoint
aitraining llm --train \
  --model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
  --trainer sft \
  --resume-from-checkpoint ./my-model/checkpoint-500

# Auto-detect latest checkpoint
aitraining llm --train \
  --model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
  --trainer sft \
  --resume-from-checkpoint auto

When set to auto, true, or latest, the system scans the output directory for checkpoint-* folders, sorts them numerically, and resumes from the most recent one. If no checkpoints are found, training starts fresh with a warning. Available for all trainers: SFT, DPO, ORPO, PPO, GRPO, Reward, Distillation, and Default.

2026-02-16 (v0.0.39)

Feature: DDP Timeout Configuration

Long-running operations (e.g., GRPO reward scoring with multi-turn episodes) could cause NCCL timeouts in multi-GPU setups. A new ddp_timeout parameter controls both the DDP timeout in training args and the TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC environment variable. New parameter:

Parameter	CLI Flag	Default	Description
`ddp_timeout`	`--ddp-timeout`	`7200`	DDP/NCCL timeout in seconds

Available for all trainers.

Feature: vLLM Server Mode for GRPO

In addition to colocate mode (vLLM shares GPU with training), GRPO now supports server mode — vLLM runs as a separate server on dedicated GPUs, and training processes are automatically reduced to account for the reserved GPUs. New parameters:

Parameter	CLI Flag	Default	Description
`vllm_server_url`	`--vllm-server-url`	None	URL of external vLLM server (e.g., `http://localhost:8000/v1`)
`vllm_tensor_parallel_size`	`--vllm-tensor-parallel-size`	`1`	Number of GPUs for vLLM tensor parallelism
`vllm_server_gpus`	`--vllm-server-gpus`	`1`	GPUs dedicated to vLLM server (subtracted from training processes)

Usage:

# 8 GPUs: 6 for training, 2 for vLLM server
aitraining llm --train --trainer grpo \
  --model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
  --rl-env-module my_envs.hotel_env \
  --rl-env-class HotelEnv \
  --use-vllm \
  --vllm-mode server \
  --vllm-server-gpus 2 \
  --vllm-tensor-parallel-size 2

2026-02-15 (v0.0.38)

Fix: device_map for Multi-GPU DDP Training

Issue: When using DDP (Distributed Data Parallel) with multiple GPUs, device_map="auto" caused conflicts — the model was spread across GPUs by auto, but DDP expects each process to own a single GPU. Fix: Now detects multi-GPU DDP via WORLD_SIZE environment variable. When WORLD_SIZE > 1, sets device_map={"": local_rank} to place the full model on the correct GPU for each process. Single-GPU still uses device_map="auto". Affected: Both PEFT and non-PEFT code paths in get_model().

2026-02-15 (v0.0.37)

Feature: vLLM Support for GRPO Training

GRPO training can now use vLLM for faster generation of completions. vLLM provides optimized inference with PagedAttention, significantly speeding up the generation phase of GRPO training. New parameters:

Parameter	CLI Flag	Default	Description
`use_vllm`	`--use-vllm`	`False`	Enable vLLM for generation
`vllm_mode`	`--vllm-mode`	`colocate`	Mode: `colocate` (same GPU) or `server` (separate)
`vllm_gpu_memory_utilization`	`--vllm-gpu-memory-utilization`	`0.3`	GPU memory fraction for vLLM (colocate mode)

Usage:

aitraining llm --train --trainer grpo \
  --model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
  --rl-env-module my_envs.hotel_env \
  --rl-env-class HotelEnv \
  --use-vllm \
  --vllm-gpu-memory-utilization 0.3

Install: pip install aitraining[vllm] (requires vllm>=0.14.0)

Dependency Updates

pydantic: ==2.10.4 → >=2.12.5
fastapi: ==0.115.6 → >=0.129.0

2026-02-14 (v0.0.36)

Fix: torch_dtype Not Set for PEFT Models on CUDA

Issue: When using --peft with --mixed-precision bf16/fp16 on CUDA, the PEFT code path in get_model() didn’t set torch_dtype, causing the model to load in float32 (2x VRAM). Fix: Added torch_dtype to model_kwargs in the PEFT branch, matching the existing non-PEFT behavior.

Fix: data_path No Longer Required for GRPO

Issue: The CLI required --data-path for all trainers, but GRPO builds its dataset from the environment’s build_dataset() method — no external data file is needed. Fix: Skip data_path validation when --trainer grpo is used.

2026-02-14 (v0.0.35)

Feature: GRPO Trainer — Group Relative Policy Optimization with Custom Environments

Train language models using GRPO with your own reward environments. Instead of a pre-trained reward model (like PPO), you provide a Python module with an environment class that runs multi-turn episodes and returns scores 0-1. GRPO generates multiple completions per prompt, scores them via your environment, and optimizes the policy relative to the group. New trainer: --trainer grpo New parameters:

--rl-env-module — Python module path for the environment (e.g., my_envs.hotel_env)
--rl-env-class — Class name in the environment module (e.g., HotelEnv)
--rl-num-generations — Number of completions per prompt (default: 4)

Shared RL parameters (--rl-kl-coef, --rl-clip-range, --rl-env-config, --rl-max-new-tokens, --rl-top-k, --rl-top-p, --rl-temperature) now work with both PPO and GRPO trainers. Environment interface (user implements):

class MyEnv:
    def build_dataset(self, tokenizer) -> Dataset:
        """Return HF Dataset with 'prompt' column."""

    def score_episode(self, model, tokenizer, completion, case_idx) -> float:
        """Run multi-turn episode, return 0.0-1.0 score."""

    def get_tools(self) -> list[dict]:
        """Return tool schemas for generation (optional)."""

Usage:

aitraining llm --train --trainer grpo \
  --model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
  --rl-env-module my_envs.hotel_env \
  --rl-env-class HotelEnv \
  --rl-num-generations 4 \
  --rl-max-new-tokens 256

Other changes:

TRL dependency updated from >=0.26.0 to >=0.28.0 (required for GRPOTrainer)
Validation: GRPO trainer requires both --rl-env-module and --rl-env-class

See GRPO Training for full documentation.

2026-01-25 (v0.0.34)

Feature: Reasoning Content Support (DeepSeek/Jan Thinking)

Training data with reasoning_content fields (used by DeepSeek, Jan, and other reasoning models) is now fully supported. What it does:

Adds reasoning_content field to the Message dataclass
Passes reasoning content through to apply_chat_template, allowing templates to render <think> tags
Detects templates that intentionally filter out reasoning content (DeepSeek, Jan) and bypasses the filter using placeholders, so thinking traces are preserved in training data

Why it matters: Models like DeepSeek-R1 produce chain-of-thought inside <think> tags. Without this, those thinking traces would be silently dropped during template application, losing valuable reasoning data. Supported patterns: last_query_index, loop.index0 >, split('</think>') Commit: 58b69bc

2026-01-25 (v0.0.33)

Bug Fix

Fix reasoning_content serialization in Conversation from_dict/to_dict

2026-01-25 (v0.0.30)

Feature: Pre-formatted Data Support for Response-Only Training

Users with externally pre-formatted data can now benefit from response-only training without needing to set chat_template. New behavior:

apply_chat_template=false now properly skips template application
Pre-formatted data (auto-detected via template tokens like <start_of_turn>) gets completion_mask automatically
Enables response-only loss even when using externally processed datasets

Use case: You have data already formatted with chat templates from another pipeline, but want AITraining’s label masking for SFT. Commit: 8bb4b06

2026-01-20 (v0.0.29)

Bug Fixes

Fix response template newline pattern detection
Fix double completion_mask processing
Fix text column selection after preprocessing

2026-01-12 (v0.0.26)

Bug Fixes

Fix tool_calls content duplication in training data
Fix tokenizer settings and turn marker validation
Fix pre-tokenization for TRL 0.26 compatibility
Fix completion_mask generation during preprocessing

2026-01-11 (v0.0.25)

Feature: Response-Only Training (SFT Label Masking)

Major change for proper SFT behavior. Models now see the full conversation context in attention but only compute loss on assistant responses. This is the expected behavior for supervised fine-tuning and post-training. Why this matters:

SFT/Post-training: Train the model to generate good responses given context. The model should attend to user messages and system prompts but only be trained to predict assistant outputs.
Pre-training: Different goal - maximize generalization and memorization across all tokens.

How it works with TRL 0.26:

Full attention mask: Model sees entire conversation (system + user + assistant)
Label masking: Loss computed only on assistant/completion tokens
Result: Model learns response patterns without memorizing prompts

New parameter: --response-only-loss (default: true) Supported models: Gemma, Qwen, Llama, Phi, Mistral (auto-detects response templates) Commit: 87a87c1

2026-01-10 (v0.0.24)

Change: OpenAI Format for Tool Calls Serialization

Change: Tool calls are now serialized in full OpenAI format instead of the simplified format. This matches the format used in system prompt instructions for better model learning. Before (v0.0.23):

{"tool": "get_weather", "arguments": {"location": "Paris"}}

After (v0.0.24):

{"content": "Let me check the weather.", "tool_calls": [{"id": "call_001", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\": \"Paris\"}"}}]}

Commit: 3f6bc15

2026-01-10 (v0.0.23)

Change: Plain JSON for Tool Calls Serialization

Change: Removed the [Tool Call] prefix from serialized tool calls. Tool calls are now output as plain JSON for cleaner training data. Before:

[Tool Call] {"tool": "get_weather", "arguments": {"location": "Paris"}}

After:

{"tool": "get_weather", "arguments": {"location": "Paris"}}

Also removed: The format instruction footer from tool definitions injection (models learn the format from examples). Commit: cde1948

2026-01-10 (v0.0.22)

Feature: Tools Definitions Injection for Non-Native Models

New: Models that don’t natively support the tools parameter (like Gemma) can now train on function calling data with tool definitions. How it works:

Detects if tokenizer supports tools parameter natively
If not supported, injects tool definitions as formatted text into the system prompt (or first user message)
Models learn to understand and respond to tool definitions

Functions added:

check_tools_support() - Detects native tools parameter support
format_tools_as_text() - Formats tool definitions as readable text
inject_tools_into_messages() - Injects tools into system/user message

Example injection:

You have access to the following tools:

1. get_weather
   Description: Get current weather for a location
   Parameters:
   - location (string, required): City name
   - units (string, optional): celsius or fahrenheit

Commit: a4af6fe

2026-01-08 (v0.0.21)

Fix: SFTTrainer Using Wrong Column After Chat Template Processing

Issue: When chat template processing converted messages to text, SFTTrainer was still trying to use the original messages column. This caused tokenization errors because it tried to tokenize a list instead of the processed string. Fix: Now correctly sets dataset_text_field='text' when chat template is applied. Commit: c2bdf05

2026-01-08 (v0.0.20)

Fix: Double BOS Token Issue

Issue: When training with pre-processed datasets or using chat templates, models would get duplicate BOS tokens (e.g., <bos><bos> or <|begin_of_text|><|begin_of_text|>). This happened because the chat template added BOS, and then the tokenizer added another one during training. Fix: BOS tokens are now stripped from rendered text before saving to processed datasets. This allows the tokenizer to add BOS correctly during training, preventing duplicates. Works universally for all tokenizers:

Gemma: <bos>
Llama 3: <|begin_of_text|>
Llama 2/Mistral: <s>

Commit: b124223

Fix: BOS Stripping for Already-Formatted Data

Issue: When loading datasets that were previously processed with chat templates, Llama 3 (which lacks the add_bos_token attribute) would always get double BOS tokens. Fix: BOS tokens are now stripped directly from text data when loading already-formatted datasets. This works for any tokenizer with a bos_token defined. Commit: 24a3af9

Feature: Preserve Original Messages Column

Issue: Processing overwrote the original messages column, making it impossible to inspect the source data. Other tools could also auto-detect and incorrectly use the unprocessed column. Fix: Processing now:

Creates a text column with formatted output
Renames original columns to _original_* prefix (e.g., _original_messages)
Prevents auto-detection conflicts with other frameworks

Commit: f73a7e3, bb146bb

Feature: Processed Dataset Saving and Model Card Improvements

New: Processed training data is now automatically saved:

Locally to {project}/data_processed/
Optionally to Hub as private dataset
New CLI param: --save-processed-data (auto|local|hub|both|none)

Model Card Improvements:

Training details table (base model, trainer, dataset, epochs, LR, etc.)
Extra params section (LoRA rank/alpha, quantization, chat template)
Updated links to AITraining GitHub repo

Commit: 299b873

Fix: Clean Tool Call Serialization and Legacy Function Role Support

Issue: Tool calls were serialized using the raw OpenAI format with nested "function" key, making training data verbose and format-specific. Additionally, the older OpenAI "function" role (used for tool responses before the "tool" role existed) was not handled. Fix:

Tool calls are now serialized to a clean format:
- Before: [Tool Calls] [{"id": "call_123", "type": "function", "function": {"name": "search", "arguments": "..."}}]
- After: [Tool Call] {"tool": "search", "arguments": {"query": "weather"}}
The "function" role (older OpenAI format) is now handled the same as "tool" role - converted to "user" with [Tool Result] prefix for models that don’t support it natively.

Example:

# Input with OpenAI format
{
    "role": "assistant",
    "tool_calls": [{"id": "call_123", "type": "function", "function": {"name": "search", "arguments": "{\"q\": \"test\"}"}}]
}

# Output (clean format)
{
    "role": "assistant",
    "content": "[Tool Call] {\"tool\": \"search\", \"arguments\": {\"q\": \"test\"}}"
}

Commit: 5bbbdd8

Fix: Complete tool_calls Preservation Across All Code Paths

Issue: The v0.0.18 fix for tool_calls was incomplete - several code paths still dropped tool_calls:

render_conversation() in message renderer blindly serialized without checking tokenizer support
Fallback functions in project.py and preprocessor/llm.py dropped tool_calls
format_chat_prompt() and build_supervised_example() in rendering utils dropped tool_calls

Fix:

Added _check_tool_calls_support() to TokenizerNativeRenderer to detect native support
render_conversation() now:
- Passes tool_calls through natively for models that support it (Qwen, Llama 3.1+)
- Only serializes to JSON for models that don’t (Gemma)
All code paths now preserve tool_calls when creating Message objects
Fallback functions preserve tool_calls in content

Pattern: All main code paths now check tokenizer support before converting. This matches the existing pattern for tool role detection.

2026-01-07

Fix: tool_calls Field Being Dropped in Training Data

Issue: When training data contains tool_calls field (from function calling conversations), the field was silently dropped. Models never learned to make tool calls. Root Cause: The Message class only extracted role and content from messages:

Message(role=m["role"], content=m["content"])  # tool_calls ignored!

Fix: Added smart tool_calls handling that:

Detects if the tokenizer supports tool_calls natively (Qwen, Llama 3.1+)
Preserves native format for models that support it
Serializes to JSON in content for models that don’t (Gemma, older models)

Example for models without native support:

# Input with tool_calls
{
    "role": "assistant",
    "content": "Let me check.",
    "tool_calls": [{"function": {"name": "weather", "arguments": "{\"city\": \"Paris\"}"}}]
}

# Output (auto-serialized for Gemma)
{
    "role": "assistant",
    "content": "Let me check.\n[Tool Call] {\"tool\": \"weather\", \"arguments\": {\"city\": \"Paris\"}}"
}

Note: At inference, parse the [Tool Call] JSON, execute the tool, and don’t show the JSON to the user.

Fix: Message Alternation Errors with Strict Models

Issue: Training data with consecutive same-role messages or system → assistant patterns (without a user message in between) failed on strict-alternation models like Gemma:

Conversation roles must alternate user/assistant/user/assistant/...

Root Cause: Some datasets have:

Consecutive assistant messages (e.g., multi-part responses)
System message followed directly by assistant (no user prompt)
Multiple user messages in a row

Fix: Added automatic message alternation fix that:

Merges consecutive same-role messages (preserving content)
Inserts placeholder [Continued] user messages when assistant follows system/assistant
Only applies when the tokenizer rejects the format (dynamic detection)

Example transformation:

# Input with consecutive assistants
[
    {"role": "system", "content": "You are helpful"},
    {"role": "assistant", "content": "Hello!"},
    {"role": "assistant", "content": "How can I help?"}
]

# Output (auto-fixed)
[
    {"role": "system", "content": "You are helpful"},
    {"role": "user", "content": "[Continued]"},
    {"role": "assistant", "content": "Hello!\nHow can I help?"}
]

Note: This fix combines with the tool role fix below - both are applied automatically as needed.

Fix: Tool Role Messages Breaking Native Tokenizer Rendering

Issue: When training data contains tool role messages (from function calling), models that require strict user/assistant alternation (like Gemma) would fail with:

Conversation roles must alternate user/assistant/user/assistant/...

Root Cause: The TokenizerNativeRenderer passed messages directly to tokenizer.apply_chat_template() without preprocessing. Tokenizers like Gemma don’t support the tool role. Fix: Added smart tool role handling that:

Detects if the tokenizer supports tool role by testing with a sample message (result is cached)
Only converts tool → user with [Tool Result] prefix when the tokenizer doesn’t support it
Preserves native tool handling for models that support it (Llama 3.1+, Mistral, etc.)
Merges consecutive same-role messages to maintain strict alternation when needed

Example transformation (only for non-supporting models like Gemma):

# Input with tool role
[
    {"role": "user", "content": "What's 2+2?"},
    {"role": "assistant", "content": "Let me calculate"},
    {"role": "tool", "content": "4"},
    {"role": "assistant", "content": "The answer is 4"}
]

# Output for Gemma (auto-converted)
[
    {"role": "user", "content": "What's 2+2?"},
    {"role": "assistant", "content": "Let me calculate"},
    {"role": "user", "content": "[Tool Result] 4"},
    {"role": "assistant", "content": "The answer is 4"}
]

# Output for Llama 3.1+ (preserved as-is)
# Same as input - native tool support used

Affected models: Gemma 2, Gemma 3, Gemma 3n, and any model with strict alternation requirements. Models with native tool support are unaffected.

Fix: Chat Template “tokenizer” Incorrectly Using ChatML Format

Issue: When using --chat-template tokenizer (the default for SFT training), the system incorrectly used ChatML format instead of the model’s native chat template. This caused ChatML tokens (<|im_start|>, <|im_end|>) to be added as literal text in training data. Impact: Models trained with this bug learned to output ChatML tokens as regular text. For example, a Gemma model would output:

Response text<|im_end|><end_of_turn>

Instead of just:

Response text<end_of_turn>

Root Cause: In clm/utils.py, the chat format mapping had:

"tokenizer": "chatml",  # BUG - should be "native"

This caused ChatMLRenderer to be used (which adds ChatML tokens via string concatenation) instead of TokenizerNativeRenderer (which correctly uses tokenizer.apply_chat_template()). Fix: Changed the mapping to:

"tokenizer": "native",  # Use tokenizer's native apply_chat_template

Affected models: Any non-ChatML model trained with --chat-template tokenizer or the SFT trainer default. Retraining required: Models trained before this fix that exhibit ChatML token output need to be retrained.

Fix: HuggingFace Push Using Full Path as Repo Name (All Trainers)

Issue: When project_name was a full path like /workspace/trainings/my-model, pushing to HuggingFace Hub created an invalid repo ID like username//workspace/trainings/my-model. Fix: Now uses basename(project_name) to extract just the folder name, creating valid repo IDs like username/my-model. Affected trainers (all fixed):

CLM (LLM fine-tuning)
VLM (Vision-Language Models)
Text Classification
Text Regression
Token Classification
Sentence Transformers
Image Classification
Image Regression
Object Detection
Seq2Seq
Extractive QA
Tabular

Feature: —repo-id Parameter for Custom HuggingFace Destination

Added --repo-id CLI parameter to specify a custom HuggingFace repository destination. Useful for:

Pushing to an organization instead of your personal account
Using a different repo name than your local project_name

Usage:

# Push to organization
aitraining llm --train \
  --push-to-hub \
  --repo-id my-organization/my-model \
  --token $HF_TOKEN

# Push with custom name
aitraining llm --train \
  --push-to-hub \
  --repo-id username/production-model \
  --token $HF_TOKEN

When --repo-id is set, --username is not required since the repo ID already specifies the destination.

Feature: Post-Trial Actions for Hyperparameter Sweeps

Added ability to execute custom actions after each sweep trial completes. CLI Usage:

aitraining llm --train \
  --use-sweep \
  --post-trial-script 'if [ "$TRIAL_IS_BEST" = "true" ]; then git add . && git commit -m "Best model"; fi'

Environment Variables Available:

TRIAL_NUMBER - Trial index (0-based)
TRIAL_METRIC_VALUE - Metric value for this trial
TRIAL_IS_BEST - Whether this is the best trial so far (true/false)
TRIAL_OUTPUT_DIR - Output directory for the trial
TRIAL_PARAMS - Trial parameters as string

Python API:

from autotrain.utils import HyperparameterSweep, SweepConfig, TrialInfo

def on_trial_complete(trial_info: TrialInfo):
    if trial_info.is_best:
        save_checkpoint(trial_info.output_dir)

config = SweepConfig(
    parameters={"lr": (1e-5, 1e-3, "log_uniform")},
    post_trial_callback=on_trial_complete,
)

2026-01-06

Feature: —wandb-run-id Parameter for Run Resumption

Added --wandb-run-id CLI parameter to resume an existing W&B run instead of creating a new one. Useful when running AITraining from external W&B sweep agents. Usage:

autotrain llm --wandb-run-id abc123xyz ...

When set, AITraining automatically sets WANDB_RESUME=allow so the trainer resumes the specified run instead of creating a duplicate.

Fix: Duplicate W&B Runs in Sweeps

Issue: Each sweep trial was creating 2 W&B runs - one from the sweep code and one from the trainer. Root Cause: Sweep code called wandb.init(), then trainer also called wandb.init() internally, creating a duplicate run. Fix: After sweep’s wandb.init(), set WANDB_RUN_ID and WANDB_RESUME=allow env vars so the trainer resumes the same run instead of creating a new one.

Improvement: Better Error Message for Missing Text Column

When dataset has a messages column but training expects text, the error now suggests the fix:

Hint: Your dataset has a 'messages' column. Use --text-column messages for chat format data.

Fix: WANDB_PROJECT Using Path Instead of Name

Issue: Running sweeps with W&B logging failed with:

wandb.errors.UsageError: Invalid project name '/workspace/trainings/hotel-sft-optuna-v2': cannot contain characters '/,\\,#,?,%,:', found '/'

Root Cause: The fix in 0.0.10 for W&B sweep logging was using config.project_name (the output path) instead of just the project name when falling back. Fix: Use os.path.basename(config.project_name) to extract just the project name from the path.

Fix: Model Loaded in float32 Instead of bf16/fp16 on CUDA

Issue: When using mixed_precision=bf16 or fp16 on CUDA, the model was loaded in float32, causing 2x VRAM usage. Root Cause: The torch_dtype parameter wasn’t being passed to from_pretrained() in the CUDA code path. Only MPS had dtype conversion. Impact:

Model weights used 2x more VRAM than necessary
Training still worked (trainer used bf16 for compute), but was suboptimal

Fix: Added torch_dtype to model_kwargs when CUDA is available:

if torch.cuda.is_available():
    model_kwargs["device_map"] = "auto"
    if config.mixed_precision == "bf16":
        model_kwargs["torch_dtype"] = torch.bfloat16
    elif config.mixed_precision == "fp16":
        model_kwargs["torch_dtype"] = torch.float16

Fix: W&B Sweep Logs to Wrong Project

Issue: During sweeps with W&B logging, trainer runs were logged to the default “huggingface” project instead of the configured sweep project. Root Cause: The sweep created wandb.init() with the correct project, but the trainer’s internal wandb.init() didn’t know about it. Fix: Set WANDB_PROJECT and WANDB_ENTITY environment variables before calling the trainer, so any subsequent wandb.init() uses the correct project.

Fix: bitsandbytes CUDA 12.x Compatibility

Issue: Training with LoRA failed on CUDA 12.8 environments with:

CUDA SETUP: Required library version not found: libbitsandbytes_cuda128.so
RuntimeError: CUDA Setup failed despite GPU being available.

Root Cause: bitsandbytes 0.42.0 doesn’t have pre-compiled binaries for CUDA 12.8. Fix: Upgraded bitsandbytes from ==0.42.0 to >=0.45.0. Version 0.45.0+ uses a new multi-backend system that doesn’t require version-specific CUDA binaries. Commit: f13a068

2026-01-05

Feature: W&B Native Sweep Integration

Added native Weights & Biases sweep support for hyperparameter optimization. When enabled, sweep runs are grouped in W&B’s native sweep dashboard, providing aggregated views and parallel coordinates plots. New Parameters:

wandb_sweep: Enable W&B native sweep dashboard (default: false)
wandb_sweep_project: W&B project name for sweep (defaults to project_name)
wandb_sweep_entity: W&B entity (team/username) for sweep
wandb_sweep_id: Existing sweep ID to continue (skips creating new sweep)

Usage:

autotrain llm \
  --use-sweep \
  --sweep-backend optuna \
  --wandb-sweep \
  --wandb-sweep-project my-sweep-project \
  --wandb-sweep-entity my-team

When wandb_sweep is enabled, each trial run is linked to the sweep via wandb.init(group=sweep_id), creating an aggregated view in W&B. Commit: e49abc9

Fix: CLI Missing FIELD_SCOPES for W&B Sweep Parameters

Issue: Running autotrain llm --wandb-sweep via CLI failed with:

ValueError: Scope metadata is required for all fields but missing for: wandb_sweep, wandb_sweep_project, wandb_sweep_entity, wandb_sweep_id

Root Cause: The new W&B sweep parameters were added to LLMTrainingParams but not to FIELD_SCOPES in the CLI argument parser. Fix: Added the missing fields to FIELD_SCOPES and added a test to prevent this regression. Note: This only affected the CLI (autotrain llm ...). The Python API and TUI were not affected. Commit: 7994989

Fix: Sweep Parameters Accept Dict Format

Fixed sweep_params to accept both list and dict formats. Previously only list format worked, now both are supported:

# List format (always worked)
sweep_params = json.dumps({
    "batch_size": [2, 4, 8],
})

# Dict format (now works)
sweep_params = json.dumps({
    "lr": {"type": "loguniform", "low": 1e-5, "high": 1e-3},
    "batch_size": {"type": "categorical", "values": [2, 4, 8]},
    "warmup_ratio": {"type": "uniform", "low": 0.0, "high": 0.2},
})

Supported dict types: categorical, loguniform, uniform, int. Commit: 15aa38a

Fix: Auto-detect model_max_length from Model Config

Previously model_max_length defaulted to 2048 regardless of model capability, causing block_size to be silently capped even when the model supports longer sequences. The Problem:

Gemma 3 supports 32K-128K context (depending on variant), but block_size was capped to 2048
Users had to manually set --model-max-length to use longer sequences

The Fix:

Auto-detect max_position_embeddings from model config
Handles VLMs (reads from text_config) and regular LLMs
Falls back to 2048 with warning if auto-detect fails
User can still override with --model-max-length

# Before: block_size silently capped to 2048
aitraining llm --model google/gemma-3-4b-it --block-size 4096
# block_size was capped to 2048!

# After: auto-detects model context length, allows 4096
aitraining llm --model google/gemma-3-4b-it --block-size 4096
# block_size is 4096 as expected

Commit: 85bd37c

Dependency Update: Gemma 3n Support

Updated dependencies to support Gemma 3n and other new models:

transformers: 4.57.1 → 4.57.3
timm: 1.0.12 → 1.0.22 (adds mobilenetv5_300m_enc for Gemma 3n vision tower)
huggingface_hub: ==0.34.4 → >=0.34.0 (flexible constraint)

This enables support for Gemma 3n and other new models released in late 2024/2025.

2025-12-02

Bug Fix: ORPO Training Beta Parameter Not Applied

Issue: The dpo_beta parameter was not being passed to TRL’s ORPOConfig during ORPO training, causing user-specified beta values to be silently ignored. Impact: Users setting dpo_beta for ORPO training (e.g., dpo_beta=0.5) would have their setting ignored. ORPO would always use TRL’s default value of 0.1 regardless of user configuration. Root Cause: In train_clm_orpo.py, the code was missing the line to pass the beta parameter to ORPOConfig:

# Before (bug):
training_args["max_length"] = config.block_size
training_args["max_prompt_length"] = config.max_prompt_length  
training_args["max_completion_length"] = config.max_completion_length
args = ORPOConfig(**training_args)  # beta not passed!

# After (fix):
training_args["max_length"] = config.block_size
training_args["max_prompt_length"] = config.max_prompt_length
training_args["max_completion_length"] = config.max_completion_length
training_args["beta"] = config.dpo_beta  # Now correctly passed
args = ORPOConfig(**training_args)

Fix: Added training_args["beta"] = config.dpo_beta to ensure the user’s beta value is passed to ORPO training. Test Added: New test test_orpo_beta_parameter verifies that different beta values (0.01, 0.1, 0.5) are correctly applied during ORPO training. Commit: a37e288

For questions or issues, please open an issue on GitHub.

Quick Start

Roadmap

​Changelog

​2026-03-14 (v0.0.53)

​Feature: VLM Support for ORPO Trainer

​2026-03-13 (v0.0.52)

​Feature: Hub Repo Visibility Control + Fix: fp16 PEFT on T4 GPUs

​2026-03-08 (v0.0.51)

​Fix: ORPO/DPO Crash on Pre-Formatted Data with Empty input_ids

​2026-03-07 (v0.0.50)

​Fix: Qwen3.5 tool_calls Crash + Token Leak Defense + Dependency Upgrades

​2026-03-02 (v0.0.49)

​Fix: Qwen3.5 Support and 409 Repo Conflict Handling

​2026-03-01 (v0.0.47)

​Fix: ORPOConfig Import for TRL 0.29

​2026-02-24 (v0.0.46)

​Breaking Change: Remove max_prompt_length from ORPO/DPO Configs

​2026-02-24 (v0.0.45)

​Fix: ORPO/DPO Multi-Turn Prompt Extraction

​2026-02-19 (v0.0.44)

​Feature: GRPO Loss Type and Truncation Masking

​2026-02-19 (v0.0.43)

​Fix: DDP Timeout via NCCL Environment Variables

​2026-02-19 (v0.0.42)

​Fix: DDP Timeout Not Reaching dist.init_process_group

​2026-02-18 (v0.0.41)

​Fix: vllm_server_url Key Name for TRL 0.28.0

​2026-02-18 (v0.0.40)

​Feature: Resume Training from Checkpoint

​2026-02-16 (v0.0.39)

​Feature: DDP Timeout Configuration

​Feature: vLLM Server Mode for GRPO

​2026-02-15 (v0.0.38)

​Fix: device_map for Multi-GPU DDP Training

​2026-02-15 (v0.0.37)

​Feature: vLLM Support for GRPO Training

​Dependency Updates

​2026-02-14 (v0.0.36)

​Fix: torch_dtype Not Set for PEFT Models on CUDA

​Fix: data_path No Longer Required for GRPO

​2026-02-14 (v0.0.35)

​Feature: GRPO Trainer — Group Relative Policy Optimization with Custom Environments

​2026-01-25 (v0.0.34)

​Feature: Reasoning Content Support (DeepSeek/Jan Thinking)

​2026-01-25 (v0.0.33)

​Bug Fix

​2026-01-25 (v0.0.30)

​Feature: Pre-formatted Data Support for Response-Only Training

​2026-01-20 (v0.0.29)

​Bug Fixes

​2026-01-12 (v0.0.26)

​Bug Fixes

​2026-01-11 (v0.0.25)

​Feature: Response-Only Training (SFT Label Masking)

​2026-01-10 (v0.0.24)

​Change: OpenAI Format for Tool Calls Serialization

​2026-01-10 (v0.0.23)

​Change: Plain JSON for Tool Calls Serialization

​2026-01-10 (v0.0.22)

​Feature: Tools Definitions Injection for Non-Native Models

​2026-01-08 (v0.0.21)

​Fix: SFTTrainer Using Wrong Column After Chat Template Processing

​2026-01-08 (v0.0.20)

​Fix: Double BOS Token Issue

​Fix: BOS Stripping for Already-Formatted Data

​Feature: Preserve Original Messages Column

​Feature: Processed Dataset Saving and Model Card Improvements

​Fix: Clean Tool Call Serialization and Legacy Function Role Support

​Fix: Complete tool_calls Preservation Across All Code Paths

​2026-01-07

​Fix: tool_calls Field Being Dropped in Training Data

​Fix: Message Alternation Errors with Strict Models

​Fix: Tool Role Messages Breaking Native Tokenizer Rendering

​Fix: Chat Template “tokenizer” Incorrectly Using ChatML Format

​Fix: HuggingFace Push Using Full Path as Repo Name (All Trainers)

​Feature: —repo-id Parameter for Custom HuggingFace Destination

​Feature: Post-Trial Actions for Hyperparameter Sweeps

​2026-01-06

​Feature: —wandb-run-id Parameter for Run Resumption

​Fix: Duplicate W&B Runs in Sweeps

​Improvement: Better Error Message for Missing Text Column

​Fix: WANDB_PROJECT Using Path Instead of Name

Changelog

2026-03-14 (v0.0.53)

Feature: VLM Support for ORPO Trainer

2026-03-13 (v0.0.52)

Feature: Hub Repo Visibility Control + Fix: fp16 PEFT on T4 GPUs

2026-03-08 (v0.0.51)

Fix: ORPO/DPO Crash on Pre-Formatted Data with Empty input_ids

2026-03-07 (v0.0.50)

Fix: Qwen3.5 tool_calls Crash + Token Leak Defense + Dependency Upgrades

2026-03-02 (v0.0.49)

Fix: Qwen3.5 Support and 409 Repo Conflict Handling

2026-03-01 (v0.0.47)

Fix: ORPOConfig Import for TRL 0.29

2026-02-24 (v0.0.46)

Breaking Change: Remove max_prompt_length from ORPO/DPO Configs

2026-02-24 (v0.0.45)

Fix: ORPO/DPO Multi-Turn Prompt Extraction

2026-02-19 (v0.0.44)

Feature: GRPO Loss Type and Truncation Masking

2026-02-19 (v0.0.43)

Fix: DDP Timeout via NCCL Environment Variables

2026-02-19 (v0.0.42)

Fix: DDP Timeout Not Reaching dist.init_process_group

2026-02-18 (v0.0.41)

Fix: vllm_server_url Key Name for TRL 0.28.0

2026-02-18 (v0.0.40)

Feature: Resume Training from Checkpoint

2026-02-16 (v0.0.39)

Feature: DDP Timeout Configuration

Feature: vLLM Server Mode for GRPO

2026-02-15 (v0.0.38)

Fix: device_map for Multi-GPU DDP Training

2026-02-15 (v0.0.37)

Feature: vLLM Support for GRPO Training

Dependency Updates

2026-02-14 (v0.0.36)

Fix: torch_dtype Not Set for PEFT Models on CUDA

Fix: data_path No Longer Required for GRPO

2026-02-14 (v0.0.35)

Feature: GRPO Trainer — Group Relative Policy Optimization with Custom Environments

2026-01-25 (v0.0.34)

Feature: Reasoning Content Support (DeepSeek/Jan Thinking)

2026-01-25 (v0.0.33)

Bug Fix

2026-01-25 (v0.0.30)

Feature: Pre-formatted Data Support for Response-Only Training

2026-01-20 (v0.0.29)

Bug Fixes

2026-01-12 (v0.0.26)

Bug Fixes

2026-01-11 (v0.0.25)

Feature: Response-Only Training (SFT Label Masking)

2026-01-10 (v0.0.24)

Change: OpenAI Format for Tool Calls Serialization

2026-01-10 (v0.0.23)

Change: Plain JSON for Tool Calls Serialization

2026-01-10 (v0.0.22)

Feature: Tools Definitions Injection for Non-Native Models

2026-01-08 (v0.0.21)

Fix: SFTTrainer Using Wrong Column After Chat Template Processing

2026-01-08 (v0.0.20)

Fix: Double BOS Token Issue

Fix: BOS Stripping for Already-Formatted Data

Feature: Preserve Original Messages Column

Feature: Processed Dataset Saving and Model Card Improvements

Fix: Clean Tool Call Serialization and Legacy Function Role Support

Fix: Complete tool_calls Preservation Across All Code Paths

2026-01-07

Fix: tool_calls Field Being Dropped in Training Data

Fix: Message Alternation Errors with Strict Models

Fix: Tool Role Messages Breaking Native Tokenizer Rendering

Fix: Chat Template “tokenizer” Incorrectly Using ChatML Format

Fix: HuggingFace Push Using Full Path as Repo Name (All Trainers)

Feature: —repo-id Parameter for Custom HuggingFace Destination

Feature: Post-Trial Actions for Hyperparameter Sweeps

2026-01-06

Feature: —wandb-run-id Parameter for Run Resumption

Fix: Duplicate W&B Runs in Sweeps

Improvement: Better Error Message for Missing Text Column

Fix: WANDB_PROJECT Using Path Instead of Name