[Technical Trigger]
The introduction of SFT and RL on single-host TPUs is facilitated by the maxtext[tpu-post-train]==0.2.1 package, which includes the Tunix library for optimized execution. The train_sft and train_rl modules are used to launch SFT and RL runs, respectively.
[Developer / Implementation Hook]
Developers can utilize the train_sft module by specifying their model, dataset, and output directory using the python3 -m maxtext.trainers.post_train.sft.train_sft command. For RL, the train_rl module handles policy and reference model loading, training execution, and automated evaluation on reasoning benchmarks.
[The Structural Shift]
The paradigm change represented by this update is the shift from pre-training to post-training as the primary method for transforming base models into specialized assistants or high-performing reasoning engines.
[Early Warning — Act Before Mainstream]
To act on this change, developers can:
1. Install the maxtext[tpu-post-train]==0.2.1 package using uv pip install maxtext[tpu-post-train]==0.2.1 --resolution=lowest.
2. Utilize the train_sft module to launch SFT runs with their model, dataset, and output directory.
3. Leverage the train_rl module to execute RL training with state-of-the-art algorithms like GRPO and GSPO.