[Technical Trigger]

The introduction of SFT and RL on single-host TPUs is facilitated by the maxtext[tpu-post-train]==0.2.1 package, which includes the Tunix library for optimized execution. The train_sft and train_rl modules are used to launch SFT and RL runs, respectively.

[Developer / Implementation Hook]

Developers can utilize the train_sft module by specifying their model, dataset, and output directory using the python3 -m maxtext.trainers.post_train.sft.train_sft command. For RL, the train_rl module handles policy and reference model loading, training execution, and automated evaluation on reasoning benchmarks.

[The Structural Shift]

The paradigm change represented by this update is the shift from pre-training to post-training as the primary method for transforming base models into specialized assistants or high-performing reasoning engines.

[Early Warning — Act Before Mainstream]

To act on this change, developers can: 1. Install the maxtext[tpu-post-train]==0.2.1 package using uv pip install maxtext[tpu-post-train]==0.2.1 --resolution=lowest. 2. Utilize the train_sft module to launch SFT runs with their model, dataset, and output directory. 3. Leverage the train_rl module to execute RL training with state-of-the-art algorithms like GRPO and GSPO.