Slime-RL-Training:清华大学THUDM开发的强化学习LLM后训练框架,支持GLM/Qwen3/DeepSeek/Llama | SkillsMD