搜索项目

搜索 "grpo" 找到 2 个结果

SWIFT(可扩展轻量级微调基础设施)

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300

⭐⭐⭐☆☆ (3/5) 5764
deepseek-r1 embedding grpo
Agent Reinforcement Trainer

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job traini

⭐⭐⭐☆☆ (3/5) 3504
agent agentic-ai grpo