Name: agents-meet-rl
Author: thinkwee

When LLM Agents Meet Reinforcement Learning

AgentsMeetRL is an awesome list that summarizes open-source repositories for training LLM Agents using reinforcement learning:

🤖 The criteria for identifying an agent project are that it must have at least one of the following: multi-turn interactions or tool use (so TIR projects, Tool-Integrated Reasoning, are considered in this repo).
⚠️ This project is based on code analysis from open-source repositories using LLM coding agents, which may contain unfaithful cases. Although manually reviewed, there may still be omissions. If you find any errors, please don't hesitate to let us know immediately through issues or PRs - we warmly welcome them!
🚀 We particularly focus on the reinforcement learning frameworks, RL algorithms, rewards, and environments that projects depend on, for everyone's reference on how these excellent open-source projects make their technical choices. See [Click to view technical details] under each table.
📅 Last updated: 2026-06-20
🤗 Feel free to submit your own projects anytime - we welcome contributions!
📚 If you find this repository helpful for your research, please cite it via the "Cite this repository" button on the right sidebar.

Taxonomy:

Base Framework: General-purpose RL training frameworks for LLM agents (e.g., veRL, OpenRLHF, trl)
General/MultiTask: Agent systems trained/evaluated across multiple tasks or environments
Search & RAG: Search-augmented reasoning agents that use retrieval tools to enhance LLM reasoning
Web & GUI: Agents that interact with web browsers, mobile/desktop GUIs, or operating systems
Tool-Use: Agents trained to invoke external tools (APIs, code executors, MCP, etc.)
Code & SWE: Software engineering and code generation agents
Reasoning: Reasoning agents with tool-integrated or multi-turn reasoning (math, QA, visual)
Multi-Agent RL: Multi-agent collaboration, negotiation, or credit assignment via RL
Memory: Agents that learn to manage, retrieve, or evolve memory
Embodied: Agents operating in embodied/physical simulation environments
Domain-Specific: RL agents for specialized domains (medical, OS tuning, etc.)
Reward & Training: Process/outcome reward models and training methodologies for agents
Safety: RL for agent safety alignment, adversarial red-teaming, and jailbreak defense/attack
VLM Agent: Vision-language model agents trained with RL for multimodal interaction
Self-Evolution: Agents that self-evolve via RL feedback loops (⚠️ definition still evolving in the community)
Environment: Benchmarks, gyms, and sandbox environments for agent training/evaluation

Some Enumeration:

Enumeration for Reward Type:
- External Verifier: e.g., a compiler or math solver
- Rule-Based: e.g., a LaTeX parser with exact match scoring
- Model-Based: e.g., a trained verifier LLM or reward LLM
- Custom

agents-meet-rl