TitleCounterfactual Exploration for Improving Multiagent Learning
Publication TypeConference Paper
Year of Publication2015
AuthorsColby M., Kharaghani S., HolmesParker C., Tumer K.
Conference NameProceedings of the Fourteenth International Joint Conference on Autonomous Agents and Multiagent Systems
Date Published5/2015
KeywordsMultiagent Systems
Abstract

In any single agent system, exploration is a critical component of learning. It ensures that all possible actions receive some degree of attention, allowing an agent to converge to good policies. The same concept has been adopted by multiagent learning systems. However, there is a fundamentally different dynamic at play in multiagent learning: each agent operates in a non-stationary environment, as a direct result of the evolving policies of other agents in the system. As such, exploratory actions taken by agents bias the policies of other agents, forcing them to perform optimally in the presence of agent exploration. CLEAN rewards address this issue by privatizing exploration (agents take their best action, but internally compute rewards for counterfactual actions). However, CLEAN rewards require each agent to know the mathematical form of the system evaluation function, which is typically unavailable to agents. In this paper, we present an algorithm to approximate CLEAN rewards, eliminating exploratory action noise without the need for expert system knowledge. Results in both coordination and congestion domains demonstrate the approximated CLEAN rewards obtain up to $95$\% of the performance of directly computed CLEAN rewards, without the need for expert domain knowledge while utilizing $99$\% less information about the system.