Autonomous Agents and Distributed Intelligence Laboratory

You are here

Counterfactual Exploration for Improving Multiagent Learning

Submitted by cookjos on Wed, 08/24/2016 - 11:14

Title	Counterfactual Exploration for Improving Multiagent Learning
Publication Type	Conference Paper
Year of Publication	2015
Authors	Colby M., Kharaghani S., HolmesParker C., Tumer K.
Conference Name	Proceedings of the Fourteenth International Joint Conference on Autonomous Agents and Multiagent Systems
Date Published	5/2015
Keywords	Multiagent Systems
Abstract	In any single agent system, exploration is a critical component of learning. It ensures that all possible actions receive some degree of attention, allowing an agent to converge to good policies. The same concept has been adopted by multiagent learning systems. However, there is a fundamentally different dynamic at play in multiagent learning: each agent operates in a non-stationary environment, as a direct result of the evolving policies of other agents in the system. As such, exploratory actions taken by agents bias the policies of other agents, forcing them to perform optimally in the presence of agent exploration. CLEAN rewards address this issue by privatizing exploration (agents take their best action, but internally compute rewards for counterfactual actions). However, CLEAN rewards require each agent to know the mathematical form of the system evaluation function, which is typically unavailable to agents. In this paper, we present an algorithm to approximate CLEAN rewards, eliminating exploratory action noise without the need for expert system knowledge. Results in both coordination and congestion domains demonstrate the approximated CLEAN rewards obtain up to $95$\% of the performance of directly computed CLEAN rewards, without the need for expert domain knowledge while utilizing $99$\% less information about the system.

Google Scholar