Difference evaluation functions have resulted in excellent multiagent behaviour in many domains, including air traffic and mobile robot control. However, calculating difference evaluation functions requires determining the value of a counterfactual system objective function, which is often difficult when the system objective function is unknown or global state and action information is unavailable. In this work, we demonstrate that a local estimate of the system evaluation function may be used to estimate difference evaluations using readily available information, allowing for difference evaluations to be computed in multiagent systems where the mathematical form of the objective function is not known. This approximation technique is tested in two domains, and we demonstrate that approximating difference evaluation functions results in better performance and faster learning than when using global evaluation functions. Finally, we demonstrate the effectiveness of the learned policies on a set of Pioneer P3-DX robots.
To continue reading, please see our paper "Local Approximations of Difference Evaluation Functions", AAMAS 2016, and see our associated video here.