Distributed Reinforcement Learning Algorithm for Multi-UAV Applications
The problem of learning a global map using local observations by multiple agents lies at the core of many control and robotic applications. Once this global map is available, autonomous agents can make optimal decisions accordingly. The decision-making rule is called a policy.
The development of policies is often very challenging in many applications as generating accurate models for these domains is difficult in the presence of complex interactions between the agents and the environment. For this reason, machine learning techniques have been often applied to policy development. Reinforcement Learning (RL) is a class of machine learning algorithms which addresses the problem of how a behaving agent can learn an optimal behavioral strategy (policy), while interacting with unknown environment. However, direct application of the current RL algorithms to the real-world tasks that involve multiple agents with heterogeneous observations may not work. Without information sharing between agents, the surrounding environment for each independent agent becomes non-stationary due to concurrently evolving and exploring companion agents.
We study a distributed RL algorithm for multi-agent UAV applications. In the distributed RL, each agent makes state observations through local processing. The agents are able to communicate over a sparse randomly changing communication network and can collaborate to learn the optimal global value function corresponding to the aggregated local rewards without centralized coordination. The overall diagram is shown in the figure below, where N agents and take actions locally, while communicating over the network.
As an example, consider multiple heterogeneous UAVs equipped with different sensors that navigate a shared space to detect harmful events, for instance, frequent turbulence in commercial flight routes, regions frequently infested by harmful insects over corn fields, or enemies in battlefields. Using the distributed RL, these UAVs will be able to maneuver the space together to collect useful data for building a value function and designing a safe motion planning policy despite the incomplete sensing abilities and communications.
A simple example of the distributed RL is demonstrated in this animation:
Another example is illustrated in the following animation: