Object-Oriented Value Function Approximators
30 October 2021, by E. Alimirzayev

Photo: Omkar Ranadive
Standard universal value functions are used in most actor-critic algorithms. Although these algo- rithms achieved high success, they still lack a property that would allow the agent to distinguish the object to manipulate and potentially further improve the training performance. This exten- sion on universal value function introduces a new type of UVFs so-called object-oriented UVFs. OO-UVFs use object-oriented properties to represent the environment’s goals and subgoals in the case of hierarchical reinforcement learning. The object-oriented representations should allow not only to tell the agent which goal to achieve but also determine to which object in the environ- ment this goal refers. That would be achieved by simplification of the goal space that comes with object determination. As an example, we could imagine the block-stacking task, where there are two blocks in the area and the agent must stack one block on another block. In object-oriented terms, we could describe this task as a sequence of two goals (1. set the object on someplace, 2. stack the object on the object from step 1) and two objects so that the first goal applies to one object and the second goal on another object. This study addresses the potential benefits of object-oriented universal value functions and intrinsically suggests experiment conduction and training performance comparison between object-oriented approach and standard algorithms.
Participants: E. Alimirzayev