Intrinsically-Motivated Humans and Agents in Open-World Exploration
Humans are naturally great at exploration. No reward signal is needed.
Human beings explore new environments remarkably effectively, even in the total absence of external rewards [Matusch et al., 2020]. Although adult humans have extensive experience and prior knowledge, even children explore intelligently.
This paper studies humans and RL agents to see which metric correlates well with how humans explore. They find consistent correlation with Entropy and Empowerment. Entropy initially, then empowerment. But not, information gain interestingly.
Furthermore, in both humans and agents we observe that Entropy initially increases rapidly before plateauing, while Empowerment increases steadily suggesting that state novelty may provide more signal in early exploration, whereas control may be a more effective objective later in exploration.
We find significant positive correlations between human exploration scores and both Entropy and Empowerment, but not Information Gain.
They also observe verbalizing helps with problem-solving.
We also transcribe human utterances during play, and find significant positive correlations between children’s performance and their frequency of private speech utterances, particularly those verbalizing goals.
While entropy and empowerment help, with RL agents measuring them effectively IS the reason why they don't work well in practice! Meaning we need to estimate them well.
agents trained on intrinsic rewards explore less effectively than adults, and fail to attain higher Entropy or Empowerment than even agents trained on extrinsic rewards focused on the task structure in the environment.
intrinsically-motivated agent exploration could be improved through better ways of approximating and incorporating entropy and empowerment-based objectives
Measuring exploration performance (in Crafter)
They come up with a set of metrics to nicely measure exploration.
Interestingly, breadth-depth correlation is not good and is seen in low-scoring children.
Breadth and Depth scores were significantly positively correlated for the children (ρ = 0.51, p = 0.01) but not the adults (ρ = −0.27, p = 0.21. Correlation between breadth and depth in the children is most apparent for the lower-scoring children, suggesting that it arises from a common cause of poor exploration ability.