Xlera8

Going beyond average for reinforcement learning

Time Stamp: July 23, 2017 8:00 PM
Source Node: 89261

Consider the commuter who toils backwards and forwards each day on a train. Most mornings, her train runs on time and she reaches her first meeting relaxed and ready. But she knows that once in awhile the unexpected happens: a mechanical problem, a signal failure, or even just a particularly rainy day. Invariably these hiccups disrupt her pattern, leaving her late and flustered.

Randomness is something we encounter everyday and has a profound effect on how we experience the world. The same is true in reinforcement learning (RL) applications, systems that learn by trial and error and are motivated by rewards. Typically, an RL algorithm predicts the average reward it receives from multiple attempts at a task, and uses this prediction to decide how to act. But random perturbations in the environment can alter its behaviour by changing the exact amount of reward the system receives.

In a new paper, we show it is possible to model not only the average but also the full variation of this reward, what we call the value distribution. This results in RL systems that are more accurate and faster to train than previous models, and more importantly opens up the possibility of rethinking the whole of reinforcement learning.

Returning to the example of our commuter, let’s consider a journey composed of three segments of 5 minutes each, except that once a week the train breaks down, adding another 15 minutes to the trip. A simple calculation shows that the average commute time is (3 x 5) + 15 / 5 = 18 minutes.

Source: https://deepmind.com/blog/article/going-beyond-average-reinforcement-learning

Tags: algorithm, applications, artificial intelligence, Artificial Intelligence Featured, Commute, Disrupt, Environment, full, How, How To, HTTPS, IT, model, Pattern, plato, Plato Data Intelligence, prediction, reinforcement learning, research, Simple, WHO, world, X

Xlera8

Going beyond average for reinforcement learning

Nope Challenge Gamifies Facing Your Phobias In VR On Quest

Forget the AI doom and hype, let’s make computers useful

Fintech Outsourcing United States: Cynergy BPO – Why Onshore Support Still Reigns Supreme

Thailand’s T2P Taps Wise Platform to Offer Global Money Transfers for DeepPocket Users – Fintech Singapore

Paris-based Edonia raises €2 million to produce plant-based ingredientes from microalgae | EU-Startups

Strasbourg-based Pixacare raises €3 million to automate wound care monitoring | EU-Startups

Chat with us