Reinforcement Learning.
Research and writing by Yashwanth — ISA, Manipal.
What is Reinforcement Learning
Reinforcement learning is a machine learning training method. It consists of an Agent (something that can perceive its environment and make decisions), the actual environment, and an interpreter.
Reinforcement learning is used to train a model to make a sequence of actions. This is useful when we want to train models to surpass human-level competency, or the action is too complex to be described by a human. For example, how can we explain to another person how to walk? As you can see, it is an incredibly challenging task. This is where Reinforcement learning comes into the picture.
Learning starts with random decisions made by the agent using trial and error. When the goal is achieved, the interpreter rewards the agent, and when the goal is not achieved (or achieved incorrectly), the agent is punished. This way, the agent makes the best decisions to reap the maximum reward and hence, “learns” how to do the task efficiently. The “state” is the current situation of the agent and it creates the context necessary for the agent to make the next decision in the sequence.
Applications
Reinforcement has become significantly more popular after the introduction of modern computing power. Some of the popular applications for reinforcement learning are:
Robotics: Most self-driving cars or autonomous machinery use some aspects of reinforcement learning. For example, in self-driving cars, the model is given a video feed and other data. It is then trained to drive on the road safely while going at an optimal speed. It needs to consider speed, proximity to other cars, traffic, road type, the safety of the driver, and many other factors. The challenges of training such a model are clear.
Games: Reinforcement learning is also used in video games, commonly in the forms of Q-learning and policy search. It is used in pathfinding, NPC(Non-Player Character) actions, and the making of AI-controlled opponents.
Q-learning is a model-independent reinforcement learning algorithm that is used to determine the value of any particular action.
Policy search is a systematic approach to embed expert knowledge at the initial stages of the model training. Essentially, it is the process of choosing good parameters to start with to ensure faster training and higher accuracy.
Bio-mechanics: Biological functions are difficult to simulate and teach to machines. Reinforcement learning allows the learning of complex biological actions, such as running. The “Learning to run” project conducted in Stanford’s Neuromuscular Biomechanics Lab is a great example. Reinforcement learning is also used in the simulation of new prosthetics.
Challenges
Environment: An accurate simulation environment is very challenging to make. Even if the model performs extremely well in the environment, there is no guarantee that it will be just as effective in the real world. This is a large issue in self-driving cars due to the need for high reliability and high cost of failure.
Credit assignment problem: Traditionally, reinforcement learning models only reward the agent if it completes a task successfully, hence disregarding all the steps leading up to it. Even if most of the steps are correct, the interpreter gives the agent the same reward/punishment as if it had failed completely. Hence, each action that leads to a higher cumulative reward needs to be given more value, or “credit”. Deciding the value of each action is also known as reward shaping.
Reward over goal: Often, you will see models using reinforcement learning finding loopholes to get more rewards, but not achieving the desired goal. For example, a car continually picking up coins in a game without finishing the race. This is part of the reason reinforcement learning is challenging to implement.
The way forward
Reinforcement learning is extremely important and can be applied to solve many complex issues. We observe that the media often portrays reinforcement learning applications as something out of a science fiction movie. In reality, these complex problems have been solved due to the dedication and hard work of the brightest minds in engineering. It is important to know the limitations of reinforcement learning today and work to further explore the field.
One day, we might reach a point where we can deploy completely autonomous robots. Of course, technology is moving much faster than laws, and it is important to create and enforce the necessary legislation to protect people.
References
2. An Introduction to Reinforcement learning by Arxiv Insights.
3. https://deepsense.ai/learning-to-run-an-example-of-reinforcement-learning/