Simple Beginner’s guide to Reinforcement Learning & its implementation

ai
reinforcement
applied-mathematics
mathematical-models

(Adavidoaiei Dumitru-Cornel) #1

Reinforcement learning este o tehnica prin care un agent AI invata prin cautare maximizare reward, efectiv incearca trial error si in functie de rezultat primeste un feedback pe care il evalueaza ca reward pozitiv sau negativ, sunt cateva exemple in tutorial cum un neural network + reinforcement learning pot face un agent AI sa invete sa indeplineasta anumite task-uri.
Unii considera ca reinforcement learning este calea catrea true AI, se fac anumite studii la OpenAI, compania fondata de Elon Musk.

Simple Beginner’s guide to Reinforcement Learning & its implementation

O aplicatie recenta a tehnicii:

Researchers at Facebook realized their bots were chattering in a new language. Then they stopped it.

Doi agenti AI trebuiau sa negocieze pentru indeplinirea unui task, desi au fost programati sa comunice in engleza si-au dezvoltat propiul limbaj pentru a comunica, deoarece functia de reward nu implica comunicarea in engleza.

Isaac Asimov’s “Three Laws of Robotics”

rl


(Adavidoaiei Dumitru-Cornel) #2

Problema se numeste clasic Markov Decision Process https://en.wikipedia.org/wiki/Markov_decision_process

Sunt 3 algoritmi care vin sa rezolve acesta problema:
1.) Q Algorithm https://en.wikipedia.org/wiki/Q-learning
2.) Alpha–beta pruning https://en.wikipedia.org/wiki/Alpha–beta_pruning
3.) A* Algorithm https://en.wikipedia.org/wiki/A*_search_algorithm

O problema clasica care se face la scoala este salesman problem https://en.wikipedia.org/wiki/Travelling_salesman_problem

Se da un graf si se doreste ca salesman sa ajunga din nodul X in nodul Y(o variatie a problemei) cu cost minim, o solutie euristica este sa se aleaga la fiecare pas path-ul optim, aceasta duce la un optim local care poate diferi de optim global, tehnica se numeste Greedy.


(Adavidoaiei Dumitru-Cornel) #3

the agent occupies a position in a 5x5 grid, and the delivery destination occupies another position. The agent can move in any of four directions (up, down, left, right). If we want our drone to learn to deliver packages, we simply provide a positive reward of +1 for successfully flying to a marked location and making a delivery.

Delivery drone scenario for goal-based RL


(Adavidoaiei Dumitru-Cornel) #4

Deep Learning for Robotics