Simple Beginner’s guide to Reinforcement Learning & its implementation

ai
reinforcement
applied-mathematics
mathematical-models

(Adavidoaiei Dumitru-Cornel) #1

Reinforcement learning este o tehnica prin care un agent AI invata prin cautare maximizare reward, efectiv incearca trial error si in functie de rezultat primeste un feedback pe care il evalueaza ca reward pozitiv sau negativ, sunt cateva exemple in tutorial cum un neural network + reinforcement learning pot face un agent AI sa invete sa indeplineasta anumite task-uri.
Unii considera ca reinforcement learning este calea catrea true AI, se fac anumite studii la OpenAI, compania fondata de Elon Musk.

Simple Beginner’s guide to Reinforcement Learning & its implementation

O aplicatie recenta a tehnicii:

Researchers at Facebook realized their bots were chattering in a new language. Then they stopped it.

Doi agenti AI trebuiau sa negocieze pentru indeplinirea unui task, desi au fost programati sa comunice in engleza si-au dezvoltat propiul limbaj pentru a comunica, deoarece functia de reward nu implica comunicarea in engleza.

Isaac Asimov’s “Three Laws of Robotics”

rl


(Adavidoaiei Dumitru-Cornel) #2

Problema se numeste clasic Markov Decision Process https://en.wikipedia.org/wiki/Markov_decision_process

Sunt 3 algoritmi care vin sa rezolve acesta problema:
1.) Q Algorithm https://en.wikipedia.org/wiki/Q-learning
2.) Alpha–beta pruning https://en.wikipedia.org/wiki/Alpha–beta_pruning
3.) A* Algorithm https://en.wikipedia.org/wiki/A*_search_algorithm

O problema clasica care se face la scoala este salesman problem https://en.wikipedia.org/wiki/Travelling_salesman_problem

Se da un graf si se doreste ca salesman sa ajunga din nodul X in nodul Y(o variatie a problemei) cu cost minim, o solutie euristica este sa se aleaga la fiecare pas path-ul optim, aceasta duce la un optim local care poate diferi de optim global, tehnica se numeste Greedy.


(Adavidoaiei Dumitru-Cornel) #3

the agent occupies a position in a 5x5 grid, and the delivery destination occupies another position. The agent can move in any of four directions (up, down, left, right). If we want our drone to learn to deliver packages, we simply provide a positive reward of +1 for successfully flying to a marked location and making a delivery.

Delivery drone scenario for goal-based RL


(Adavidoaiei Dumitru-Cornel) #4

Deep Learning for Robotics


(Adavidoaiei Dumitru-Cornel) #5

AlphaStar este urmasul lui AlphaGo, AI-ul care la batut pe campionul mondial la Go, unul dintre cele mai complexe jocuri, AlphaStar este un AI care se concentreaza sa concureze cu jucatori profesionisti la jocuri pe calculator in cazul asta Starcraft, lansarea este recenta 24 January 2019.

Mastering this problem requires breakthroughs in several AI research challenges including:

  • Game theory: StarCraft is a game where, just like rock-paper-scissors, there is no single best strategy. As such, an AI training process needs to continually explore and expand the frontiers of strategic knowledge.

  • Imperfect information: Unlike games like chess or Go where players see everything, crucial information is hidden from a StarCraft player and must be actively discovered by “scouting”.

  • Long term planning: Like many real-world problems cause-and-effect is not instantaneous. Games can also take anywhere up to one hour to complete, meaning actions taken early in the game may not pay off for a long time.

  • Real time: Unlike traditional board games where players alternate turns between subsequent moves, StarCraft players must perform actions continually as the game clock progresses.

  • Large action space: Hundreds of different units and buildings must be controlled at once, in real-time, resulting in a combinatorial space of possibilities. On top of this, actions are hierarchical and can be modified and augmented. Our parameterization of the game has an average of approximately 10 to the 26 legal actions at every time-step.

Demis Hassabis este conducatorul acestui proiect.