About this Course
Reinforcement Studying is a subfield of Machine Studying, however can be a basic objective formalism for automated decision-making and AI. This course introduces you to statistical studying methods the place an agent explicitly takes actions and interacts with the world. Understanding the significance and challenges of studying brokers that make selections is of important significance in the present day, with increasingly corporations eager about interactive brokers and clever decision-making.
WHAT YOU WILL LEARN
Formalize issues as Markov Resolution Processes
Perceive primary exploration strategies and the exploration / exploitation tradeoff
Perceive worth features, as a general-purpose device for optimum decision-making
Know easy methods to implement dynamic programming as an environment friendly answer method to an industrial management downside
SKILLS YOU WILL GAIN
- Synthetic Intelligence (AI)
- Machine Studying
- Reinforcement Studying
- Operate Approximation
- Clever Methods
Syllabus – What you’ll be taught from this course
1 hour to finish
Welcome to the Course!
Welcome to: Fundamentals of Reinforcement Studying, the primary course in a four-part specialization on Reinforcement Studying dropped at you by the College of Alberta, Onlea, and Coursera. On this pre-course module, you’ll be launched to your instructors, get a flavour of what the course has in retailer for you, and be given an in-depth roadmap to assist make your journey by means of this specialization as clean as attainable.
4 hours to finish
An Introduction to Sequential Resolution-Making
For the primary week of this course, you’ll learn to perceive the exploration-exploitation trade-off in sequential decision-making, implement incremental algorithms for estimating action-values, and evaluate the strengths and weaknesses to completely different algorithms for exploration. For this week’s graded evaluation, you’ll implement and take a look at an epsilon-greedy agent.
3 hours to finish
Markov Resolution Processes
If you’re offered with an issue in trade, the primary and most necessary step is to translate that downside right into a Markov Resolution Course of (MDP). The standard of your answer relies upon closely on how effectively you do that translation. This week, you’ll be taught the definition of MDPs, you’ll perceive goal-directed habits and the way this may be obtained from maximizing scalar rewards, and additionally, you will perceive the distinction between episodic and persevering with duties. For this week’s graded evaluation, you’ll create three instance duties of your individual that match into the MDP framework.
3 hours to finish
Worth Capabilities & Bellman Equations
As soon as the issue is formulated as an MDP, discovering the optimum coverage is extra environment friendly when utilizing worth features. This week, you’ll be taught the definition of insurance policies and worth features, in addition to Bellman equations, which is the important thing know-how that each one of our algorithms will use.
4 hours to finish
This week, you’ll learn to compute worth features and optimum insurance policies, assuming you will have the MDP mannequin. You’ll implement dynamic programming to compute worth features and optimum insurance policies and perceive the utility of dynamic programming for industrial functions and issues. Additional, you’ll study Generalized Coverage Iteration as a standard template for establishing algorithms that maximize reward. For this week’s graded evaluation, you’ll implement an environment friendly dynamic programming agent in a simulated industrial management downside.