Mastering Classic Reinforcement Learning Algorithms

Holen Sie sich eines unserer besten Angebote und erweitern Sie Ihre Fähigkeiten mit 50% Rabatt auf Coursera Plus. Jetzt sparen.

kurs ist nicht verfügbar in Deutsch (Deutschland)

Wir übersetzen es in weitere Sprachen.

Mastering Classic Reinforcement Learning Algorithms

Dieser Kurs ist Teil von Spezialisierung „Foundations of Reinforcement Learning“

Dozent: Ashutosh Trivedi

Bei enthalten

Mehr erfahren

Fragen Sie Coursera

5 Module

Verschaffen Sie sich einen Einblick in ein Thema und lernen Sie die Grundlagen.

Stufe Mittel

Empfohlene Erfahrung

1 Woche zu vervollständigen

unter 10 Stunden pro Woche

Flexibler Zeitplan

In Ihrem eigenen Lerntempo lernen

5 Module

Verschaffen Sie sich einen Einblick in ein Thema und lernen Sie die Grundlagen.

Stufe Mittel

Empfohlene Erfahrung

1 Woche zu vervollständigen

unter 10 Stunden pro Woche

Flexibler Zeitplan

In Ihrem eigenen Lerntempo lernen

Was Sie lernen werden

Formulate sequential decision-making problems as deterministic decision processes, Markov chains, and finite Markov decision processes.
Explain and apply core reinforcement-learning concepts, including discounting, value functions, policies, Bellman equations, and optimality.
Implement planning algorithms for finite Markov decision processes, including value iteration, policy iteration, and linear programming formulations.
Compare tabular reinforcement-learning algorithms, including bandits, Monte Carlo methods, temporal-difference learning, SARSA, and Q-learning.

Kompetenzen, die Sie erwerben

Kategorie: Probability Distribution
Kategorie: Model Optimization
Kategorie: Probability & Statistics
Kategorie: Statistical Machine Learning
Kategorie: Reinforcement Learning
Kategorie: Machine Learning
Kategorie: Markov Model
Kategorie: Decision Intelligence
Kategorie: Algorithms
Kategorie: Sampling (Statistics)
Kategorie: Artificial Intelligence and Machine Learning (AI/ML)
Kategorie: Machine Learning Algorithms
Kategorie: Applied Mathematics

Wichtige Details

Zertifikat zur Vorlage

Zu Ihrem LinkedIn-Profil hinzufügen

Kürzlich aktualisiert!

Juni 2026

Bewertungen

6 Aufgaben

Unterrichtet in Englisch

Erfahren Sie, wie Mitarbeiter führender Unternehmen gefragte Kompetenzen erwerben.

Weitere Informationen zu Coursera für Unternehmen

Logos von Petrobras, TATA, Danone, Capgemini, P&G und L'Oreal

Erweitern Sie Ihre Fachkenntnisse

Dieser Kurs ist Teil der Spezialisierung Spezialisierung „Foundations of Reinforcement Learning“

Wenn Sie sich für diesen Kurs anmelden, werden Sie auch für diese Spezialisierung angemeldet.

Lernen Sie neue Konzepte von Branchenexperten
Gewinnen Sie ein Grundverständnis bestimmter Themen oder Tools
Erwerben Sie berufsrelevante Kompetenzen durch praktische Projekte
Erwerben Sie ein Berufszertifikat zur Vorlage

In diesem Kurs gibt es 5 Module

How can an agent learn to make good decisions through repeated interaction with an uncertain environment? This course introduces the mathematical and algorithmic foundations of classical reinforcement learning, with an emphasis on finite Markov decision processes and tabular methods.

The course begins with the simplest settings in which the central ideas are clearest: deterministic decision processes, discounted rewards, and Bellman optimality equations. It then introduces stochasticity through Markov chains and Markov decision processes, where learners study policies, value functions, expected discounted reward, and dynamic programming. With this foundation in place, the course turns to planning methods for known models, including value iteration, policy iteration, and linear programming formulations. The second half of the course studies reinforcement learning when the model is unknown and the agent must learn from sampled experience. Topics include multi-armed bandits, exploration and exploitation, Monte Carlo methods, temporal-difference learning, SARSA, Q-learning, and convergence principles. The course ends with a final assessment in which learners solve the same finite MDP from both model-based planning and model-free learning perspectives. By the end of the course, learners will be able to formulate finite decision-making problems as Markov decision processes, solve them using classical planning algorithms, and implement tabular reinforcement-learning algorithms from sampled data. This course provides the foundation for later study of deep reinforcement learning, reward programming, and trustworthy AI systems. This course can be taken for academic credit as part of CU Boulder’s Masters of Science in Computer Science (MS-CS) and Master of Science in Artificial Intelligence (MS-AI) degrees offered on the Coursera platform. These fully accredited graduate degrees offer targeted courses, short 8-week sessions, and pay-as-you-go tuition. Admission is based on performance in three preliminary courses, not academic history. CU degrees on Coursera are ideal for recent graduates or working professionals. Learn more: MS in Artificial Intelligence: https://coursera.netlol.uk/degrees/ms-artificial-intelligence-boulder MS in Computer Science: https://coursera.netlol.uk/degrees/ms-computer-science-boulder

This module introduces the modeling and optimization foundations for sequential decision-making in their simplest form: deterministic decision processes with discounted rewards. We begin with states, actions, transitions, and rewards as a language for representing decision problems over time. We then develop value functions and Bellman equations as tools for optimizing long-term return. The goal is to build intuition for why dynamic programming is correct in the simpler setting of deterministic decision processes before introducing stochastic transitions, learning from sampled experience, and bootstrapping in later modules.

Das ist alles enthalten

11 Videos12 Lektüren2 Aufgaben

11 VideosInsgesamt 69 Minuten

Course Introduction7 Minuten
Decision-Making over Time 3 Minuten
States, Actions, Transitions, and Rewards 2 Minuten
From Unfolded Decisions to State-Based Models 2 Minuten
Formal Definition of a Deterministic Decision Process 4 Minuten
Discounting Infinite Reward Streams 9 Minuten
Runs, Histories, Policies, and Values9 Minuten
Discounted Optimality Equations5 Minuten
Checking Values and Extracting Policies5 Minuten
Why Bellman Equations Characterize Optimal Behavior10 Minuten
Existence, Uniqueness, and Value Iteration13 Minuten

12 LektürenInsgesamt 110 Minuten

Earn Academic Credit for your Work!10 Minuten
Course Support10 Minuten
Assessment Expectations5 Minuten
Sequential Decision-Making as Optimization10 Minuten
States, Actions, Transitions, and Rewards10 Minuten
Deterministic Decision Processes10 Minuten
Discounting Infinite Reward Streams10 Minuten
Policies, Runs, and Values10 Minuten
Bellman Equations and Dynamic Programming10 Minuten
Why Bellman Equations Characterize Optimal Behavior10 Minuten
Existence, Uniqueness, and Value Iteration10 Minuten
Module Summary5 Minuten

2 AufgabenInsgesamt 50 Minuten

AI Policy Quiz5 Minuten
Deterministic Decision Processes45 Minuten

This module adds stochasticity to the deterministic picture developed in the previous module. Learners continue with the surprise-quiz example, now with uncertain outcomes: studying usually helps but may not always help, and relaxing may reduce preparation but may not always do so. The module first introduces stochastic transitions as probability distributions over next states, then studies Markov chains as stochastic systems without choices and finally adds actions to obtain Markov decision processes. The goal is to make expected discounted reward, policies, and Bellman equations feel like natural extensions of the deterministic setting.

Das ist alles enthalten

8 Videos8 Lektüren1 Aufgabe

8 VideosInsgesamt 70 Minuten

Module Introduction2 Minuten
From Deterministic to Stochastic Transitions10 Minuten
Markov Chains23 Minuten
Markov Decision Processes7 Minuten
Policies and Values8 Minuten
Checking Values and Extracting Policies3 Minuten
Bellman Optimality Equations7 Minuten
Why Bellman Optimality Equations Are Correct9 Minuten

8 LektürenInsgesamt 70 Minuten

From Deterministic to Stochastic Transitions10 Minuten
Markov Chains10 Minuten
Expected Discounted Reward in Markov Chains10 Minuten
Markov Decision Processses10 Minuten
Policies, Value Functions, and Expected Return 10 Minuten
Bellman Equations for MDPs10 Minuten
Comparing DDPs, Markov Chains, and MDPs5 Minuten
Module Summary5 Minuten

1 AufgabeInsgesamt 45 Minuten

Markov Chains and Markov Decision Processes45 Minuten

This module focuses on known-model optimization. Learners use Bellman equations as computational tools for policy evaluation, policy improvement, value iteration, policy iteration, and linear programming formulations of discounted MDPs.

Das ist alles enthalten

9 Videos8 Lektüren1 Aufgabe

9 VideosInsgesamt 41 Minuten

Module Introduction2 Minuten
Planning Setup4 Minuten
Policy Evaluation6 Minuten
From Values to Better Policies 8 Minuten
The Bellman Optimality Operator 5 Minuten
Value Iteration as Fixed-Point Computation6 Minuten
Alternating Evaluation and Improvement5 Minuten
The Linear Programming View of Optimality 3 Minuten
Module Summary2 Minuten

8 LektürenInsgesamt 75 Minuten

Planning with a Known Model 10 Minuten
Policy Evaluation10 Minuten
Policy Improvement10 Minuten
The Bellman Optimality Operator 10 Minuten
Value Iteration10 Minuten
Policy Iteration10 Minuten
Linear Programming for Discounted MDPs10 Minuten
Module Summary5 Minuten

1 AufgabeInsgesamt 45 Minuten

Dynamic Programming in MDPs45 Minuten

This module begins the transition from planning to reinforcement learning. In planning, the MDP model is known and Bellman backups compute expectations exactly. In reinforcement learning, the model is replaced by sampled experience. Learners first view reinforcement learning as sample-based dynamic programming, then study rewards, uncertainty, agent--environment interaction, bandit estimation, exploration versus exploitation, Monte Carlo policy evaluation, and Monte Carlo control.

Das ist alles enthalten

9 Videos11 Lektüren1 Aufgabe

9 VideosInsgesamt 37 Minuten

Module Introduction2 Minuten
From Planning to Reinforcement Learning3 Minuten
Rewards, Uncertainty, and Exploration3 Minuten
The Agent–Environment Interface3 Minuten
One-Armed Bandits5 Minuten
Multi-Armed Bandits5 Minuten
Monte Carlo Policy Evaluation8 Minuten
Monte Carlo Control6 Minuten
Module Summary3 Minuten

11 LektürenInsgesamt 74 Minuten

From Planning to Learning10 Minuten
From Planning to Reinforcement Learning10 Minuten
Rewards, Uncertainty, and Behavior5 Minuten
The Agent–Environment Interaction Loop5 Minuten
One-Armed Bandits10 Minuten
Multi-Armed Bandits10 Minuten
Monte Carlo Estimation2 Minuten
Returns as Random Variables5 Minuten
Monte Carlo Policy Evaluation5 Minuten
Monte Carlo Control10 Minuten
Module Summary2 Minuten

1 AufgabeInsgesamt 45 Minuten

Learning from Sampled Experience 45 Minuten

This module completes the tabular reinforcement-learning part of Course 1. Module 4 introduced sample-based learning through bandits and Monte Carlo methods. Module 5 introduces temporal-difference learning: updating after one sampled transition by combining an observed reward with a bootstrapped value estimate. The module ends by summarizing tabular reinforcement learning and motivating the transition to function approximation and deep RL.

Das ist alles enthalten

8 Videos9 Lektüren1 Aufgabe

8 VideosInsgesamt 33 Minuten

Learning before the Episode Ends4 Minuten
One-Step Bootstrapped Prediction5 Minuten
On-Policy Temporal-Difference Control 4 Minuten
Off-Policy Temporal-Difference Control5 Minuten
What Policy Is Being Learned? 4 Minuten
Smoother Targets and Overestimation3 Minuten
Reducing Maximization Bias3 Minuten
Between Monte Carlo and One-Step TD4 Minuten

9 LektürenInsgesamt 39 Minuten

Why Temporal-Difference Learning?5 Minuten
TD(0) Policy Evaluation5 Minuten
On-Policy TD Control5 Minuten
Q-Learning: Off-Policy TD Control5 Minuten
On-Policy and Off-Policy Learning2 Minuten
Expected SARSA and Maximization Bias5 Minuten
Double Q-Learning5 Minuten
n-Step TD2 Minuten
Why Move Beyond Tabular Methods?5 Minuten

1 AufgabeInsgesamt 45 Minuten

Control, Exploration, and Tabular RL Algorithms45 Minuten

Erwerben Sie ein Karrierezertifikat.

Fügen Sie dieses Zeugnis Ihrem LinkedIn-Profil, Lebenslauf oder CV hinzu. Teilen Sie sie in Social Media und in Ihrer Leistungsbeurteilung.

Dozent

Ashutosh Trivedi

University of Colorado Boulder

3 Kurse60 Lernende

von

University of Colorado Boulder

Mehr von Algorithms entdecken

University of Colorado Boulder
Deep Reinforcement Learning: From Theory to Practice
Kurs
University of Colorado Boulder
Reward Programming: Optimizing RL Efficiency and Safety
Kurs

Warum entscheiden sich Menschen für Coursera für ihre Karriere?

Felipe M.

Lernender seit 2018

„Es ist eine großartige Erfahrung, in meinem eigenen Tempo zu lernen. Ich kann lernen, wenn ich Zeit und Nerven dazu habe.“

Jennifer J.

Lernender seit 2020

„Bei einem spannenden neuen Projekt konnte ich die neuen Kenntnisse und Kompetenzen aus den Kursen direkt bei der Arbeit anwenden.“

Larry W.

Lernender seit 2021

„Wenn mir Kurse zu Themen fehlen, die meine Universität nicht anbietet, ist Coursera mit die beste Alternative.“

Chaitanya A.

„Man lernt nicht nur, um bei der Arbeit besser zu werden. Es geht noch um viel mehr. Bei Coursera kann ich ohne Grenzen lernen.“

Häufig gestellte Fragen

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.