We slightly switch gear in this third blog-post of the multi-agent series. Letting go of game theoretic concepts, we instead discuss some training paradigms for multi-agent reinforcement learning. We will cover the main limitations behind centralised and independent learning, to land at the centralised training with decentralised execution (CTDE), arguably the more established framework to train autonomous agents in decentralised multiplayer games.
This second blog-post of the multi-agent series is dedicated to solution concepts: how we can characterize and compare different joint policies. We will cover Pareto optimality, the definition and existence of Nash equilibrium, as well as the minimax theorem in two-agents zero-sum games.
This blog-post is the first of a short serie on Multi-Agent control and RL. The objective here is to cover the different models of multi-agent interactions, from repeated matrix games to partially observable Markov games. Going up the game hierarchy (from less to more general) we will detail what constitutes valid policies and how they differ from their usual fully and partially observed MDPs cousin.
This post covers the Expectation Maximization (EM) algorithm, a popular heuristic to (approximately) compute maximum likelihood estimates when dealing with unobserved / latent data. We will motivate and derive the EM’s inner mechanisms before making them explicit on two classical examples (Gaussian Mixture parameter estimation and Hidden Markov Model identification).
This post is concerned with the Linear Quadratic Regulator (LQR) in discrete-time. The LQR stands as somewhat of a singularity in optimal control theory: the (only?) non-trivial control problem in continuous state and action space for which a closed-form solution is known.