Linear Quadratic Regulator (LQR) Demo

This code demonstrates the implementation of Linear Quadratic Regulator (LQR) applied to the CartPole environment. LQR is a classic model-based optimal control method that computes the optimal control policy for linear dynamical systems with quadratic costs.

Key concepts illustrated:

Linearization of CartPole dynamics around equilibrium
Backward pass computation of optimal gains
Receding horizon control (replanning at each step)
Analytical solution to optimal control problem
Model-based planning vs. model-free learning

LQR Implementation

Download

# Code adapted from
# https://gist.github.com/jeiting/c381e195d6153eaf657c21f691c2e456
# Code is fixed based on suggestions at
# https://www.reddit.com/r/berkeleydeeprlcourse/comments/632tuq/replanning_lqr_controller_for_cartpole_v2/dfquyqw/
import gymnasium as gym
import numpy as np

env = gym.make('CartPole-v1')

x = env.reset()[0]
# x, xdot, theta, thetadot

gamma = (4.0 / 3.0 - env.masspole / env.total_mass)

a = -env.gravity * env.masspole / (env.total_mass * gamma)
b = 1.0 / env.total_mass * (1 + env.masspole / (env.total_mass * gamma))
c = env.gravity / (env.length * gamma)
d = -1.0 / (env.total_mass * env.length * gamma)

tau = env.tau

F = np.array([
    [1, tau,       0,   0,       0],
    [0,   1, tau * a,   0, tau * b],
    [0,   0,       1, tau,       0],
    [0,   0, tau * c,   1, tau * d],
  ])

C = np.array([
    [0,  0, 0,  0,   0],
    [0, 10, 0,  0,   0],
    [0,  0, 100,  0,   0],
    [0,  0, 0, 0,   0],
    [0,  0, 0,  0, 1000],
  ])

c = np.array([0, 0, 0, 0, 0]).T

frame = 0
done = False
while not done:
    Ks = []
    T = 100
    V = np.zeros((4, 4))
    v = np.zeros((4))
    for t in range(T):
        t = T - t
        # Qt
        Qt = C + np.matmul(F.T, np.matmul(V, F))
        qt = c + np.matmul(F.T, v)

        Quu = Qt[-1:, -1:]
        Qux = Qt[-1:, :-1]
        Qxu = Qt[:-1, -1:]

        qu = qt[-1:]

        Qut_inv = np.linalg.inv(Quu)

        Kt = -np.matmul(Qut_inv, Qux)
        kt = -np.matmul(Qut_inv, qu)

        Ks.append((Kt, kt))

        V = Qt[:4, :4] + np.matmul(Qxu, Kt) + np.matmul(Kt.T, Qux) + np.matmul(Kt.T, np.matmul(Quu, Kt))
        v = qt[:4] + np.matmul(Qxu, kt) + np.matmul(Kt.T, qu).reshape(-1) + np.matmul(Kt.T, np.matmul(Quu, kt))

    Kt, kt = Ks[-1]
    ut = np.matmul(Kt, x.reshape((1, -1)).T) + kt

    if ut > 0.0:
        ut = env.force_mag
        action = 1
    else:
        ut = -env.force_mag
        action = 0

    xu = np.hstack([x, ut])
    my_guess = np.matmul(F, xu.T)
    x, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated
    frame += 1
    env.render()

print(frame)