LQR控制算法及其仿真实现

文章目录

1 离散有限时间系统

1.1 LQR问题描述

离散系统方程:
x t + 1 = A n × n x t + B n × m u t , x 0 = x i n i t x_{t+1}=A_{n\times n}x_{t}+B_{n\times m}u_{t}, x_0=x^{init}x t +1 ​=A n ×n ​x t ​+B n ×m ​u t ​,x 0 ​=x i n i t
问题: 选取u 0 , u 1 , … u_0,u_1,\ldots u 0 ​,u 1 ​,…使得:

  • x 0 , x 1 , … x_0,x_1,\ldots x 0 ​,x 1 ​,… 较小,获得较好的 状态控制
  • u 0 , u 1 , … u_0,u_1,\ldots u 0 ​,u 1 ​,… 较小,使用较少的 输入控制

较大的u u u可以使x x x快速趋于0。

定义二次代价函数:
J ( U ) = ∑ τ = 0 N − 1 ( x τ T Q x τ + u τ T R u τ ) + x N T Q f x N J(U)=\sum_{\tau=0}^{N-1}(x^T_{\tau}Qx_{\tau}+u^T_{\tau}Ru_{\tau})+x^T_NQ_fx_N J (U )=τ=0 ∑N −1 ​(x τT ​Q x τ​+u τT ​R u τ​)+x N T ​Q f ​x N ​
其中,U = ( u 0 , u 1 , … , u N − 1 ) U=(u_0,u_1,\ldots,u_{N-1})U =(u 0 ​,u 1 ​,…,u N −1 ​)
Q = Q T ≥ 0 , Q f = Q f T ≥ 0 , R = R T > 0 Q=Q^T\ge0,Q_f=Q^T_f\ge0,R=R^T>0 Q =Q T ≥0 ,Q f ​=Q f T ​≥0 ,R =R T >0
分别为状态代价权重矩阵,终态代价权重矩阵和输入权重矩阵。

N N N为时间范围,可有限也可无限,后续分开讨论。

  • R > 0 R>0 R >0 表示任何非零输入都会影响代价J J J

LQR问题:找到u 0 l q r , u 1 l q r , … , u N − 1 l q r u_0^{lqr},u_1^{lqr},\ldots,u_{N-1}^{lqr}u 0 l q r ​,u 1 l q r ​,…,u N −1 l q r ​ 使J J J 最小。

1.2 最小二乘法求解

令X = [ x 0 T , x 1 T , … , x N T ] T , U = [ u 0 T , u 1 T , … , u N − 1 T ] T X=[x_0^T,x_1^T,\ldots,x_N^T]^T,U=[u_0^T,u_1^T,\ldots,u_{N-1}^T]^T X =[x 0 T ​,x 1 T ​,…,x N T ​]T ,U =[u 0 T ​,u 1 T ​,…,u N −1 T ​]T,则有:
X N n × 1 = [ B 0 ⋯ 0 A B B ⋯ 0 ⋮ ⋮ ⋮ ⋮ A N − 1 B A N − 2 B ⋯ B ] N n × N m U N m × 1 + [ A ⋮ A N ] N n × n x 0 X_{Nn\times 1}= \begin{bmatrix} B&&0&&\cdots&&0\ AB && B && \cdots &&0\ \vdots && \vdots && \vdots && \vdots \ A^{N-1}B && A^{N-2}B && \cdots &&B \end{bmatrix}{Nn\times Nm}U{Nm\times 1}+ \begin{bmatrix} A\ \vdots\ A^{N} \end{bmatrix}{Nn\times n}x_0 X N n ×1 ​=⎣⎢⎢⎢⎡​B A B ⋮A N −1 B ​​0 B ⋮A N −2 B ​​⋯⋯⋮⋯​​0 0 ⋮B ​⎦⎥⎥⎥⎤​N n ×N m ​U N m ×1 ​+⎣⎢⎡​A ⋮A N ​⎦⎥⎤​N n ×n ​x 0 ​
写做:
X = G U + H x 0 X=GU+Hx_0 X =G U +H x 0 ​
则有:
J = U T R ~ U + X T Q ~ X = U T R ~ U + ( G U + H x 0 ) T Q ~ ( G U + H x 0 ) J=U^T\widetilde RU+X^T\widetilde QX=U^T\widetilde RU+(GU+Hx_0)^T\widetilde Q(GU+Hx_0)J =U T R U +X T Q ​X =U T R U +(G U +H x 0 ​)T Q ​(G U +H x 0 ​)
其中R ~ = d i a g ( R , R , ⋯ , R ⏟ N 个 ) \widetilde R=diag(\underbrace{R,R,\cdots,R}
{N\text{个}})R =d i a g (N 个R ,R ,⋯,R ​​),Q ~ = d i a g ( Q , Q , ⋯ , Q ⏟ N 个 ) \widetilde Q=diag(\underbrace{Q,Q,\cdots,Q}_{N\text{个}})Q ​=d i a g (N 个Q ,Q ,⋯,Q ​​)
J J J可以表示为关于U U U的二次型形式:
J ( U ) = U T ( R ~ + G T Q ~ G ) U + 2 x 0 T H T Q ~ G U + x 0 T H T Q ~ H x 0 J(U)=U^T(\widetilde R +G^T\widetilde QG)U+2x_0^TH^T\widetilde QGU+x_0^TH^T\widetilde QHx_0 J (U )=U T (R +G T Q ​G )U +2 x 0 T ​H T Q ​G U +x 0 T ​H T Q ​H x 0 ​
可以证明求J J J的最小值是一个凸优化问题,可直接求导得到J J J取最小值时的U U U。
d J d U = 2 ( R ~ + G T Q ~ G ) U + 2 G T Q ~ H x 0 \frac{dJ}{dU}=2(\widetilde R +G^T\widetilde QG)U+2G^T\widetilde QHx_0 d U d J ​=2 (R +G T Q ​G )U +2 G T Q ​H x 0 ​
令d J d U = 0 \frac{dJ}{dU}=0 d U d J ​=0,则有
U ∗ = − ( R ~ + G T Q ~ G ) − 1 G T Q ~ H x 0 U^*=-(\widetilde R +G^T\widetilde QG)^{-1}G^T\widetilde QHx_0 U ∗=−(R +G T Q ​G )−1 G T Q ​H x 0 ​

1.3 最小二乘法编程实现

以下为一个简单的例子:

%% Problem description:
% Suppose there is a car moving along the trajectory, and the current speed
%   is 0.1m/s. The current state of the car is that the lateral error is 0.5m,
%   and the lateral error angle is 5 degrees. Now that the car wants to eliminate
%   the lateral error by entering the angular velocity, we need to design the LQR
%   target and solve it.

% state function:
% [dx1, dx2]' = [0, v; 0, 0] * [x1, x2]' + [0, 1]' * u
% where x1 is the the lateral error, x2 is the the lateral error
%   angle, and u is the angular velocity
% Initial state: x1 = 0.5, x2 = 0.0872;
% End state: x1 = 0

%%
clear;clc;
close all;

A = [0, 0.1; 0, 0];
B = [0, 1]';
x0 = [0.5, 0.0872]';
[state_num, input_num] = size(B);

dt = 0.05;
N = 80/dt;

Ak = eye(2) + dt*A;
Bk = dt*B;

Q = eye(2);
R = 1;

% X = G * U + H * x0
G = zeros(N*state_num, N*input_num);
H = zeros(N*state_num, state_num);

tic;
for i = 1:N
    for j = 1:i
        G((state_num*(i-1)+1):(state_num*(i)), (input_num*(j-1)+1):(input_num*(j))) = Ak^(i-j)*Bk;
        H((state_num*(i-1)+1):(state_num*(i)), 1:state_num) = Ak^(i);
    end
end

H_q = diag(repmat(diag(R), N, 1)) + G'*diag(repmat(diag(Q), N, 1))*G;
f_q = x0'*H' * diag(repmat(diag(Q), N, 1)) *G;
U = - H_q \ f_q';
X = G*U+H*x0;
X1 = [x0(1); X(1:2:end-2)];
X2 = [x0(2); X(2:2:end-2)];
toc;

figure;
subplot(2, 1, 1);
hold on;
plot(X1, 'b');

subplot(2, 1, 2);
plot(X2, 'b');

LQR控制算法及其仿真实现
运行时间为 71s,讲义里说明这种解法时间复杂度为O ( N 3 n m 2 ) O(N^3nm^2)O (N 3 n m 2 ),确实效率不高。

1.4 动态规划算法

定义函数V t : R n → R V_t:\mathbf{R}^n\rightarrow\mathbf{R}V t ​:R n →R
V t ( z ) = m i n u t , ⋯ , u N − 1 ∑ τ = t N − 1 ( x τ T Q x τ + u τ T R u τ ) + x N T Q f x N V_t(z)=\mathop{min}\limits_{u_t, \cdots, u_{N-1}}\sum\limits_{\tau=t}^{N-1}(x_{\tau}^TQx_{\tau}+u_{\tau}^TRu_{\tau})+x_N^TQ_fx_N V t ​(z )=u t ​,⋯,u N −1 ​min ​τ=t ∑N −1 ​(x τT ​Q x τ​+u τT ​R u τ​)+x N T ​Q f ​x N ​
满足x t = z , x τ + 1 = A x τ + B u τ x_t=z, x_{\tau+1}=Ax_{\tau}+Bu_{\tau}x t ​=z ,x τ+1 ​=A x τ​+B u τ​。则有一下几个性质:

  • V t ( z ) V_t(z)V t ​(z )即为从t t t时刻,初始状态为z z z开始的LQR代价函数;
  • V 0 ( x 0 ) V_0(x_0)V 0 ​(x 0 ​)为系统LQR代价函数;
  • 可以证明V t V_t V t ​可以写成二次型形式,即V t ( z ) = z T P t z V_t(z)=z^TP_tz V t ​(z )=z T P t ​z,并且有P t = P t ≥ 0 P_t=P_t\geq0 P t ​=P t ​≥0;
  • P t P_t P t ​可以从t = N t=N t =N开始反向递归求解;
  • 最优控制u u u可以用P t P_t P t ​表示。

假设我们知道V t + 1 ( z ) V_{t+1}(z)V t +1 ​(z ) ,需要选取u t u_t u t ​ 使得系统代价函数最小,u t u_t u t ​ 的选取会影响u t T R u t u_t^TRu_t u t T ​R u t ​ ,以及从下一个时刻开始的代价函数V t + 1 ( A z + B u t ) V_{t+1}(Az+Bu_t)V t +1 ​(A z +B u t ​) 。
动态规划基本公式:
V t ( z ) = m i n w ( z T Q z + w T R w + V t + 1 ( A z + B w ) ) V_t(z)=\mathop{min}\limits_{w}(z^TQz+w^TRw+V_{t+1}(Az+Bw))V t ​(z )=w min ​(z T Q z +w T R w +V t +1 ​(A z +B w ))
w w w即为使得V t ( z ) V_t(z)V t ​(z )取最小值的u t u_t u t ​。
根据上面的第三条性质,有:
V t + 1 ( A z + B w ) = ( A z + B w ) T P t + 1 ( A z + B w ) V_{t+1}(Az+Bw)=(Az+Bw)^TP_{t+1}(Az+Bw)V t +1 ​(A z +B w )=(A z +B w )T P t +1 ​(A z +B w )
代入上式可得:
V t ( z ) = m i n w ( z T Q z + w T R w + ( A z + B w ) T P t + 1 ( A z + B w ) ) V_t(z)=\mathop{min}\limits_{w}(z^TQz+w^TRw+(Az+Bw)^TP_{t+1}(Az+Bw))V t ​(z )=w min ​(z T Q z +w T R w +(A z +B w )T P t +1 ​(A z +B w ))
同时也可以证明该问题为凸优化,最小值取在导数为0处。
d V t d w = 2 w T R + 2 ( A z + B w ) T P t + 1 B = 0 \frac{dV_t}{dw}=2w^TR+2(Az+Bw)^TP_{t+1}B=0 d w d V t ​​=2 w T R +2 (A z +B w )T P t +1 ​B =0
可得:
w ∗ = − ( R + B T P t + 1 B ) − 1 B T P t + 1 A z w^=-(R+B^TP_{t+1}B)^{-1}B^TP_{t+1}Az w ∗=−(R +B T P t +1 ​B )−1 B T P t +1 ​A z
则有:
V t ( z ) = z T Q z + w ∗ T R w ∗ + ( A z + B w ∗ ) T P t + 1 ( A z + B w ∗ ) = ⋯ = z T ( Q + A T P t + 1 A − A T P t + 1 B ( R + B T P t + 1 B ) − 1 B T P t + 1 A ) z = z T P t z \begin{aligned} V_t(z) &= z^TQz+w^{
T}Rw^+(Az+Bw^)^TP_{t+1}(Az+Bw^) \ &= \cdots \ &= z^T(Q+A^TP_{t+1}A-A^TP_{t+1}B(R+B^TP_{t+1}B)^{-1}B^TP_{t+1}A)z \ &= z^TP_tz \end{aligned}V t ​(z )​=z T Q z +w ∗T R w ∗+(A z +B w ∗)T P t +1 ​(A z +B w ∗)=⋯=z T (Q +A T P t +1 ​A −A T P t +1 ​B (R +B T P t +1 ​B )−1 B T P t +1 ​A )z =z T P t ​z ​
所以:
P t = Q + A T P t + 1 A − A T P t + 1 B ( R + B T P t + 1 B ) − 1 B T P t + 1 A P_t = Q+A^TP_{t+1}A-A^TP_{t+1}B(R+B^TP_{t+1}B)^{-1}B^TP_{t+1}A P t ​=Q +A T P t +1 ​A −A T P t +1 ​B (R +B T P t +1 ​B )−1 B T P t +1 ​A
同时又有P N = Q f P_N=Q_f P N ​=Q f ​,所以可以根据时间序列反向求解P N − 1 , P N − 2 , ⋯ , P 0 P_{N-1},P_{N-2},\cdots,P_0 P N −1 ​,P N −2 ​,⋯,P 0 ​,根据w ∗ w^
w ∗表达式可以顺序求解u t l q r u_t^{lqr}u t l q r ​。动态规划算法总结如下:

  1. 令P N = Q f P_N=Q_f P N ​=Q f ​;
  2. 对于t = N , ⋯ , 1 t=N,\cdots,1 t =N ,⋯,1,P t − 1 = Q + A T P t A − A T P t B ( R + B T P t B ) − 1 B T P t A P_{t-1}=Q+A^TP_{t}A-A^TP_{t}B(R+B^TP_{t}B)^{-1}B^TP_{t}A P t −1 ​=Q +A T P t ​A −A T P t ​B (R +B T P t ​B )−1 B T P t ​A
  3. 对于t = 0 , ⋯ , N − 1 t=0,\cdots,N-1 t =0 ,⋯,N −1,定义K t = ( R + B T P t + 1 B ) − 1 B T P t + 1 A K_t=(R+B^TP_{t+1}B)^{-1}B^TP_{t+1}A K t ​=(R +B T P t +1 ​B )−1 B T P t +1 ​A
  4. 对于t = 0 , ⋯ , N − 1 t=0,\cdots,N-1 t =0 ,⋯,N −1,最优控制为:u t l q r = K t x t u_t^{lqr}=K_tx_t u t l q r ​=K t ​x t ​

1.5 动态规划算法实现

问题描述:
两自由度,单输入单输出系统:
x t + 1 = [ 1 1 0 1 ] x t + [ 0 1 ] u t , y t = [ 1 0 ] x t x_{t+1}=\begin{bmatrix} 1 & 1\ 0 &1 \end{bmatrix}x_t+\begin{bmatrix}0 \1 \end{bmatrix}u_t,\ y_t=\begin{bmatrix} 1 & 0 \end{bmatrix}x_t x t +1 ​=[1 0 ​1 1 ​]x t ​+[0 1 ​]u t ​,y t ​=[1 ​0 ​]x t ​
初始状态x 0 = ( 1 , 0 ) , N = 20 x_0=(1, 0), N=20 x 0 ​=(1 ,0 ),N =2 0,权重矩阵:Q = Q f = C T C , R = ρ I Q=Q_f=C^TC, R=\rho I Q =Q f ​=C T C ,R =ρI,可取ρ 1 = 0.3 , ρ 2 = 10 \rho_1=0.3, \rho_2=10 ρ1 ​=0 .3 ,ρ2 ​=1 0。

clear;clc;
close all;

A = [1,1;0,1];
B = [0;1];
C = [1,0];
x0 = [1;0];

N = 20;
Q = C'*C;
Q_f = Q;
rho = 0.3;
R = rho*eye(size(B, 2));

P = zeros(2, 2, N);
P(:,:,N) = Q_f;

for i = N-1:-1:1
    P(:,:,i) = Q+A'*P(:,:,i+1)*A-A'*P(:,:,i+1)*B/(R+B'*P(:,:,i+1)*B)*B'*P(:,:,i+1)*A;
end

K = zeros(1, 2, N);
u = zeros(1,N);
x = zeros(2, N);
x(:, 1) = x0;
y = zeros(1, N);

for i = 1:1:N-1
   K(:, :, i) = -(R+B'*P(:,:,i+1)*B)\B'*P(:,:,i+1)*A;
   u(i) = K(:,:,i)*x(:,i);
   x(:, i+1) = A*x(:, i)+B*u(i);
   y(i) = C*x(:, i);
end

figure(1);
subplot(2,2,1);
plot(u, '-ob');
hold on;grid on;
subplot(2,2,3);
plot(y, '-ob');
hold on;grid on;
K1 = reshape(K(1,1,:), 1, N);
K2 = reshape(K(1,2,:), 1, N);
subplot(2,2,2);
hold on;grid on;
plot(K1, '-b');
ylabel('K1');
subplot(2,2,4);
hold on;grid on;
plot(K2, '-b');
ylabel('K2');

%%
rho = 10;
R = rho*eye(size(B, 2));

P = zeros(2, 2, N);
P(:,:,N) = Q_f;

for i = N-1:-1:1
    P(:,:,i) = Q+A'*P(:,:,i+1)*A-A'*P(:,:,i+1)*B/(R+B'*P(:,:,i+1)*B)*B'*P(:,:,i+1)*A;
end

K = zeros(1, 2, N);
u = zeros(1,N);
x = zeros(2, N);
x(:, 1) = x0;
y = zeros(1, N);

for i = 1:1:N-1
   K(:, :, i) = -(R+B'*P(:,:,i+1)*B)\B'*P(:,:,i+1)*A;
   u(i) = K(:,:,i)*x(:,i);
   x(:, i+1) = A*x(:, i)+B*u(i);
   y(i) = C*x(:, i);
end

figure(1);
subplot(2,2,1);
plot(u, '-*r');
ylabel('u');
hold on;grid on;
legend('\rho = 0.3', '\rho = 10');
subplot(2,2,3);
plot(y, '-*r');
ylabel('y');
hold on;grid on;
legend('\rho = 0.3', '\rho = 10');

K1 = reshape(K(1,1,:), 1, N);
K2 = reshape(K(1,2,:), 1, N);
subplot(2,2,2);
hold on;grid on;
plot(K1, '-r');
ylabel('K1');
legend('\rho = 0.3', '\rho = 10');
subplot(2,2,4);
hold on;grid on;
plot(K2, '-r');
ylabel('K2');
legend('\rho = 0.3', '\rho = 10');

运行结果如下:

LQR控制算法及其仿真实现
从上图结果可以发现,K t K_t K t ​从t = 0 t=0 t =0开始一段时间内为恒定值, 或者说P t P_t P t ​ 从N N N 反向开始后很快就能收敛到恒定值。
即有:
P s s = Q + A T P s s A − A T P s s B ( R + B T P s s B ) − 1 B T P s s A P_{ss} = Q+A^TP_{ss}A-A^TP_{ss}B(R+B^TP_{ss}B)^{-1}B^TP_{ss}A P s s ​=Q +A T P s s ​A −A T P s s ​B (R +B T P s s ​B )−1 B T P s s ​A
同时说明,对于不是很接近最终时刻的t t t时刻,LQR控制可以看作是一个线性定常反馈系统:
u t = K s s x t , K s s = − ( R + B T P s s B ) − 1 B T P s s u_t = K_{ss}x_t, K_{ss} = -(R+B^TP_{ss}B)^{-1}B^TP_{ss}u t ​=K s s ​x t ​,K s s ​=−(R +B T P s s ​B )−1 B T P s s ​
这在实际中经常用到。
另外讲义中也提到,最终态的权重矩阵对反馈增益没有影响,即P t P_t P t ​的初始值对其收敛值没有影响:
LQR控制算法及其仿真实现
另外用DP方法求解第一个问题耗时不超过0.02s。
LQR控制算法及其仿真实现

2 拉格朗日乘子法求解LQR

2.1 一些实用的矩阵特征

(1)
Z ( I + Z ) − 1 = I − ( I + Z ) − 1 Z(I+Z)^{-1}=I-(I+Z)^{-1}Z (I +Z )−1 =I −(I +Z )−1
其中( I + Z ) (I+Z)(I +Z )可逆。证明右边同乘( I + Z ) (I+Z)(I +Z )即可。
(2)
( I + X Y ) − 1 = I − X ( I + Y X ) − 1 Y (I+XY)^{-1}=I-X(I+YX)^{-1}Y (I +X Y )−1 =I −X (I +Y X )−1 Y
证明:
( I − X ( I + Y X ) − 1 Y ) ( I + X Y ) = I + X Y − X ( I + Y X ) − 1 Y ( I + X Y ) = I + X Y − X ( I + Y X ) − 1 ( I + Y X ) Y = I + X Y − X Y = I \begin{aligned}(I-X(I+YX)^{-1}Y)(I+XY) &= I+XY-X(I+YX)^{-1}Y(I+XY)\ &= I+XY-X(I+YX)^{-1}(I+YX)Y\ &= I+XY-XY=I \end{aligned}(I −X (I +Y X )−1 Y )(I +X Y )​=I +X Y −X (I +Y X )−1 Y (I +X Y )=I +X Y −X (I +Y X )−1 (I +Y X )Y =I +X Y −X Y =I ​
(3)
Y ( I + X Y ) − 1 = ( I + Y X ) − 1 Y Y(I+XY)^{-1}=(I+YX)^{-1}Y Y (I +X Y )−1 =(I +Y X )−1 Y
证明左乘( I + Y X ) (I+YX)(I +Y X )右乘( I + X Y ) (I+XY)(I +X Y )即可。速记:左边Y Y Y移进去,右边Y Y Y移出来。
(4)
( I + X Z − 1 Y ) − 1 = I − X ( Z + Y X ) − 1 Y (I+XZ^{-1}Y)^{-1}=I-X(Z+YX)^{-1}Y (I +X Z −1 Y )−1 =I −X (Z +Y X )−1 Y
证明直接使用公式(2)即可。
(5)
( A + B C ) − 1 = A − 1 − A − 1 B ( I + C A − 1 B ) − 1 C A − 1 (A+BC)^{-1}=A^{-1}-A^{-1}B(I+CA^{-1}B)^{-1}CA^{-1}(A +B C )−1 =A −1 −A −1 B (I +C A −1 B )−1 C A −1
证明:
( A + B C ) − 1 = ( A ( I + A − 1 B C ) ) − 1 = ( I + A − 1 B C ) − 1 A − 1 = ( I − A − 1 B ( I + C A − 1 B ) − 1 C ) A − 1 ( 使 用 公 式 ( 2 ) ) = A − 1 − A − 1 B ( I + C A − 1 B ) − 1 C A − 1 \begin{aligned}(A+BC)^{-1}&=(A(I+A^{-1}BC))^{-1}\ &=(I+A^{-1}BC)^{-1}A^{-1}\ &=(I-A^{-1}B(I+CA^{-1}B)^{-1}C)A^{-1} (使用公式(2))\ &=A^{-1}-A^{-1}B(I+CA^{-1}B)^{-1}CA^{-1} \end{aligned}(A +B C )−1 ​=(A (I +A −1 B C ))−1 =(I +A −1 B C )−1 A −1 =(I −A −1 B (I +C A −1 B )−1 C )A −1 (使用公式(2 ))=A −1 −A −1 B (I +C A −1 B )−1 C A −1 ​
(6)根据之前关于P t P_t P t ​的表达式可以进行化简:
P t = Q + A T P t + 1 A − A T P t + 1 B ( R + B T P t + 1 B ) − 1 B T P t + 1 A = Q + A T P t + 1 ( I − B ( R + B T P t + 1 B ) − 1 B T P t + 1 ) A = Q + A T P t + 1 ( I − B ( ( I + B T P t + 1 B R − 1 ) R ) − 1 B T P t + 1 ) A = Q + A T P t + 1 ( I − B R − 1 ( I + B T P t + 1 B R − 1 ) − 1 B T P t + 1 ) A = Q + A T P t + 1 ( I + B R − 1 B T P t + 1 ) − 1 A ( 使 用 公 式 ( 2 ) ) = Q + A T ( I + P t + 1 B R − 1 B T ) − 1 P t + 1 A \begin{aligned}P_t &= Q+A^TP_{t+1}A-A^TP_{t+1}B(R+B^TP_{t+1}B)^{-1}B^TP_{t+1}A\ &=Q+A^TP_{t+1}(I-B(R+B^TP_{t+1}B)^{-1}B^TP_{t+1})A\ &=Q+A^TP_{t+1}(I-B((I+B^TP_{t+1}BR^{-1})R)^{-1}B^TP_{t+1})A\ &=Q+A^TP_{t+1}(I-BR^{-1}(I+B^TP_{t+1}BR^{-1})^{-1}B^TP_{t+1})A\ &=Q+A^TP_{t+1}(I+BR^{-1}B^TP_{t+1})^{-1}A(使用公式(2))\ &=Q+A^T(I+P_{t+1}BR^{-1}B^T)^{-1}P_{t+1}A \end{aligned}P t ​​=Q +A T P t +1 ​A −A T P t +1 ​B (R +B T P t +1 ​B )−1 B T P t +1 ​A =Q +A T P t +1 ​(I −B (R +B T P t +1 ​B )−1 B T P t +1 ​)A =Q +A T P t +1 ​(I −B ((I +B T P t +1 ​B R −1 )R )−1 B T P t +1 ​)A =Q +A T P t +1 ​(I −B R −1 (I +B T P t +1 ​B R −1 )−1 B T P t +1 ​)A =Q +A T P t +1 ​(I +B R −1 B T P t +1 ​)−1 A (使用公式(2 ))=Q +A T (I +P t +1 ​B R −1 B T )−1 P t +1 ​A ​

2.2 线性约束最优化问题

m i n f ( x ) s . t . : F x = g \begin{aligned}min\enspace &f(x)\ s.t.:\enspace &Fx=g \end{aligned}m i n s .t .:​f (x )F x =g ​

  • f : R n → R f:\mathbf R^n \rightarrow \mathbf {R}f :R n →R
  • F ∈ R m × n F\in \mathbf R^{m\times n}F ∈R m ×n

拉格朗日表达式:
L ( x , λ ) = f ( x ) + λ T ( g − F x ) L(x,\lambda)=f(x)+\lambda ^T(g-Fx)L (x ,λ)=f (x )+λT (g −F x )
其中,λ \lambda λ为拉格朗日乘子。若x x x是最优解,则有:
∇ x L = ∇ f ( x ) − F T λ = 0 , ∇ λ L = g − F x = 0 \nabla xL=\nabla f(x)-F^T\lambda = 0, \enspace \nabla \lambda L=g-Fx=0 ∇x ​L =∇f (x )−F T λ=0 ,∇λ​L =g −F x =0
即∇ f ( x ) = F T λ \nabla f(x)=F^T\lambda ∇f (x )=F T λ

LQR控制算法及其仿真实现
  • 假设当前位置为x x x,为可行点,即F x = g Fx=g F x =g;
  • 考虑沿v v v方向移动很小距离h h h,到达x + h v x+hv x +h v位置;
  • 为了移动后仍为可行点,则需F ( x + h v ) = g + h F v = g F(x+hv)=g+hFv=g F (x +h v )=g +h F v =g,即F v = 0 Fv=0 F v =0,所以v ∈ N ( F ) v\in \Nu (F)v ∈N (F ),称为可行方向;
  • 需要移动后得到更小目标函数:f ( x + h v ) ≈ f ( x ) + h ∇ f ( x ) T v < f ( x ) f(x+hv)\approx f(x)+h\nabla f(x)^Tv

当∇ f ( x ) T v < 0 \nabla f(x)^Tv时,为目标函数下降方向。当存在下降方向时,x x x不为最优解,所以当x x x为最优解时,应满足∇ f ( x ) T v = 0 \nabla f(x)^Tv=0 ∇f (x )T v =0

; 2.3 LQR约束最优化求解

把LQR问题写成最优化问题:
m i n J = 1 2 ∑ t = 0 N − 1 ( x t T Q x t + u t T R u t ) + 1 2 x N T Q f x N s . t . x t + 1 = A x t + B u t , t = 0 , 1 , ⋯ , N − 1 min \enspace J=\frac{1}{2}\sum_{t=0}^{N-1}(x_t^TQx_t+u_t^TRu_t)+\frac{1}{2}x_N^TQ_fx_N\s.t. \enspace x_{t+1}=Ax_t+Bu_t, \enspace t=0,1,\cdots,N-1 m i n J =2 1 ​t =0 ∑N −1 ​(x t T ​Q x t ​+u t T ​R u t ​)+2 1 ​x N T ​Q f ​x N ​s .t .x t +1 ​=A x t ​+B u t ​,t =0 ,1 ,⋯,N −1
则有拉格朗日表达式:
L = J + ∑ t = 0 N − 1 λ t + 1 ( A x t + B u t − x t + 1 ) L=J+\sum_{t=0}^{N-1}\lambda {t+1}(Ax_t+Bu_t-x{t+1})L =J +t =0 ∑N −1 ​λt +1 ​(A x t ​+B u t ​−x t +1 ​)
则有:
∇ u t L = R u t + B T λ t + 1 = 0 , u t = − R − 1 B T λ t + 1 ∇ x t L = Q x t + A T λ t + 1 − λ t = 0 , λ t = A T λ t + 1 + Q x t ∇ x N L = Q f x N − λ N = 0 , λ N = Q f x N \nabla {u_t}L=Ru_t+B^T\lambda{t+1}=0,\enspace u_t=-R^{-1}B^T\lambda {t+1}\ \nabla {x_t}L=Qx_t+A^T\lambda_{t+1}-\lambda_t=0, \enspace \lambda t=A^T\lambda{t+1}+Qx_t\ \nabla {x_N}L=Q_fx_N-\lambda_N=0, \enspace \lambda_N=Q_fx_N ∇u t ​​L =R u t ​+B T λt +1 ​=0 ,u t ​=−R −1 B T λt +1 ​∇x t ​​L =Q x t ​+A T λt +1 ​−λt ​=0 ,λt ​=A T λt +1 ​+Q x t ​∇x N ​​L =Q f ​x N ​−λN ​=0 ,λN ​=Q f ​x N ​
对于原系统有:
x t + 1 = A x t + B u t , x 0 = x i n i t x
{t+1}=Ax_t+Bu_t, \enspace x_0=x^{init}x t +1 ​=A x t ​+B u t ​,x 0 ​=x i n i t
迭代计算是从0时刻向后进行,初始条件为起始状态。
现在有:
λ t = A T λ t + 1 + Q x t , λ N = Q f x N \lambda t=A^T\lambda{t+1}+Qx_t, \enspace \lambda _N=Q_fx_N λt ​=A T λt +1 ​+Q x t ​,λN ​=Q f ​x N ​
迭代计算从N N N时刻开始向前进行,初始条件为最终状态。
所以称λ \lambda λ为伴随状态,上式也称为伴随系统的状态方程。

可以用归纳法证明 λ t = P t x t \lambda_t=P_tx_t λt ​=P t ​x t ​:
对于t = N t=N t =N,有λ N = Q f x N \lambda N=Q_fx_N λN ​=Q f ​x N ​,现在假设λ t + 1 = P t + 1 x t + 1 \lambda{t+1}=P_{t+1}x_{t+1}λt +1 ​=P t +1 ​x t +1 ​成立,证明λ t = P t x t \lambda_t=P_tx_t λt ​=P t ​x t ​:
有:λ t + 1 = P t + 1 ( A x t + B u t ) = P t + 1 ( A x t − B R − 1 B T λ t + 1 ) \lambda_{t+1}=P_{t+1}(Ax_t+Bu_t)=P_{t+1}(Ax_t-BR^{-1}B^T\lambda {t+1})λt +1 ​=P t +1 ​(A x t ​+B u t ​)=P t +1 ​(A x t ​−B R −1 B T λt +1 ​)
所以:
λ t + 1 = ( I + P t + 1 B R − 1 B T ) − 1 P t + 1 A x t \lambda
{t+1}=(I+P_{t+1}BR^{-1}B^T)^{-1}P_{t+1}Ax_t λt +1 ​=(I +P t +1 ​B R −1 B T )−1 P t +1 ​A x t ​
所以:
λ t = A T λ t + 1 + Q x t = A T ( I + P t + 1 B R − 1 B T ) − 1 P t + 1 A x t + Q x t = P t x t \lambda t=A^T\lambda{t+1}+Qx_t=A^T(I+P_{t+1}BR^{-1}B^T)^{-1}P_{t+1}Ax_t+Qx_t=P_tx_t λt ​=A T λt +1 ​+Q x t ​=A T (I +P t +1 ​B R −1 B T )−1 P t +1 ​A x t ​+Q x t ​=P t ​x t ​
其中P t = Q + A T ( I + P t + 1 B R − 1 B T ) − 1 P t + 1 A P_t=Q+A^T(I+P_{t+1}BR^{-1}B^T)^{-1}P_{t+1}A P t ​=Q +A T (I +P t +1 ​B R −1 B T )−1 P t +1 ​A与之前化简后结果一致。

持续更新中…

reference

Original: https://blog.csdn.net/qq_42239519/article/details/122588697
Author: LaplaceVan
Title: LQR控制算法及其仿真实现

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/559142/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球