iLQR算法公式推导
背景
在之前的文章《离散系统的LQR推导》中,我们讨论了LQR在线性系统中的应用,我们讨论了该算法在线性时变系统中的一般情况,也讨论了在线性时不变系统中的特殊情况;以及**有限时域(Finite Horizon)和无限时域(Infinite Horizon)**这两种情况。不过在实际的控制系统中,大部分系统的模型都是非线性的,传统的LQR只能在系统的当前状态下对系统进行线性化近似,这种近似并不能保证基于线性假设得到的最优控制率实际上真的是最优的。
由于传统的LQR存在上述局限性,有人对它进行了改进,提出了iLQR算法,它的全称是迭代线性二次调节器(iterative Linear Quadratic Regulator),下面我们对该算法进行详细介绍。
算法概览
iLQR算法的关键思想是,在每次迭代中,所有非线性约束和目标都使用一阶或二阶泰勒级数展开来近似,因此,现在对标称轨迹的偏差进行操作的近似函数可以使用离散LQR来求解。最优反馈控制策略在“反向传播”阶段计算,因为和LQR算法一样,动态规划(Dynamic Programming)的步骤是从轨迹的尾部开始。然后在“正向传播”期间将“反向传播”期间得到的最优控制策略产生的控制输入偏差应用于标称轨迹的控制输入量,并使用更新之后的输入量来正向模拟更新轨迹(rollout)。这个过程在每次迭代中重复,直到收敛为止。
非线性系统优化问题
考虑下述非线性系统的优化问题:
代价(Cost):
J ( x 0 , U ) = ℓ f ( x N ) + ∑ k = 0 N − 1 ℓ ( x k , u k ) (1) J(x_0,U)=\ell_f(x_N)+\sum_{k=0}^{N-1}\ell(x_k,u_k)\tag{1} J(x0,U)=ℓf(xN)+k=0∑N−1ℓ(xk,uk)(1)
其中 U = [ u 0 T , ⋯ , u N − 1 T ] T U=[u_0^T,\cdots,u_{N-1}^T]^T U=[u0T,⋯,uN−1T]T ,
状态转移方程(State Transition Equation):
x k + 1 = f ( x k , u k ) (2) x_{k+1}=f(x_k,u_k)\tag{2} xk+1=f(xk,uk)(2)
根据《离散系统的LQR推导》的中的公式(8)中的相关推导,我们可以得到cost to go:
J ^ i = min u i { ℓ ( x i , u i ) + J ^ i + 1 } J ^ N = J N = ℓ f ( x N ) (3) \begin{aligned} \hat J_i &= \underset{u_i}\min{\{\ell(x_i,u_i)+\hat J_{i+1}\}}\\ \hat J_N &= J_N = \ell_f(x_N) \end{aligned} \tag{3} J^iJ^N=uimin{ℓ(xi,ui)+J^i+1}=JN=ℓf(xN)(3)
根据上述公式,我们可以发现 J ^ N \hat J_N J^N 是关于 x N x_N xN 的函数,即 J ^ N ( x N ) = ℓ N ( x N ) \hat J_N(x_N)=\ell_N(x_N) J^N(xN)=ℓN(xN) ,下面再看 J N − 1 J_{N-1} JN−1 :
J ^ N − 1 = min u N − 1 { ℓ ( x N − 1 , u N − 1 ) + J ^ N ( x N ) } = min u N − 1 { ℓ ( x N − 1 , u N − 1 ) + J ^ N ( f ( x N − 1 , u N − 1 ) ) } = min u N − 1 J ~ N − 1 ( x N − 1 , u N − 1 ) = J ^ N − 1 ( x N − 1 ) (4) \begin{aligned} \hat J_{N-1} &= \underset{u_{N-1}}\min{\{\ell(x_{N-1},u_{N-1})+\hat J_N(x_N)\}}\\ &= \underset{u_{N-1}}\min{\{\ell(x_{N-1},u_{N-1})+\hat J_N(f(x_{N-1},u_{N-1}))\}}\\ &=\underset{u_{N-1}}\min{\tilde J_{N-1}(x_{N-1},u_{N-1})}\\ &=\hat J_{N-1}(x_{N-1}) \end{aligned} \tag{4} J^N−1=uN−1min{ℓ(xN−1,uN−1)+J^N(xN)}=uN−1min{ℓ(xN−1,uN−1)+J^N(f(xN−1,uN−1))}=uN−1minJ~N−1(xN−1,uN−1)=J^N−1(xN−1)(4)
根据上式,可知 J ^ ( x N − 1 ) \hat J(x_{N-1}) J^(xN−1) 是关于 x N − 1 x_{N-1} xN−1 的函数,因为在对 u N − 1 u_{N-1} uN−1 做优化之后,会得到一个最优控制率 u ^ N − 1 ( x N − 1 ) \hat u_{N-1}(x_{N-1}) u^N−1(xN−1) 是关于, x N − 1 x_{N-1} xN−1 的函数。如果我们将最优控制率 u ^ N − 1 ( x N − 1 ) \hat u_{N-1}(x_{N-1}) u^N−1(xN−1) 代入到公式(4),得到的最优代价 J ^ N − 1 ( x N − 1 ) \hat J_{N-1}(x_{N-1}) J^N−1(xN−1) 也是 x N − 1 x_{N-1} xN−1 的函数,因此根据公式(3)的递归,可知:
J ^ i = min u i { ℓ ( x i , u i ) + J ^ i + 1 } = min u i J ~ ( x i , u i ) = J ^ i ( x i ) (5) \begin{aligned} \hat J_i &= \underset{u_i}\min{\{\ell(x_i,u_i)+\hat J_{i+1}\}}\\ &=\underset{u_i}\min{\tilde{J}(x_i,u_i)}\\ &=\hat J_i(x_i) \end{aligned} \tag{5} J^i=uimin{ℓ(xi,ui)+J^i+1}=uiminJ~(xi,ui)=J^i(xi)(5)
即第 i i i 个最优代价(cost to go) J ^ i ( x i ) \hat J_i(x_i) J^i(xi) 是 x i x_i xi 的函数,还未对 u i u_i ui 进行优化的代价 J ~ ( x i , u i ) \tilde{J}(x_i,u_i) J~(xi,ui) 是 x i x_i xi 和 u i u_i ui 的函数。
使用泰勒展开将系统局部线性化
为了线性化非线性系统以及非线性代价函数,我们需要对 f ( x k , u k ) f(x_k,u_k) f(xk,uk) 、 J ^ i ( x i ) \hat J_i(x_i) J^i(xi) 和 J ~ ( x i , u i ) \tilde{J}(x_i,u_i) J~(xi,ui) 进行泰勒展开,我们首先对 f ( x k , u k ) f(x_k,u_k) f(xk,uk) 进行泰勒展开,根据泰勒展开的公式:
f ( x i , u i ) + δ f ( x i , u i ) = f ( x i + δ x i , u i + δ u i ) ≈ f ( x i , u i ) + ∂ f ( x i , u i ) ∂ x i δ x i + ∂ f ( x i , u i ) ∂ u i δ u i ≜ f ( x i , u i ) + A i δ x i + B i δ u i (6) \begin{aligned} f(x_i,u_i)+\delta f(x_i,u_i)&=f(x_i+\delta x_i,u_i+\delta u_i)\\ &\approx f(x_i,u_i)+\frac{\partial f(x_i,u_i)}{\partial x_i}\delta x_i+\frac{\partial f(x_i,u_i)}{\partial u_i}\delta u_i\\ &\triangleq f(x_i,u_i)+A_i\delta x_i+B_i\delta u_i \end{aligned} \tag{6} f(xi,ui)+δf(xi,ui)=f(xi+δxi,ui+δui)≈f(xi,ui)+∂xi∂f(xi,ui)δxi+∂ui∂f(xi,ui)δui≜f(xi,ui)+Aiδxi+Biδui(6)
下面对 J ^ i ( x i ) \hat J_i(x_i) J^i(xi) 进行泰勒展开,根据泰勒展开的公式:
J ^ i ( x i ) + δ J ^ i ( x i ) = J ^ i ( x i + δ x i ) ≈ J ^ i ( x i ) + ∂ J ^ i ( x i ) ∂ x i δ x i + 1 2 δ x i T ∂ 2 J ^ i ( x i ) ∂ x i 2 δ x i ≜ J ^ i ( x i ) + p i T δ x i + 1 2 δ x i T P i δ x i (7) \begin{aligned} \hat J_i(x_i)+\delta \hat J_i(x_i)=\hat J_i(x_i+\delta x_i)&\approx\hat J_i(x_i)+\frac{\partial \hat J_i(x_i)}{\partial x_i}\delta x_i+\frac{1}{2}\delta x_i^T\frac{\partial^2 \hat J_i(x_i)}{\partial x_i^2}\delta x_i\\ &\triangleq \hat J_i(x_i)+p_i^T\delta x_i+\frac{1}{2}\delta x_i^TP_i\delta x_i \end{aligned} \tag{7} J^i(xi)+δJ^i(xi)=J^i(xi+δxi)≈J^i(xi)+∂xi∂J^i(xi)δxi+21δxiT∂xi2∂2J^i(xi)δxi≜J^i(xi)+piTδxi+21δxiTPiδxi(7)
上式定义 p i T ≜ ∂ J ^ i ( x i ) ∂ x i p_i^T\triangleq\frac{\partial \hat J_i(x_i)}{\partial x_i} piT≜∂xi∂J^i(xi) , P i ≜ ∂ 2 J ^ i ( x i ) ∂ x i 2 P_i\triangleq\frac{\partial^2 \hat J_i(x_i)}{\partial x_i^2} Pi≜∂xi2∂2J^i(xi) ,下面对 J ~ i ( x i , u i ) \tilde{J}_i(x_i,u_i) J~i(xi,ui) 进行泰勒展开,根据泰勒展开的公式:
J ~ i ( x i , u i ) + δ J ~ i ( x i , u i ) = J ~ i ( x i + δ x i , u i + δ u i ) ≈ J ~ i ( x i , u i ) + ∂ J ~ i ( x i , u i ) ∂ x i δ x i + ∂ J ~ i ( x i , u i ) ∂ u i δ u i + 1 2 δ x i T ∂ 2 J ~ i ( x i , u i ) ∂ x i 2 δ x i + 1 2 δ u i T ∂ 2 J ~ i ( x i , u i ) ∂ u i 2 δ u i + 1 2 δ x i T ∂ 2 J ~ i ( x i , u i ) ∂ x i ∂ u i δ u i + 1 2 δ u i T ∂ 2 J ~ i ( x i , u i ) ∂ u i ∂ x i δ x i ≜ J ~ i ( x i , u i ) + Q x i T δ x i + Q u i T δ u i + 1 2 δ x i T Q x i 2 δ x i + 1 2 δ u i T Q u i 2 δ u i + 1 2 δ x i T Q x i u i δ u i + 1 2 δ u i T Q u i x i T δ x i (8) \begin{aligned} \tilde{J}_i(x_i,u_i)+\delta \tilde{J}_i(x_i,u_i) &=\tilde{J}_i(x_i+\delta x_i,u_i+\delta u_i)\\ &\approx\tilde{J}_i(x_i,u_i)+\frac{\partial \tilde{J}_i(x_i,u_i)}{\partial x_i}\delta x_i+\frac{\partial \tilde{J}_i(x_i,u_i)}{\partial u_i}\delta u_i\\ &+\frac{1}{2}\delta x_i^T\frac{\partial^2 \tilde{J}_i(x_i,u_i)}{\partial x_i^2}\delta x_i+\frac{1}{2}\delta u_i^T\frac{\partial^2 \tilde{J}_i(x_i,u_i)}{\partial u_i^2}\delta u_i\\ &+\frac{1}{2}\delta x_i^T\frac{\partial^2 \tilde{J}_i(x_i,u_i)}{\partial x_i\partial u_i}\delta u_i+\frac{1}{2}\delta u_i^T\frac{\partial^2 \tilde{J}_i(x_i,u_i)}{\partial u_i\partial x_i}\delta x_i\\ &\triangleq \tilde{J}_i(x_i,u_i)+Q_{x_i}^T\delta x_i+Q_{u_i}^T\delta u_i+\frac{1}{2}\delta x_i^TQ_{x_i^2}\delta x_i+\frac{1}{2}\delta u_i^TQ_{u_i^2}\delta u_i\\ &+\frac{1}{2}\delta x_i^TQ_{x_iu_i}\delta u_i+\frac{1}{2}\delta u_i^TQ_{u_ix_i}^T\delta x_i \end{aligned} \tag{8} J~i(xi,ui)+δJ~i(xi,ui)=J~i(xi+δxi,ui+δui)≈J~i(xi,ui)+∂xi∂J~i(xi,ui)δxi+∂ui∂J~i(xi,ui)δui+21δxiT∂xi2∂2J~i(xi,ui)δxi+21δuiT∂ui2∂2J~i(xi,ui)δui+21δxiT∂xi∂ui∂2J~i(xi,ui)δui+21δuiT∂ui∂xi∂2J~i(xi,ui)δxi≜J~i(xi,ui)+QxiTδxi+QuiTδui+21δxiTQxi2δxi+21δuiTQui2δui+21δxiTQxiuiδui+21δuiTQuixiTδxi(8)
于是有:
δ J ~ i ( x i , u i ) = 1 2 [ δ x i δ u i ] T [ Q x i 2 Q x i u i Q u i x i Q u i 2 ] [ δ x i δ u i ] + [ Q x i Q u i ] T [ δ x i δ u i ] (9) \begin{aligned} \delta \tilde{J}_i(x_i,u_i)&=\frac{1}{2} \begin{bmatrix} \delta x_i \\ \delta u_i \end{bmatrix}^T \begin{bmatrix} Q_{x_i^2} & Q_{x_iu_i}\\ Q_{u_ix_i} & Q_{u_i^2} \end{bmatrix} \begin{bmatrix} \delta x_i \\ \delta u_i \end{bmatrix}+ \begin{bmatrix} Q_{x_i} \\ Q_{u_i} \end{bmatrix}^T \begin{bmatrix} \delta x_i \\ \delta u_i \end{bmatrix}\\ \end{aligned} \tag{9} δJ~i(xi,ui)=21[δxiδui]T[Qxi2QuixiQxiuiQui2][δxiδui]+[QxiQui]T[δxiδui](9)
根据公式(5)的定义,有:
J ~ i ( x i , u i ) + δ J ~ i ( x i , u i ) = J ~ i ( x i + δ x i , u i + δ u i ) = ℓ ( x i + δ x i , u i + δ u i ) ⏟ 10.1 + J ^ i + 1 ( f ( x i + δ x i , u i + δ u i ) ) ⏟ 10.2 (10) \begin{aligned} \tilde{J}_i(x_i,u_i)+\delta \tilde{J}_i(x_i,u_i) &=\tilde{J}_i(x_i+\delta x_i,u_i+\delta u_i)\\ &=\underbrace{\ell(x_i+\delta x_i,u_i+\delta u_i)}_{10.1}+\underbrace{\hat J_{i+1}(f(x_i+\delta x_i,u_i+\delta u_i))}_{10.2}\\ \end{aligned} \tag{10} J~i(xi,ui)+δJ~i(xi,ui)=J~i(xi+δxi,ui+δui)=10.1 ℓ(xi+δxi,ui+δui)+10.2 J^i+1(f(xi+δxi,ui+δui))(10)
首先对上述的公式(10.1)进行泰勒展开,根据泰勒展开的公式:
ℓ ( x i + δ x i , u i + δ u i ) = ℓ ( x i , u i ) + δ ℓ ( x i , u i ) ≈ ℓ ( x i , u i ) + ∂ ℓ ( x i , u i ) ∂ x i δ x i + ∂ ℓ ( x i , u i ) ∂ u i δ u i + 1 2 δ x i T ∂ 2 ℓ ( x i , u i ) ∂ x i 2 δ x i + 1 2 δ u i T ∂ 2 ℓ ( x i , u i ) ∂ u i 2 δ u i + 1 2 δ x i T ∂ 2 ℓ ( x i , u i ) ∂ x i ∂ u i δ u i + 1 2 δ u i T ∂ 2 ℓ ( x i , u i ) ∂ u i ∂ x i δ x i ≜ ℓ ( x i , u i ) + ℓ x i T δ x i + ℓ u i T δ u i + 1 2 δ x i T ℓ x i 2 δ x i + 1 2 δ u i T ℓ u i 2 δ u i + 1 2 δ x i T ℓ x i u i δ u i + 1 2 δ u i T ℓ u i x i δ x i (10.1) \begin{aligned} \ell(x_i+\delta x_i,u_i+\delta u_i)&=\ell(x_i,u_i)+\delta \ell(x_i,u_i)\\ &\approx\ell(x_i,u_i)+\frac{\partial \ell(x_i,u_i)}{\partial x_i}\delta x_i+\frac{\partial \ell(x_i,u_i)}{\partial u_i}\delta u_i\\ &+\frac{1}{2}\delta x_i^T\frac{\partial ^ 2\ell(x_i,u_i)}{\partial x_i^2}\delta x_i+\frac{1}{2}\delta u_i^T\frac{\partial ^ 2\ell(x_i,u_i)}{\partial u_i^2}\delta u_i\\ &+\frac{1}{2}\delta x_i^T\frac{\partial ^ 2\ell(x_i,u_i)}{\partial x_i\partial u_i}\delta u_i+\frac{1}{2}\delta u_i^T\frac{\partial ^ 2\ell(x_i,u_i)}{\partial u_i\partial x_i}\delta x_i\\ &\triangleq\ell(x_i,u_i)+\ell_{x_i}^T\delta x_i+\ell_{u_i}^T\delta u_i+\frac{1}{2}\delta x_i^T\ell_{x_i^2}\delta x_i+\frac{1}{2}\delta u_i^T\ell_{u_i^2}\delta u_i\\ &+\frac{1}{2}\delta x_i^T\ell_{x_iu_i}\delta u_i+\frac{1}{2}\delta u_i^T\ell_{u_ix_i}\delta x_i\\ \end{aligned} \tag{10.1} ℓ(xi+δxi,ui+δui)=ℓ(xi,ui)+δℓ(xi,ui)≈ℓ(xi,ui)+∂xi∂ℓ(xi,ui)δxi+∂ui∂ℓ(xi,ui)δui+21δxiT∂xi2∂2ℓ(xi,ui)δxi+21δuiT∂ui2∂2ℓ(xi,ui)δui+21δxiT∂xi∂ui∂2ℓ(xi,ui)δui+21δuiT∂ui∂xi∂2ℓ(xi,ui)δxi≜ℓ(xi,ui)+ℓxiTδxi+ℓuiTδui+21δxiTℓxi2δxi+21δuiTℓui2δui+21δxiTℓxiuiδui+21δuiTℓuixiδxi(10.1)
然后对公式(9.2)进行泰勒展开,根据泰勒展开的公式:
J ^ i + 1 ( f ( x i + δ x i , u i + δ u i ) ) = J ^ i + 1 ( f ( x i , u i ) + δ f ( x i , u i ) ) ≈ J ^ i + 1 ( f ( x i , u i ) ) + ∂ J ^ i + 1 ( x i + 1 ) ∂ x i + 1 δ f ( x i , u i ) + 1 2 δ f ( x i , u i ) T ∂ 2 J ^ i + 1 ( x i + 1 ) ∂ x i + 1 2 δ f ( x i , u i ) = J ^ i + 1 ( f ( x i , u i ) ) + p i + 1 T ( A i δ x i + B i δ u i ) + 1 2 ( A i δ x i + B i δ u i ) T P i + 1 ( A i δ x i + B i δ u i ) = J ^ i + 1 ( f ( x i , u i ) ) + p i + 1 T A i δ x i + p i + 1 T B i δ u i + 1 2 δ x i T A i T P i + 1 A i δ x i + 1 2 δ x i T A i T P i + 1 B i δ u i + 1 2 δ u i T B i T P i + 1 A i δ x i + 1 2 δ u i T B i T P i + 1 B i δ u i (10.2) \begin{aligned} \hat J_{i+1}(f(x_i+\delta x_i,u_i+\delta u_i))&=\hat J_{i+1}(f(x_i,u_i)+\delta f(x_i,u_i))\\ &\approx \hat J_{i+1}(f(x_i,u_i))+\frac{\partial \hat J_{i+1}(x_{i+1})}{\partial x_{i+1}}\delta f(x_i,u_i)\\ &+\frac{1}{2}\delta f(x_i,u_i)^T\frac{\partial^2 \hat J_{i+1}(x_{i+1})}{\partial x_{i+1}^2}\delta f(x_i,u_i)\\ &=\hat J_{i+1}(f(x_i,u_i))+ p_{i+1}^T(A_i\delta x_i+B_i\delta u_i)\\ &+\frac{1}{2}(A_i\delta x_i+B_i\delta u_i)^TP_{i+1}(A_i\delta x_i+B_i\delta u_i)\\ &=\hat J_{i+1}(f(x_i,u_i))+p_{i+1}^TA_i\delta x_i+p_{i+1}^TB_i\delta u_i\\ &+\frac{1}{2}\delta x_i^TA_i^TP_{i+1}A_i\delta x_i+\frac{1}{2}\delta x_i^TA_i^TP_{i+1}B_i\delta u_i\\ &+\frac{1}{2}\delta u_i^TB_i^TP_{i+1}A_i\delta x_i+\frac{1}{2}\delta u_i^TB_i^TP_{i+1}B_i\delta u_i\\ \end{aligned} \tag{10.2} J^i+1(f(xi+δxi,ui+δui))=J^i+1(f(xi,ui)+δf(xi,ui))≈J^i+1(f(xi,ui))+∂xi+1∂J^i+1(xi+1)δf(xi,ui)+21δf(xi,ui)T∂xi+12∂2J^i+1(xi+1)δf(xi,ui)=J^i+1(f(xi,ui))+pi+1T(Aiδxi+Biδui)+21(Aiδxi+Biδui)TPi+1(Aiδxi+Biδui)=J^i+1(f(xi,ui))+pi+1TAiδxi+pi+1TBiδui+21δxiTAiTPi+1Aiδxi+21δxiTAiTPi+1Biδui+21δuiTBiTPi+1Aiδxi+21δuiTBiTPi+1Biδui(10.2)
综上我们分别得到 Q x i Q_{x_i} Qxi , Q u i Q_{u_i} Qui , Q x i 2 Q_{x_i^2} Qxi2 , Q x i u i Q_{x_iu_i} Qxiui , Q u i x i Q_{u_ix_i} Quixi , Q u i 2 Q_{u_i^2} Qui2 :
Q x i = ℓ x i + A i T p i + 1 Q u i = ℓ u i + B i T p i + 1 Q x i 2 = ℓ x i 2 + A i T P i + 1 A i Q x i u i = ℓ x i u i + A i T P i + 1 B i Q u i x i = ℓ u i x i + B i T P i + 1 A i Q u i 2 = ℓ u i 2 + B i T P i + 1 B i (11) \begin{aligned} Q_{x_i}&=\ell_{x_i}+A_i^Tp_{i+1}\\ Q_{u_i}&=\ell_{u_i}+B_i^Tp_{i+1}\\ Q_{x_i^2}&=\ell_{x_i^2}+A_i^TP_{i+1}A_i\\ Q_{x_iu_i}&=\ell_{x_iu_i}+A_i^TP_{i+1}B_i\\ Q_{u_ix_i}&=\ell_{u_ix_i}+B_i^TP_{i+1}A_i\\ Q_{u_i^2}&=\ell_{u_i^2}+B_i^TP_{i+1}B_i\\ \end{aligned} \tag{11} QxiQuiQxi2QxiuiQuixiQui2=ℓxi+AiTpi+1=ℓui+BiTpi+1=ℓxi2+AiTPi+1Ai=ℓxiui+AiTPi+1Bi=ℓuixi+BiTPi+1Ai=ℓui2+BiTPi+1Bi(11)
我们有必要注意一下公式(6)中对于状态转移函数 f f f的泰勒展开,会发现它只保留了一阶项,这一点是iLQR算法和DDP(differential dynamic programming)算法的一个主要区别,DDP算法会计算全二阶展开,应该在计算上会更加复杂。
(未完待续)