数值优化:经典二阶确定性算法与对偶方法

1 牛顿法

牛顿法[1]的基本思想是将目标函数在当前迭代点处进行二阶泰勒展开,然后最小化这个近似目标函数,即

[\underset{w\in \mathcal{W}}{\text{min}} f(w) \approx \underset{w \in W}{\text{min}} f(w^t) + \nabla f(w^t)^T(w – w^t) + \frac{1}{2}(w – w^t)^T\nabla^2f(w^t)(w-w^t) ]

此处(\nabla^2f(w^t))是目标函数在当前迭代点(w^t)处的海森矩阵。如果该海森矩阵是正定的,则上述问题的最优值 (w^t -[\nabla^2f(w^t)]^{-1}\nabla f(w^t))处取到,牛顿法将其做为下一时刻的状态,即

[w^{t+1} = w^t -[\nabla^2f(w^t)]^{-1}\nabla f(w^t) ]

下图给了一个简单实例:

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:c5ab9fb1-d6e9-4199-8e37-c766442f4b63

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:3b9259eb-8d98-475c-9b36-f4ece2971724

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:c8b8ed4e-f51b-4261-98ae-2520adcc72c8

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:be7af184-b54e-4e58-8d25-6110d67216f8

牛顿法收敛性 假设目标函数(f: \mathbb{R}^d \rightarrow \mathbb{R})的导数(\nabla f(w))是光滑的,存在二阶导数,并且在其最优点(x^)处满足(\nabla f\left(x^{}\right)=0, \nabla^{2} f\left(x^{}\right) \succ 0)且初始点(x^0)与最优点(x^)足够近,那么牛顿法具有(\mathcal{O}(e^{-2^t}))的 超线性(二阶)收敛速率

[\lVert w^t – w^*\rVert \leqslant \mathcal{O}(e^{-2^t}) ]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:1d96a1f3-a440-41cd-90df-151e3e26b274

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:79b409ae-45a5-4917-9010-7e25e07e86c5

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:3dddd34b-2ebe-4b39-9188-48e801ab1f31

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:22343861-0d17-4e5a-8bd6-aef0ad19aea6

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:88223c70-35fe-4bfa-a3ab-a6fc4266122d

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:59622fdf-cd91-4143-a32e-cc16caf65e27

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:886d1289-2f21-45d5-8b01-ae16e7813b2e

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:2d77091d-dfce-427c-8825-d642cb0b936e

2 拟牛顿法

既然海森矩阵不一定正定,那就构造一个与海森矩阵相差不太远的正定矩阵作为其替代品,这正是拟牛顿法[5]的主要思想。此外,逆牛顿法可以迭代更新海森逆矩阵,而不是每次迭代都需要重新进行逆矩阵的计算。

记(H_t = \nabla^2 f(w^t)),(M^t = [\nabla^2 f(w^t)]^{-1})。在第(t+1)轮迭代,对第(t)轮迭代的海森矩阵(H_t)加上一个或者两个秩为1的矩阵作为对(t+1)时刻的海森矩阵(H^{t+1})的估计。例如:

[H^{t+1} = H^t + aa^T + bb^T ]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:4ffa3eac-a39c-4618-b209-e77a1ad74507

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:878a48e7-a6e6-4461-9fc2-e9af0bcba28d

对(f(w))在(w^t)处的二阶泰勒展开的左右两边计算梯度,可得

[\nabla f(w) \approx \nabla f(w^t) + H^t(w-w^t) ]

令(\delta^t_1 = w^{t+1}-w^t)为模型的更新量,(\delta^t_2=\nabla f(w^{t+1}) – \nabla f(w^t))为目标函数导数的更新量,则我们有:

[(H^t)^{-1}\delta^t_2 \approx \delta^t_1 ]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:a1ff0be8-2b6d-4b77-b761-23456fcd0016

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:a8d18394-0c54-4498-b6ae-90fc7e4c5e67

[\begin{aligned} &H^0 = I\ &H^{t+1}= H^t + \frac{\delta^t_2 (\delta^t_2)^{T}}{(\delta^t_2)^T\delta^t_1} – \frac{H^t\delta^t_1(H^t\delta^t_1)^T }{(\delta^t_1)^TH^t\delta^t_1} \ &M^{t+1} = \left(I – \frac{\delta^t_1(\delta^t_2)^T}{(\delta^t_2)^T\delta^t_1}\right)M^t(I – \frac{\delta^t_2 (\delta^t_1)^T}{(\delta^t_2)^T\delta^t_1}) + \frac{\delta_1^t (\delta_1)^T}{(\delta^t_2)^T\delta^t_1} \end{aligned} ]

此时,所对应的算法被称作BFGS算法,如下面伪代码所示:

在实际应用中基于(M^t)的拟牛顿法更实用,应为根据(M^t)计算下降方向的方法不需要对(H^t)矩阵求逆(解线性方程,其复杂度为(\mathcal{O}(d^3)),非常耗时)。但基于(H^t)的拟牛顿法有比较好的理论性质,产生的迭代序列比较稳定,但如果有办法快速求解线性方程组,我们也可以采用基于(H^t)的拟牛顿法。此外在某些场景下,比如有些带约束的优化问题的算法设计,由于需要用到海森矩阵的近似,(H^t)的使用也很常见。

当使用其它的海森矩阵近似方式和约束条件时,我们还可以设计出其他拟牛顿法,比如DFP[3][4]、Broyden[5]、SR1[6]等。

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:61a28642-51e3-4ca1-8cf5-36518fc94068

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:7119ca93-5467-4891-88c8-32c79c16c9cb

3 对偶方法

我们到目前为止提到的各种优化算法都是直接求解原始优化问题。某些时候,如果把原始优化问题转化成对偶优化问题,会更容易求解。比如,当原始问题的变量维度很高,但是约束条件个数不太多时,对偶问题的复杂度(对偶变量的维度对应于约束条件的个数)会远小于原始问题的复杂度,因而更容易求解(这也是支持向量机高效的主要原因)。本节我们将介绍与此相关的对偶理论以及常见的对偶优化算法[7]。

考虑下述原始优化问题(P_0):

[\underset{w}{\text{min}} f(w) \ \begin{aligned} \quad\quad\quad\quad\quad\text{s.t.} \quad\quad\quad &g_i(w) \leqslant 0, \quad i=1,\cdots,m_1 \ &h_j(w) = 0, \quad j=1,\cdots,m_2 \end{aligned} ]

包含(m_1)个不等式约束和(m_2)个等式约束。假设问题(P_0)的最优解为(p^*)。

首先,我们将有约束问题转化为无约束优化问题,即构造(P_0)的拉格朗日函数,将约束条件转化到目标函数中。考虑原始问题(P_0)中带有的约束条件,例如,当(g_1(w^0)\geqslant0)时,(w^0)就不是一个可行解,这时我们需要把对应的目标函数之补充定义为正无穷。于是可将原始优化问题转换为如下的无约束优化形式:

[\underset{w}{\text{min}} f(w) + \infty \sum_{i=1}^{m_1} I_{g_i(w)>0} + \infty \sum_{j=1}^{m_2}I_{h_j(w)\neq 0} ]

进一步地,我们用线性函数(\lambda x (\lambda > 0))和(\nu x)来近似指示函数(I_{[x>0]})和(I_{[x\neq 0]}),得到原始问题(P_0)所对应的拉格朗日函数:

[L(w, \lambda, \nu) \triangleq f(w) + \sum_{i=1}^{m_1}\lambda_ig_i(w) + \sum_{j=1}^{m_2}\nu_jh_j(w) ]

其中,(\lambda_i\geqslant 0),(\nu_j\in \mathbb{R})称为拉格朗日乘子。

接下来,我们讨论拉格朗日函数和原目标函数取值的大小关系,引出拉格朗日对偶函数的定义。如果假设(w^0)满足所有约束条件,那额拉格朗日函数中的第二项小于等于零,第三项等于零,即

[L(w, \lambda, \nu) \leqslant f(w),\quad \text{对任意可行解}w ]

在可行域(\mathcal{W})上对上述不等式取最小值,得到

[\underset{w\in \mathcal{W}}{\text{inf}}L(w, \lambda, v) \leqslant \underset{w\in\mathcal{W}}{\text{min}} f(w) = p^* ]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:725eabfc-f54c-4754-bf30-2e865b62648a

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:efb4fbc3-b131-41c0-a0ad-b610df0950c3

[h(\lambda, \nu) \triangleq \underset{w\in \mathcal{W}}{\text{inf}}L(w, \lambda, v) \leqslant p^*, \quad 其中\lambda_i\geqslant0, \nu_j \in \mathbb{R} ]

由于拉格朗日对偶函数(h(\lambda, \nu))是一族关于((\lambda, \nu))仿射函数的逐点下确界,所以即使原始问题(P_0)不是凸的,拉格朗日对偶函数(h(\lambda, \nu))也是凹的。

接下来我们定义对偶优化问题。既然拉格朗日对偶函数的取值是原始物体最优值的下界,在拉格朗日乘子(\lambda),(\nu)的取值空间对函数(h(\lambda, \nu))取最大值,将会得到原始问题最优解的最大下界。这个问题通常被定义为原始问题(P_0)的对偶问题,记为(D_0),具体如下:

[\underset{\lambda, \nu}{\text{max}} \space h(\lambda, \nu) \ \begin{aligned} \quad\quad\quad\text{s.t.} \quad\quad\quad &\lambda_i \geqslant 0, \quad i=1,\cdots,m \ \end{aligned} ]

对偶问题是一个对凹函数在可行域(凸集)内求解最大值的问题,记最优值为(d^)。通过上面的讨论,(d^

最后,可通过求解对偶问题来得到原始问题的解。在求解对偶问题的过程中,我们可以使用前面的各种一阶或二阶优化方法。假设对偶问题求得的解为((\lambda^, \nu^)),将其代入拉格朗日函数中,对原始变量求拉格朗日函数的最小值,即:

[\underset{w}{\text{min}}\space L(w, \lambda^, \nu^) ]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:0a6d96b1-440a-4234-8159-6b8f8e8ed8d8

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:905118ec-5f53-40b5-8fae-e33b3742959f

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:27db1583-22be-4973-9898-9e99c52d3723

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:967aa7fb-2417-4698-a568-3128d1e51131

对于带有线性约束的凸优化问题,对偶坐标上升法被证明至少具有线性收敛速率[6]。

4 对确定性算法方法的小结

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:b97b3fa3-81b0-4c60-a424-2a2b012d30b7

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:3e466456-563e-4d98-9085-6f7bc6ad5a88

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:24fcd858-e71e-4230-9c3d-56163dad02ba

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:073fe00c-4b73-419d-ac5d-d71f62e9ce30

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:11cc7b28-0d27-4733-a656-f493378437dc

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:ed57b9a0-05ab-4cfb-8229-71d7f73a7065

* Nesterov加速法将梯度下降法速率中关于条件数的阶数进一步改进(但仍然是线性收敛速率)。
* 投影次梯度法和Frank-Wolfe方法都可以用于解决带有约束的优化问题,两种方法的收敛速率都为次线性,具体可以依据投影操作的难易程度来选择。
* 一些拟牛顿法(比如BFGS算法)也可以和牛顿法一样达到二阶收敛速率。

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:35ccbf47-5c96-418f-b9a9-ee494629ced3

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:7eed50d3-0581-4d39-8cdc-6b47e7c1952a

  • [1] Polyak B T. Newton’s method and its use in optimization[J]. European Journal of Operational Research, 2007, 181(3): 1086-1096.

  • [2] Dennis, Jr J E, Moré J J. Quasi-Newton methods, motivation and theory[J]. SIAM review, 1977, 19(1): 46-89.

  • [3] Davidon W C. Variable metric method for minimization[J]. SIAM Journal on Optimization, 1991, 1(1): 1-17.

  • [4] Fletcher R, Powell M J D. A rapidly convergent descent method for minimization[J]. The computer journal, 1963, 6(2): 163-168.

  • [5] Broyden C G. A class of methods for solving nonlinear simultaneous equations[J]. Mathematics of computation, 1965, 19(92): 577-593.

  • [6] Luo Z Q, Tseng P. Error bounds and convergence analysis of feasible descent methods: a general approach[J]. Annals of Operations Research, 1993, 46(1): 157-178.

  • [7] Numerical optimization[M]. New York, NY: Springer New York, 1999.

  • [8] 刘浩洋,户将等. 最优化:建模、算法与理论[M]. 高教出版社, 2020.

  • [9] 刘铁岩,陈薇等. 分布式机器学习:算法、理论与实践[M]. 机械工业出版社, 2018.

  • [10] Stanford CME 323: Distributed Algorithms and Optimization (Lecture 7)

Original: https://www.cnblogs.com/orion-orion/p/16376453.html
Author: orion-orion
Title: 数值优化:经典二阶确定性算法与对偶方法

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/562901/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球