1 牛顿法
牛顿法[1]的基本思想是将目标函数在当前迭代点处进行二阶泰勒展开,然后最小化这个近似目标函数,即
[\underset{w\in \mathcal{W}}{\text{min}} f(w) \approx \underset{w \in W}{\text{min}} f(w^t) + \nabla f(w^t)^T(w – w^t) + \frac{1}{2}(w – w^t)^T\nabla^2f(w^t)(w-w^t) ]
此处(\nabla^2f(w^t))是目标函数在当前迭代点(w^t)处的海森矩阵。如果该海森矩阵是正定的,则上述问题的最优值 (w^t -[\nabla^2f(w^t)]^{-1}\nabla f(w^t))处取到,牛顿法将其做为下一时刻的状态,即
[w^{t+1} = w^t -[\nabla^2f(w^t)]^{-1}\nabla f(w^t) ]
下图给了一个简单实例:
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:c5ab9fb1-d6e9-4199-8e37-c766442f4b63
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:3b9259eb-8d98-475c-9b36-f4ece2971724
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:c8b8ed4e-f51b-4261-98ae-2520adcc72c8
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:be7af184-b54e-4e58-8d25-6110d67216f8
牛顿法收敛性 假设目标函数(f: \mathbb{R}^d \rightarrow \mathbb{R})的导数(\nabla f(w))是光滑的,存在二阶导数,并且在其最优点(x^)处满足(\nabla f\left(x^{}\right)=0, \nabla^{2} f\left(x^{}\right) \succ 0)且初始点(x^0)与最优点(x^)足够近,那么牛顿法具有(\mathcal{O}(e^{-2^t}))的 超线性(二阶)收敛速率:
[\lVert w^t – w^*\rVert \leqslant \mathcal{O}(e^{-2^t}) ]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:1d96a1f3-a440-41cd-90df-151e3e26b274
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:79b409ae-45a5-4917-9010-7e25e07e86c5
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:3dddd34b-2ebe-4b39-9188-48e801ab1f31
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:22343861-0d17-4e5a-8bd6-aef0ad19aea6
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:88223c70-35fe-4bfa-a3ab-a6fc4266122d
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:59622fdf-cd91-4143-a32e-cc16caf65e27
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:886d1289-2f21-45d5-8b01-ae16e7813b2e
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:2d77091d-dfce-427c-8825-d642cb0b936e
2 拟牛顿法
既然海森矩阵不一定正定,那就构造一个与海森矩阵相差不太远的正定矩阵作为其替代品,这正是拟牛顿法[5]的主要思想。此外,逆牛顿法可以迭代更新海森逆矩阵,而不是每次迭代都需要重新进行逆矩阵的计算。
记(H_t = \nabla^2 f(w^t)),(M^t = [\nabla^2 f(w^t)]^{-1})。在第(t+1)轮迭代,对第(t)轮迭代的海森矩阵(H_t)加上一个或者两个秩为1的矩阵作为对(t+1)时刻的海森矩阵(H^{t+1})的估计。例如:
[H^{t+1} = H^t + aa^T + bb^T ]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:4ffa3eac-a39c-4618-b209-e77a1ad74507
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:878a48e7-a6e6-4461-9fc2-e9af0bcba28d
对(f(w))在(w^t)处的二阶泰勒展开的左右两边计算梯度,可得
[\nabla f(w) \approx \nabla f(w^t) + H^t(w-w^t) ]
令(\delta^t_1 = w^{t+1}-w^t)为模型的更新量,(\delta^t_2=\nabla f(w^{t+1}) – \nabla f(w^t))为目标函数导数的更新量,则我们有:
[(H^t)^{-1}\delta^t_2 \approx \delta^t_1 ]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:a1ff0be8-2b6d-4b77-b761-23456fcd0016
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:a8d18394-0c54-4498-b6ae-90fc7e4c5e67
[\begin{aligned} &H^0 = I\ &H^{t+1}= H^t + \frac{\delta^t_2 (\delta^t_2)^{T}}{(\delta^t_2)^T\delta^t_1} – \frac{H^t\delta^t_1(H^t\delta^t_1)^T }{(\delta^t_1)^TH^t\delta^t_1} \ &M^{t+1} = \left(I – \frac{\delta^t_1(\delta^t_2)^T}{(\delta^t_2)^T\delta^t_1}\right)M^t(I – \frac{\delta^t_2 (\delta^t_1)^T}{(\delta^t_2)^T\delta^t_1}) + \frac{\delta_1^t (\delta_1)^T}{(\delta^t_2)^T\delta^t_1} \end{aligned} ]
此时,所对应的算法被称作BFGS算法,如下面伪代码所示:
在实际应用中基于(M^t)的拟牛顿法更实用,应为根据(M^t)计算下降方向的方法不需要对(H^t)矩阵求逆(解线性方程,其复杂度为(\mathcal{O}(d^3)),非常耗时)。但基于(H^t)的拟牛顿法有比较好的理论性质,产生的迭代序列比较稳定,但如果有办法快速求解线性方程组,我们也可以采用基于(H^t)的拟牛顿法。此外在某些场景下,比如有些带约束的优化问题的算法设计,由于需要用到海森矩阵的近似,(H^t)的使用也很常见。
当使用其它的海森矩阵近似方式和约束条件时,我们还可以设计出其他拟牛顿法,比如DFP[3][4]、Broyden[5]、SR1[6]等。
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:61a28642-51e3-4ca1-8cf5-36518fc94068
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:7119ca93-5467-4891-88c8-32c79c16c9cb
3 对偶方法
我们到目前为止提到的各种优化算法都是直接求解原始优化问题。某些时候,如果把原始优化问题转化成对偶优化问题,会更容易求解。比如,当原始问题的变量维度很高,但是约束条件个数不太多时,对偶问题的复杂度(对偶变量的维度对应于约束条件的个数)会远小于原始问题的复杂度,因而更容易求解(这也是支持向量机高效的主要原因)。本节我们将介绍与此相关的对偶理论以及常见的对偶优化算法[7]。
考虑下述原始优化问题(P_0):
[\underset{w}{\text{min}} f(w) \ \begin{aligned} \quad\quad\quad\quad\quad\text{s.t.} \quad\quad\quad &g_i(w) \leqslant 0, \quad i=1,\cdots,m_1 \ &h_j(w) = 0, \quad j=1,\cdots,m_2 \end{aligned} ]
包含(m_1)个不等式约束和(m_2)个等式约束。假设问题(P_0)的最优解为(p^*)。
首先,我们将有约束问题转化为无约束优化问题,即构造(P_0)的拉格朗日函数,将约束条件转化到目标函数中。考虑原始问题(P_0)中带有的约束条件,例如,当(g_1(w^0)\geqslant0)时,(w^0)就不是一个可行解,这时我们需要把对应的目标函数之补充定义为正无穷。于是可将原始优化问题转换为如下的无约束优化形式:
[\underset{w}{\text{min}} f(w) + \infty \sum_{i=1}^{m_1} I_{g_i(w)>0} + \infty \sum_{j=1}^{m_2}I_{h_j(w)\neq 0} ]
进一步地,我们用线性函数(\lambda x (\lambda > 0))和(\nu x)来近似指示函数(I_{[x>0]})和(I_{[x\neq 0]}),得到原始问题(P_0)所对应的拉格朗日函数:
[L(w, \lambda, \nu) \triangleq f(w) + \sum_{i=1}^{m_1}\lambda_ig_i(w) + \sum_{j=1}^{m_2}\nu_jh_j(w) ]
其中,(\lambda_i\geqslant 0),(\nu_j\in \mathbb{R})称为拉格朗日乘子。
接下来,我们讨论拉格朗日函数和原目标函数取值的大小关系,引出拉格朗日对偶函数的定义。如果假设(w^0)满足所有约束条件,那额拉格朗日函数中的第二项小于等于零,第三项等于零,即
[L(w, \lambda, \nu) \leqslant f(w),\quad \text{对任意可行解}w ]
在可行域(\mathcal{W})上对上述不等式取最小值,得到
[\underset{w\in \mathcal{W}}{\text{inf}}L(w, \lambda, v) \leqslant \underset{w\in\mathcal{W}}{\text{min}} f(w) = p^* ]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:725eabfc-f54c-4754-bf30-2e865b62648a
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:efb4fbc3-b131-41c0-a0ad-b610df0950c3
[h(\lambda, \nu) \triangleq \underset{w\in \mathcal{W}}{\text{inf}}L(w, \lambda, v) \leqslant p^*, \quad 其中\lambda_i\geqslant0, \nu_j \in \mathbb{R} ]
由于拉格朗日对偶函数(h(\lambda, \nu))是一族关于((\lambda, \nu))仿射函数的逐点下确界,所以即使原始问题(P_0)不是凸的,拉格朗日对偶函数(h(\lambda, \nu))也是凹的。
接下来我们定义对偶优化问题。既然拉格朗日对偶函数的取值是原始物体最优值的下界,在拉格朗日乘子(\lambda),(\nu)的取值空间对函数(h(\lambda, \nu))取最大值,将会得到原始问题最优解的最大下界。这个问题通常被定义为原始问题(P_0)的对偶问题,记为(D_0),具体如下:
[\underset{\lambda, \nu}{\text{max}} \space h(\lambda, \nu) \ \begin{aligned} \quad\quad\quad\text{s.t.} \quad\quad\quad &\lambda_i \geqslant 0, \quad i=1,\cdots,m \ \end{aligned} ]
对偶问题是一个对凹函数在可行域(凸集)内求解最大值的问题,记最优值为(d^)。通过上面的讨论,(d^
最后,可通过求解对偶问题来得到原始问题的解。在求解对偶问题的过程中,我们可以使用前面的各种一阶或二阶优化方法。假设对偶问题求得的解为((\lambda^, \nu^)),将其代入拉格朗日函数中,对原始变量求拉格朗日函数的最小值,即:
[\underset{w}{\text{min}}\space L(w, \lambda^, \nu^) ]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:0a6d96b1-440a-4234-8159-6b8f8e8ed8d8
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:905118ec-5f53-40b5-8fae-e33b3742959f
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:27db1583-22be-4973-9898-9e99c52d3723
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:967aa7fb-2417-4698-a568-3128d1e51131
对于带有线性约束的凸优化问题,对偶坐标上升法被证明至少具有线性收敛速率[6]。
4 对确定性算法方法的小结
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:b97b3fa3-81b0-4c60-a424-2a2b012d30b7
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:3e466456-563e-4d98-9085-6f7bc6ad5a88
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:24fcd858-e71e-4230-9c3d-56163dad02ba
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:073fe00c-4b73-419d-ac5d-d71f62e9ce30
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:11cc7b28-0d27-4733-a656-f493378437dc
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:ed57b9a0-05ab-4cfb-8229-71d7f73a7065
* Nesterov加速法将梯度下降法速率中关于条件数的阶数进一步改进(但仍然是线性收敛速率)。
* 投影次梯度法和Frank-Wolfe方法都可以用于解决带有约束的优化问题,两种方法的收敛速率都为次线性,具体可以依据投影操作的难易程度来选择。
* 一些拟牛顿法(比如BFGS算法)也可以和牛顿法一样达到二阶收敛速率。
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:35ccbf47-5c96-418f-b9a9-ee494629ced3
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:7eed50d3-0581-4d39-8cdc-6b47e7c1952a
-
[1] Polyak B T. Newton’s method and its use in optimization[J]. European Journal of Operational Research, 2007, 181(3): 1086-1096.
-
[2] Dennis, Jr J E, Moré J J. Quasi-Newton methods, motivation and theory[J]. SIAM review, 1977, 19(1): 46-89.
-
[3] Davidon W C. Variable metric method for minimization[J]. SIAM Journal on Optimization, 1991, 1(1): 1-17.
-
[4] Fletcher R, Powell M J D. A rapidly convergent descent method for minimization[J]. The computer journal, 1963, 6(2): 163-168.
-
[5] Broyden C G. A class of methods for solving nonlinear simultaneous equations[J]. Mathematics of computation, 1965, 19(92): 577-593.
-
[6] Luo Z Q, Tseng P. Error bounds and convergence analysis of feasible descent methods: a general approach[J]. Annals of Operations Research, 1993, 46(1): 157-178.
-
[7] Numerical optimization[M]. New York, NY: Springer New York, 1999.
-
[8] 刘浩洋,户将等. 最优化:建模、算法与理论[M]. 高教出版社, 2020.
-
[9] 刘铁岩,陈薇等. 分布式机器学习:算法、理论与实践[M]. 机械工业出版社, 2018.
-
[10] Stanford CME 323: Distributed Algorithms and Optimization (Lecture 7)
Original: https://www.cnblogs.com/orion-orion/p/16376453.html
Author: orion-orion
Title: 数值优化:经典二阶确定性算法与对偶方法
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/562901/
转载文章受原作者版权保护。转载请注明原作者出处!