# xgboost 算法总结

xgboost有一篇博客写的很清楚，但是现在网址已经失效了，之前转载过，可以搜索XGBoost 与 Boosted Tree。

[En]

Now refer to this article and make a summary of it by yourself.

xgboost是GBDT的后继算法，也是采用boost算法的cart 树集合。

cart树既可以 进行分类，也可以进行回归，但是两种情况下，采用的切分变量选择方式不同。

CART在进行回归的时候，选择最优切分变量和切分点采用的是如下的标准

CART采用暴力的遍历方式来确定最优切分变量和切分点，具体算法如下：

CART分类树的算法类似，由于分类无法计算均值，CART分类树采用的是计算基尼指数，通过遍历所有特征和他们的可能切分点，选择基尼指数最小的特征及切分点作为最优特征和最优切分点，并重复调用，直到生成CART分类树。

[En]

The objective function is as follows:

[En]

The first part is the error function and the second part is the regularization term.

[En]

More generally, in the case of non-square error, we will use the following Taylor expansion approximation to define an approximate objective function, which is convenient for us to calculate in this step.

[En]

When we remove the constant term, we will find a more unified objective function as follows. This objective function has a very obvious characteristic that it only depends on the first and second derivatives of the error function of each data point.

[En]

In this way, the objective function can be changed as follows, using the way in step 4 to represent the error function and complexity, as follows

[En]

The last part is the simplification of the algorithm.

xgboost算法不断地枚举不同树的结构，利用这个打分函数来寻找出一个最优结构的树，加入到我们的模型中，再重复这样的操作。不过枚举所有树结构这个操作不太可行，所以常用的方法是贪心法，每一次尝试去对已有的叶子加入一个分割。对于一个具体的分割方案，我们可以获得的增益可以由如下公式计算

xgboost的github地址： https://github.com/dmlc/xgboost 。xgboost是大规模并行boosted tree的工具，它是目前最快最好的开源boosted tree工具包，比常见的工具包快10倍以上。

Original: https://www.cnblogs.com/bnuvincent/p/11223200.html
Author: Alexander
Title: xgboost 算法总结

(0)