吃透论文-推荐算法不可不看的DeepFM模型

大家好。今天我们将继续分析一些推荐广告领域的论文。

[En]

Hello, everyone. Today we will continue to analyze some papers in the field of recommended advertising.

今天选择的这篇叫做DeepFM: A Factorization-Machine based Neural Network for CTR Prediction,翻译过来就是DeepFM:一个基于深度神经网络的FM模型。这篇paper的作者来自 哈工大和华为,不得不说在人工智能领域的很多论文都是国产的,作为从业者还是非常欣喜能看到这点的。

从名字上我们也可以看到,今天的论文本质上是FM模型的高级或优化版本。如果你对FM模式不太了解,可以通过下面的门户网站进行查看:

[En]

We can also see by the name that today’s paper is essentially an advanced or optimized version of the FM model. If you don’t know much about the FM model, you can review it through the portal below:

这篇文章也很长,建议先读一读。

[En]

This article is also very long, so it is recommended to read it first.

对于CTR估计的模型,非常重要的一点是学习与用户行为相对应的特征背后的潜在联系。虽然在这一领域(截至2017年)已经取得了一些进展,但目前的实践要么在低维或高维特征上存在较大偏差,要么需要大量的专家级特征工程。

[En]

For the model estimated by CTR, a very important point is to learn the potential connections behind the characteristics corresponding to user behavior. Although some progress has been made in this field (as of 2017), the current practice either has a big deviation in low-dimensional or high-dimensional features, or requires a large number of expert-level feature engineering.

在本文中,我们设计了一个新的模型DeepFM,并找到了同时改善低维和高维特征的可能性。它结合了FM模型和神经网络模型的优点,与Google最新的Wide&Deep模型相比有了更大的进步,省去了特征工程的部分。

[En]

In this paper, we design a new model, DeepFM, and find a possibility to improve both low-dimensional and high-dimensional features. It combines the advantages of FM and neural network model, makes greater progress compared with Google’s latest Wide & Deep model, and eliminates the part of feature engineering.

摘要没有太多内容,主要是因为我踩到了同事。

[En]

There is not much content in the abstract, mainly because I stepped on my colleague.

CTR是推荐场景的关键指标。除了广告系统将按CTR x BID排序外,推荐系统通常会严格根据估计的CTR排序。因此,准确估计CTR是关键问题。

[En]

CTR is the key indicator for recommendation scenarios. Except that the advertising system will sort by CTR x bid, the recommendation system will generally * sort strictly according to the estimated CTR * . So the key problem is to estimate CTR accurately.

为了让大家更容易理解,让我们简单介绍一下目前的惯例。一般而言,传统推荐系统的特点分为四个部分。第一部分是用户特征,是关于用户的一些信息。比如,无论是男性还是女性,是不是高收入群体,是不是高消费群体,成为平台用户多长时间,平台上哪些品类的商品更受青睐,等等。第二部分是商品特征,是关于商品的一些信息,如价格、类别、折扣、评价等。第三部分是上下文功能,比如当前时间,无论是早上还是晚上,比如物品展示的位置等等。最后一部分是用户的实时行为,比如用户在浏览产品之前还看到了哪些产品,登录平台的时间有多长等。

[En]

In order to make it easier for you to understand, let’s briefly introduce the current routine practice. Generally speaking, the features of a conventional recommendation system are divided into four parts. The first part is * user characteristics * , which is some information about users. For example, whether it is male or female, whether it is a high-income group, whether it is a high-consumption group, how long it has been a user of the platform, what categories of goods in the platform are preferred, and so on. The second part is * merchandise characteristics * , which is some information about item, such as price, category, discount, evaluation and so on. The third part is the * context features * , such as the current time, whether it is morning or evening, such as the location of the item display, and so on. The last part is the real-time behavior of the user, such as what other products the user has seen before browsing the product, how long he has logged on to the platform, and so on.

显然,用户是否会点击一件物品是以上四条信息的组合,比如把兰博基尼或百达翡丽推给一个有钱又帅的男人是有吸引力的,但向一个从未听说过的失败者推送同样的内容显然是没有用的。换句话说,商品的特征和用户的特征之间存在着一种逻辑关系,我们通常称之为特征的交集。

[En]

Obviously, whether a user will click on an item is a combination of the above four messages, such as pushing Lamborghini or Patek Philippe to a rich and handsome man is attractive, but it is obviously useless to tweet the same content to a loser who has never heard of it. In other words, there is a logical relationship between the characteristics of goods and the characteristics of users, which we generally call the intersection of features.

这些交叉信息往往是隐含的,也就是说,我们无法直接描述和描述它们。举一个简单的例子,并不是所有的富人都喜欢奢侈品,有些人可能喜欢消费电子产品,有些人可能喜欢服装或旅游。*人们的偏好是如此复杂,以至于我们很难用固定的规则来描述他们。因此,这就要求模型具有学习这些特征之间潜在关系的能力,并且模型越好地把握了这些潜在的交叉信息,效果就越好。

[En]

These cross-information are often implicit, that is, we can not directly describe and describe them. To take a simple example, not all the rich may like luxury goods, some may like consumer electronics, and some may like clothing or travel. * people’s preferences are so complicated that it is difficult for us to describe them with fixed rules. Therefore, this requires the model to have the ability to learn the potential relationship between these features, and the better the model that grasps these potential cross-information, the better the effect.

例如,在分析了主流应用商店市场后,我们发现用户经常在用餐时间下载外卖APP,这表明APP的品类和时间之间存在交叉关系。例如,我们发现年轻男孩倾向于喜欢设计类游戏,这表明APP的类别与用户的性别也存在交叉关系。像这样的交叉信息仍然很多,我们可以从Wide&Deep模型的经验中学到,在考虑了低维和高维的交叉特征后,模型的效果会更好。

[En]

For example, after analyzing the mainstream app store market, we found that users often download takeout app at meal time, which shows that there is a cross relationship between the category and time of app. For example, we found that young boys tend to like design games, which shows that there is also a cross-relationship between the category of app and the gender of the user. There is still a lot of cross-information like this, and we can learn from the experience of Wide & Deep model that the effect of the model will be better after considering low-dimensional and high-dimensional cross features.

一个关键的挑战是如何高效地对特征之间的交叉信息进行建模,其中一些更容易理解和制作特征,但大多数交叉信息是隐含的,很难直观地理解,比如啤酒和尿布,只有通过大数据挖掘才能发现。即使直觉上很容易理解,但因为涉及的数量太大,不可能全部靠手工处理。

[En]

A key challenge is * how to efficiently model the cross-information between features * , some of which are easier to understand and make features, but most of the cross-information is implicit and difficult to understand intuitively, such as beer and diapers, which can only be found by big data mining. Even if it is intuitively easy to understand, because the number involved is too large, it is impossible to deal with it all by hand.

之后,我在纸上踩到了我的同事,这篇文章首先解释了简单的CNN和RNN效果不好,这很容易理解。RNN的主要应用场景是序列场景,如文本、音频等,不适合进行CTR估计。CNN也是如此,它主要用于图像等高维数据,不适合推荐场景。

[En]

After that, I stepped on my colleague in paper, which first explained that the effect of simple CNN and RNN is not good, which is easy to understand. The main application scenarios of RNN are sequence scenes, such as text, audio, etc., which are not suitable for CTR estimation. The same is true of CNN, which is mainly used in high-dimensional data such as images, and is not suitable for recommended scenarios.

然后对同年发表的另外三篇论文FNN、PNN和Wide&Deep进行了比较。它也是一些常规的陈词滥调,没有太多的分析,比如在低维和高维特征的交集上表现不够充分,比如需要太多的特征工程等等。其中,Wide&Deep我们之前就写过分析,FNN和PNN你感兴趣的阅读纸,在行业中使用不多,效果应该不是很理想。经过一些比较,我们提出了本文的观点,我们可以设计出更好的效果,并会自动学习模型之间的交叉信息。

[En]

Then it compares three other papers published in the same year, FNN, PNN and Wide & Deep. It is also some conventional platitudes, without too much analysis, such as insufficient performance on the intersection of low-dimensional and high-dimensional features, such as the need for too much feature engineering, and so on. Among them, Wide & Deep we have written before to analyze, FNN and PNN you are interested in reading paper, not much used in the industry, the effect should not be very satisfactory. After some comparison, we put forward the point of view of this paper, we can design a better effect and will automatically learn the cross-information between features of the model.

我们假设训练集当中一共有n条样本,每一条样本可以写成。其中的是一个m个field组成的向量,包含了用户和item组成的特征。,y=0表示用户没有点击,相反,y=1表示用户点击。

让我们来看看样本的特征,这个m维特征可以看作两部分,第一部分是类别特征,比如性别、地理位置、收入等。二是连续性,如平均费用、平均停留时间等。类别特征(范畴特征)通常表示为一个热点后的向量,而一个连续的特征通常表示自身,也可以离散化为一个热点向量。

[En]

Let’s look at the characteristics of the sample, this m-dimensional feature can be regarded as two parts, the first part is * category characteristics * , such as gender, geographical location, income, and so on. The second is * continuity characteristics * , such as average cost, average stay time, and so on. Category features (categorical feature) are generally expressed as a vector after one-hot, while a continuous feature, which generally represents itself, can also be discretized into one-hot vectors.

在处理完所有这些特征后,整个向量将被转换为这里的每个场与该向量之间的一一对应。因为进行了一些离散化,所以x向量变得非常稀疏。因此,我们需要做的是在这样一个稀疏的样本上恢复CTR预测模型。

[En]

After we deal with all these features, the whole vector will be transformed into an one-to-one correspondence between each field here and the vector. Because some discretization is done, the x-vector becomes very sparse. So what we need to do is to resume a CTR prediction model on such a sparse sample.

我们希望设计模型能够更好地学习低维和高维特征之间的相互作用。在此基础上,结合基于深度模型的调频模型,提出了深度调频模型。其整体结构如下所示:

[En]

We hope that the design model can better learn the interaction between low-dimensional and high-dimensional features. based on this, we combine FM on the basis of depth model and propose DeepFM model. Its overall structure is shown below:

这张照片看起来可能有点乱。我们可以忽略一些局部的细节,把它作为一个整体来把握。该模型分为两个部分:FM部分和深层部分。这两个部分的输入是相同的,并且没有与Wide&Deep模型中的区别。

[En]

This picture may look a little messy. We can ignore some local details and grasp it as a whole. The model can be divided into two parts: the * FM part and the Deep part * . The input of the two parts is the same, and there is no distinction as in the Wide & Deep model.

实际上,这种模型比较容易理解,神经网络,也就是深度部分,用来训练这些特征的一维关联和连接,而FM模型将通过隐藏向量V的形式来计算特征之间的二维交叉信息,最后将一维和二维信息聚集在一起,进入S型层得到最终结果。

[En]

In fact, this model is relatively easy to understand, the neural network, that is, the part of Deep, is used to train the one-dimensional association and connection of these features, while the FM model will calculate the two-dimensional crossing information between features by hiding the form of vector V. Finally, the one-dimensional and two-dimensional information are gathered together and entered into the sigmoid layer to get the final result.

如果用公式表示,它大概是这样的:

[En]

If expressed in a formula, it goes something like this:

FM部分其实就是因子分解机,我们在之前的文章当中曾经专门剖析过。FM会考虑所有特征之间两两交叉的情况,相当于人为对左右特征做了交叉。但是由于n个特征交叉的组合是这个量级,所以FM设计了一种新的方案,对于每一个特征i训练一个向量,当i和j两个特征交叉的时候, 通过 来计算两个特征交叉之后的权重。这样大大降低了计算的复杂度。

这涉及到一些公式的推导和计算,我们在前一篇文章中已经详细推导过了,所以我在这里不重复了。

[En]

This involves the derivation and calculation of some formulas, which we have deduced in detail in the previous article, so I will not repeat them here.

最后,我们可以得到公式的这一部分:

[En]

Finally, we can get this part of the formula:

Deep部分就是经典的 前馈网络,用来学习特征之间的高维交叉。

图3展示的就是模型当中Deep这个部分,从图中我们可以看到, 所有的特征都会被转化成embedding向量作为Deep部分的输入。CTR预估的模型和图片以及音频处理的模型有一个很大的不同,就是它的维度会更大,并且特征会非常稀疏,还伴有类别连续、混合、聚合的特点。在这种情况下,使用embedding向量来把原始特征当中的信息压缩到低维的向量就是一种比较好的做法了,这样模型的泛化能力会更强,要比全是01组成的multi-hot输入好得多。

这个图显示了这个部分的局部结构,我们可以看到所有转换为嵌入向量的特征都具有相同的维度k,并且它与FM模型中的维度相同,并且这种嵌入的初始化也是通过使用FM中的二维矩阵V来*实现的。我们都知道,V是d×k的二维矩阵,而模型的原始输入是d维01向量,所以当乘以V时,它自然会转化为d x k的嵌入。

[En]

This diagram shows the local structure of this part, and we can see that all the features transformed into embedding vectors have the same dimension k. And it is the same as the dimension in the FM model, and the initialization of this embedding is also * realized by using the two-dimensional matrix V in FM. We all know that V is a two-dimensional matrix of d x k, and the original input of the model is a d-dimensional 01 vector, so when multiplied by V, it is naturally transformed into the embedding of d x k.

这里要注意的一点是,在DNN进行CTR估计的其他一些论文中,预训练的FM模型被用于深层部分的矢量初始化。但这里的方法略有不同,它不是使用经过训练的FM进行初始化,而是与FM模型的部分共享相同的V。这有两个非常重要的好处:

[En]

One thing to note here is that in some other papers in which DNN does CTR estimation, the pre-trained FM model is used for vector initialization of the Deep part. But the approach here is slightly different, instead of using a trained FM for initialization, it shares the same V with parts of the FM model. This has two very important benefits:

实验结果

我们选取了两个数据对DeepFM和其他模型的性能进行了评估。一个是Criteo数据集,包含4500w用户点击数据,由13个连续特征和26个类别特征组成。我们将90%转化为训练数据,10%转化为测试数据。第二个数据是内部(华为)数据,由用户连续7天在华为应用商店游戏中心点击数据的训练数据(约10亿条)和1天的数据作为测试数据组成。

[En]

We selected two pieces of data to evaluate the performance of DeepFM and other models. One is the Criteo dataset, which contains 4500w user click data, consisting of 13 continuous features and 26 category features. We turn 90% into training data and 10% into test data. The second data is internal (Huawei) data, which consists of training data (about 1 billion items) from users’ click data in Huawei app store game center for seven consecutive days, and one day’s data as test data.

我们的评价模型有两个主要的指标,一个是AUC,另一个是Logoss(交叉熵)。从这个评价指标的角度来看,它更具针对性,不像一些论文中定义了一个新的评价指标。

[En]

There are two main indicators of our evaluation model, one is AUC and the other is Logloss (cross entropy). From the point of view of this evaluation index, it is more pertinent, unlike some paper in which a new evaluation index is defined.

共有LR、FM、FNN、PNN、Wide&Deep和DeepFM等七款车型进行了比较。对于Wide&Deep模型,为了消除特征预处理的影响,我们将Wide&Deep模型的LR部分替换为FM部分。为了避免歧义,我们将替换后的模型称为FM&DNN,替换前的模型称为LR&DNN。

[En]

A total of seven models such as LR, FM, FNN, PNN, Wide & Deep and DeepFM are selected for comparison. For the Wide & Deep model, in order to eliminate the influence of feature preprocessing, we replace the LR part of the Wide & Deep model with the FM part. In order to avoid ambiguity, we call the model after replacement FM & DNN, and the model before replacement is called LR & DNN.

表现评估

深度学习模型的性能非常重要,因为深度学习模型非常复杂,并且消耗很大的计算资源。我们使用以下公式来比较每个模型的计算效率:即根据LR模型的训练时间进行比较

[En]

The performance of the deep learning model is very important, because the deep learning model is very complex and eats computing resources very much. We use the following formula to compare the computational efficiency of each model:. That is, * compare based on the training time of the LR model * .

最终结果如下图所示,左侧部分显示在CPU上,右侧部分显示在GPU上。

[En]

The final result is shown in the figure below, with the left part showing on CPU and the right part showing on GPU.

基本上,DeepFM模型在CPU和GPU上都表现最好。

[En]

Basically, the DeepFM model performs best on both CPU and GPU.

我们经常使用AUC来评估场景中CTR预测模型的准确性,我们已经整理了一个结果,如下图所示:

[En]

We routinely use AUC to evaluate the accuracy of the CTR prediction model in the scenario, and we have sorted out a result, as shown in the following figure:

从上图我们还可以看到,无论是在AUC上还是在LogLoss上,DeepFM模型都是这些模型中最好的。据我所知,虽然DeepFM在四年前就提出了,但仍然有很多公司在使用它。因此,如果你对推荐领域的研究和工作感兴趣,了解这个模型也是必不可少的。

[En]

We can also see from the figure above that the DeepFM model is the best of these models, both on AUC and on LogLoss. From what I have learned, although DeepFM was proposed four years ago, there are still a lot of companies still using it. Therefore, if you are interested in research and work in the field of recommendation, understanding of this model is also essential.

今天的文章就到这里。我衷心祝愿大家每天都有丰收。如果你仍然喜欢今天的内容,请有一个三重支持吧~(点赞、关注、转发)

[En]

That’s all for today’s article. I sincerely wish you all a harvest every day. If you still like today’s content, please have a * triple support * bar ~ ( * like, follow, forward * )

Original: https://www.cnblogs.com/techflow/p/14260630.html
Author: Coder梁
Title: 吃透论文-推荐算法不可不看的DeepFM模型

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/6494/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

免费咨询
免费咨询
扫码关注
扫码关注
联系站长

站长Johngo!

大数据和算法重度研究者!

持续产出大数据、算法、LeetCode干货,以及业界好资源!

2022012703491714

微信来撩,免费咨询:xiaozhu_tec

分享本页
返回顶部