推荐算法总览(完整总结)

目录

1. 什么是推荐算法

2. 推荐算法的目的

3. 推荐算法的条件

4. 推荐算法分类

4.1 基于流行度的推荐算法

4.2 基于内容的推荐算法

4.3 基于关联规则的推荐算法

4.4 基于协同过滤的推荐

4.4.1 基于用户(User-based)的推荐

4.4.2 基于物品(Item-based)的推荐

4.4.3 协同过滤算法总结

4.5 基于模型的推荐算法

4.6 混合推荐算法

1.什么是推荐算法

推荐算法计算机专业的一种算法,通过一些数学算法来推测用户可能喜欢什么。目前,推荐算法应用较好的是互联网,如淘宝、头条等。所谓推荐算法,就是利用用户的一些行为,通过一些数学算法,来推测用户可能喜欢什么。

[En]

Recommendation algorithm an algorithm in computer major that speculates what users may like through some mathematical algorithms. at present, the better application of recommendation algorithm is the Internet, such as Taobao, headlines and so on. The so-called recommendation algorithm is to use some user behavior, through some mathematical algorithms, to speculate what the user may like.

推荐算法最早是在1992年提出的,但实际上近年来很流行,因为互联网的爆发,随着我们可以使用的数据量更大,推荐算法有很大的机会展示自己的能力。

[En]

The recommendation algorithm was first proposed in 1992, but it is actually popular in recent years, because of the outbreak of the Internet, with a larger amount of data available for us to use, the recommendation algorithm has a great opportunity to show its ability.

2. 推荐算法的目的

猜猜你喜欢什么,个性化歌单,爆款微博,这些都是推荐系统的输出。该推荐系统的主要目标包括:

[En]

Guess what you like, personalized playlist, hot Weibo, these are the output of the recommendation system. The main objectives of the recommendation system include the following:

目的1:帮助用户找到想要的商品(新闻/音乐/……),发掘长尾

帮助用户找到他们想要的并不容易。商品太多了,连我们自己都经常点淘宝,面对琳琅满目的优惠活动都不知道该买什么。在经济学中,有一个著名的理论叫长尾理论。

[En]

It’s not easy to help users find what they want. There are so many goods, even ourselves, that we often click on Taobao and don’t know what to buy in the face of dazzling discount activities. In economics, there is a famous theory called long tail theory (The Long Tail).

推荐算法总览(完整总结)

应用于互联网领域,意味着资源中最热门的部分将获得绝大多数关注,而其余的很大一部分资源将很少被访问。这不仅造成了资源的浪费,也让很多小众用户无法找到自己感兴趣的内容。

[En]

Applied in the field of the Internet, it means that the hottest part of the resources will get the vast majority of attention, while the rest of a large part of the resources will be rarely visited. This not only results in a waste of resources, but also makes it impossible for many minority users to find the content they are interested in.

目的2:降低信息过载

互联网时代的信息量一直处于爆炸状态,如果网站首页上的所有内容都不可能让用户阅读,信息利用率将会很低。因此,我们需要一个推荐系统来帮助用户过滤掉低价值的信息。

[En]

The amount of information in the Internet era has been in a state of explosion, if all the content on the home page of the website is impossible for users to read, the utilization of information will be very low. So we need a recommendation system to help users filter out low-value information.

目的3:提高站点的点击率/转化率

一个好的推荐系统可以让用户更频繁地访问一个网站,并总是能找到他们想要购买或阅读的东西。

[En]

A good recommendation system allows users to visit a site more frequently and can always find what they want to buy or read.

目的4:加深对用户的了解,为用户提供定制化服务

可以想象,每当系统成功推荐用户感兴趣的内容时,我们对用户兴趣等维度的形象就变得越来越清晰。当我们能够准确地描述每个用户的形象时,我们就可以为他们定制一系列服务,这样我们平台上就可以满足各种需求的用户。

[En]

It is conceivable that whenever the system successfully recommends a content that the user is interested in, our image of the user’s interests and other dimensions is becoming more and more clear. When we can accurately describe the image of each user, we can customize a series of services for them, so that users with a variety of needs can be met on our platform.

……

3. 推荐算法的条件

如今的各种推荐算法,但无论如何都绕不过几个条件,这是推荐的基本条件:

[En]

Today’s various recommendation algorithms, but in any case, can not bypass a few conditions, this is the basic condition of recommendation:

1.根据和你共同喜好的人来给你推荐

2.根据你喜欢的物品找出和它相似的来给你推荐

3.根据你给出的关键字来给你推荐,这实际上就退化成搜索算法了

4.根据上面的几种条件组合起来给你推荐

4.推荐算法分类

4.1 基于流行度的推荐算法

基于热度的推荐算法相对简单粗略,主要针对热门商品或信息的推荐。主要是根据PV、UV、日PV或共享率等数据,按照一定的热度排序推荐给用户。该算法既有优点,也有缺点。

[En]

The recommendation algorithm based on popularity is relatively simple and rough, mainly for the recommendation of hot goods or information. It is mainly recommended to users according to a certain heat sort according to data such as PV, UV, daily PV or sharing rate. This algorithm has both advantages and disadvantages.

优点:简单,适用于刚注册的新用户,能够解决对新用户进行推荐的冷启动问题;

缺点:无法针对用户提供个性化的推荐。基于这种算法也可做一些优化,比如加入用户分群的流行度排序,例如把热榜上的体育内容优先推荐给体育迷,把政要热文推给热爱谈论政治的用户。

4.2 基于内容的推荐算法

基于内容的推荐是推荐引擎出现之初使用最广泛的推荐机制。其核心思想是根据推荐项目或内容的元数据找到项目或内容的相关度,然后根据用户以往的偏好记录向用户推荐相似的项目。例如,如果你读了《哈利·波特I》,基于内容的推荐算法会发现,《哈利·波特II-VI》与你在内容中看到的内容密切相关(有很多关键词),所以会向你推荐后者。

[En]

Content-based recommendation is the most widely used recommendation mechanism at the beginning of recommendation engine. Its core idea is to find the relevance of items or content based on the metadata of recommended items or content, and then recommend similar items to users based on users’ previous preference records. For example, if you read Harry Potter I, the content-based recommendation algorithm finds that Harry Potter II-VI is closely related to what you have seen in the content (there are many keywords), so you will be recommended to the latter.

这种推荐系统主要应用于一些信息应用中,对于文章本身(电影、音乐),提取一些标签作为关键词,然后用这些标签来评价两篇文章的相似度。

[En]

This kind of recommendation system is mostly used in some information applications, for the article (movie music) itself to extract some tag as its keywords, and then these tag can be used to evaluate the similarity of the two articles.

优点:

1、易于实现,不需要用户数据因此不存在稀疏性和冷启动问题。

2、基于物品本身特征推荐,因此不存在过度推荐热门的问题。

缺点:

1、抽取的特征既要保证准确性又要具有一定的实际意义,否则很难保证推荐结果的相关性。豆瓣网采用人工维护tag的策略,依靠用户去维护内容的tag的准确性。

2、推荐的Item可能会重复,典型的就是新闻推荐,如果你看了一则关于MH370的新闻,很可能推荐的新闻和你浏览过的,内容一致。

4.3 基于关联规则的推荐算法

基于关联规则的推荐在电子商务系统中较为常见,并已被证明是有效的。它的现实意义在于,购买了一些商品的用户更有可能购买其他商品。基于关联规则的推荐系统的主要目标是挖掘关联规则,即多个用户同时购买的商品的集合,这些集合中的商品可以相互推荐。目前,关联规则的挖掘算法主要是基于关联规则的关联规则挖掘算法和FP-Growth算法。基于关联规则的推荐系统通常具有较高的转换率,因为当用户购买了频繁集合中的几个项目时,他们更有可能购买频繁集合中的其他项目。

[En]

Recommendation based on association rules is more common in e-commerce systems, and has been proved to be effective. Its practical significance is that users who have bought some items are more likely to buy other items. The primary goal of the recommendation system based on association rules is to mine association rules, that is, the collection of items purchased by many users at the same time, and the items in these sets can recommend each other. At present, the algorithm of mining association rules is mainly evolved from Apriori and FP-Growth. Recommendation systems based on association rules generally have a higher conversion rate, because when users have purchased several items in a frequent set, they are more likely to purchase other items in the frequent set.

缺点:

1.计算量较大,但是可以离线计算,因此影响不大。

2.由于采用用户数据,不可避免的存在冷启动和稀疏性问题。

3.存在热门项目容易被过度推荐的问题。

4.4 基于协同过滤的推荐

协同过滤是推荐系统中广泛使用的一种推荐方法。该算法基于“物以类聚”的假设。喜欢相同物品的用户更有可能有相同的兴趣。在有用户评分的系统中,一般采用基于协同过滤的推荐系统,用评分来描述用户对商品的偏好。协同过滤被视为一种集思广益的使用模式,它不需要对项目进行特殊处理,而是通过用户建立对象之间的关系。目前,协同过滤推荐系统分为两种:基于用户(User-Based)的推荐和基于项目(Item-Based)的推荐。

[En]

Collaborative filtering is a recommendation method widely used in recommendation system. This algorithm is based on the assumption that “birds of a feather flock together”. Users who like the same items are more likely to have the same interests. The recommendation system based on collaborative filtering is generally used in the system with user rating, and the score is used to describe users’ preferences for items. Collaborative filtering is seen as a model for the use of collective wisdom, which does not require special treatment of the project, but through the user to establish the relationship between objects. At present, collaborative filtering recommendation system is divided into two types: user (User-based)-based recommendation and item-based (Item-based) recommendation.

4.4.1 基于用户(User-based)的推荐

基于用户的协同过滤推荐的基本原理是根据用户对物品的偏好找到邻居用户,然后将邻居用户喜欢的东西推荐给当前用户。在计算时,将用户对所有项目的偏好作为向量来计算用户之间的相似度。在找到K个邻居后,根据邻居的相似度权重和他们对项目的偏好,预测当前用户没有偏好的无关项目,并计算出排序后的项目列表作为推荐。

[En]

The basic principle of collaborative filtering recommendation based on users is to find neighboring users based on the user’s preference for items, and then recommend what the neighboring users like to the current user. In calculation, the similarity between users is calculated by taking a user’s preference for all items as a vector. after finding K neighbors, according to the similarity weight of neighbors and their preference for items, predict the unrelated items that the current user has no preference, and calculate a sorted list of items as a recommendation.

下图显示了一个示例。对于用户A,根据用户的历史偏好,只计算一个邻居,即用户C,然后将用户C喜欢的物品D推荐给用户A。

[En]

The following figure shows an example. For user A, according to the user’s historical preference, only one neighbor, user C, is calculated, and then the item D that user C likes is recommended to user A.

推荐算法总览(完整总结)

优点:在于推荐物品之间在内容上可能完全不相关,因此可以发现用户的潜在兴趣,并且针对每个用户生成其个性化的推荐结果。

缺点:在于一般的Web系统中,用户的增长速度都远远大于物品的增长速度,因此其计算量的增长巨大,系统性能容易成为瓶颈。因此在业界中单纯的使用基于用户的协同过滤系统较少。

4.4.2 基于物品(Item-based)的推荐

基于项目的协同过滤类似于基于用户的协同过滤,不同之处在于对象本身是用来计算邻居的,而不是从用户的角度出发,即根据用户对项目的偏好找到相似的项目,然后根据用户的历史偏好向他推荐相似的项目。从计算角度看,将所有用户对某一物品的偏好作为向量来计算物品之间的相似度,在得到相似物品后,根据用户历史偏好对当前用户没有表达过的物品进行预测,并计算出排序后的物品列表作为推荐。基于项目的协同过滤可以被认为是关联规则推荐的退化,但由于协同过滤更多地考虑用户的实际得分,并且只计算相似度而不是发现频繁集,因此可以认为基于项目的协同过滤具有更高的准确率和更高的覆盖率。

[En]

Item-based collaborative filtering is similar to user-based collaborative filtering, except that the object itself is used to calculate neighbors, not from the user’s point of view, that is, to find similar items based on the user’s preference for items, and then recommend similar items to him according to the user’s historical preferences. From the point of view of calculation, the similarity between items is calculated by taking all users’ preferences for an item as a vector, and after getting similar items, the items that have not been expressed by the current user are predicted according to the preference of the user’s history, and a sorted list of items is calculated as a recommendation. Item-based collaborative filtering can be regarded as a degradation of association rule recommendation, but because collaborative filtering takes more into account the actual score of users, and only calculates similarity rather than finding frequent sets, it can be considered that item-based collaborative filtering has higher accuracy and higher coverage.

下图显示了一个示例。对于物品A和物品B,根据所有用户的历史偏好,喜欢物品A的用户都喜欢物品C,得出物品A和物品C相似,而用户C喜欢物品A,由此可以推断用户C也可能喜欢物品C。

[En]

The following figure shows an example. For items An and B, according to the historical preferences of all users, users who like item An all like item C. it is concluded that item An and item C are similar, while user C likes item A. then it can be inferred that user C may also like item C.

推荐算法总览(完整总结)

优点:同基于用户的推荐相比,基于物品的推荐应用更为广泛,扩展性和算法性能更好。由于项目的增长速度一般较为平缓,因此性能变化不大。

缺点:无法提供个性化的推荐结果。

4.4.3 协同过滤算法总结

如何在基于用户的策略和基于项目的策略之间进行选择?事实上,基于项目的协同过滤推荐机制是亚马逊对基于用户机制的一种改进策略,因为在大多数网站中,项目数量远远少于用户数量,并且项目的数量和相似度相对稳定;同时,基于项目的机制优于基于用户的实时机制。然而,并不是所有的场景都是这样。在一些新闻推荐系统中,可能条目数量,也就是新闻,可能会大于用户数量,新闻更新非常快,所以它的相似度仍然不稳定。因此,推荐策略的选择实际上与具体的应用场景有很大关系。

[En]

How to choose between user-based and item-based strategies? In fact, the item-based collaborative filtering recommendation mechanism is an improved strategy of Amazon on the user-based mechanism, because in most Web sites, the number of items is far less than the number of users, and the number and similarity of items are relatively stable; at the same time, the item-based mechanism is better than the user-based real-time. However, this is not the case in all scenes. In some news recommendation systems, perhaps the number of items, that is, news, may be greater than the number of users, and the news is updated very quickly, so its similarity is still unstable. Therefore, the choice of recommendation strategy actually has a lot to do with the specific application scenarios.

基于协同过滤的推荐机制是目前应用最广泛的推荐机制,它具有以下显著优势:

[En]

The recommendation mechanism based on collaborative filtering is the most widely used recommendation mechanism nowadays, and it has the following significant advantages:

  1. 它不需要对物品或者用户进行严格的建模,而且不要求物品的描述是机器可以理解的,所以这种方法也是领域无关的。
  2. 这种方法计算出来的推荐是开放的,可以共用他人的经验,很好的支持用户发现潜在的兴趣偏好。

那么它也有以下缺点:

[En]

Then it also has the following shortcomings:

  1. 方法的核心是基于历史数据,所以对新物品和新用户都有”冷启动”的问题。
  2. 推荐的效果依赖于用户历史偏好数据的多少和准确性。
  3. 在大部分的实现中,用户历史偏好是用稀疏矩阵进行存储的,而稀疏矩阵上的计算有些明显的问题,包括可能少部分人的错误偏好会对推荐的准确度有很大的影响等等。
  4. 对于一些特殊品味的用户不能给予很好的推荐。
  5. 由于以历史数据为基础,抓取和建模用户的偏好后,很难利用获取的用户偏好演变,从而导致这个方法不够灵活。

4.5 基于模型的推荐算法

基于模型的方法很多,主要是利用常用的机器学习算法为目标用户建立推荐算法模型,然后对用户的偏好进行预测和推荐,并对推荐结果进行排名等。常用的模型包括方面模型、pLSA、LDA、聚类、奇异值分解、矩阵分解、LR、GBDT等。该方法训练过程较长,但训练完成后,推荐过程快速准确。因此,它更适合新闻、广告等实时服务。当然,如果需要这种算法来达到更好的效果,它需要人工干预来反复组合和过滤属性,这就是我们通常所说的特征工程。由于新闻的及时性,系统还需要反复更新在线数学模型以适应变化。

[En]

There are many model-based methods, mainly using commonly used machine learning algorithms to establish a recommendation algorithm model for the target user, and then predict and recommend the user’s preferences and rank the recommended results and so on. The commonly used models include Aspect Model,pLSA,LDA, clustering, SVD,Matrix Factorization,LR,GBDT and so on. The training process of this method is relatively long, but after the training is completed, the recommendation process is fast and accurate. Therefore, it is more suitable for real-time services such as news, advertising and so on. Of course, if this algorithm is needed to achieve better results, it needs manual intervention to combine and filter attributes repeatedly, which is what we often call feature engineering. Because of the timeliness of the news, the system also needs to update the online mathematical model repeatedly to adapt to the changes.

简单地以LR为例,谈谈推荐系统是如何工作的。通过分析用户在系统中的行为和购买记录,得到如下表格:

[En]

Simply take LR as an example to talk about how the recommendation system works. By analyzing the user’s behavior and purchase records in the system, we get the following table:

推荐算法总览(完整总结)

表格中的行是商品,x1~xn是影响用户行为的各种特征属性,比如用户的年龄、性别、地区、价格、品类等;y是用户对商品的偏好,可以是购买历史、浏览、收藏等。通过大量这样的数据,我们可以回归和拟合一个函数,并计算出相应的系数x1~xn,这是每个特征属性的相应权重。权重值越大,该属性对用户选择商品越重要。

[En]

The row in the table is an item, and x1~xn is a variety of characteristic attributes that affect the user’s behavior, such as the user’s age, gender, region, price, category, etc., while y is the user’s preference for the item, which can be purchase history, browsing, collection, and so on. Through a large number of such data, we can regress and fit a function and calculate the corresponding coefficient of x1~xn, which is the corresponding weight of each feature attribute. The larger the weight value is, the more important the attribute is for users to choose goods.

在对函数进行拟合时,我们会认为单个属性和另一个属性之间可能没有很强的相关性。例如,年龄和购买护肤品之间没有很强的相关性,性别和购买护肤品之间也没有很强的相关性,但当我们把年龄和性别放在一起考虑时,它们与购买行为有很强的关联。例如,20多岁和30多岁的女性更倾向于购买护肤品,这被称为交叉属性。通过反复的测试和经验,我们可以调整特征属性的组合,以拟合最准确的回归函数。最终属性权重如下:

[En]

When fitting the function, we will think that there may not be a strong correlation between a single attribute and another attribute. For example, there is not a strong correlation between age and the purchase of skincare products, nor is there a strong correlation between gender and buying skincare products, but when we consider age and gender together, they are strongly associated with purchasing behavior. For example, women in their 20s and 30s are more likely to buy skin care products, which is called cross-attributes. Through repeated testing and experience, we can adjust the combination of feature attributes to fit the most accurate regression function. The final attribute weights are as follows:

基于模型的算法速度快、精度高,适用于新闻、广告等实时服务,但如果需要这种算法达到更好的效果,则需要人工干预,反复组合和过滤属性,即所谓的特征工程。由于新闻的及时性,系统还需要反复更新在线数学模型以适应变化。

[En]

Because of its high speed and accuracy, the model-based algorithm is suitable for real-time services such as news and advertising, but if this algorithm is needed to achieve better results, it needs manual intervention to combine and filter attributes repeatedly, that is, the so-called Feature Engineering. Because of the timeliness of the news, the system also needs to update the online mathematical model repeatedly to adapt to the changes.

4.6 混合推荐算法

在实际应用中,很少使用单一的推荐算法来实现推荐任务。因此,大型和成熟网站的推荐系统是基于各种推荐算法的优缺点和适合场景分析的组合而成的混合算法。当然,混合型策略也会非常丰富,比如针对不同策略的加权算法,针对不同场景、不同阶段使用不同的算法等等。具体如何混合需要结合实际应用场景进行分析和应用。由此可见,推荐算法的种类仍然很多,尤其是当应用场景发生变化时,推荐算法往往需要做出很大的改变。

[En]

In real applications, it is rare to use a single recommendation algorithm to implement recommendation tasks. Therefore, the recommendation systems of large and mature websites are “hybrid algorithms” based on the advantages and disadvantages of various recommendation algorithms and the combination suitable for scenario analysis. Of course, hybrid strategies will also be very rich, such as weighting algorithms for different strategies, using different algorithms for different scenarios and phases, and so on. Specific how to mix needs to be combined with the actual application scenarios for analysis and application. Thus it can be seen that there are still many types of recommendation algorithms, especially when the application scene changes, recommendation algorithms often need to make great changes.

在混合的方式上,一些研究人员提出了七种混合的想法:

[En]

In the way of mixing, some researchers have put forward seven mixed ideas:

  1. 加权(Weight):加权多种推荐技术结果。
  2. 变换(Switch):根据问题背景和实际情况或要求决定变换采用不同的推荐技术。
  3. 混合(Mixed):同时采用多种推荐技术给出多种推荐结果为用户提供参考。
  4. 特征组合(Feature combination):组合来自不同推荐数据源的特征被另一种推荐算法所采用。
  5. 层叠(Cascade):先用一种推荐技术产生一种粗糙的推荐结果,第二种推荐技术在此推荐结果的基础上进一步作出更精确的推荐。
  6. 特征扩充(Feature augmentation):一种技术产生附加的特征信息嵌入到另一种推荐技术的特征输入中。
  7. 元级别(Meta-level):用一种推荐方法产生的模型作为另一种推荐方法的输入。

参考文章:

[En]

Reference article:

推荐系统系列 – 引导 – 5类系统推荐算法

[机器学习]推荐系统之协同过滤算法

系列学习——推荐算法综述

推荐系统中常用算法 以及优点缺点对比

Original: https://blog.csdn.net/yawei_liu1688/article/details/103986284
Author: bigdata老司机
Title: 推荐算法总览(完整总结)

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/6461/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

最近整理资源【免费获取】:   👉 程序员最新必读书单  | 👏 互联网各方向面试题下载 | ✌️计算机核心资源汇总