推荐算法总结

一、什么是推荐算法

所谓推荐算法,就是利用用户的一些行为,通过一些数学算法,来推测用户可能喜欢什么。

[En]

The so-called recommendation algorithm is to use some user behavior, through some mathematical algorithms, to speculate what the user may like.

卡内基梅隆大学的罗伯特·阿姆斯特朗于1995年3月在美国人工智能协会上首次提出了个性化推荐的概念。

[En]

The concept of personalized recommendation first appeared at the American artificial Intelligence Association in March 1995, by Robert Armstrong of Carnegie Mellon University.

提出了个性化导航系统Web Watcher。与此同时,斯坦福大学的马尔科·巴拉巴诺维奇还推出了一个名为LIRA–的个性化推荐系统。从那时起,个性化推荐的研究开始蓬勃发展。

[En]

Put forward the personalized navigation system Web Watcher. At the same time, Marko balabanovic of Stanford University has also launched a personalized recommendation system called LIRA–. Since then, the research of personalized recommendation began to flourish.

二、推荐算法的几个条件

如今的各种推荐算法,但无论如何都绕不过几个条件,这是推荐的基本条件:

[En]

Today’s various recommendation algorithms, but in any case, can not bypass a few conditions, this is the basic condition of recommendation:

1.根据和你共同喜好的人来给你推荐
2.根据你喜欢的物品找出和它相似的来给你推荐
3.根据你给出的关键字来给你推荐,这实际上就退化成搜索算法了
4.根据上面的几种条件组合起来给你推荐

三、推荐算法的分类

推荐算法可以分为三类:基于内容的推荐算法、协同过滤推荐算法和基于知识的推荐算法。

[En]

Recommendation algorithms can be divided into three categories: content-based recommendation algorithm, collaborative filtering recommendation algorithm and knowledge-based recommendation algorithm.

1、基于内容的推荐算法,原理是用户喜欢和自己关注过的Item在内容上类似的Item,比如你看了哈利波特I,基于内容的推荐算法发现哈利波特II-VI,与你以前观看的在内容上面(共有很多关键词)有很大关联性,就把后者推荐给你,这种方法可以避免Item的冷启动问题(冷启动:如果一个Item从没有被关注过,其他推荐算法则很少会去推荐,但是基于内容的推荐算法可以分析Item之间的关系,实现推荐),弊端在于推荐的Item可能会重复,典型的就是新闻推荐,如果你看了一则关于MH370的新闻,很可能推荐的新闻和你浏览过的,内容一致;另外一个弊端则是对于一些多媒体的推荐(比如音乐、电影、图片等)由于很难提内容特征,则很难进行推荐,一种解决方式则是人工给这些Item打标签。
2、基于协同过滤的推荐算法
协同过滤是推荐系统中广泛使用的一种推荐方法。该算法基于“物以类聚”的假设。喜欢相同物品的用户更有可能有相同的兴趣。在有用户评分的系统中,一般采用基于协同过滤的推荐系统,用评分来描述用户对商品的偏好。协同过滤被视为一种集思广益的使用模式,它不需要对项目进行特殊处理,而是通过用户建立对象之间的关系。目前,协同过滤推荐系统分为两种:基于用户(User-Based)的推荐和基于项目(Item-Based)的推荐。

[En]

Collaborative filtering is a recommendation method widely used in recommendation system. This algorithm is based on the assumption that “birds of a feather flock together”. Users who like the same items are more likely to have the same interests. The recommendation system based on collaborative filtering is generally used in the system with user rating, and the score is used to describe users’ preferences for items. Collaborative filtering is seen as a model for the use of collective wisdom, which does not require special treatment of the project, but through the user to establish the relationship between objects. At present, collaborative filtering recommendation system is divided into two types: user (User-based)-based recommendation and item-based (Item-based) recommendation.

a.基于用户(User-based)的推荐
基于用户的协同过滤推荐的基本原理是根据所有用户对物品或信息的偏好(分数),找到与当前用户的品味和偏好相似的“邻居”用户群。在一般应用中,采用计算K近邻的算法,并根据这K近邻的历史偏好信息对当前用户进行推荐。该推荐系统的优点是推荐的项目可能在内容上完全无关,因此可以发现用户的潜在兴趣,并为每个用户生成个性化的推荐结果。缺点是在一般的Web系统中,用户的增长率远远大于商品的增长率,因此计算量的增长是巨大的,系统的性能容易成为瓶颈。因此,目前业内基于用户的协同过滤系统还很少。

[En]

The basic principle of collaborative filtering recommendation based on users is to find “neighbor” user groups similar to current users’ tastes and preferences according to all users’ preferences (scores) for items or information. in general applications, the algorithm for calculating K nearest neighbors is adopted, and recommendations are made for current users based on the historical preference information of these K neighbors. The advantage of this recommendation system is that the recommended items may be completely irrelevant in content, so the potential interest of users can be found and personalized recommendation results can be generated for each user. The disadvantage is that in the general Web system, the growth rate of users is far greater than the growth rate of goods, so the growth of the amount of computation is huge, and the system performance is easy to become a bottleneck. Therefore, there are few user-based collaborative filtering systems in the industry.

推荐算法总结
b.基于物品(Item-based)的推荐
基于项目的协同过滤类似于基于用户的协同过滤,它利用所有用户对项目或信息的偏好(分数)来查找项目和对象之间的相似度,然后根据用户的历史偏好信息,向用户推荐相似的项目。基于项目的协同过滤可以被认为是关联规则推荐的退化,但由于协同过滤更多地考虑用户的实际得分,并且只计算相似度而不是发现频繁集,因此可以认为基于项目的协同过滤具有更高的准确率和更高的覆盖率。与基于用户的推荐相比,基于项的推荐具有更广泛的应用范围、更好的可扩展性和更好的算法性能。由于项目增速普遍较慢,业绩变化不大。缺点是不能提供个性化的推荐结果。
[En]

Item-based collaborative filtering is similar to user-based collaborative filtering, which uses all users’ preferences (scores) for items or information to find the similarity between items and objects, and then according to the user’s historical preference information, recommend similar items to the user. Item-based collaborative filtering can be regarded as a degradation of association rule recommendation, but because collaborative filtering takes more into account the actual score of users, and only calculates similarity rather than finding frequent sets, it can be considered that item-based collaborative filtering has higher accuracy and higher coverage. Compared with user-based recommendation, item-based recommendation has more extensive applications, better scalability and better algorithm performance. Because the growth rate of the project is generally relatively slow, the performance has not changed much. The disadvantage is that personalized recommendation results cannot be provided.

推荐算法总结
两种协同过滤:如何在基于用户和基于项目的策略之间进行选择?事实上,基于项目的协同过滤推荐机制是亚马逊对基于用户机制的一种改进策略,因为在大多数网站中,项目数量远远少于用户数量,并且项目的数量和相似度相对稳定;同时,基于项目的机制优于基于用户的实时机制。然而,并不是所有的场景都是这样。在一些新闻推荐系统中,可能条目数量,也就是新闻,可能会大于用户数量,新闻更新非常快,所以它的相似度仍然不稳定。因此,推荐策略的选择实际上与具体的应用场景有很大关系。
[En]

Two kinds of collaborative filtering: how to choose between user-based and item-based strategies? In fact, the item-based collaborative filtering recommendation mechanism is an improved strategy of Amazon on the user-based mechanism, because in most Web sites, the number of items is far less than the number of users, and the number and similarity of items are relatively stable; at the same time, the item-based mechanism is better than the user-based real-time. However, this is not the case in all scenes. In some news recommendation systems, perhaps the number of items, that is, news, may be greater than the number of users, and the news is updated very quickly, so its similarity is still unstable. Therefore, the choice of recommendation strategy actually has a lot to do with the specific application scenarios.

基于协同过滤的推荐机制是目前应用最广泛的推荐机制,它具有以下显著优势:

[En]

The recommendation mechanism based on collaborative filtering is the most widely used recommendation mechanism nowadays, and it has the following significant advantages:

它不需要对对象或用户进行严格的建模,也不要求对对象的描述是机器可理解的,因此这种方法是独立于领域的。

[En]

It does not require strict modeling of objects or users, and does not require the description of objects to be machine-understandable, so this approach is domain-independent.

这种方法计算出的推荐是开放的,可以分享他人的经验,支持用户发现潜在的兴趣和偏好。

[En]

The recommendations calculated by this method are open, can share the experience of others, and support users to discover potential interests and preferences.

那么它也有以下缺点:

[En]

Then it also has the following shortcomings:

a、方法的核心是基于历史数据,所以对新物品和新用户都有”冷启动”的问题。
b、推荐的效果依赖于用户历史偏好数据的多少和准确性。
c、在大部分的实现中,用户历史偏好是用稀疏矩阵进行存储的,而稀疏矩阵上的计算有些明显的问题,包括可能少部分人的错误偏好会对推荐的准确度有很大的影响等等。
d、对于一些特殊品味的用户不能给予很好的推荐。
e、由于以历史数据为基础,抓取和建模用户的偏好后,很难利用获取的用户偏好演变,从而导致这个方法不够灵活。
3、 基于关联规则的推荐算法
基于关联规则的推荐在电子商务系统中较为常见,并已被证明是有效的。它的现实意义在于,购买了一些商品的用户更有可能购买其他商品。基于关联规则的推荐系统的主要目标是挖掘关联规则,即多个用户同时购买的商品的集合,这些集合中的商品可以相互推荐。目前,关联规则的挖掘算法主要是基于关联规则的关联规则挖掘算法和FP-Growth算法。基于关联规则的推荐系统通常具有较高的转换率,因为当用户购买了频繁集合中的几个项目时,他们更有可能购买频繁集合中的其他项目。

[En]

Recommendation based on association rules is more common in e-commerce systems, and has been proved to be effective. Its practical significance is that users who have bought some items are more likely to buy other items. The primary goal of the recommendation system based on association rules is to mine association rules, that is, the collection of items purchased by many users at the same time, and the items in these sets can recommend each other. At present, the algorithm of mining association rules is mainly evolved from Apriori and FP-Growth. Recommendation systems based on association rules generally have a higher conversion rate, because when users have purchased several items in a frequent set, they are more likely to purchase other items in the frequent set.

这个机制的缺点如下:

[En]

The disadvantages of this mechanism are as follows:

1.计算量较大,但是可以离线计算,因此影响不大。 2.由于采用用户数据,不可避免的存在冷启动和稀疏性问题。 3.存在热门项目容易被过度推荐的问题。
4、基于模型的推荐算法
基于模型的方法很多,主要是利用常用的机器学习算法为目标用户建立推荐算法模型,然后对用户的偏好进行预测和推荐,并对推荐结果进行排名等。常用的模型包括方面模型、pLSA、LDA、聚类、奇异值分解、矩阵分解、LR、GBDT等。该方法训练过程较长,但训练完成后,推荐过程快速准确。因此,它更适合新闻、广告等实时服务。当然,如果需要这种算法来达到更好的效果,它需要人工干预来反复组合和过滤属性,这就是我们通常所说的特征工程。由于新闻的及时性,系统还需要反复更新在线数学模型以适应变化。

[En]

There are many model-based methods, mainly using commonly used machine learning algorithms to establish a recommendation algorithm model for the target user, and then predict and recommend the user’s preferences and rank the recommended results and so on. The commonly used models include Aspect Model,pLSA,LDA, clustering, SVD,Matrix Factorization,LR,GBDT and so on. The training process of this method is relatively long, but after the training is completed, the recommendation process is fast and accurate. Therefore, it is more suitable for real-time services such as news, advertising and so on. Of course, if this algorithm is needed to achieve better results, it needs manual intervention to combine and filter attributes repeatedly, which is what we often call feature engineering. Because of the timeliness of the news, the system also needs to update the online mathematical model repeatedly to adapt to the changes.

5、 混合推荐算法
在实际应用中,很少使用单一的推荐算法来实现推荐任务。因此,大型和成熟网站的推荐系统是基于各种推荐算法的优缺点和适合场景分析的组合而成的混合算法。当然,混合型策略也会非常丰富,比如针对不同策略的加权算法,针对不同场景、不同阶段使用不同的算法等等。具体如何混合需要结合实际应用场景进行分析和应用。

[En]

In real applications, it is rare to use a single recommendation algorithm to implement recommendation tasks. Therefore, the recommendation systems of large and mature websites are “hybrid algorithms” based on the advantages and disadvantages of various recommendation algorithms and the combination suitable for scenario analysis. Of course, hybrid strategies will also be very rich, such as weighting algorithms for different strategies, using different algorithms for different scenarios and phases, and so on. Specific how to mix needs to be combined with the actual application scenarios for analysis and application.

推荐算法总结

Original: https://blog.csdn.net/qq_40394960/article/details/105868978
Author: Q&Cui
Title: 推荐算法总结

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/6458/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

免费咨询
免费咨询
扫码关注
扫码关注
联系站长

站长Johngo!

大数据和算法重度研究者!

持续产出大数据、算法、LeetCode干货,以及业界好资源!

2022012703491714

微信来撩,免费咨询:xiaozhu_tec

分享本页
返回顶部
最近整理资源【免费获取】:   👉 程序员最新必读书单  | 👏 互联网各方向面试题下载 | ✌️计算机核心资源汇总