# 推荐算法总览（完整总结）

1. 什么是推荐算法

2. 推荐算法的目的

3. 推荐算法的条件

4. 推荐算法分类

4.1 基于流行度的推荐算法

4.2 基于内容的推荐算法

4.3 基于关联规则的推荐算法

4.4 基于协同过滤的推荐

4.4.1 基于用户(User-based)的推荐

4.4.2 基于物品(Item-based)的推荐

4.4.3 协同过滤算法总结

4.5 基于模型的推荐算法

4.6 混合推荐算法

# 1.什么是推荐算法

[En]

Recommendation algorithm an algorithm in computer major that speculates what users may like through some mathematical algorithms. at present, the better application of recommendation algorithm is the Internet, such as Taobao, headlines and so on. The so-called recommendation algorithm is to use some user behavior, through some mathematical algorithms, to speculate what the user may like.

[En]

The recommendation algorithm was first proposed in 1992, but it is actually popular in recent years, because of the outbreak of the Internet, with a larger amount of data available for us to use, the recommendation algorithm has a great opportunity to show its ability.

# 2. 推荐算法的目的

[En]

Guess what you like, personalized playlist, hot Weibo, these are the output of the recommendation system. The main objectives of the recommendation system include the following:

[En]

It’s not easy to help users find what they want. There are so many goods, even ourselves, that we often click on Taobao and don’t know what to buy in the face of dazzling discount activities. In economics, there is a famous theory called long tail theory (The Long Tail).

[En]

Applied in the field of the Internet, it means that the hottest part of the resources will get the vast majority of attention, while the rest of a large part of the resources will be rarely visited. This not only results in a waste of resources, but also makes it impossible for many minority users to find the content they are interested in.

[En]

The amount of information in the Internet era has been in a state of explosion, if all the content on the home page of the website is impossible for users to read, the utilization of information will be very low. So we need a recommendation system to help users filter out low-value information.

[En]

A good recommendation system allows users to visit a site more frequently and can always find what they want to buy or read.

[En]

It is conceivable that whenever the system successfully recommends a content that the user is interested in, our image of the user’s interests and other dimensions is becoming more and more clear. When we can accurately describe the image of each user, we can customize a series of services for them, so that users with a variety of needs can be met on our platform.

……

# 3. 推荐算法的条件

[En]

Today’s various recommendation algorithms, but in any case, can not bypass a few conditions, this is the basic condition of recommendation:

1.根据和你共同喜好的人来给你推荐

2.根据你喜欢的物品找出和它相似的来给你推荐

3.根据你给出的关键字来给你推荐，这实际上就退化成搜索算法了

4.根据上面的几种条件组合起来给你推荐

# 4.推荐算法分类

## 4.1 基于流行度的推荐算法

[En]

The recommendation algorithm based on popularity is relatively simple and rough, mainly for the recommendation of hot goods or information. It is mainly recommended to users according to a certain heat sort according to data such as PV, UV, daily PV or sharing rate. This algorithm has both advantages and disadvantages.

## 4.2 基于内容的推荐算法

[En]

Content-based recommendation is the most widely used recommendation mechanism at the beginning of recommendation engine. Its core idea is to find the relevance of items or content based on the metadata of recommended items or content, and then recommend similar items to users based on users’ previous preference records. For example, if you read Harry Potter I, the content-based recommendation algorithm finds that Harry Potter II-VI is closely related to what you have seen in the content (there are many keywords), so you will be recommended to the latter.

[En]

This kind of recommendation system is mostly used in some information applications, for the article (movie music) itself to extract some tag as its keywords, and then these tag can be used to evaluate the similarity of the two articles.

1、易于实现，不需要用户数据因此不存在稀疏性和冷启动问题。

2、基于物品本身特征推荐，因此不存在过度推荐热门的问题。

1、抽取的特征既要保证准确性又要具有一定的实际意义，否则很难保证推荐结果的相关性。豆瓣网采用人工维护tag的策略，依靠用户去维护内容的tag的准确性。

2、推荐的Item可能会重复，典型的就是新闻推荐，如果你看了一则关于MH370的新闻，很可能推荐的新闻和你浏览过的，内容一致。

## 4.3 基于关联规则的推荐算法

[En]

Recommendation based on association rules is more common in e-commerce systems, and has been proved to be effective. Its practical significance is that users who have bought some items are more likely to buy other items. The primary goal of the recommendation system based on association rules is to mine association rules, that is, the collection of items purchased by many users at the same time, and the items in these sets can recommend each other. At present, the algorithm of mining association rules is mainly evolved from Apriori and FP-Growth. Recommendation systems based on association rules generally have a higher conversion rate, because when users have purchased several items in a frequent set, they are more likely to purchase other items in the frequent set.

1.计算量较大，但是可以离线计算，因此影响不大。

2.由于采用用户数据，不可避免的存在冷启动和稀疏性问题。

3.存在热门项目容易被过度推荐的问题。

## 4.4 基于协同过滤的推荐

[En]

Collaborative filtering is a recommendation method widely used in recommendation system. This algorithm is based on the assumption that “birds of a feather flock together”. Users who like the same items are more likely to have the same interests. The recommendation system based on collaborative filtering is generally used in the system with user rating, and the score is used to describe users’ preferences for items. Collaborative filtering is seen as a model for the use of collective wisdom, which does not require special treatment of the project, but through the user to establish the relationship between objects. At present, collaborative filtering recommendation system is divided into two types: user (User-based)-based recommendation and item-based (Item-based) recommendation.

### 4.4.1 基于用户(User-based)的推荐

[En]

The basic principle of collaborative filtering recommendation based on users is to find neighboring users based on the user’s preference for items, and then recommend what the neighboring users like to the current user. In calculation, the similarity between users is calculated by taking a user’s preference for all items as a vector. after finding K neighbors, according to the similarity weight of neighbors and their preference for items, predict the unrelated items that the current user has no preference, and calculate a sorted list of items as a recommendation.

[En]

The following figure shows an example. For user A, according to the user’s historical preference, only one neighbor, user C, is calculated, and then the item D that user C likes is recommended to user A.

### 4.4.2 基于物品(Item-based)的推荐

[En]

Item-based collaborative filtering is similar to user-based collaborative filtering, except that the object itself is used to calculate neighbors, not from the user’s point of view, that is, to find similar items based on the user’s preference for items, and then recommend similar items to him according to the user’s historical preferences. From the point of view of calculation, the similarity between items is calculated by taking all users’ preferences for an item as a vector, and after getting similar items, the items that have not been expressed by the current user are predicted according to the preference of the user’s history, and a sorted list of items is calculated as a recommendation. Item-based collaborative filtering can be regarded as a degradation of association rule recommendation, but because collaborative filtering takes more into account the actual score of users, and only calculates similarity rather than finding frequent sets, it can be considered that item-based collaborative filtering has higher accuracy and higher coverage.

[En]

The following figure shows an example. For items An and B, according to the historical preferences of all users, users who like item An all like item C. it is concluded that item An and item C are similar, while user C likes item A. then it can be inferred that user C may also like item C.

### 4.4.3 协同过滤算法总结

[En]

How to choose between user-based and item-based strategies? In fact, the item-based collaborative filtering recommendation mechanism is an improved strategy of Amazon on the user-based mechanism, because in most Web sites, the number of items is far less than the number of users, and the number and similarity of items are relatively stable; at the same time, the item-based mechanism is better than the user-based real-time. However, this is not the case in all scenes. In some news recommendation systems, perhaps the number of items, that is, news, may be greater than the number of users, and the news is updated very quickly, so its similarity is still unstable. Therefore, the choice of recommendation strategy actually has a lot to do with the specific application scenarios.

[En]

The recommendation mechanism based on collaborative filtering is the most widely used recommendation mechanism nowadays, and it has the following significant advantages:

1. 它不需要对物品或者用户进行严格的建模，而且不要求物品的描述是机器可以理解的，所以这种方法也是领域无关的。
2. 这种方法计算出来的推荐是开放的，可以共用他人的经验，很好的支持用户发现潜在的兴趣偏好。

[En]

Then it also has the following shortcomings:

1. 方法的核心是基于历史数据，所以对新物品和新用户都有”冷启动”的问题。
2. 推荐的效果依赖于用户历史偏好数据的多少和准确性。
3. 在大部分的实现中，用户历史偏好是用稀疏矩阵进行存储的，而稀疏矩阵上的计算有些明显的问题，包括可能少部分人的错误偏好会对推荐的准确度有很大的影响等等。
4. 对于一些特殊品味的用户不能给予很好的推荐。
5. 由于以历史数据为基础，抓取和建模用户的偏好后，很难利用获取的用户偏好演变，从而导致这个方法不够灵活。

## 4.5 基于模型的推荐算法

[En]

There are many model-based methods, mainly using commonly used machine learning algorithms to establish a recommendation algorithm model for the target user, and then predict and recommend the user’s preferences and rank the recommended results and so on. The commonly used models include Aspect Model,pLSA,LDA, clustering, SVD,Matrix Factorization,LR,GBDT and so on. The training process of this method is relatively long, but after the training is completed, the recommendation process is fast and accurate. Therefore, it is more suitable for real-time services such as news, advertising and so on. Of course, if this algorithm is needed to achieve better results, it needs manual intervention to combine and filter attributes repeatedly, which is what we often call feature engineering. Because of the timeliness of the news, the system also needs to update the online mathematical model repeatedly to adapt to the changes.

[En]

Simply take LR as an example to talk about how the recommendation system works. By analyzing the user’s behavior and purchase records in the system, we get the following table:

[En]

The row in the table is an item, and x1~xn is a variety of characteristic attributes that affect the user’s behavior, such as the user’s age, gender, region, price, category, etc., while y is the user’s preference for the item, which can be purchase history, browsing, collection, and so on. Through a large number of such data, we can regress and fit a function and calculate the corresponding coefficient of x1~xn, which is the corresponding weight of each feature attribute. The larger the weight value is, the more important the attribute is for users to choose goods.

[En]

When fitting the function, we will think that there may not be a strong correlation between a single attribute and another attribute. For example, there is not a strong correlation between age and the purchase of skincare products, nor is there a strong correlation between gender and buying skincare products, but when we consider age and gender together, they are strongly associated with purchasing behavior. For example, women in their 20s and 30s are more likely to buy skin care products, which is called cross-attributes. Through repeated testing and experience, we can adjust the combination of feature attributes to fit the most accurate regression function. The final attribute weights are as follows:

[En]

Because of its high speed and accuracy, the model-based algorithm is suitable for real-time services such as news and advertising, but if this algorithm is needed to achieve better results, it needs manual intervention to combine and filter attributes repeatedly, that is, the so-called Feature Engineering. Because of the timeliness of the news, the system also needs to update the online mathematical model repeatedly to adapt to the changes.

## 4.6 混合推荐算法

[En]

In real applications, it is rare to use a single recommendation algorithm to implement recommendation tasks. Therefore, the recommendation systems of large and mature websites are “hybrid algorithms” based on the advantages and disadvantages of various recommendation algorithms and the combination suitable for scenario analysis. Of course, hybrid strategies will also be very rich, such as weighting algorithms for different strategies, using different algorithms for different scenarios and phases, and so on. Specific how to mix needs to be combined with the actual application scenarios for analysis and application. Thus it can be seen that there are still many types of recommendation algorithms, especially when the application scene changes, recommendation algorithms often need to make great changes.

[En]

In the way of mixing, some researchers have put forward seven mixed ideas:

1. 加权（Weight）：加权多种推荐技术结果。
2. 变换（Switch）：根据问题背景和实际情况或要求决定变换采用不同的推荐技术。
3. 混合（Mixed）：同时采用多种推荐技术给出多种推荐结果为用户提供参考。
4. 特征组合（Feature combination）：组合来自不同推荐数据源的特征被另一种推荐算法所采用。
6. 特征扩充（Feature augmentation）：一种技术产生附加的特征信息嵌入到另一种推荐技术的特征输入中。
7. 元级别（Meta-level）：用一种推荐方法产生的模型作为另一种推荐方法的输入。

[En]

Reference article:

[机器学习]推荐系统之协同过滤算法

Original: https://blog.csdn.net/yawei_liu1688/article/details/103986284
Author: bigdata老司机
Title: 推荐算法总览（完整总结）

(0)