# 推荐算法概述

【自取】最近整理的，有需要可以领取学习：

## 1 协同过滤推荐算法总结

[En]

Recommendation algorithm has a lot of application scenarios and commercial value, so the recommendation algorithm is worth studying.

[En]

There are many kinds of recommendation algorithms, but the recommendation algorithm of collaborative filtering category is the most widely used at present. This paper summarizes the recommendation algorithm of collaborative filtering category, and then summarizes the principles of some typical collaborative filtering recommendation algorithms.

1.1 推荐算法概述

[En]

Recommendation algorithms are very old, and there are needs and applications before machine learning is on the rise.

[En]

Generally speaking, it can be divided into the following five categories:

1） 基于内容的推荐：这一类一般依赖于自然语言处理NLP的一些知识，通过挖掘文本的TF-IDF特征向量，来得到用户的偏好，进而做推荐。这类推荐算法可以找到用户独特的小众喜好，而且还有较好的解释性。这一类需要NLP的基础。（自然语言NLP的基础指机器如何）

2） 协同过滤推荐：协同过滤是推荐算法中目前最主流的种类，花样繁多，在工业界已经有了很多广泛的应用。它的优点是不需要太多特定领域的知识，可以通过基于统计的机器学习算法来得到较好的推荐效果。最大的优点是工程上容易实现，可以方便应用到产品中。目前绝大多数实际应用的推荐算法都是协同过滤推荐算法。

3） 混合推荐：这个类似我们机器学习中的集成学习，博才众长，通过多个推荐算法的结合，得到一个更好的推荐算法，起到三个臭皮匠顶一个诸葛亮的作用。比如通过建立多个推荐算法的模型，最后用投票法决定最终的推荐结果。混合推荐理论上不会比单一任何一种推荐算法差，但是使用混合推荐，算法复杂度就提高了，在实际应用中有使用，但是并没有单一的协同过滤推荐算法，比如逻辑回归之类的二分类推荐算法广泛。

4） 基于规则的推荐：这类算法常见的比如基于最多用户点击，最多用户浏览等，属于大众型的推荐方法，在目前的大数据时代并不主流。

5） 基于人口统计信息的推荐：这一类是最简单的推荐算法了，它只是简单的根据系统用户的基本信息发现用户的相关程度，然后进行推荐，目前在大型系统中已经较少使用。

1.2 协同过滤推荐概述

[En]

Collaborative filtering (Collaborative Filtering), as the most classical type of recommendation algorithm, includes online collaborative filtering and offline filtering.

[En]

The so-called online collaboration is to find items that users may like through online data, while offline filtering is to filter out some data that are not worthy of recommendation, such as data with low recommended values. or data that users have purchased despite high recommended values.

[En]

The model of collaborative filtering is generally m items and m users’ data, only some users have scoring data between some users and some data, and others are blank. At this time, we need to use some of the existing sparse data to predict the scoring relationship between those blank items and data, and find the items with the highest score to recommend to users.

[En]

Generally speaking, there are three types of collaborative filtering recommendations.

[En]

The first is * user-based-based collaborative filtering *

[En]

The second is * item-based-based collaborative filtering *

[En]

The third is model-based (model based) collaborative filtering.

[En]

Collaborative filtering based on user-based mainly considers the similarity between users and users. As long as we find out the items that similar users like and predict the score of the corresponding items by the target users, we can find several items with the highest score and recommend them to users.

[En]

Item-based-based collaborative filtering is similar to user-based collaborative filtering, except that we turn to find the similarity between items and items, and only find the target user’s score on some items, then we can predict similar items with high similarity and recommend several similar items with the highest score to users. For example, if you buy a book related to machine learning online, the website will immediately recommend a bunch of books related to machine learning and big data to you, and the idea of project-based collaborative filtering is obviously used here.

[En]

We can simply compare user-based collaborative filtering with project-based collaborative filtering: user-based collaborative filtering needs to find the similarity between users and users online, and the computational complexity will certainly be higher than that of project-based collaborative filtering. But it can help users find new categories of surprising items. In project-based collaborative filtering, because the similarity of items will not change for a period of time, it can be easily calculated offline, and the accuracy is generally acceptable, but in terms of the diversity of recommendations, it is difficult to surprise users.

[En]

Generally speaking, for small recommendation systems, project-based collaborative filtering must be the mainstream. But if it is a large recommendation system, we can consider user-based collaborative filtering, of course, we can consider our third type, model-based collaborative filtering.

[En]

Model-based (model based) collaborative filtering is the most mainstream type of collaborative filtering at present, and a large number of machine learning algorithms can also find opportunities to show their talents here. Next we will focus on model-based collaborative filtering.

1.3 基于模型的协同过滤

[En]

Model-based collaborative filtering as the most mainstream type of collaborative filtering, its ideas are mainly classified and summarized here.

[En]

Our problem is that for m items and m users’ data, only some users have scoring data between some users and some data, while others are blank. At this time, we need to use some of the existing sparse data to predict the scoring relationship between those blank items and data, and to find the items with the highest score and recommend them to users.

[En]

The mainstream methods can be divided into association algorithm, clustering algorithm, classification algorithm, regression algorithm, matrix decomposition, neural network, graph model and hidden semantic model to solve this problem. Let’s introduce them respectively below.

1.3.1 用关联算法做协同过滤

[En]

In general, we can find out the itemset sequences that frequently appear in the data of all items purchased by users, and do frequent set mining to find the frequent N itemsets or sequences of related items that meet the support threshold. If the user buys some items in the frequent N itemset or sequence, then we can recommend other items in the frequent itemset or sequence to the user according to certain scoring criteria, which can include support, confidence and promotion. (data mining)

[En]

The commonly used association recommendation algorithms are Apriori,FP Tree and PrefixSpan.

1.3.2 用聚类算法做协同过滤

[En]

Collaborative filtering using clustering algorithm is similar to the previous collaborative filtering based on users or items. We can cluster according to the user or according to the object based on a certain distance measure. If it is based on user clustering, users can be divided into different target groups according to a certain distance measurement, and the items with high scores of the same target group can be recommended to the target users. Based on item clustering, similar items with high user scores are recommended to users.

[En]

The commonly used clustering recommendation algorithms are K-Means, BIRCH, DBSCAN and spectral clustering.

1.3.3 用分类算法做协同过滤

[En]

If we divide the score into several paragraphs according to the user’s score, the problem becomes a classification problem. For example, the most direct, set a score threshold, the score above the threshold is recommended, the score below the threshold is not recommended, we turn the problem into a two-category problem. Although there are many algorithms for classification problems, logical regression is the most widely used at present.

[En]

Because the explanation of logical regression is relatively strong, we have a clear probability of whether each item is recommended or not. at the same time, we can engineer the characteristics of the data and get the purpose of tuning.

[En]

At present, logical regression to do collaborative filtering has been very mature in large companies such as BAT.

[En]

The common classification recommendation algorithms are logical regression and naive Bayes, both of which are characterized by strong explanation.

1.3.4 用回归算法做协同过滤

[En]

It seems more natural to use regression algorithm for collaborative filtering than classification algorithm. Our score can be a continuous value rather than a discrete value, through the regression model we can get the target user’s prediction score of a product.

[En]

The commonly used regression recommendation algorithms are Ridge regression, regression tree and support vector regression.

1.3.5 用矩阵分解做协同过滤

[En]

Collaborative filtering using matrix decomposition is a widely used method at present. Because the traditional singular value decomposition (SVD) requires that the matrix must have no missing data and must be dense, and our user item score matrix is a typical sparse matrix, it is more complex to use traditional SVD to collaborative filtering directly.

1.3.6 用神经网络做协同过滤

[En]

Using neural network and even deep learning to do collaborative filtering should be a trend in the future. At present, the mainstream recommendation algorithm using two-layer neural network is * restricted Boltzmann machine (RBM). In the current Netflix algorithm competition, the performance of RBM algorithm is very good. Of course, it would be better to use deep neural network to do collaborative filtering, and the method of commercial deep learning to do collaborative filtering should be a trend in the future.

1.3.7 用图模型做协同过滤

[En]

Using graph model for collaborative filtering, the similarity between users is considered in a graph model. The commonly used algorithms are SimRank series algorithm and Markov model algorithm.

[En]

For SimRank series algorithms, its basic idea is that two objects referenced by similar objects are also similar. The idea of the algorithm is somewhat similar to the famous PageRank. Of course, the Markov model algorithm is based on Markov chain, and its basic idea is to find out the similarity which is difficult to be found by ordinary distance measurement algorithm based on conductivity.

1.3.8 用隐语义模型做协同过滤

[En]

The implicit semantic model is mainly based on NLP, which involves the semantic analysis of user behavior to make ratings and recommendations. The main methods are implicit semantic analysis LSA and implicit Dirichlet distribution LDA. (natural language)

1.4 协同过滤的一些新方向

[En]

The reform of recommendation algorithm is also under way, and even the most popular recommendation algorithm based on logical regression is facing to be replaced.

[En]

Which algorithms may replace traditional collaborative filtering such as logical regression?

a) 基于集成学习的方法和混合推荐:这个和混合推荐也靠在一起了。由于集成学习的成熟，在推荐算法上也有较好的表现。一个可能取代逻辑回归的算法是GBDT。目前GBDT在很多算法比赛都有好的表现，还有工业级的并行化实现类库。

b) 基于矩阵分解的方法：矩阵分解，由于方法简单，一直受到青睐。目前开始渐渐流行的矩阵分解方法有分解机(Factorization Machine)和张量分解(Tensor Factorization)。

c) 基于深度学习的方法：目前两层的神经网络RBM都已经有非常好的推荐算法效果，而随着深度学习和多层神经网络的兴起，以后可能推荐算法就是深度学习的天下了？目前看最火爆的是基于CNN和RNN的推荐算法。

1.5 协同过滤总结

[En]

As a classical recommendation algorithm, collaborative filtering is widely used in industry. It has many advantages, strong versatility of the model, does not need much professional knowledge in the corresponding data field, simple engineering implementation, and good results. These are the reasons why it is popular.

[En]

Collaborative filtering also has some unavoidable problems, such as the headache of “cold start”. When we don’t have any data for new users, we can’t recommend items for new users. At the same time, it does not take into account the differences in scenarios, such as based on the user’s scenario and the user’s current mood. Of course, you can’t get some minority’s unique preferences, which is good at content-based recommendations.

[En]

The above is a summary of collaborative filtering recommendation algorithm.

## 2 矩阵分解在协同过滤算法中的应用

[En]

Using matrix decomposition to do collaborative filtering is a widely used method. This paper summarizes the application of matrix decomposition in collaborative filtering recommendation algorithm.

2.1 矩阵分解用于推荐算法要解决的问题

[En]

The problem we often encounter in the recommendation system is that we have many users and items, and a small number of users rate a small number of items. We want to predict the target users’ scores on other unscored items, and then recommend the items with high scores to the target users. For example, the following user item score table:

[En]

For each user, we hope to accurately predict the user’s score on the unscored items. We have many solutions to this problem, and this paper focuses on using the method of matrix decomposition to do it. If we regard the score of m users and n items as a matrix M, we hope to solve this problem through matrix decomposition.

2.2 传统的奇异值分解SVD用于推荐

[En]

When it comes to matrix decomposition, the first thing we think of is singular value decomposition (SVD).

[En]

At this time, the m × n matrix M corresponding to the user item can be decomposed by SVD, and some larger singular values can be selected to reduce the dimension at the same time, that is to say, the matrix M can be decomposed into:

，则只需要计算 即可。通过这种方法，我们可以将评分表里面所有没有评分的位置得到一个预测评分。通过找到最高的若干个评分对应的物品推荐给用户。

[En]

It can be seen that this method is simple, direct and seems to be very attractive. But there is a big problem we overlooked, that is, SVD decomposition requires the matrix to be dense, that is to say, there can be no gaps in all positions of the matrix. When there is a blank, our M can not be decomposed directly to SVD. People will say, if the matrix is dense, it doesn’t mean that we have already found the scores of all the user items, then why do we need SVD! Indeed, this is a problem. The traditional SVD method is to simply complete the missing values in the score matrix, such as using the global average or the average of user items to get the completed matrix. Then you can use SVD to decompose and reduce the dimension.

[En]

Although with the above completion strategy, our traditional SVD is still difficult to use in the recommendation algorithm. Because the number of our users and items are generally super large, casually tens of thousands. It is very time-consuming to do SVD decomposition for such a large matrix.

[En]

So is there a simplified version of matrix factorization available?

[En]

Let’s take a look at the matrix factorization that can actually be used for recommendation systems.

2.3 FunkSVD算法用于推荐

FunkSVD是在传统SVD面临计算效率问题时提出来的，既然将一个矩阵做SVD分解成3个矩阵很耗时，同时还面临稀疏的问题，那么我们能不能避开稀疏问题，同时只分解成两个矩阵呢？也就是说，现在期望我们的矩阵M这样进行分解：

[En]

We know that the SVD decomposition is mature, but how does FunkSVD decompose the matrix M into P and Q?

[En]

The idea of linear regression is adopted here. Our goal is to keep the residual between the user’s score and the score obtained by the product of the matrix as small as possible, that is, the mean square error can be used as the loss function to find the final P and Q.

[En]

Through iteration, we can finally get P and Q, which can be used for recommendation. Although the idea of FunkSVD algorithm is very simple, it works very well in practical application, which really verifies the simplicity of the main road.

2.4 BiasSVD算法用于推荐

BiasSVD假设评分系统包括三部分的偏置因素：一些和用户物品无关的评分因素，用户有一些和物品无关的评分因素，称为用户偏置项。而物品也有一些和用户无关的评分因素，称为物品偏置项。这其实很好理解。比如一个垃圾山寨货评分不可能高，自带这种烂属性的物品由于这个因素会直接导致用户评分低，与用户无关。

[En]

Through iteration, we can finally get P and Q, which can be used for recommendation. BiasSVD adds some extra considerations, so it will perform better than FunkSVD in some scenarios.

2.5 SVD++算法用于推荐

SVD++算法在BiasSVD算法上进一步做了增强，这里它增加考虑用户的隐式反馈。

1.6 矩阵分解推荐方法小结

FunkSVD将矩阵分解用于推荐方法推到了新的高度，在实际应用中使用也是非常广泛。当然矩阵分解方法也在不停的进步，目前张量分解和分解机方法是矩阵分解推荐方法今后的一个趋势。

[En]

For the recommendation method itself, matrix decomposition is easy to program, low complexity and good prediction effect, while maintaining expansibility. These are its valuable advantages. Of course, the matrix decomposition method sometimes does not explain as well as recommendation algorithms such as probability-based logical regression, but this does not affect its popularity.

[En]

It should be a good choice for a small recommendation system to use matrix decomposition. If it is large, matrix decomposition does not have an advantage over some of the current methods of deep learning.

## 3 SimRank协同过滤推荐算法

[En]

The method of collaborative filtering with graph model includes SimRank series algorithm and Markov chain series algorithm. Now we will make a summary of the application of SimRank algorithm in recommendation system.

3.1 SimRank推荐算法的图论基础

SimRank是基于图论的，如果用于推荐算法，则它假设用户和物品在空间中形成了一张图。而这张图是一个二部图。所谓二部图就是图中的节点可以分成两个子集，而图中任意一条边的两个端点分别来源于这两个子集。

[En]

An example of a bipartite graph is shown below. It can also be seen from the graph that there is no edge connection inside the subset of the bipartite graph. For SimRank in our recommendation algorithm, the two subsets of the bipartite graph can be a subset of users and a subset of items. On the other hand, some scoring data between users and items constitute the edges of our bipartite graph.

3.2 SimRank推荐算法思想

[En]

How to recommend the bipartite graph composed of users and items?

SimRank算法的思想是，如果两个用户相似，则与这两个用户相关联的物品也类似；如果两个物品类似，则与这两个物品相关联的用户也类似。如果回到上面的二部图，假设上面的节点代表用户子集，而下面节点代表物品子集。如果用户1和3类似，那么我们可以说和它们分别相连的物品2和4也类似。

[En]

If our bipartite graph is G (VMagneE), where V is a node set and E is an edge set. Then the similarity of two points in a subset s (aformab) can be expressed by the similarity between the nodes of another subset associated with it. That is:

[En]

The above formula can continue to be translated into:

[En]

However, because the similarity between the node and itself is 1, that is, the value on the diagonal of our matrix S should be changed to 1, then we can remove the value on the diagonal and add the unit matrix to get the similarity matrix with diagonal 1. That is:

[En]

As long as we iterate several rounds of the S matrix according to the above formula, when the value of the S matrix is basically stable, we get the similarity matrix of the bipartite graph, and then we can use the similarity measure between the user and the user. the similarity measure between items and items is recommended.

3.3 SimRank算法流程

[En]

Input: transfer matrix W corresponding to bipartite graph, damping constant C, maximum number of iterations k

[En]

Output: subset similarity matrix S

[En]

The above is based on the common SimRank algorithm flow. Of course, there are many variations of the SimRank algorithm, so you may see that the description or iterative process of the SimRank algorithm elsewhere is a little different from the above, but the idea of the algorithm is basically the same as above.

SimRank算法有很多改进变种，比较著名的一个改进是SimRank++算法。

3.4 SimRank++算法原理

SimRank++算法对SimRank算法主要做了两点改进。

[En]

The first point is to consider the weight of the edge, and the second point is to consider the evidence of subset node similarity.

[En]

For the weight of the edge of the first point, in the above SimRank algorithm, we measure the normalized weight of the edge with a relatively general associated edge number, and do not consider that different edges may have different weights, while the SimRank++ algorithm will consider the different weights of different edges when constructing the transition matrix W.

[En]

For the second point, the evidence of node similarity. Review the above SimRank algorithm, as long as we think that there are edges connected, it is similar. However, it does not take into account that if there are more edges connected together, it means that the similarity between the two nodes will be higher. The SimRank++ algorithm uses the number of connected edges as evidence, and modifies the node similarity calculated by the SimRank algorithm in each iteration, that is, multiplying the corresponding evidence to the final similarity value of the current iteration.

3.5 SimRank系列算法的求解

[En]

As the SimRank algorithm involves matrix operation, if the number of users and goods is very large, then the corresponding amount of computation is very large. If we directly use the iterative method mentioned in the second section to solve the problem, it will take a long time. For this problem, in addition to some traditional SimRank optimization, there are two commonly used methods to speed up the speed of solution.

[En]

The first is to use big data platform parallelization, that is, to use Hadoop’s MapReduce or Spark to parallelize matrix operations and speed up the solution of the algorithm.

[En]

The second is to use Monte Carlo (MC) simulation to express the similarity of SimRank between two nodes as the expectation function of the total time of two random walkers from node an and b respectively to the final encounter. The time complexity of this method will be greatly reduced, but because of the randomness of MC, the accuracy of the results may not be high.

3.6 SimRank小结

SimRank算法作为一种基于图论的推荐算法，在广告推荐中有着广泛的应用。作为一种非常好的建模工具，图论被广泛应用于许多算法领域，如前面提到的谱聚类算法。同时，如果你理解SimRank，那么谷歌的PageRank对你来说更容易理解。

[En]

As a recommendation algorithm based on graph theory, SimRank algorithm is widely used in advertising recommendation. As a very good modeling tool, graph theory is widely used in many algorithm fields, such as spectral clustering algorithm I mentioned earlier. At the same time, if you understand SimRank, then Google’s PageRank is easier for you to understand.

## 4 用Spark学习矩阵分解推荐算法

[En]

Here, we use Spark learning matrix to decompose the recommendation algorithm from a practical point of view.

4.1 Spark推荐算法概述

[En]

Where k is the dimension decomposed into low dimensions, which is generally much smaller than m and n.

4.2 Spark推荐算法类库介绍

Spark MLlib推荐算法python对应的接口都在pyspark.mllib.recommendation包中，这个包有三个类，Rating, MatrixFactorizationModel和ALS。虽然里面有三个类，但是算法只是FunkSVD算法。下面介绍这三个类的用途。

Rating类比较简单，仅仅只是为了封装用户，物品与评分这3个值。也就是说，Rating类里面只有用户，物品与评分三元组， 并没有什么函数接口。

ALS负责训练我们的FunkSVD模型。之所以这儿用交替最小二乘法ALS表示，是因为Spark在FunkSVD的矩阵分解的目标函数优化时，使用的是ALS。ALS函数有两个函数，一个是train,这个函数直接使用我们的评分矩阵来训练数据，而另一个函数trainImplicit则稍微复杂一点，它使用隐式反馈数据来训练模型，和train函数相比，它多了一个指定隐式反馈信心阈值的参数，比如我们可以将评分矩阵转化为反馈数据矩阵，将对应的评分值根据一定的反馈原则转化为信心权重值。由于隐式反馈原则一般要根据具体的问题和数据来定，本文后面只讨论普通的评分矩阵分解。

MatrixFactorizationModel类是我们用ALS类训练出来的模型，这个模型可以帮助我们做预测。常用的预测有某一用户和某一物品对应的评分，某用户最喜欢的N个物品，某物品可能会被最喜欢的N个用户，所有用户各自最喜欢的N物品，以及所有物品被最喜欢的N个用户。

4.3 Spark推荐算法重要类参数

[En]

Here is a summary of the important parameters of ALS training model.

1) ratings : 评分矩阵对应的RDD。需要我们输入。如果是隐式反馈，则是评分矩阵对应的隐式反馈矩阵。

2) rank : 矩阵分解时对应的低维的维数。即

3) iterations :在矩阵分解用交替最小二乘法求解时，进行迭代的最大次数。这个值取决于评分矩阵的维度，以及评分矩阵的系数程度。一般来说，不需要太大，比如5-20次即可。默认值是5。

4) lambda: 在 python接口中使用的是lambda_,原因是lambda是Python的保留字。这个值即为FunkSVD分解时对应的正则化系数。主要用于控制模型的拟合程度，增强模型泛化能力。取值越大，则正则化惩罚越强。大型推荐系统一般需要调参得到合适的值。

5) alpha : 这个参数仅仅在使用隐式反馈trainImplicit时有用。指定了隐式反馈信心阈值，这个值越大则越认为用户和他没有评分的物品之间没有关联。一般需要调参得到合适值。

[En]

As can be seen from the above description, it is quite simple to use the ALS algorithm. It should be noted that the main parameters to adjust the parameters are the dimension rank of the matrix decomposition and the regularized hyperparameter lambda. If it is implicit feedback, you also need to adjust the implicit feedback confidence threshold alpha.

4.4 Spark推荐算法实例

[En]

Let’s use a specific example to illustrate the use of the Spark matrix decomposition recommendation algorithm.

[En]

Here we use MovieLens 100K data.

[En]

After unzipping the data, we only use the score data in the u.data file. This data set has four columns per row, corresponding to user ID, item ID, score, and timestamp. Because my machine is relatively broken, in the following example, I only used the first 100 pieces of data. So if you use all the data, the later predictions will be different from mine.

[En]

First of all, you need to make sure that you have installed Hadoop and Spark (version no less than 1.6) and set the environment variables. Generally speaking, we study in ipython notebook (jupyter notebook), so it is best to build a notebook-based Spark environment. Of course, it doesn’t matter if you don’t use notebook’s Spark environment, it’s just that you need to set environment variables before running each time.

[En]

If you don’t have a Spark environment for notebook, you need to run the following code first. Of course, if you’ve already built it, the following code doesn’t have to run.

[En]

Before running the algorithm, it is recommended to output Spark Context as follows. If the memory address can be printed normally, it means that the running environment of Spark is done.

[En]

For example, my output is:

<pyspark.context.sparkcontext object at 0x07352950>&#x3000;</pyspark.context.sparkcontext>


[En]

First read the u.data file into memory, and try to output the first line of data to check whether it is read successfully. Note that when copying the code, the data directory should use your own u.data directory. The code is as follows:

[En]

The output is as follows:

u'196\t242\t3\t881250949'


[En]

You can see that the data is separated by t. We need to split the string of each line into an array and take only the first three columns without timestamping that column. The code is as follows:

[En]

The output is as follows:

[u'196', u'242', u'3']


[En]

At this time, although we have got the RDD corresponding to the score matrix array, but these data are still strings, what Spark needs is the array corresponding to several Rating classes. So we now convert the data type of RDD as follows:

[En]

The output is as follows:

Rating(user=196, product=242, rating=3.0)


[En]

It can be seen that our data is based on the RDD of Rating class, and now we can finally train the sorted data. The code is as follows: we set the dimension of matrix decomposition to 20, the maximum number of iterations to 5, and the regularization coefficient to 0.02. In practical application, we need to select the appropriate matrix decomposition dimension and regularization coefficient through cross-validation. Here we simplify because we are examples.

[En]

After training the model, we can finally make the prediction of the recommendation system.

[En]

First, make the simplest prediction, such as predicting user 38’s rating of 20 items. The code is as follows:

[En]

The output is as follows:

0.311633491603


[En]

It can be seen that the score is not high.

[En]

Now let’s predict the 10 favorite items of user 38, the code is as follows:

[En]

The output is as follows:

[Rating(user=38, product=95, rating=4.995227969811873), Rating(user=38, product=304, rating=2.5159673379104484), Rating(user=38, product=1014, rating=2.165428673820349),


[En]

It can be seen that user 38 may like the corresponding scores of 10 items from high to low.

[En]

Then let’s predict the 10 most recommended users of item 20, the code is as follows:

[En]

The output is as follows:

[Rating(user=115, product=20, rating=2.9892138653406635), Rating(user=25, product=20, rating=1.7558472892444517), Rating(user=7, product=20, rating=1.523935609195585),


[En]

Now let’s take a look at the three most recommended items for each user, with the following code:

[En]

Since the output is very long, the output copy will not be sent here.

[En]

The three most recommended users for each item are as follows:

[En]

Also because the output is very long, the output copy will not be used here.

[En]

The above is the Spark matrix decomposition recommendation algorithm.

Original: https://blog.csdn.net/heihei2017/article/details/93752459
Author: 无香菜不欢
Title: 推荐算法概述

(0)

### 大家都在看

• #### 【毕业设计系列】001：基于DCT和置乱算法的视频水印Matlab实现

Date： 2022.4.5 1、前言 2、视频水印系统 2.1、初始界面 2.2、不攻击水印嵌入和提取 2.3、滤波攻击水印嵌入和提取 2.4、高斯噪声攻击水印嵌入和提取 2.5…

人工智能 2022年9月1日
0470
• #### 《数字语音处理》- 实验3. 基于MATLAB的LPC分析（附代码）

声明 本文仅在CSDN发布，其他均为盗版。请支持正版！正版链接https://blog.csdn.net/meenr/article/details/117629793 基于MAT…

人工智能 2022年9月7日
0280
• #### DL之GRU(Tensorflow框架)：基于茅台股票数据集利用GRU算法实现回归预测(保存模型.ckpt.index、.ckpt.data文件)

DL之GRU(Tensorflow框架)：基于茅台股票数据集利用GRU算法实现回归预测(保存模型.ckpt.index、.ckpt.data文件) 目录 基于茅台股票数据集利用GR…

2022年9月2日
0310
• #### 直播交友app开发,一对一视频直播聊天APP定制,语音直播交友软件源码

直播类app近两年十分受欢迎，一对一视频直播交友，直播交友app目前主要是语音直播间陪玩，视频直播间打赏，pk，视频相亲，七人语音直播间，多人视频直播间，语音直播交友软件源码！1，…

2022年9月2日
0670
• #### Livox雷达驱动程序发布点云格式CustomMsg、PointCloud2、pcl::PointXYZI、pcl::PointXYZINormal解析

livox_ros_driver可以发布多种格式的点云数据，今天仔细看看这些点云的具体差别： 驱动程序中所有的 launch 文件都位于 “ws_livox/src/l…

人工智能 2022年8月27日
0250
• #### AI模型各种存储格式文件介绍（pb, onnx, ckpt, tflite, h5）

.ckpt模型文件 tensorflow框架下保存的模型，包含以下几个子文件： model.ckpt.meta ：保存Tensorflow计算图结构，可以理解为神经网络的网络结构 …

2022年9月2日
0230
• #### 论文笔记：Dual Quaternion Knowledge Graph Embeddings

论文笔记：Dual Quaternion Knowledge Graph Embeddings 一、研究任务： 知识图谱补全（链接预测）注：文中虽然是以对偶四元数解释的，本质上可以…

人工智能 1天前
040
• #### NAFNet网络图像去模糊及模型转为onnx

面朝大海，春暖花开感谢https://github.com/hzk7287对问题解决提供的支持，谢谢！ 官方git:https://github.com/megvii-resear…

人工智能 2022年8月27日
0300
• #### tensorflow学习笔记 (三)

文章目录 * – 一、用Tensorflow API: tf.keras搭建网络八股 – + 1. Sequential + 2. compile + 3….

2022年9月6日
0210
• #### 在 Windows 安装 RASA X 以及一些错误总结 (MissingDependencyException)

文章目录 1. 错误描述 2. 解决方法 * 2.1 创建环境 2.2 安装rasa – 安装完成后 2.4 安装 rasa x – 检查 2.5 可能存在…

人工智能 2022年9月14日
0330
• #### 【论文】(IJCAI20 知识图谱神经网络)KGNN: Knowledge Graph Neural Network for Drug-Drug Interaction Prediction

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。 Original: https:…

人工智能 23小时前
040
• #### 【深度学习入门项目】使用tensorflow训练CNN实现口罩识别

目录 1. 导入所需要的python包 2. 准备数据集 3. 训练模型 3.1 定义参数变量 3.2 数据读取与预处理 3.3 搭建神经网络框架 3.4 结果可视化函数 3.5 …

2022年9月2日
0240
• #### 基于libtorch的yolov5目标检测网络实现(3)——Kmeans聚类获取anchor框尺寸

算上这一篇，yolov5目标检测框架我们已经更新到第3篇了，前面两篇分别讲解了coco数据集json标签文件的解析，以及yolov5网络的网络结构与实现： 基于libtorch的y…

人工智能 2天前
0100
• #### Win10笔记本用雷电3接口外接显卡加速tensorflow深度学习步骤

简介：最近入手了一块rtx3060，但自己的主力设备是笔记本，于是萌生了通过外接显卡来加速深度学习的想法，配置过程中遇到一些小问题，经过调试最后解决了，现在简单把整个过程涉及的要点…

2022年9月5日
0370
• #### VQA: Visual Question Answering 视觉问答

论文：Antol S, Agrawal A, Lu J, et al. Vqa: Visual question answering[C]//Proceedings of the …

人工智能 2022年9月8日
0340
• #### 直播预告 | AAAI 2022：融入知识图谱的分子对比学习

​ 「AI Drive」是由 biendata 和 PaperWeekly 共同发起的学术直播间，旨在帮助更多的青年学者宣传其最新科研成果。我们一直认为，单向地输出知识并不是一个最…

人工智能 23小时前
030
• #### NVIDIA JETSON XAVIER NX烧录（emmc版本）

目录 0.前言 1.安装虚拟机 2.安装SDKManager 3.使用SDK Manager开始烧录 4.配置系统 5.开发环境的安装（CUDA） 6.遇到问题记录（如果有其它问题…

2022年8月27日
0310
• #### 计算机视觉项目实战-图像特征检测harris、sift、特征匹配

😊😊😊 欢迎来到本博客😊😊😊本次博客内容将继续讲解关于OpenCV的相关知识🎉 作者简介：⭐️⭐️⭐️ 目前计算机研究生在读。主要研究方向是人工智能和群智能算法方向。目前熟悉pyt…

人工智能 2022年9月8日
0210
• #### 文字识别成语音_百度Ai语音识别文字转语音

2020 04 08 语音识别-文字转语音 01前言 接上一期内容，这次利用百度Ai 提供的语音合成api，将获取到的小说文字数据，转化为音频数据 。需要有一个百度账号，然后再创建…

人工智能 2022年9月7日
0230
• #### 【图像分割】使用np.where()实现多类别图像分割可视化

目录 背景 方法 背景 相比采用表格化数据定量分析，可视化是分割任务定性分析的主要手段。 [En] Compared with the quantitative analysis …

人工智能 2022年9月18日
0160
• #### 深度学习之目标检测（五）– RetinaNet网络结构详解

深度学习之目标检测（五）– RetinaNet网络结构详解 * – 深度学习之目标检测（五）RetinaNet网络结构详解 – + 1. Ret…

人工智能 2022年9月7日
0510
• #### 推荐算法之SVD算法

目录 特征值与奇异值 1）特征值 2）奇异值 推荐系统中的SVD算法 SVD算法优缺点 通过SVD对数据的处理，我们可以使用小得多的数据集来表示原始数据集，这样做实际上是去除了噪声…

2022年8月13日
0520
• #### Pr 入门教程 如何为多个剪辑设置音频电平？

欢迎观看 Premiere Pro 教程，小编带大家学习 Pr 的基本编辑技巧，了解如何为多个剪辑设置音频电平。 有两种方法可以同时更改多个剪辑的音量。这两个选项都运行良好：音频增…

人工智能 2022年9月7日
0250
• #### python||判断K-Means聚类最佳数量

在进行K均值聚类之前，不知道聚成几类效果是最好的，经过查资料，发现有两种常用的方式。分别是： 1.肘方法 2.轮廓系数法 这篇文章记录使用肘方法判断聚类最佳数量，肘方法的原理是什么…

人工智能 2天前
0100
• #### 【生成对抗网络】GAN入门与代码实现（一）

文章目录 * – 1. 生成对抗网络介绍 – 2. 基于TensorFlow2的GAN的简单实现 – + 2.1 导包与参数设置 + 2.2 生…

2022年8月15日
0370
• #### 端到端语音识别笔记

一、端到端语音识别的输入与输出是什么输入：目前端到端语音识别常用的输入特征为 fbank。fbank 特征的处理过程为对一段语音信号进行预加重、分帧、加窗、短时傅里叶变换（STFT…

2022年9月6日
0240