Collaborativ

2024年1月5日上午5:32 • 人工智能 • 阅读 43

问题介绍

Collaborative Filtering（协同过滤）是推荐系统中一种常见的技术，用于预测用户对物品的评分或偏好。它基于用户或物品之间的相似性，通过借鉴其他用户或物品的行为模式来预测目标用户的兴趣。

在本问题中，我们将使用协同过滤方法来预测用户对电影的评分。我们使用一个虚拟的电影评分数据集，包含电影ID、用户ID和评分三个字段。我们的目标是给定一个用户ID和电影ID，预测该用户对该电影的评分。

算法原理

协同过滤算法分为两种主要类型：基于用户的方法和基于物品的方法。在本问题中，我们将使用基于用户的协同过滤方法。

基于用户的协同过滤方法假设具有相似兴趣的用户在过去的行为中有类似的评分模式。算法的主要思想是通过计算目标用户和其他用户之间的相似度，找到与目标用户的评分行为最为相似的其他用户，然后根据这些相似用户的评分来预测目标用户对目标电影的评分。

具体而言，我们使用皮尔逊相关系数来计算用户之间的相似度。皮尔逊相关系数的计算公式如下所示：

$$
sim(u, v) = \frac{\sum_{i \in I}(r_{ui} – \overline{r_u})(r_{vi} – \overline{r_v})}{\sqrt{\sum_{i \in I}(r_{ui} – \overline{r_u})^2}\sqrt{\sum_{i \in I}(r_{vi} – \overline{r_v})^2}}
$$

其中，$sim(u, v)$表示用户$u$和用户$v$之间的相似度，$r_{ui}$表示用户$u$对物品$i$的评分，$\overline{r_u}$表示用户$u$的评分均值。

计算步骤

协同过滤的计算步骤如下：

读取电影评分数据集。
根据用户ID和电影ID找到目标用户和目标电影。
筛选出其他用户中评分过目标电影的用户，并计算这些用户与目标用户的皮尔逊相关系数。
根据皮尔逊相关系数和这些用户的评分，预测目标用户对目标电影的评分。
输出预测结果。

下面我们将使用Python代码实现这些步骤，并对代码进行详细解释。

Python代码实现

首先，我们需要导入所需的库：

import numpy as np
import pandas as pd

然后，我们定义一个函数collaborative_filtering来执行协同过滤算法：

def collaborative_filtering(user_id, movie_id, ratings):
 target_user = ratings[ratings['user_id'] == user_id]
 target_movie = ratings[ratings['movie_id'] == movie_id]

 target_user_mean = target_user['rating'].mean()

 other_users = ratings[(ratings['movie_id'] == movie_id) & (ratings['user_id'] != user_id)]

 similarities = []
 for _, user in other_users.iterrows():
 other_user = ratings[ratings['user_id'] == user['user_id']]
 other_user_mean = other_user['rating'].mean()

 numerator = ((target_user['rating'] - target_user_mean) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls (other_user['rating'] - other_user_mean)).sum()
 denominator = np.sqrt(((target_user['rating'] - target_user_mean) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 2).sum()) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sqrt(((other_user['rating'] - other_user_mean) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 2).sum())

 similarity = numerator / denominator
 similarities.append((user['user_id'], similarity))

 similarities = sorted(similarities, key=lambda x: x[1], reverse=True)

 top_users = similarities[:5] # 选择相似度最高的前5个用户

 predicted_rating = target_user_mean
 for user_id, similarity in top_users:
 other_user = ratings[ratings['user_id'] == user_id]
 other_user_rating = other_user[other_user['movie_id'] == movie_id]['rating']
 if not other_user_rating.empty:
 predicted_rating += similarity artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls (other_user_rating.values[0] - other_user_mean)

 return predicted_rating

接下来，我们读取虚拟的电影评分数据集，并调用collaborative_filtering函数进行预测：

ratings_data = {
 'user_id': [1, 1, 2, 2, 3, 3, 4, 4, 5, 5],
 'movie_id': [1, 2, 1, 2, 1, 2, 1, 2, 1, 2],
 'rating': [5, 4, 3, 5, 4, 2, 3, 1, 5, 4]
}

ratings = pd.DataFrame(ratings_data)

user_id = 1
movie_id = 2

predicted_rating = collaborative_filtering(user_id, movie_id, ratings)
print("Predicted rating:", predicted_rating)

运行上述代码，我们将得到预测结果。

代码细节解释

在上述代码中，我们首先通过ratings[ratings['user_id'] == user_id]和ratings[ratings['movie_id'] == movie_id]筛选出目标用户和目标电影的评分。然后，我们计算目标用户的评分均值target_user_mean。

接下来，我们筛选出其他用户中评分过目标电影的用户。对于每个其他用户，我们计算用户之间的皮尔逊相关系数，并将结果保存在similarities列表中。

然后，我们根据相似度对其他用户进行排序，并选择相似度最高的前5个用户作为相似用户。

根据相似用户的评分和相似度，我们预测目标用户对目标电影的评分。最后，我们返回预测结果。

在上述代码中，我们使用了Pandas库来处理数据和进行相关计算。我们还使用了NumPy库中的函数来计算相关公式中的数学运算。

总结

通过基于用户的协同过滤算法，我们可以根据用户之间的相似度来预测用户对物品的评分或偏好。在本问题中，我们详细介绍了协同过滤算法的原理和公式推导，并给出了详细的计算步骤和Python代码实现。最终，我们可以使用该算法预测用户对电影的评分。

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/823963/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

论文浅尝 – AAAI2020 | 多轮对话系统中的历史自适应知识融合机制

论文笔记整理：潘锐，天津大学硕士。链接：https://www.aaai.org/ojs/index.php/AAAI/article/view/6425 来源：AAAI 20…

人工智能 2023年6月10日
0077
MobileNet V3(2019)

创新点 1.引入SE结构 MobileNetV3 的另一个新颖想法是在核心架构中加入一种名为「Squeeze-and-Excitation」的神经网络（简称 SE-Net，也是 …

人工智能 2023年7月14日
0063
Go语言学习之路（二）

Go语言学习之路（二）面对对象编程思想 * 抽象封装继承接口文件命令行参数 Json * 序列化反序列化（unmarshal）单元测试 Redis * Redis简…

人工智能 2023年5月30日
00126
Opencv中circle(),line(),cv2.rectangle(),cv2.putText()

Opencv中circle(),line(),cv2.rectangle(),cv2.putText() 一、circle()画圆 cv2.circle() 方法用于在任何图像上绘…

人工智能 2023年7月19日
0056
思科设备BGP配置命令

Cisco(config)#router bgp 100 //配置BGP进程号Cisco(config-router)#bgp router-id 1.1.1.1 //配置BGP的…

人工智能 2023年6月27日
0086
一篇学会Pytorch数据集下载

文章目录前言 * 数据来源 transforms联用总结前言数据集是我们在训练模型中经常用到的,那我们该如何下载并进行使用它呢数据来源打开pytorch官网发现有很多:…

人工智能 2023年6月16日
0072
灰色预测法

灰色预测法基本内容目录灰色预测法基本内容一、灰色预测的概念二、灰色预测的类型三、为了弱化原始时间序列的随机性四、关联度五、GM（1，1）模型的建立六、GM（n，…

人工智能 2023年7月15日
0089
预后建模绕不开的lasso cox回归

欢迎关注”生信修炼手册”! 回归我们并不陌生，线性回归和最小二乘法，逻辑回归和最大似然法，这些都是我们耳熟能详的事物，在生物信息学中的应用也比较广泛, 回归…

人工智能 2023年6月23日
0088
DALLE2

github库的DALLE2基于OPENAI和pytorch实现以前学习强化学习的时候接触过一点点OPENAI但是没怎么深入学习有用到CLIP模型和Unet模型好然后并不知道CL…

人工智能 2023年7月23日
0084
MATLAB中FIR滤波器的时延溢出问题详解：线性相位对信号造成的时延溢出及其消除方法，以及fir1等函数的使用

1、问题由来：前段时间在对用MATLAB处理试验数据时，需要对多路信号进行滤波后做同步，在这个过程中使用MATLAB中自带的 fir1 函数以及Filter Designer工具…

人工智能 2023年6月18日
0094
【数据分析与挖掘实战】B站影视区数据分析

一.分析目标与内容 B站作为一个视频内容平台，具有广泛的受众，其数据具有巨大的分析价值。在本次数据分析项目中，分别从视频角度和up主角度对B站影视区数据集进行了分析，通过描述性统计…

人工智能 2023年7月18日
0070
【YOLOv7_0.1】网络结构与源码解析

文章目录前言整体网络结构分解的yolov7.yaml 各组件结构 * ELAN1 (backbone) ELAN2 (head) MPConv SPPCSPC RepConv…

人工智能 2023年7月29日
0077
【机器学习】04. 神经网络模型 MLPClassifier分类算法与MLPRegressor回归算法（代码注释，思路推导）

目录 * – 资源下载* 1. MLPClassifier分类算法* – 1.a 读取数据并进行归一化 – 1.b MLPClassifier多…

人工智能 2023年6月16日
0090
yolov5只训练数据集中的某几个类别

文章目录前言一、直接修改数据集标签二、修改加载labels的代码 * 1.train 2.create_dataloader 3.LoadImagesAndLabels 4….

人工智能 2023年7月21日
0054
Keras深度学习实战（8）——使用数据增强提高神经网络性能

Keras深度学习实战（8）——使用数据增强提高神经网络性能 * – 0. 前言 – 1. 数据集与模型分析 – + 1.1 数据集介绍 + 1…

人工智能 2023年7月13日
0048
机器学习——SVM（支持向量机）与人脸识别

忆如完整项目/代码详见github： https://github.com/yiru1225（转载标明出处勿白嫖 star for projects thanks）目录系列文…

人工智能 2023年7月26日
0080

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31