图神经网络07-从零构建一个电影推荐系统

2023年6月2日下午12:30 • 人工智能 • 阅读 77

欢迎大家”Fork”，点击右上角的 ” Fork “,可直接运行并查看代码效果

1 简介

这个项目的目标是为Netflix上的电影和电视节目开发一个基于内容的推荐引擎。我们将比较两种不同的方法:

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:9efa7cb5-6631-4980-83e2-0297f9b2b7f9

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:ae938482-ad98-47e2-a095-90d5475c7c4c

* 用电影/电视节目中的词语作为特征。

Image Name

2 导入工具包

!pip install nltk pytest -i https://pypi.tuna.tsinghua.edu.cn/simple

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: nltk in /opt/conda/lib/python3.8/site-packages (3.6.1)
Requirement already satisfied: pytest in /opt/conda/lib/python3.8/site-packages (6.2.3)
Requirement already satisfied: regex in /opt/conda/lib/python3.8/site-packages (from nltk) (2021.4.4)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.8/site-packages (from nltk) (4.48.2)
Requirement already satisfied: joblib in /opt/conda/lib/python3.8/site-packages (from nltk) (0.16.0)
Requirement already satisfied: click in /opt/conda/lib/python3.8/site-packages (from nltk) (7.1.2)
Requirement already satisfied: iniconfig in /opt/conda/lib/python3.8/site-packages (from pytest) (1.1.1)
Requirement already satisfied: attrs>=19.2.0 in /opt/conda/lib/python3.8/site-packages (from pytest) (20.1.0)
Requirement already satisfied: pluggy<1.0.0a1,>=0.12 in /opt/conda/lib/python3.8/site-packages (from pytest) (0.13.1)
Requirement already satisfied: packaging in /opt/conda/lib/python3.8/site-packages (from pytest) (20.4)
Requirement already satisfied: toml in /opt/conda/lib/python3.8/site-packages (from pytest) (0.10.2)
Requirement already satisfied: py>=1.8.2 in /opt/conda/lib/python3.8/site-packages (from pytest) (1.10.0)
Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from packaging->pytest) (1.15.0)
Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging->pytest) (2.4.7)</1.0.0a1,>

import numpy as npimport pandas as pdimport refrom tqdm import tqdmimport nltk
</code></pre>
<p>from nltk.tokenize import word_tokenize
3 加载数据
&#x67E5;&#x770B;&#x5F53;&#x524D;&#x6302;&#x8F7D;&#x7684;&#x6570;&#x636E;&#x96C6;&#x76EE;&#x5F55; !ls /home/kesci/input/

netflix8714

data=pd.read_csv('/home/kesci/input/netflix8714/netflix_titles.csv') data.head()

show_idtypetitledirectorcastcountrydate_addedrelease_yearratingdurationlisted_indescription0s1TV Show3%NaNJoão Miguel, Bianca Comparato, Michel Gomes, R...BrazilAugust 14, 20202020TV-MA4 SeasonsInternational TV Shows, TV Dramas, TV Sci-Fi &...In a future where the elite inhabit an island ...1s2Movie7:19Jorge Michel GrauDemián Bichir, Héctor Bonilla, Oscar Serrano, ...MexicoDecember 23, 20162016TV-MA93 minDramas, International MoviesAfter a devastating earthquake hits Mexico Cit...2s3Movie23:59Gilbert ChanTedd Chan, Stella Chung, Henley Hii, Lawrence ...SingaporeDecember 20, 20182011R78 minHorror Movies, International MoviesWhen an army recruit is found dead, his fellow...3s4Movie9Shane AckerElijah Wood, John C. Reilly, Jennifer Connelly...United StatesNovember 16, 20172009PG-1380 minAction & Adventure, Independent Movies, Sci-Fi...In a postapocalyptic world, rag-doll robots hi...4s5Movie21Robert LuketicJim Sturgess, Kevin Spacey, Kate Bosworth, Aar...United StatesJanuary 1, 20202008PG-13123 minDramasA brilliant group of students become card-coun...
data.groupby('type').count()

show_idtitledirectorcastcountrydate_addedrelease_yearratingdurationlisted_indescriptiontype------------------------------------Movie53775377521449515147537753775372537753775377TV Show2410241018421182133240024102408241024102410
data.isnull().sum()

show_id            0
type               0
title              0
director        2389
cast             718
country          507
date_added        10
release_year       0
rating             7
duration           0
listed_in          0
description        0
dtype: int64

data.shape

(7787, 12)

&#x5220;&#x9664;&#x7A7A;&#x503C; data = data.dropna(subset=['cast', 'country', 'rating']) data.shape

(6652, 12)

4 使用cast, director, country, rating 和 genres开发推荐系统
使用演员，导演，国家/地区，评分和类型开发推荐系统
movies = data[data['type'] == 'Movie'].reset_index() movies = movies.drop(['index', 'show_id', 'type', 'date_added', 'release_year', 'duration', 'description'], axis=1) movies.head()

titledirectorcastcountryratinglisted_in07:19Jorge Michel GrauDemián Bichir, Héctor Bonilla, Oscar Serrano, ...MexicoTV-MADramas, International Movies123:59Gilbert ChanTedd Chan, Stella Chung, Henley Hii, Lawrence ...SingaporeRHorror Movies, International Movies29Shane AckerElijah Wood, John C. Reilly, Jennifer Connelly...United StatesPG-13Action & Adventure, Independent Movies, Sci-Fi...321Robert LuketicJim Sturgess, Kevin Spacey, Kate Bosworth, Aar...United StatesPG-13Dramas4122Yasir Al YasiriAmina Khalil, Ahmed Dawood, Tarek Lotfy, Ahmed...EgyptTV-MAHorror Movies, International Movies
tv = data[data['type'] == 'TV Show'].reset_index() tv = tv.drop(['index', 'show_id', 'type', 'date_added', 'release_year', 'duration', 'description'], axis=1) tv.head()

titledirectorcastcountryratinglisted_in03%NaNJoão Miguel, Bianca Comparato, Michel Gomes, R...BrazilTV-MAInternational TV Shows, TV Dramas, TV Sci-Fi &...146Serdar AkarErdal Beşikçioğlu, Yasemin Allen, Melis Birkan...TurkeyTV-MAInternational TV Shows, TV Dramas, TV Mysteries21983NaNRobert Więckiewicz, Maciej Musiał, Michalina O...Poland, United StatesTV-MACrime TV Shows, International TV Shows, TV Dramas3SAINT SEIYA: Knights of the ZodiacNaNBryson Baugus, Emily Neves, Blake Shepard, Pat...JapanTV-14Anime Series, International TV Shows4#blackAFNaNKenya Barris, Rashida Jones, Iman Benson, Genn...United StatesTV-MATV Comedies

获取演员列表
独热编码

# 首先获取所有的演员列表actors = []for i in movies['cast']:actor = re.split(r', \s*', i)actors.append(actor)
</code></pre>
<p>flat_list = []
for sublist in actors:
for item in sublist:
flat_list.append(item)</p>
<p>actors_list = sorted(set(flat_list))
len(actors_list)
22622

我们可以看到有一共有22622个演员
&#x6253;&#x5370;&#x524D;10&#x4E2A;&#x6F14;&#x5458; actors_list[:10]

['"Riley" Lakdhar Dridi',
 "'Najite Dede",
 '4Minute',
 '50 Cent',
 'A. Murat &#xD6;zgen',
 'A.C. Peterson',
 'A.J. Cook',
 'A.J. LoCascio',
 'A.K. Hangal',
 'A.R. Rahman']

binary_actors = [[0] * 0 for i in range(len(set(flat_list)))]
</code></pre>
<p>for i in tqdm(movies['cast']):
k = 0
遍历所有的演员
for j in actors_list:
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:f96f2856-3ded-4a7f-bbd7-3e22e4c6cc99<details><summary><em><font color='gray'>[En]</font></em></summary><em><font color='gray'>[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:2690bbe3-d18c-4c24-a018-eab4e5dc7ef0</font></em></details>
例如João Miguel存在于João Miguel, Bianca Comparato, Michel Gomes
那么João Miguel所在actors_list的位置设置为1
if j in i:
binary_actors[k].append(1.0)
else:
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:1a19f2bb-4ce1-4c8e-a799-d575912fd10a<details><summary><em><font color='gray'>[En]</font></em></summary><em><font color='gray'>[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:b72dbfd1-809e-4e03-acc3-4caf64c3c7f1</font></em></details>
binary_actors[k].append(0.0)
k+=1</p>
<p>binary_actors = pd.DataFrame(binary_actors).transpose()
binary_actors
100%|&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;| 4761/4761 [00:56<00:00, 84.33it s]< code></00:00,>

0123456789...2261222613226142261522616226172261822619226202262100.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.010.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.020.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.030.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.040.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0..................................................................47560.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.047570.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.047580.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.047590.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.047600.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
4761 rows × 22622 columns
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:f80ba58e-ccc3-422f-a9d2-2c95f2e029c5

[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:bd40d4b5-894e-4d44-8199-168c00e67e40

获取导演列表
独热编码

directors = []
</code></pre>
<p>for i in movies['director']:
if pd.notna(i):
director = re.split(r', \s*', i)
directors.append(director)</p>
<p>flat_list2 = []
for sublist in directors:
for item in sublist:
flat_list2.append(item)</p>
<p>directors_list = sorted(set(flat_list2))</p>
<p>binary_directors = [[0] * 0 for i in range(len(set(flat_list2)))]</p>
<p>for i in tqdm(movies['director']):
k = 0
for j in directors_list:
if pd.isna(i):
binary_directors[k].append(0.0)
elif j in i:
binary_directors[k].append(1.0)
else:
binary_directors[k].append(0.0)
k+=1</p>
<p>binary_directors = pd.DataFrame(binary_directors).transpose()
binary_directors.head()
100%|&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;| 4761/4761 [00:14<00:00, 337.39it s]< code></00:00,>

0123456789...382338243825382638273828382938303831383200.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.010.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.020.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.030.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.040.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
5 rows × 3833 columns

获取导演列表
独热编码

countries = []
</code></pre>
<p>for i in movies['country']:
country = re.split(r', \s*', i)
countries.append(country)</p>
<p>flat_list3 = []
for sublist in countries:
for item in sublist:
flat_list3.append(item)</p>
<p>countries_list = sorted(set(flat_list3))</p>
<p>binary_countries = [[0] * 0 for i in range(len(set(flat_list3)))]</p>
<p>for i in tqdm(movies['country']):
k = 0
for j in countries_list:
if j in i:
binary_countries[k].append(1.0)
else:
binary_countries[k].append(0.0)
k+=1</p>
<p>binary_countries = pd.DataFrame(binary_countries).transpose()
binary_countries.head()
100%|&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;| 4761/4761 [00:00<00:00, 35151.57it s]< code></00:00,>

0123456789...959697989910010110210310400.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.010.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.020.00.00.00.00.00.00.00.00.00.0...0.00.00.01.00.00.00.00.00.00.030.00.00.00.00.00.00.00.00.00.0...0.00.00.01.00.00.00.00.00.00.040.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
5 rows × 105 columns

获取题材列表
独热编码

genres = []
</code></pre>
<p>for i in movies['listed_in']:
genre = re.split(r', \s*', i)
genres.append(genre)</p>
<p>genres_list = sorted(set(flat_list4))</p>
<p>binary_genres = [[0] * 0 for i in range(len(set(flat_list4)))]</p>
<p>for i in tqdm(movies['listed_in']):
k = 0
for j in genres_list:
if j in i:
binary_genres[k].append(1.0)
else:
binary_genres[k].append(0.0)
k+=1</p>
<p>binary_genres = pd.DataFrame(binary_genres).transpose()
binary_genres.head()
100%|&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;| 4761/4761 [00:00<00:00, 198223.96it s]< code></00:00,>

01234567891011121314151617181900.00.00.00.00.00.00.01.00.00.00.01.00.01.00.00.00.00.00.00.010.00.00.00.00.00.00.00.00.01.00.01.00.01.00.00.00.00.00.00.021.00.00.00.00.00.00.00.00.00.01.00.00.01.00.00.01.00.00.00.030.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.040.00.00.00.00.00.00.00.00.01.00.01.00.01.00.00.00.00.00.00.0

获取评分列表
独热编码

ratings = []
</code></pre>
<p>for i in movies['rating']:
ratings.append(i)</p>
<p>ratings_list = sorted(set(ratings))</p>
<p>binary_ratings = [[0] * 0 for i in range(len(set(ratings_list)))]</p>
<p>for i in tqdm(movies['rating']):
k = 0
for j in ratings_list:
if j in i:
binary_ratings[k].append(1.0)
else:
binary_ratings[k].append(0.0)
k+=1</p>
<p>binary_ratings = pd.DataFrame(binary_ratings).transpose()
binary_ratings
100%|&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;| 4761/4761 [00:00<00:00, 294134.44it s]< code></00:00,>

01234567891011121300.00.00.00.00.00.00.00.01.00.00.00.00.00.010.00.00.00.00.01.00.00.00.00.00.00.00.00.021.00.00.01.01.00.00.00.00.00.00.00.00.00.031.00.00.01.01.00.00.00.00.00.00.00.00.00.040.00.00.00.00.00.00.00.01.00.00.00.00.00.0.............................................47560.00.00.00.00.01.00.00.00.00.00.00.00.00.047570.00.00.00.00.00.00.00.01.00.00.00.00.00.047581.00.00.01.00.00.00.00.00.00.00.00.00.00.047590.00.00.00.00.00.00.00.01.00.00.00.00.00.047600.00.00.00.00.00.01.00.00.00.00.00.00.00.0
4761 rows × 14 columns
最后我们将5个特征向量进行拼接在一起
binary = pd.concat([binary_actors, binary_directors, binary_countries, binary_genres], axis=1,ignore_index=True) binary

0123456789...2657026571265722657326574265752657626577265782657900.00.00.00.00.00.00.00.00.00.0...0.01.00.01.00.00.00.00.00.00.010.00.00.00.00.00.00.00.00.00.0...0.01.00.01.00.00.00.00.00.00.020.00.00.00.00.00.00.00.00.00.0...1.00.00.01.00.00.01.00.00.00.030.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.040.00.00.00.00.00.00.00.00.00.0...0.01.00.01.00.00.00.00.00.00.0..................................................................47560.00.00.00.00.00.00.00.00.00.0...0.00.00.01.00.00.00.00.00.00.047570.00.00.00.00.00.00.00.00.00.0...1.01.00.01.00.00.00.00.00.00.047580.00.00.00.00.00.00.00.00.00.0...0.00.00.01.00.00.00.00.00.00.047590.00.00.00.00.00.00.00.00.00.0...0.01.00.01.00.00.00.00.00.00.047600.00.00.00.00.00.00.00.00.00.0...0.01.00.01.01.00.00.00.00.00.0
4761 rows × 26580 columns
以上为电影所有特征向量的独热编码获取思路，接下来我们对电视节目tv也做同样的操作
actors2 = []
</code></pre>
<p>for i in tv['cast']:
actor2 = re.split(r', \s*', i)
actors2.append(actor2)</p>
<p>flat_list5 = []
for sublist in actors2:
for item in sublist:
flat_list5.append(item)</p>
<p>actors_list2 = sorted(set(flat_list5))</p>
<p>binary_actors2 = [[0] * 0 for i in range(len(set(flat_list5)))]</p>
<p>for i in tv['cast']:
k = 0
for j in actors_list2:
if j in i:
binary_actors2[k].append(1.0)
else:
binary_actors2[k].append(0.0)
k+=1</p>
<p>binary_actors2 = pd.DataFrame(binary_actors2).transpose()</p>
<p>countries2 = []</p>
<p>for i in tv['country']:
country2 = re.split(r', \s*', i)
countries2.append(country2)</p>
<p>flat_list6 = []
for sublist in countries2:
for item in sublist:
flat_list6.append(item)</p>
<p>countries_list2 = sorted(set(flat_list6))</p>
<p>binary_countries2 = [[0] * 0 for i in range(len(set(flat_list6)))]</p>
<p>for i in tv['country']:
k = 0
for j in countries_list2:
if j in i:
binary_countries2[k].append(1.0)
else:
binary_countries2[k].append(0.0)
k+=1</p>
<p>binary_countries2 = pd.DataFrame(binary_countries2).transpose()</p>
<p>genres2 = []</p>
<p>for i in tv['listed_in']:
genre2 = re.split(r', \s*', i)
genres2.append(genre2)</p>
<p>flat_list7 = []
for sublist in genres2:
for item in sublist:
flat_list7.append(item)</p>
<p>genres_list2 = sorted(set(flat_list7))</p>
<p>binary_genres2 = [[0] * 0 for i in range(len(set(flat_list7)))]</p>
<p>for i in tv['listed_in']:
k = 0
for j in genres_list2:
if j in i:
binary_genres2[k].append(1.0)
else:
binary_genres2[k].append(0.0)
k+=1</p>
<p>binary_genres2 = pd.DataFrame(binary_genres2).transpose()</p>
<p>ratings2 = []</p>
<p>for i in tv['rating']:
ratings2.append(i)</p>
<p>ratings_list2 = sorted(set(ratings2))</p>
<p>binary_ratings2 = [[0] * 0 for i in range(len(set(ratings_list2)))]</p>
<p>for i in tv['rating']:
k = 0
for j in ratings_list2:
if j in i:
binary_ratings2[k].append(1.0)
else:
binary_ratings2[k].append(0.0)
k+=1</p>
<p>binary_ratings2 = pd.DataFrame(binary_ratings2).transpose()
binary2 = pd.concat([binary_actors2, binary_countries2, binary_genres2], axis=1, ignore_index=True)
binary2
0123456789...1274112742127431274412745127461274712748127491275000.00.00.00.00.00.00.00.00.00.0...0.00.00.01.00.00.01.01.00.00.010.00.00.00.00.00.00.00.00.00.0...0.00.00.01.00.01.00.01.00.00.020.00.00.00.00.00.00.00.00.00.0...0.00.00.01.00.00.00.01.00.00.030.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.01.00.00.040.00.00.00.00.00.00.00.00.00.0...0.00.01.00.00.00.00.00.00.00.0..................................................................18860.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.018870.00.00.00.00.00.00.00.00.00.0...0.00.00.01.00.00.00.01.00.00.018880.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.018890.00.00.00.00.00.00.00.00.00.0...1.00.00.00.00.00.00.01.00.00.018900.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.01.00.00.0
1891 rows × 12751 columns
def recommender(search):cs_list = [] # 存放余弦相似度结果binary_list = []
</code></pre>
<pre><code>判断搜索的title是电影还是电视节目
if search in movies['title'].values:
    # 获取查询作品的特征向量
    idx = movies[movies['title'] == search].index.item()
    for i in binary.iloc[idx]:
        binary_list.append(i)
    point1 = np.array(binary_list).reshape(1, -1)
    point1 = [val for sublist in point1 for val in sublist]
    # 获取所有候选集作品的特征向量
    for j in tqdm(range(len(movies)),desc="searching"):
        binary_list2 = []
        for k in binary.iloc[j]:
            binary_list2.append(k)
        point2 = np.array(binary_list2).reshape(1, -1)
        point2 = [val for sublist in point2 for val in sublist]
        # 计算查询作品特征向量与当前候选作品特征向量的余弦相似度
        dot_product = np.dot(point1, point2)
        norm_1 = np.linalg.norm(point1)
        norm_2 = np.linalg.norm(point2)
        cos_sim = dot_product / (norm_1 * norm_2)
        cs_list.append(cos_sim)
    movies_copy = movies.copy()
    movies_copy['cos_sim'] = cs_list
    # 按照cos_sim从大到小进行排序
    results = movies_copy.sort_values('cos_sim', ascending=False)
    results = results[results['title'] != search]
    # 返回相似度前5的结果
    top_results = results.head(5)
    return(top_results)
elif search in tv['title'].values:
    idx = tv[tv['title'] == search].index.item()
    for i in binary2.iloc[idx]:
        binary_list.append(i)
    point1 = np.array(binary_list).reshape(1, -1)
    point1 = [val for sublist in point1 for val in sublist]
    for j in range(len(tv)):
        binary_list2 = []
        for k in binary2.iloc[j]:
            binary_list2.append(k)
        point2 = np.array(binary_list2).reshape(1, -1)
        point2 = [val for sublist in point2 for val in sublist]
        dot_product = np.dot(point1, point2)
        norm_1 = np.linalg.norm(point1)
        norm_2 = np.linalg.norm(point2)
        cos_sim = dot_product / (norm_1 * norm_2)
        cs_list.append(cos_sim)
    tv_copy = tv.copy()
    tv_copy['cos_sim'] = cs_list
    results = tv_copy.sort_values('cos_sim', ascending=False)
    results = results[results['title'] != search]
    top_results = results.head(5)
    return(top_results)
else:
    return("Title not in dataset. Please check spelling.")

recommender('The Conjuring')

searching: 100%|&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;| 4761/4761 [10:52<00:00, 7.30it s]< code></00:00,>

titledirectorcastcountryratinglisted_incos_sim1868InsidiousJames WanPatrick Wilson, Rose Byrne, Lin Shaye, Ty Simp...United States, Canada, United KingdomPG-13Horror Movies, Thrillers0.388922968CreepPatrick BriceMark Duplass, Patrick BriceUnited StatesRHorror Movies, Independent Movies, Thrillers0.3779641844In the Tall GrassVincenzo NataliPatrick Wilson, Laysla De Oliveira, Avery Whit...Canada, United StatesTV-MAHorror Movies, Thrillers0.370625969Creep 2Patrick BriceMark Duplass, Desiree Akhavan, Karan SoniUnited StatesTV-MAHorror Movies, Independent Movies, Thrillers0.3563481077DesolationSam PattonJaimi Paige, Alyshia Ochse, Toby Nichols, Clau...United StatesTV-MAHorror Movies, Thrillers0.356348
recommender("Dr. Seuss' The Cat in the Hat")

searching: 100%|&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;| 4761/4761 [10:51<00:00, 7.31it s]< code></00:00,>

titledirectorcastcountryratinglisted_incos_sim2798NOVA: Bird BrainNaNCraig SechlerUnited StatesTV-GChildren & Family Movies, Documentaries0.3721043624Sugar HighAriel BolesHunter MarchUnited StatesTV-GChildren & Family Movies0.3721044758ZoomPeter HewittTim Allen, Courteney Cox, Chevy Chase, Kate Ma...United StatesPGChildren & Family Movies, Comedies0.3706254624What a Girl WantsDennie GordonAmanda Bynes, Colin Firth, Kelly Preston, Eile...United States, United KingdomPGChildren & Family Movies, Comedies0.3706253066Prince of Peoria: A Christmas Moose MiracleJon RosenbaumGavin Lewis, Theodore Barnes, Shelby Simmons, ...United StatesTV-GChildren & Family Movies, Comedies0.369800
recommender('After Life')

5.使用电影/电视节目描述开发推荐引擎
movies_des = data[data['type'] == 'Movie'].reset_index() movies_des = movies_des[['title', 'description']] movies_des.head()

titledescription07:19After a devastating earthquake hits Mexico Cit...123:59When an army recruit is found dead, his fellow...29In a postapocalyptic world, rag-doll robots hi...321A brilliant group of students become card-coun...4122After an awful accident, a couple admitted to ...
tv_des = data[data['type'] == 'TV Show'].reset_index() tv_des = tv_des[['title', 'description']] tv_des.head()

titledescription03%In a future where the elite inhabit an island ...146A genetics professor experiments with a treatm...21983In this dark alt-history thriller, a naïve law...3SAINT SEIYA: Knights of the ZodiacSeiya and the Knights of the Zodiac rise again...4#blackAFKenya Barris and his family navigate relations...
stopwords=['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', 'couldn', 'didn', 'doesn', 'hadn', 'hasn', 'haven', 'isn', 'ma', 'mightn', 'mustn', 'needn', 'shan', 'shouldn', 'wasn', 'weren', 'won', 'wouldn']

def word_tokenize(text): return [w.lower() for w in text.split()]

filtered_movies = []movies_words = []
</code></pre>
<p>for text in movies_des['description']:
text_tokens = word_tokenize(text)
tokens_without_sw = [word.lower() for word in text_tokens if not word in stopwords]
movies_words.append(tokens_without_sw)
filtered = (" ").join(tokens_without_sw)
filtered_movies.append(filtered)</p>
<p>movies_words = [val for sublist in movies_words for val in sublist]
movies_words = sorted(set(movies_words))
movies_des['description_filtered'] = filtered_movies
movies_des.head()
titledescriptiondescription_filtered07:19After a devastating earthquake hits Mexico Cit...devastating earthquake hits mexico city, trapp...123:59When an army recruit is found dead, his fellow...army recruit found dead, fellow soldiers force...29In a postapocalyptic world, rag-doll robots hi...postapocalyptic world, rag-doll robots hide fe...321A brilliant group of students become card-coun...brilliant group students become card-counting ...4122After an awful accident, a couple admitted to ...awful accident, couple admitted grisly hospita...
filtered_tv = []tv_words = []for text in tv_des['description']:text_tokens = word_tokenize(text)tokens_without_sw = [word.lower() for word in text_tokens if not word in stopwords]tv_words.append(tokens_without_sw)filtered = (" ").join(tokens_without_sw)filtered_tv.append(filtered)
</code></pre>
<p>tv_words = [val for sublist in tv_words for val in sublist]
tv_words = sorted(set(tv_words))
tv_des['description_filtered'] = filtered_tv
tv_des.head()
titledescriptiondescription_filtered03%In a future where the elite inhabit an island ...future elite inhabit island paradise far crowd...146A genetics professor experiments with a treatm...genetics professor experiments treatment comat...21983In this dark alt-history thriller, a naïve law...dark alt-history thriller, naïve law student w...3SAINT SEIYA: Knights of the ZodiacSeiya and the Knights of the Zodiac rise again...seiya knights zodiac rise protect reincarnatio...4#blackAFKenya Barris and his family navigate relations...kenya barris family navigate relationships, ra...
movie_word_binary = [[0] * 0 for i in range(len(set(movies_words)))]
</code></pre>
<p>for des in movies_des['description_filtered']:
k = 0
for word in movies_words:
if word in des:
movie_word_binary[k].append(1.0)
else:
movie_word_binary[k].append(0.0)
k+=1</p>
<p>movie_word_binary = pd.DataFrame(movie_word_binary).transpose()
tv_word_binary = [[0] * 0 for i in range(len(set(tv_words)))]
</code></pre>
<p>for des in tv_des['description_filtered']:
k = 0
for word in tv_words:
if word in des:
tv_word_binary[k].append(1.0)
else:
tv_word_binary[k].append(0.0)
k+=1</p>
<p>tv_word_binary = pd.DataFrame(tv_word_binary).transpose()
def recommender2(search): cs_list = [] binary_list = [] if search in movies_des['title'].values: idx = movies_des[movies_des['title'] == search].index.item() for i in movie_word_binary.iloc[idx]: binary_list.append(i) point1 = np.array(binary_list).reshape(1, -1) point1 = [val for sublist in point1 for val in sublist] for j in tqdm(range(len(movies_des))): binary_list2 = [] for k in movie_word_binary.iloc[j]: binary_list2.append(k) point2 = np.array(binary_list2).reshape(1, -1) point2 = [val for sublist in point2 for val in sublist] dot_product = np.dot(point1, point2) norm_1 = np.linalg.norm(point1) norm_2 = np.linalg.norm(point2) cos_sim = dot_product / (norm_1 * norm_2) cs_list.append(cos_sim) movies_copy = movies_des.copy() movies_copy['cos_sim'] = cs_list results = movies_copy.sort_values('cos_sim', ascending=False) results = results[results['title'] != search] top_results = results.head(5) return(top_results) elif search in tv_des['title'].values: idx = tv_des[tv_des['title'] == search].index.item() for i in tv_word_binary.iloc[idx]: binary_list.append(i) point1 = np.array(binary_list).reshape(1, -1) point1 = [val for sublist in point1 for val in sublist] for j in tqdm(range(len(tv))): binary_list2 = [] for k in tv_word_binary.iloc[j]: binary_list2.append(k) point2 = np.array(binary_list2).reshape(1, -1) point2 = [val for sublist in point2 for val in sublist] dot_product = np.dot(point1, point2) norm_1 = np.linalg.norm(point1) norm_2 = np.linalg.norm(point2) cos_sim = dot_product / (norm_1 * norm_2) cs_list.append(cos_sim) tv_copy = tv_des.copy() tv_copy['cos_sim'] = cs_list results = tv_copy.sort_values('cos_sim', ascending=False) results = results[results['title'] != search] top_results = results.head(5) return(top_results) else: return("Title not in dataset. Please check spelling.")

pd.options.display.max_colwidth = 300 recommender2('The Conjuring')

100%|&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;| 4761/4761 [06:03<00:00, 13.11it s]< code></00:00,>

titledescriptiondescription_filteredcos_sim2549MiraiUnhappy after his new baby sister displaces him, four-year-old Kun begins meeting people and pets from his family's history in their unique house.unhappy new baby sister displaces him, four-year-old kun begins meeting people pets family's history unique house.0.4264011632Hard LessonsThis drama based on real-life events tells the story of George McKenna, the tough, determined new principal of a notorious Los Angeles high school.drama based real-life events tells story george mckenna, tough, determined new principal notorious los angeles high school.0.3762562372Macchli Jal Ki Rani HaiAfter relocating to a different town with her husband, a housewife begins to sense the existence of a mysterious presence in their new house.relocating different town husband, housewife begins sense existence mysterious presence new house.0.3754673910The Eyes of My MotherAt the remote farmhouse where she once witnessed a traumatic childhood event, a young woman develops a grisly fascination with violence.remote farmhouse witnessed traumatic childhood event, young woman develops grisly fascination violence.0.371312227AdrishyaA family's harmonious existence is interrupted when the young son begins showing symptoms of anxiety that seem linked to disturbing events at home.family's harmonious existence interrupted young son begins showing symptoms anxiety seem linked disturbing events home.0.367423
recommender2('After Life')

100%|&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;| 1891/1891 [01:32<00:00, 20.46it s]< code></00:00,>

titledescriptiondescription_filteredcos_sim1628The PaperA construction magnate takes over a struggling newspaper and attempts to wield editorial influence for power and personal gain.construction magnate takes struggling newspaper attempts wield editorial influence power personal gain.0.3513511848Winter SunYears after ruthless businessmen kill his father and order the death of his twin brother, a modest fisherman adopts a new persona to exact revenge.years ruthless businessmen kill father order death twin brother, modest fisherman adopts new persona exact revenge.0.3117411768Under the Black MoonlightA college art club welcomes a new member who has the secret ability to smell death and who warns one of them to leave her boyfriend ... or else.college art club welcomes new member secret ability smell death warns one leave boyfriend ... else.0.2778851180Private PracticeAt Oceanside Wellness Center, Dr. Addison Montgomery deals with competing personalities in the new world of holistic medicine.oceanside wellness center, dr. addison montgomery deals competing personalities new world holistic medicine.0.2757771271Santa Clarita DietThey're ordinary husband and wife realtors until she undergoes a dramatic change that sends them down a road of death and destruction. In a good way.they're ordinary husband wife realtors undergoes dramatic change sends road death destruction. good way.0.256748
我们这个教程的主要目的是基于Graph 节点的Adamic Adar指标来推荐相似电影。如果Adamic Adar指标越高，就代表两个节点越相近。
Adamic/Adar (Frequency-Weighted Common Neighbors)
Adamic-Adar 简称AA，该指标根据共同邻居的节点的度给每个节点赋予一个权重值，即为每个节点的度的对数分之一。然后把节点对的所有共同邻居的权重值相加，其和作为该节点对的相似度值。
这个方法同样是对Common Neighbors的改进，当我们计算两个相同邻居的数量的时候，其实每个邻居的"重要程度"都是不一样的，我们认为这个邻居的邻居数量越少，就越凸显它作为"中间人"的重要性，毕竟一共只认识那么少人，却恰好是x，y的好朋友。
例如：

x,y是两个节点(在这个例子中就是两个电影)
N(one_node)是返回某个节点的相邻节点集合大小的函数，比如x有相邻节点a,b,c那么这个函数就返回3

这个公式的含义就是，比如对于节点x和y，遍历x和y的每一个共同节点u，然后将他们所有的 1/log(N(u))相加
的大小决定了节点u的重要性:

如果x和y共享节点u，并且节点u有大量的邻居节点，说明这个节点u越不重要或者越不相关：N(u)值越大，1/log((u))就越小
如果x和y共享节点u，并且节点u只有很少的的邻居节点，说明这个节点u越重要或者越相关：N(u)值越小，1/log((u))就越大

这个可以理解我向我们生活中，如果同学A和同学B是通过同学C认识的，而同学C的社交关系很简单或者周围人很少，说明C是能够将A和B强关联的人物
方法1 将文本的TF-IDF权重作为Kmeans进行无监督聚类
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:0a754023-bb0c-43f2-85b3-298ce0ddbf18

[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:7d297ada-dec2-4948-80ac-5e31fc079034
方法2 构建电影的TF-IDF向量表示矩阵
通过获取每一个电影的tfidf向量表示，然后基于余弦相似度获取相似性最高的top5个其他电影，然后创建一个相似节点簇，然后通过Adamin Adar评估该簇
&#x5BFC;&#x5165;&#x5305; import networkx as nx # &#x6784;&#x5EFA;Graph import matplotlib.pyplot as plt import pandas as pd import numpy as np import math as math import time plt.style.use('seaborn') plt.rcParams['figure.figsize'] = [14,14]

# 加载数据df = pd.read_csv('/home/kesci/input/netflix8714/netflix_titles.csv')
</code></pre>
<p>df["date_added"] = pd.to_datetime(df['date_added'])
df['year'] = df['date_added'].dt.year # 获取年份
df['month'] = df['date_added'].dt.month # 获取月份
df['day'] = df['date_added'].dt.day # 获取天
df.head()
show_idtypetitledirectorcastcountrydate_addedrelease_yearratingdurationlisted_indescriptionyearmonthday0s1TV Show3%NaNJoão Miguel, Bianca Comparato, Michel Gomes, R...Brazil2020-08-142020TV-MA4 SeasonsInternational TV Shows, TV Dramas, TV Sci-Fi &...In a future where the elite inhabit an island ...2020.08.014.01s2Movie7:19Jorge Michel GrauDemián Bichir, Héctor Bonilla, Oscar Serrano, ...Mexico2016-12-232016TV-MA93 minDramas, International MoviesAfter a devastating earthquake hits Mexico Cit...2016.012.023.02s3Movie23:59Gilbert ChanTedd Chan, Stella Chung, Henley Hii, Lawrence ...Singapore2018-12-202011R78 minHorror Movies, International MoviesWhen an army recruit is found dead, his fellow...2018.012.020.03s4Movie9Shane AckerElijah Wood, John C. Reilly, Jennifer Connelly...United States2017-11-162009PG-1380 minAction & Adventure, Independent Movies, Sci-Fi...In a postapocalyptic world, rag-doll robots hi...2017.011.016.04s5Movie21Robert LuketicJim Sturgess, Kevin Spacey, Kate Bosworth, Aar...United States2020-01-012008PG-13123 minDramasA brilliant group of students become card-coun...2020.01.01.0
通过上表输出我们可以已经获取了每个作品的year,month，day
# 导演列表director，标签列表listed_in，演员列表cast和国家country这些列包含一组值，我们可以按照逗号，进行分割，后去列表值
</code></pre>
<p>df['directors'] = df['director'].apply(lambda l: [] if pd.isna(l) else [i.strip() for i in l.split(",")])
df['categories'] = df['listed_in'].apply(lambda l: [] if pd.isna(l) else [i.strip() for i in l.split(",")])
df['actors'] = df['cast'].apply(lambda l: [] if pd.isna(l) else [i.strip() for i in l.split(",")])
df['countries'] = df['country'].apply(lambda l: [] if pd.isna(l) else [i.strip() for i in l.split(",")])</p>
<p>df.head(3)
show_idtypetitledirectorcastcountrydate_addedrelease_yearratingdurationlisted_indescriptionyearmonthdaydirectorscategoriesactorscountries0s1TV Show3%NaNJoão Miguel, Bianca Comparato, Michel Gomes, R...Brazil2020-08-142020TV-MA4 SeasonsInternational TV Shows, TV Dramas, TV Sci-Fi &...In a future where the elite inhabit an island ...2020.08.014.0[][International TV Shows, TV Dramas, TV Sci-Fi ...[João Miguel, Bianca Comparato, Michel Gomes, ...[Brazil]1s2Movie7:19Jorge Michel GrauDemián Bichir, Héctor Bonilla, Oscar Serrano, ...Mexico2016-12-232016TV-MA93 minDramas, International MoviesAfter a devastating earthquake hits Mexico Cit...2016.012.023.0[Jorge Michel Grau][Dramas, International Movies][Demián Bichir, Héctor Bonilla, Oscar Serrano,...[Mexico]2s3Movie23:59Gilbert ChanTedd Chan, Stella Chung, Henley Hii, Lawrence ...Singapore2018-12-202011R78 minHorror Movies, International MoviesWhen an army recruit is found dead, his fellow...2018.012.020.0[Gilbert Chan][Horror Movies, International Movies][Tedd Chan, Stella Chung, Henley Hii, Lawrence...[Singapore]
我们可以看到listed_in中International TV Shows, TV Dramas, TV Sci-Fi转为[International TV Shows, TV Dramas, TV Sci-Fi ]，其他几列也是
print(df.shape)

(7787, 19)

from sklearn.feature_extraction.text import TfidfVectorizer # 构建TFIDF向量from sklearn.metrics.pairwise import linear_kernelfrom sklearn.cluster import MiniBatchKMeans # Kmeans算法
</code></pre>
<p>start_time = time.time()
text_content = df['description']
vector = TfidfVectorizer(max_df=0.4, # 去除文本频率大约0.4的词
min_df=1, # 词语最小出现次数
stop_words='english', # 去除停用词
lowercase=True, # 将大写字母转为小写
use_idf=True, # 使用idf
norm=u'l2', # 正则化
smooth_idf=True # 平滑因子，避免idf为0
)
tfidf = vector.fit_transform(text_content)</p>
<p>k = 200# 聚类中心个数
kmeans = MiniBatchKMeans(n_clusters = k)
kmeans.fit(tfidf)
centers = kmeans.cluster_centers_.argsort()[:,::-1]
terms = vector.get_feature_names()</p>
<p>request_transform = vector.transform(df['description'])</p>
<p>df['cluster'] = kmeans.predict(request_transform)</p>
<p>df['cluster'].value_counts().head()
19     7179
39      333
182       6
1         5
144       5
Name: cluster, dtype: int64

我们可以看到聚类标签很不均衡，19有7179，39 有333个，所以我们不能基于聚类标签cluster来做节点创建了。
&#x8F93;&#x5165;&#x76EE;&#x6807;&#x7535;&#x5F71;&#x63CF;&#x8FF0;&#xFF0C;&#x67E5;&#x627E;&#x6700;&#x76F8;&#x4F3C;&#x7684;topn&#x4E2A;&#x7535;&#x5F71; def find_similar(tfidf_matrix, index, top_n = 5): cosine_similarities = linear_kernel(tfidf_matrix[index:index+1], tfidf_matrix).flatten() related_docs_indices = [i for i in cosine_similarities.argsort()[::-1] if i != index] return [index for index in related_docs_indices][0:top_n]

节点定义
节点包括如下 :

Movies：电影
Person ( actor or director) ：人物
Categorie：勒边
Countries：国家
Cluster (description)：描述
Sim(title) top 5 similar movies in the sense of the description:相似电影电影

边定义
关系包括如下 :

ACTED_IN：演员和电影之间的关系
CAT_IN：类别和电影之间的关系
DIRECTED：导演与电影之间的关系
COU_IN：国家与电影之间的关系
DESCRIPTION：聚类标签和电影之间的关系
SIMILARITY：在描述意义上相似的关系

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:5e324b5c-a3fa-4fe1-a098-b828ff619f89

[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:66996c63-9177-40ee-98ab-0313397efd18
G = nx.Graph(label="MOVIE")start_time = time.time()for i, rowi in df.iterrows():if (i%1000==0):print(" iter {} -- {} seconds --".format(i,time.time() - start_time))G.add_node(rowi['title'],key=rowi['show_id'],label="MOVIE",mtype=rowi['type'],rating=rowi['rating'])
</code></pre>
<pre><code>for element in rowi['actors']:
    # 创建“演员”节点”,类型为PERSON
    G.add_node(element,label="PERSON")
    # 创建作品与演员的关系：ACTED_IN
    G.add_edge(rowi['title'], element, label="ACTED_IN")
for element in rowi['categories']:
    # 创建“类别标签”节点“，类型为CAT
    G.add_node(element,label="CAT")
    # 创建作品与类别标签的关系：CAT_IN
    G.add_edge(rowi['title'], element, label="CAT_IN")
for element in rowi['directors']:
    # 创建“导演”节点，类别为PERSON
    G.add_node(element,label="PERSON")
    # 创建作品与导演的关系：DIRECTED
    G.add_edge(rowi['title'], element, label="DIRECTED")
for element in rowi['countries']:
    # 创建“国家”节点，类别为COU
    G.add_node(element,label="COU")
    # 创建作品与国家的关系：COU_IN
    G.add_edge(rowi['title'], element, label="COU_IN")
创建相似作品节点
indices = find_similar(tfidf, i, top_n = 5) # 取相似性最高的top5
snode="Sim("+rowi['title'][:15].strip()+")"
G.add_node(snode,label="SIMILAR")
G.add_edge(rowi['title'], snode, label="SIMILARITY")
for element in indices:
    G.add_edge(snode, df['title'].loc[element], label="SIMILARITY")
</code></pre>
<p>print(" finish -- {} seconds --".format(time.time() - start_time))
iter 0 -- 0.02708911895751953 seconds --
 iter 1000 -- 4.080239295959473 seconds --
 iter 2000 -- 8.126200675964355 seconds --
 iter 3000 -- 12.209706783294678 seconds --
 iter 4000 -- 16.362282037734985 seconds --
 iter 5000 -- 20.392311811447144 seconds --
 iter 6000 -- 24.43456506729126 seconds --
 iter 7000 -- 28.474121809005737 seconds --
 finish -- 31.648479461669922 seconds --

设置不同类型节点的颜色
def get_all_adj_nodes(list_in):sub_graph=set()for m in list_in:sub_graph.add(m)for e in G.neighbors(m):sub_graph.add(e)return list(sub_graph)def draw_sub_graph(sub_graph):subgraph = G.subgraph(sub_graph)colors=[]for e in subgraph.nodes():if G.nodes[e]['label']=="MOVIE":colors.append('blue')elif G.nodes[e]['label']=="PERSON":colors.append('red')elif G.nodes[e]['label']=="CAT":colors.append('green')elif G.nodes[e]['label']=="COU":colors.append('yellow')elif G.nodes[e]['label']=="SIMILAR":colors.append('orange')elif G.nodes[e]['label']=="CLUSTER":colors.append('orange')
</code></pre>
<pre><code>nx.draw(subgraph, with_labels=True, font_weight='bold',node_color=colors)
plt.show()

list_in=["Ocean's Twelve","Ocean's Thirteen"] sub_graph = get_all_adj_nodes(list_in) draw_sub_graph(sub_graph)

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:50092e68-183b-4c49-bc5f-f7926503eb5e

[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:6c7d3cc3-427e-48f0-8233-94610a1f54ca
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:0485dc81-1eda-4d0b-ac55-e19c47c09474

[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:8c420e38-c85d-42ef-86f9-43b306ae8a0b
* 计算 Adamic Adar度量→最终结果
def get_recommendation(root):commons_dict = {}for e in G.neighbors(root):for e2 in G.neighbors(e):if e2==root:continueif G.nodes[e2]['label']=="MOVIE":commons = commons_dict.get(e2)if commons==None:commons_dict.update({e2 : [e]})else:commons.append(e)commons_dict.update({e2 : commons})movies=[]weight=[]for key, values in commons_dict.items():w=0.0for e in values:w=w+1/math.log(G.degree(e))movies.append(key)weight.append(w)
</code></pre>
<pre><code>result = pd.Series(data=np.array(weight),index=movies)
result.sort_values(inplace=True,ascending=False)
return result;

result = get_recommendation("Ocean's Twelve") result2 = get_recommendation("Ocean's Thirteen") result3 = get_recommendation("The Devil Inside") result4 = get_recommendation("Stranger Things") print("*"*40+"\n Recommendation for 'Ocean's Twelve'\n"+"*"*40) print(result.head()) print("*"*40+"\n Recommendation for 'Ocean's Thirteen'\n"+"*"*40) print(result2.head()) print("*"*40+"\n Recommendation for 'Belmonte'\n"+"*"*40) print(result3.head()) print("*"*40+"\n Recommendation for 'Stranger Things'\n"+"*"*40) print(result4.head())

****************************************
 Recommendation for 'Ocean's Twelve'
****************************************
Ocean's Thirteen    7.033613
Ocean's Eleven      1.528732
The Informant!      1.252955
Babel               1.162454
Cannabis            1.116221
dtype: float64
****************************************
 Recommendation for 'Ocean's Thirteen'
****************************************
Ocean's Twelve       7.033613
The Departed         2.232071
Ocean's Eleven       2.086843
Brooklyn's Finest    1.467979
Boyka: Undisputed    1.391627
dtype: float64
****************************************
 Recommendation for 'Belmonte'
****************************************
The Boy                                  1.901648
The Devil and Father Amorth              1.413791
Making a Murderer                        1.239666
Belief: The Possession of Janet Moses    1.116221
I Am Vengeance                           1.116221
dtype: float64
****************************************
 Recommendation for 'Stranger Things'
****************************************
Beyond Stranger Things    12.047956
Rowdy Rathore              2.585399
Big Stone Gap              2.355888
Kicking and Screaming      1.566140
Prank Encounters           1.269862
dtype: float64

reco=list(result.index[:4].values) reco.extend(["Ocean's Twelve"]) sub_graph = get_all_adj_nodes(reco) draw_sub_graph(sub_graph)

reco=list(result4.index[:4].values) reco.extend(["Stranger Things"]) sub_graph = get_all_adj_nodes(reco) draw_sub_graph(sub_graph)


Original: https://blog.csdn.net/yanqianglifei/article/details/115861023
Author: 致Great
Title: 图神经网络07-从零构建一个电影推荐系统

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/560916/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

澳大利亚市场家电标准准入最新动态！

澳大利亚标准委员会与2022年6月24日发布了一系列新版标准。其中，AS/NZS 60335.1:2022 应在IEC 60335-1第六版对应的分标发布之后使用。此外，澳大利亚…

人工智能 2023年6月4日
0069
单变量logistics回归_回归分析时，单因素P<0.05，而多因素却没有意义，我该如何是好？…

点击学习全部医学统计学与SPSS教程 1.一道饕餮大餐来了！手把手教你如何科学地构建回归模型！ 2.一文汇总三大回归的基本应用条件、诊断与处理方法 3.回归分析时何时设置哑变量？如…

人工智能 2023年6月18日
0073
融营智能进军AI领域，成功挖掘NLP技术应用新场景

融营通信自2012年成立至今，已为近20000家在网企业提供SaaS+PaaS+AI三位一体化平台服务。十年来，公司深耕行业领域，不断满足用户在不同场景下的通讯需求，逐渐成长为集通…

人工智能 2023年5月28日
0061
普林斯顿陈丹琦组：以实体为问题中心，让稠密检索模型DPR光环暗淡

©PaperWeekly 原创 · 作者 |Maple小七学校 |北京邮电大学硕士生研究方向 |自然语言处理最近，以 DPR 为代表的稠密检索模型在开放域问答上取得了巨大的进…

人工智能 2023年6月1日
0093
TensorFlow2 实现神经风格迁移，DIY数字油画定制照片

TensorFlow2 实现神经风格迁移，DIY数字油画定制照片 * – 前言 – 神经风格迁移 – 使用VGG提取特征 – + 图…

人工智能 2023年5月23日
0085
（详细）爬虫可视化温州11年天气大作业

仅以本次爬虫大作业，纪念老师给的士力架以后应该再也没有士力架的课了吧本文编辑于2022年1月1日 0：33分可以作为参考目录1 关键技术介绍 3一、数据分析用到的关键技术 3二、…

人工智能 2023年7月17日
0057
图像处理：U-Net中的重叠-切片(Overlap-tile)

深蓝学院是专注于人工智能的在线教育平台，已有数万名伙伴在深蓝学院平台学习，很多都来自于国内外知名院校，比如清华、北大等。最开始接触 U-Net 的时候并不知道原作使用了 Over…

人工智能 2023年7月23日
0060
在Linux系统中安装Jupyter

1、安装Anaconda3 打开Anaconda官网，下载Anaconda对应的Linux文件，并将文件拖进Linux系统中。 cd到本文件夹下，在终端运行命令： bash Ana…

人工智能 2023年7月5日
0062
从感知机到神经网络

感知机 x1、x2 是输入信号，y 是输出信号，w1、w2 是权重，神经元会计算传送过来的信号的总和，只有当这个总和超过了某个界限值时，才会输出1。这也称为”神经元被激…

人工智能 2023年7月14日
0065
论文阅读_时序聚类K-Shape

K-Shape 高效且准确的时间序列的聚类方法基本信息论文题目：k-Shape: Efficient and Accurate Clustering of Time Serie…

人工智能 2023年5月31日
0093
SI24R1国产低功耗2.4GHz收发一体射频遥控工控答题卡方案芯片替代NRF24L01+

目录 SI24R1简介 * 芯片特性硬件设计参考 * 定频测试 2.4GHz射频芯片选型参考 SI24R1简介 Si24R1低功耗2.4GHz收发一体射频芯片量产于2012年，由…

人工智能 2023年6月28日
00114
MMdet的Resnet卷积替换成Ghost卷积组所出现的问题

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年6月18日
0078
点云数据格式说明

LiDAR点云数据 LiDAR，是Light Detection and Ranging即光探测和测距的缩写。常见点云文件格式如下： .las,.laz（LiDAR数据的工业标准…

人工智能 2023年6月23日
00133
使用恒源云服务器跑深度学习（使用pycharm_professional，FileZilla，xshell）

目录一、前言二、操作 1.创建账户 2.使用OSS将本地电脑的数据上传服务器 3.创建实例 4.将服务器数据上传我们的实例（1）开机（2）使用Xshell命令行远程登录（…

人工智能 2023年7月20日
0076
Web前端大作业、基于HTML+CSS+JavaScript响应式个人相册博客网站

🎉精彩专栏推荐👇🏻👇🏻👇🏻✍️ 作者简介: 一个热爱把逻辑思维转变为代码的技术博主💂 作者主页: 【主页——🚀获取更多优质源码】🎓 web前端期末大作业：【📚毕设项目精品实战案例…

人工智能 2023年6月27日
0089
MV3D代码笔记

MV3D代码笔记 data.py文件 train.py tracklet_labels.txt文件说一说心路历程一周前看到了论文pointfusion，但是在阅读代码的时候，发…

人工智能 2023年7月20日
0054

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

图神经网络07-从零构建一个电影推荐系统

1 简介

2 导入工具包

3 加载数据

4 使用cast, director, country, rating 和 genres开发推荐系统

5.使用电影/电视节目描述开发推荐引擎

方法1 将文本的TF-IDF权重作为Kmeans进行无监督聚类

方法2 构建电影的TF-IDF向量表示矩阵

节点定义

边定义

大家都在看