用TFIDF词袋模型进行新闻分类

2023年7月18日上午12:30 • 人工智能 • 阅读 82

词袋不关注词的先后顺序—词袋模型(bow–一元模型) bag of words
二元模型
n-gram

&#x521B;&#x5EFA;&#x8F93;&#x51FA;&#x76EE;&#x5F55;  &#x4FDD;&#x5B58;&#x8BAD;&#x7EC3;&#x597D;&#x7684;&#x6A21;&#x578B;
import os#&#x5BF9;&#x6587;&#x4EF6;&#x548C;&#x76EE;&#x5F55;&#x8FDB;&#x884C;&#x64CD;&#x4F5C;
output_dir = u'output'
if not os.path.exists(output_dir):
    os.mkdir(output_dir)

加载数据

import numpy as np#&#x4E00;&#x4E2A;&#x6570;&#x636E;&#x5206;&#x6790;&#x5904;&#x7406;&#x6570;&#x636E;&#x7684;&#x5E38;&#x89C1;&#x7684;&#x5E93;,&#x5B83;&#x63D0;&#x4F9B;&#x7684;&#x6570;&#x636E;&#x7ED3;&#x6784;&#x6BD4; Python &#x81EA;&#x8EAB;&#x7684;&#x66F4;&#x9AD8;&#x6548;
import pandas as pd

1.Pandas 是基于 NumPy 的一个开源 Python 库，它被广泛用于快速分析数据，以及数据清洗和准备等工作。它的名字来源是由” Panel data”（面板数据，一个计量经济学名词）两个单词拼成的。简单地说，你可以把 Pandas 看作是 Python 版的 Excel。
2. Pandas能很好地处理来自各种不同来源的数据，比如 Excel 表格、CSV 文件、SQL 数据库，甚至还能处理存储在网页上的数据。
3. Pandas基于Numpy，常常与Numpy、matplotlib一起使用。
4. Pandas库的两个主要数据结构：
Series：一维
DataFrame：多维

python list 列表保存的是对象的指针，比如 [0,1,2] 需要保存 3 个指针和 3 个整数的对象，这样就很浪费内存了。

Numpy 是储存在一个连续的内存块中，节约了计算资源。

&#x67E5;&#x770B;&#x8BAD;&#x7EC3;&#x6570;&#x636E;
train_data = pd.read_csv('sohu_train.txt', sep='\t', header=None, dtype=np.str_, encoding='utf8',error_bad_lines=False, delimiter="\t", names=[u'&#x9891;&#x9053;', u'&#x6587;&#x7AE0;'])
train_data.head()

&#x8F7D;&#x5165;&#x505C;&#x7528;&#x8BCD;
stopwords = set()
with open('stopwords.txt', 'r',encoding='utf8') as infile:
    for line in infile:
        line = line.rstrip('\n')
        if line:
            stopwords.add(line.lower())

计算每个文章的tfidf特征

import jieba
from sklearn.feature_extraction.text import TfidfVectorizer

min_df去掉df值小的词这样的词一般是非常专业的名词或者是生僻词是噪音
max_df 去掉df值很大的词这样词是常用词去掉不要

tfidf = TfidfVectorizer(tokenizer=jieba.lcut, stop_words=stopwords, min_df=50, max_df=0.3)#&#x4F7F;&#x7528;TfidfVectorizer&#x5B9E;&#x4F8B;&#x5316;
x = tfidf.fit_transform(train_data[u'&#x6587;&#x7AE0;'])

·输出结果

Building prefix dict from the default dictionary ...

Loading model from cache C:\Users\10248\AppData\Local\Temp\jieba.cache
Loading model cost 0.550 seconds.

Prefix dict has been built successfully.

E:\ANACODAN\lib\site-packages\sklearn\feature_extraction\text.py:388: UserWarning: Your stop_words may be inconsistent with your preprocessing. Tokenizing the stop words generated tokens ['&', ',', '.', ';', 'e', 'g', 'nbsp', '&#x2014;', '\u3000', '&#x50A5;', '&#x517C;', '&#x524D;', '&#x5537;', '&#x556A;', '&#x5577;', '&#x5594;', '&#x59CB;', '&#x6F2B;', '&#x7136;', '&#x7279;', '&#x7ADF;', '&#x82E5;&#x679C;', '&#x83AB;', '&#x89C1;', '&#x8BBE;', '&#x8BF4;', '&#x8FBE;', '&#x975E;'] not in stop_words.

  warnings.warn('Your stop_words may be inconsistent with '

print(u'&#x8BCD;&#x8868;&#x5927;&#x5C0F;: {}'.format(len(tfidf.vocabulary_)))

&#x8BCD;&#x8868;&#x5927;&#x5C0F;: 14516

训练分类器

编码目标变量因为咱们的标签是字符串 sklearn只接受数值

from sklearn.preprocessing import LabelEncoder#LabelEncoder&#xFF1A;&#x5C06;&#x7C7B;&#x522B;&#x6570;&#x636E;&#x6570;&#x5B57;&#x5316;
y_encoder = LabelEncoder()
y = y_encoder.fit_transform(train_data[u'&#x9891;&#x9053;'])#&#x5C06;&#x7C7B;&#x522B;&#x8F6C;&#x6362;&#x6210;0,1,2,3,4,5,6,7,8,9...

y[:10]

·输出结果

array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

编码X变量
x = tfidf.transform(train_data[u’文章’])

&#x5212;&#x5206;&#x8BAD;&#x7EC3;&#x6D4B;&#x8BD5;&#x6570;&#x636E;
from sklearn.model_selection import train_test_split#&#x5206;&#x5272;&#x6570;&#x636E;&#x96C6;
&#x6839;&#x636E;y&#x5206;&#x5C42;&#x62BD;&#x6837;&#xFF0C;&#x6D4B;&#x8BD5;&#x6570;&#x636E;&#x5360;20%
#&#x56E0;&#x4E3A;&#x73B0;&#x5728;&#x6570;&#x636E;&#x91CF;&#x5F88;&#x5927;  &#x6B64;&#x65F6;&#x91C7;&#x7528;&#x5BF9;&#x4E0B;&#x6807;&#x8FDB;&#x884C;&#x5206;&#x5272;
train_idx, test_idx = train_test_split(range(len(y)), test_size=0.2, stratify=y)
train_x = x[train_idx, :]#&#x8BAD;&#x7EC3;&#x96C6;
train_y = y[train_idx]
test_x = x[test_idx, :]#&#x6D4B;&#x8BD5;&#x96C6;
test_y = y[test_idx]

训练逻辑回归模型我们是12分类属于多分类

常用参数说明
penalty: 正则项类型，l1还是l2
C: 正则项惩罚系数的倒数，越大则惩罚越小
fit_intercept: 是否拟合常数项
max_iter: 最大迭代次数
multi_class: 以何种方式训练多分类模型
ovr = 对每个标签训练二分类模型
multinomial ovo = 直接训练多分类模型，仅当solver={newton-cg, sag, lbfgs}时支持
solver: 用哪种方法求解，可选有{liblinear, newton-cg, sag, lbfgs}
小数据liblinear比较好，大数据量sag更快
多分类问题，liblinear只支持ovr模式，其他支持ovr和multinomial
liblinear支持l1正则，其他只支持l2正则

from sklearn.linear_model import LogisticRegression#&#x5F15;&#x5165;&#x903B;&#x8F91;&#x56DE;&#x5F52;
model = LogisticRegression(multi_class='multinomial', solver='lbfgs')#solver='lbfgs'&#xFF1A;&#x6C42;&#x89E3;&#x65B9;&#x5F0F;
model.fit(train_x, train_y)

·输出结果

E:\ANACODAN\lib\site-packages\sklearn\linear_model\_logistic.py:763: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(

LogisticRegression(multi_class='multinomial')

模型效果评估

from sklearn.metrics import confusion_matrix, precision_recall_fscore_support
&#x5728;&#x6D4B;&#x8BD5;&#x96C6;&#x4E0A;&#x8BA1;&#x7B97;&#x6A21;&#x578B;&#x7684;&#x8868;&#x73B0;
test_y_pred = model.predict(test_x)
&#x8BA1;&#x7B97;&#x6DF7;&#x6DC6;&#x77E9;&#x9635;
pd.DataFrame(confusion_matrix(test_y, test_y_pred), columns=y_encoder.classes_, index=y_encoder.classes_)

·输出结果

    &#x4F53;&#x80B2;    &#x5065;&#x5EB7;    &#x5973;&#x4EBA;    &#x5A31;&#x4E50;    &#x623F;&#x5730;&#x4EA7;    &#x6559;&#x80B2;    &#x6587;&#x5316;    &#x65B0;&#x95FB;    &#x65C5;&#x6E38;    &#x6C7D;&#x8F66;    &#x79D1;&#x6280;    &#x8D22;&#x7ECF;
&#x4F53;&#x80B2;    193 1   0   1   0   0   3   2   0   0   0   0
&#x5065;&#x5EB7;    0   165 9   0   0   4   0   7   3   0   4   8
&#x5973;&#x4EBA;    1   5   167 4   0   0   13  5   3   0   1   1
&#x5A31;&#x4E50;    0   1   9   164 0   5   17  2   0   0   1   1
&#x623F;&#x5730;&#x4EA7;    0   1   4   0   180 0   0   3   0   0   1   11
&#x6559;&#x80B2;    0   0   3   2   0   185 2   6   1   0   1   0
&#x6587;&#x5316;    0   3   13  17  0   1   153 8   2   1   2   0
&#x65B0;&#x95FB;    1   4   6   5   1   12  4   124 5   2   11  25
&#x65C5;&#x6E38;    0   2   8   0   6   1   8   8   163 0   1   3
&#x6C7D;&#x8F66;    1   1   3   0   0   0   0   4   2   182 1   6
&#x79D1;&#x6280;    0   1   0   0   0   2   2   12  5   1   164 13
&#x8D22;&#x7ECF;    1   4   3   0   12  0   4   19  2   4   11  140

&#x8BA1;&#x7B97;&#x5404;&#x9879;&#x8BC4;&#x4EF7;&#x6307;&#x6807;
def eval_model(y_true, y_pred, labels):
    # &#x8BA1;&#x7B97;&#x6BCF;&#x4E2A;&#x5206;&#x7C7B;&#x7684;Precision, Recall, f1, support
    p, r, f1, s = precision_recall_fscore_support(y_true, y_pred)
    # &#x8BA1;&#x7B97;&#x603B;&#x4F53;&#x7684;&#x5E73;&#x5747;Precision, Recall, f1, support
    tot_p = np.average(p, weights=s)
    tot_r = np.average(r, weights=s)
    tot_f1 = np.average(f1, weights=s)
    tot_s = np.sum(s)
    res1 = pd.DataFrame({
        u'Label': labels,
        u'Precision': p,
        u'Recall': r,
        u'F1': f1,
        u'Support': s
    })
    res2 = pd.DataFrame({
        u'Label': [u'&#x603B;&#x4F53;'],
        u'Precision': [tot_p],
        u'Recall': [tot_r],
        u'F1': [tot_f1],
        u'Support': [tot_s]
    })
    res2.index = [999]
    res = pd.concat([res1, res2])
    return res[[u'Label', u'Precision', u'Recall', u'F1', u'Support']]

·输出结果

eval_model(test_y, test_y_pred, y_encoder.classes_)


Label   Precision   Recall  F1  Support
0   &#x4F53;&#x80B2;    0.979695    0.965   0.972292    200
1   &#x5065;&#x5EB7;    0.877660    0.825   0.850515    200
2   &#x5973;&#x4EBA;    0.742222    0.835   0.785882    200
3   &#x5A31;&#x4E50;    0.849741    0.820   0.834606    200
4   &#x623F;&#x5730;&#x4EA7;    0.904523    0.900   0.902256    200
5   &#x6559;&#x80B2;    0.880952    0.925   0.902439    200
6   &#x6587;&#x5316;    0.742718    0.765   0.753695    200
7   &#x65B0;&#x95FB;    0.620000    0.620   0.620000    200
8   &#x65C5;&#x6E38;    0.876344    0.815   0.844560    200
9   &#x6C7D;&#x8F66;    0.957895    0.910   0.933333    200
10  &#x79D1;&#x6280;    0.828283    0.820   0.824121    200
11  &#x8D22;&#x7ECF;    0.673077    0.700   0.686275    200
999 &#x603B;&#x4F53;    0.827759    0.825   0.825831    2400

模型保存

&#x4FDD;&#x5B58;&#x6A21;&#x578B;&#x5230;&#x6587;&#x4EF6;  pip install dill
#&#x6CE8;&#x610F;  &#x6211;&#x4EEC;&#x8981;&#x628A;tfidf&#x7279;&#x5F81;&#x63D0;&#x53D6;&#x6A21;&#x578B;&#x4FDD;&#x5B58;  &#x6807;&#x7B7E;&#x8F6C;&#x6362;&#x6A21;&#x578B;   &#x9884;&#x6D4B;&#x6A21;&#x578B;
!pip install dill
import dill
import pickle
model_file = os.path.join(output_dir, u'model.pkl')
with open(model_file, 'wb') as outfile:
    dill.dump({
        'y_encoder': y_encoder,
        'tfidf': tfidf,
        'lr': model
    }, outfile)

·输出结果

Requirement already satisfied: dill in e:\anacodan\lib\site-packages (0.3.4)

测试模型，对新文档预测

&#x52A0;&#x8F7D;&#x65B0;&#x6587;&#x6863;&#x6570;&#x636E;
new_data = pd.read_csv('sohu_test.txt', sep='\t', header=None, dtype=np.str_, encoding='utf8',error_bad_lines=False, delimiter="\t", names=[u'&#x9891;&#x9053;', u'&#x6587;&#x7AE0;'])
new_data.head()

&#x52A0;&#x8F7D;&#x6A21;&#x578B;
import pickle
model_file = os.path.join(output_dir, u'model.pkl')
with open(model_file, 'rb') as infile:
    model = pickle.load(infile)

&#x5BF9;&#x65B0;&#x6587;&#x6863;&#x9884;&#x6D4B;&#xFF08;&#x8FD9;&#x91CC;&#x53EA;&#x5BF9;&#x524D;10&#x7BC7;&#x9884;&#x6D4B;&#xFF09;
1. &#x8F6C;&#x5316;&#x4E3A;&#x8BCD;&#x888B;&#x8868;&#x793A;
new_x = model['tfidf'].transform(new_data[u'&#x6587;&#x7AE0;'][:50])

·输出结果


E:\ANACODAN\lib\site-packages\sklearn\feature_extraction\text.py:388: UserWarning: Your stop_words may be inconsistent with your preprocessing. Tokenizing the stop words generated tokens ['&', ',', '.', ';', 'e', 'g', 'nbsp', '&#x2014;', '\u3000', '&#x50A5;', '&#x517C;', '&#x524D;', '&#x5537;', '&#x556A;', '&#x5577;', '&#x5594;', '&#x59CB;', '&#x6F2B;', '&#x7136;', '&#x7279;', '&#x7ADF;', '&#x82E5;&#x679C;', '&#x83AB;', '&#x89C1;', '&#x8BBE;', '&#x8BF4;', '&#x8FBE;', '&#x975E;'] not in stop_words.

  warnings.warn('Your stop_words may be inconsistent with '

2. &#x9884;&#x6D4B;&#x7C7B;&#x522B;
new_y_pred = model['lr'].predict(new_x)
new_y_pred

·输出结果

array([3, 0, 3, 3, 5, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3])

3. &#x89E3;&#x91CA;&#x7C7B;&#x522B;
pd.DataFrame({u'&#x9884;&#x6D4B;&#x9891;&#x9053;': model['y_encoder'].inverse_transform(new_y_pred), u'&#x5B9E;&#x9645;&#x9891;&#x9053;': new_data[u'&#x9891;&#x9053;'][:50]})

·输出结果

    &#x9884;&#x6D4B;&#x9891;&#x9053;    &#x5B9E;&#x9645;&#x9891;&#x9053;
0   &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
1   &#x4F53;&#x80B2;    &#x5A31;&#x4E50;
2   &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
3   &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
4   &#x6559;&#x80B2;    &#x5A31;&#x4E50;
5   &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
6   &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
7   &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
8   &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
9   &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
10  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
11  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
12  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
13  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
14  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
15  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
16  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
17  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
18  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
19  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
20  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
21  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
22  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
23  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
24  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
25  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
26  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
27  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
28  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
29  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
30  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
31  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
32  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
33  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
34  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
35  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
36  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
37  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
38  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
39  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
40  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
41  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
42  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
43  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
44  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
45  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
46  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
47  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
48  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;
49  &#x5A31;&#x4E50;    &#x5A31;&#x4E50;

主函数，调用模型对新闻进行预测

&#x52A0;&#x8F7D;&#x6A21;&#x578B;
import pickle
import os
import numpy as np
import pandas as pd

output_dir = u'output'
if not os.path.exists(output_dir):
    os.mkdir(output_dir)

model_file = os.path.join(output_dir, u'model.pkl')
with open(model_file, 'rb') as infile:
    model = pickle.load(infile)

oo = 1
while oo == 1:
    f = open('yuce.txt', 'w', encoding='utf8')
    f.write(input())
    f.close()
    new1_data = pd.read_csv('yuce.txt', sep='\t', header=None, dtype=np.str_, encoding='utf8', names=[u'&#x6587;&#x7AE0;'])
    new1_data.head()
    # &#x52A0;&#x8F7D;&#x6A21;&#x578B;
    import pickle

    model_file = os.path.join(output_dir, u'model.pkl')
    with open(model_file, 'rb') as infile:
        model = pickle.load(infile)
    new1_x = model['tfidf'].transform(new1_data[u'&#x6587;&#x7AE0;'])
    # 2. &#x9884;&#x6D4B;&#x7C7B;&#x522B;
    new1_y_pred = model['lr'].predict(new1_x)
    pd.DataFrame({u'&#x9884;&#x6D4B;&#x9891;&#x9053;': model['y_encoder'].inverse_transform(new1_y_pred)})
    print(pd.DataFrame({u'&#x9884;&#x6D4B;&#x9891;&#x9053;': model['y_encoder'].inverse_transform(new1_y_pred)}))
    with open(r'yuce.txt', 'a+', encoding='utf-8') as test:
        test.truncate(0)

Original: https://blog.csdn.net/weixin_57231611/article/details/120915342
Author: 赵有才er
Title: 用TFIDF词袋模型进行新闻分类

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/699793/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

swin transformer详解

摘要 transformer应用到CV领域的挑战： 1、相对于文本，视觉实体的尺度区别很大，例如车辆和人2、相对于文本，图像像素的分辨率太大解决方法：使用层级式的transfor…

人工智能 2023年6月16日
0079
PIM其它特性——IPv6、Anycast RP

PIM IPv6 MLD 接收端到组播路由器之间使用MLD协议来支持IPv6组播MLD 版本MLDv1同ICMPv2MLDv2同ICMPv3MLD 报文通过ICMPv6进行发包IC…

人工智能 2023年6月26日
00101
python机器学习多项式回归模型正则化（拉索，岭，弹性网）

多项式回归模型正则化（拉索，岭，弹性网）目录多项式回归模型正则化（拉索，岭，弹性网）一、多项式回归模型正则化： * 1.L1正则化（lasso)回归 2.L2正则化（Ridg…

人工智能 2023年6月18日
00114
Qt配置OpenCV【视频+编译好的OpenCV文件百度网盘】

这是我们一个项目中用到l Qt和OpenCV，但是我配置OpenCV总是搞不好他的编译。后面才发现，原来可以直接用别人编译好的就行了。给我的教训就是，先得弄懂一些基本的知识，比如…

人工智能 2023年7月19日
0054
自动驾驶、无人驾驶、车联网笔记分享

人工智能 2023年5月26日
0071
实验二：用python实现SVM支持向量机并对鸢尾花数据集分类

实验二：SVM支持向量机 1. 实验内容： (1)用你熟知的语言(尽量使用python)实现支持向量机的算法，并在给定的数据集上训练。 (2)在测试集上用训练好的支持向量机进行测试…

人工智能 2023年7月1日
0090
ModuleNotFoundError: No module named ‘onnxruntime‘和ModuleNotFoundError: No module named ‘onnx‘

D:\programfiles\miniconda\envs\py38torch_gpu\python.exe C:/Users/liqiang/Desktop/handpose_…

人工智能 2023年7月6日
0063
两阶段鲁棒优化的 Benders分解与行列生成(C&CG) 算法及算例讲解

本文主要基于Zeng Bo老师2013年发表于《Operations Research Letters》上的文章《Solving two-stage robust optimiz…

人工智能 2023年6月15日
0081
[Debug] Pytorch 版本问题 THC/THC.h: No such file or directory

Pytorch 版本问题 THC/THC.h: No such file or directory 该问题发生于安装 c语言扩展时。这个问题我经常遇见，也是因为我之前不关心 pyt…

人工智能 2023年7月21日
0055
基于Graph的Embedding方法概述

原文链接：基于 Graph 的 Embedding 方法概述文章目录 Graph Embedding * 浅层图模型 – DeepWalk Node2vec Meta…

人工智能 2023年6月1日
0094
pycharm安装pygame库遇到问题怎么办？

尽管网上搜到许多篇关于pycharm如何安装pygame库，但是跟着要求做还是安装失败，在安装其他第三方库时，我目前还没有遇到任何问题。最后总结了好几篇，得出一个最简单的安装方式：…

人工智能 2023年7月29日
0074
鸢尾花分类——后续（读取csv文件，并对数据进行处理数据）

这篇文章是在前篇文章的基础上进行的更改的，补充了简单的数据处理部分完成缺失值处理完成数据编码与标准化完成数据集的划分（可尝试多种划分方法）完成建立鸢尾花分类模型（可尝试使用…

人工智能 2023年7月1日
0095
各种弱人工智能产品已经逐步走入了我们的生活

在过往的三四十年之间科学的发展与科技的进步让我们所处的世界发生了翻天覆地的变化，而在未来这种变化也许会更加惊人。不知不觉中，我们已经进入了人工智能的时代各种弱人工智能产品已经逐步…

人工智能 2023年7月17日
0073
Ubuntu 18.04安装配置OpenCV 4.4.0

概述本文介绍ubuntu下OpenCV的编译安装以及环境配置，ubuntu版本18.04 OpenCV下载下载地址OpenCV官网，选择最新的4.4.0版本(如果下载速度太慢，…

人工智能 2023年7月20日
0061
知识图谱环境搭建

借由学习模型预测和深度学习第一次接触知识图谱，发现有点意思，写个博客纪念一下。看定义懵懵懂懂，实际上它是这样的：通俗点来讲，就是将有关联的不同种类的事物通过某种关系关联起来，建…

人工智能 2023年6月1日
0074
人脸与关键点检测：YOLO5Face实战

Github:https://github.com/deepcam-cn/yolov5-face 导读：居然花了一天时间把该项目复现，主要是折腾在数据集格式上，作者居然在train…

人工智能 2023年7月23日
0069

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

用TFIDF词袋模型进行新闻分类

大家都在看