用python一键爬取几千张表情包斗图，分分钟征服朋友圈所有好友

2023年5月24日下午9:42 • Python • 阅读 70

如今，年轻人聊天时，不好意思说自己是没有表情包的年轻人。表情包已经成为人们聊天中不可或缺的一部分。

[En]

Nowadays, when young people chat, they are embarrassed to say that they are young people without memes. Memes have become an indispensable part of chatting with people.

我刚认识的朋友一分一秒地扔出几个表情包，然后分分钟拉进关系中，女朋友对整个两个表情包又生气又开心，这也可以化解尴尬。我没有时间把两个表情包都打出来，所以我很有礼貌，也很尴尬。

[En]

The friend I just met threw a few emojis out and pulled into the relationship minute by minute, and my girlfriend was angry and happy with the whole two emojis, which could also resolve the embarrassment. I didn’t have time to type the whole two emojis, so I was polite and embarrassed.

; 一、欲扬先抑

准备非常重要，首先知道我们要做什么，带着什么做什么，怎么做，然后一步步实时、稳定地走下去。

[En]

Preparation is very important, first know what we are going to do, with what to do, how to do, and then go step by step real-time, steady.

开发环境配置

Python 3.6
Pycharm

打开浏览器搜索要安装的软件的名称

[En]

Open your browser to search for the name of the software you want to install

Python

之后的官方网站是官方网站，只要名字上有广告两个字，不要指向，自信，那就是广告。

[En]

After the official is the official website, as long as the name with the word advertisement, do not point, self-confidence, that is advertising.

直接点下面的 Python 3.10.2 下载最新版本即可，不用点那啥 Download

pycharm

随便点一个 Download

安装方法已经写得太久了，可以添加以下组<details><summary>*<font color='gray'>[En]</font>*</summary>*<font color='gray'>The installation methods have been written one by one for too long, you can add the following groups</font>*</details>
Python学习交流1群：924040232
Python学习交流2群：815624229
我还给大家准备了大量的Python学习资料，直接在群里就可以免费领取了。

模块安装配置

requests
parsel
re

打开电脑，按住win+r，输入cmd，回车，输入pip install （加上要安装的模块名），回车即可安装。

二、代码

目标：fabiaoqing
你可以自己填写地址的前面和后面，包括后面的代码，这应该不是不可能的。

[En]

You can fill in the front and back of the address by yourself, including those in the back code, which should not be impossible.

导入模块

import requests
import parsel
import re
import time

请求网址

url = f'fabiaoqing/biaoqing/lists/page/{page}.html'

请求头

headers = {
       'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
    }

返回网页源代码

response = requests.get(url=url, headers=headers)

解析数据

selector = parsel.Selector(response.text) # 把respons.text 转换成 selector 对象

第一次提取提取所有的div标签内容

divs = selector.css('#container div.tagbqppdiv') # css 根据标签提取内容

通过标签内容提取他的图片url地址

img_url = div.css('img::attr(data-original)').get()

提取标题

title = div.css('img::attr(title)').get()

获取图片的后缀名

name = img_url.split('.')[-1]

保存数据

new_title = change_title(title)

对表情包图片发送请求获取它二进制数据

img_content = requests.get(url=img_url, headers=headers).content

保存数据

def save(title, img_url, name):

    img_content = get_response(img_url).content
    try:
        with open('img\\' + title + '.' + name, mode='wb') as f:
            # 写入图片二进制数据
            f.write(img_content)
            print('正在保存:', title)
    except:
        pass

替换标题中的特殊字符

因为文件名未知并且有特殊字符，所以我们需要用正则表达式替换特殊字符。

[En]

Because the file name is unknown and there are special characters, we need to replace the special characters with regular expressions.

def change_title(title):
    mode = re.compile(r'[\\\/\:\*\?\"\\|]')
    new_title = re.sub(mode, "_", title)
    return new_title

记录时间

time_2 = time.time()

use_time = int(time_2) - int(time_1)
print(f'总共耗时:{use_time}秒')

伙计们，这是单线程的，下面是多线程的，我会直接转到代码。

[En]

Guys, this is single-threaded, the following is multi-threaded, I will go straight to the code.

import requests
import parsel
import re
import time
import concurrent.futures

def change_title(title):

    mode = re.compile(r'[\\\/\:\*\?\"\\|]')
    new_title = re.sub(mode, "_", title)
    return new_title

def get_response(html_url):

    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
    }
    repsonse = requests.get(url=html_url, headers=headers)
    return repsonse

def save(title, img_url, name):

    img_content = get_response(img_url).content
    try:
        with open('img\\' + title + '.' + name, mode='wb') as f:

            f.write(img_content)
            print('正在保存:', title)
    except:
        pass

def main(html_url):

    html_data = get_response(html_url).text
    selector = parsel.Selector(html_data)
    divs = selector.css('#container div.tagbqppdiv')
    for div in divs:

        img_url = div.css('img::attr(data-original)').get()

        title = div.css('img::attr(title)').get()

        name = img_url.split('.')[-1]

        new_title = change_title(title)
        save(new_title, img_url, name)

if __name__ == '__main__':
    time_1 = time.time()
    exe = concurrent.futures.ThreadPoolExecutor(max_workers=10)
    for page in range(1, 201):
        url = f'fabiaoqing/biaoqing/lists/page/{page}.html'
        exe.submit(main, url)
    exe.shutdown()
    time_2 = time.time()
    use_time = int(time_2) - int(time_1)
    print(f'总共耗时:{use_time}秒')

兄弟们，18秒一千多张，这结束的有点快了啊

如果你看过后觉得有用，请喜欢它并收集它。我爱你是为了让你觉得自己很伟大。

[En]

If you think it is useful after reading it, please like it and collect it. I love you to feel big.

你看，代码运行得如此之快，只需要18秒。我不希望每个人在日常生活中都这么快。嘿，这不太好。

[En]

You see, the code runs so fast, it only takes 18 seconds. I don’t want everyone to be so fast in daily life. Hey, it’s not good.

Original: https://www.cnblogs.com/hahaa/p/15990045.html
Author: 轻松学Python
Title: 用python一键爬取几千张表情包斗图，分分钟征服朋友圈所有好友

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/509844/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

pandas库的Series结构和DataFrame结构

【高心星出品】 Pandas series 是像数组一样的一维对象，可以存储很多类型的数据。Pandas series 和 Numpy array之间的主要区别之一是你可以为Pan…

Python 2023年8月21日
0046
当我给表弟用python写了个雷霆战机后…

#导入模块 import pygame, os import time import random from pygame.sprite import Sprite from py…

Python 2023年5月24日
0066
我的Vue之旅 07 Axios + Golang + Sqlite3 实现简单评论机制

第三期 · 使用 Vue 3.1 + TailWind.CSS + Axios + Golang + Sqlite3 实现简单评论机制效果图我们需要借助js的Data对象把毫秒…

Python 2023年10月18日
0064
【Flask+Echarts】创建基础图例(散点、折线、柱状图、南丁格尔、饼图、堆叠、雷达、热点图)

注：本文着重于使用Flask+Echarts画图，所以不会再数据处理上做较大的操作。本次所使用到的数据北京每小时空气质量房源分布占比疫情数据 Echarts官网网址还需要两个j…

Python 2023年8月11日
0071
[pandas基础] Python Pandas描述性统计基础内容

Pandas描述性统计简介描述统计学（descriptive statistics）是一门统计学领域的学科，主要研究如何取得反映客观现象的数据，并以图表形式对所搜集的数据进行处理…

Python 2023年8月16日
0054
实训——基于大数据Hadoop平台的医疗平台项目实战

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

Python 2023年8月2日
0070
【数字IC精品文章收录】近500篇文章|学习路线|基础知识|接口|总线|脚本语言|芯片求职|安全|EDA|工具|低功耗设计|Verilog|低功耗|STA|设计|验证|FPGA|架构|AMBA|书籍|

一、项目说明本篇索引旨在收藏CSDN全站中有关数字IC领域高价值文章，在数字芯片领域中，就算将架构，设计，验证，DFT，后端诸多岗位加在一起的数量，都不及软件类一个细分方向的岗…

Python 2023年8月26日
0058
【Keras环境的安装】

Keras的安装 1-介绍 2-创建虚拟环境 3-tensorflow与keras安装 4-错误记录-等待解决 1-介绍 1，keras需要在TensorFlow之上才能运行，所以…

Python 2023年8月3日
0070
python实现skywalking邮件告警webhook接口

1、介绍 2、默认告警规则告警配置文件：config/alarm-settings.yml 3、告警配置文件语法 rules: metrics-name: threshold 阈…

Python 2023年6月3日
0065
可以跟小伙伴联机对战的五子棋，你会吗？今天用Python教大家搞定！！快上车！

导语每次都写单机游戏自嗨好像没啥意思，这次我们来写个支持联机对战的游戏吧，支持局域网联机对战的五子棋小游戏他来了！相关文件关注小编，私信小编领取游戏源码的哟！！当然别忘了一键…

Python 2023年9月22日
0047
pandas进行数据处理

pandas读取多列选择表格中的’w’、’z’列 data[[‘w’,’z’]] …

Python 2023年8月7日
0076
python 的scrapy框架

scrapy startproject <project_name> [project_dir]</project_name>创建一个名为 project_…

Python 2023年10月1日
0041
异常值检测！最佳统计方法实践（代码实现）！⛵

💡 作者：韩信子@ShowMeAI📘 Python3◉技能提升系列：https://www.showmeai.tech/tutorials/56📘 数据分析实战系列：https:/…

Python 2023年10月24日
0050
pytest_allure_jenkins_email

搭建一个接口自动化测试框架可以使用以下工具和技术： 1. Python：作为主要开发语言，可以使用 Python_来编写测试脚本和测试用例。 2. _Pytest：一款功能强大的 …

Python 2023年9月14日
0047
C++流媒体开源库Live555详细介绍

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。 Original: https://blo…

Python 2023年11月6日
0042
ubuntu：自动加载第三方设备驱动

有时，我们需要让Ubuntu系统加载第三方的内核模块，但是重新编译内核显然太繁琐，因此可以使用某些手段来手动加载或者自动加载这些模块。本文介绍几种方法，用来设置开机加载这些模块。…

Python 2023年10月24日
0052

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

用python一键爬取几千张表情包斗图，分分钟征服朋友圈所有好友

大家都在看