python 爬虫 抓取高清美女壁纸 源码附上

python 爬虫 抓取高清美女壁纸 源码附上

本人比较喜欢收集壁纸,发现53PIN.com动漫分类下的壁纸,我都很喜欢;于是写了个爬虫,只需要输入你需要爬几页,就可以爬几页。

环境准备

  1. python3.8
  2. 需要用到的第三方包

*
requests:通过http请求获取页面,官方文档
Beautiful Soup4:可以从HTML或XML文件中提取数据,官方文档

  1. 在终端中分别输入以下pip命令,安装它们
python -m pip install beautifulsoup4
python -m pip install requests

最后,代码附上。

import os
import time
import requests
from bs4 import BeautifulSoup

需要爬取的页数
gain_page = int(input("请输入你需要爬取的页数:"))
根据页数进行逻辑判断
for i in range(1, gain_page + 1):
    if i == 1:
        url = "https://www.53pic.com/bizhi/dongman/"
    else:
        url = "https://www.53pic.com/bizhi/dongman/index_%s.html" % str(i)

    # print(url)    # 测试代码

    # ---------------提取主页源代码--------------- #
    # 向服务器请求数据
    main_page_info = requests.get(url)
    # 解决乱码问题
    main_page_info.encoding = "utf-8"
    main_page_text = main_page_info.text
    # print(main_page_text)

    # -------2&#x3001;&#x901A;&#x8FC7;href&#x62FF;&#x5230;&#x5B50;&#x9875;&#x9762;&#x5185;&#x5BB9;&#xFF0C;&#x4ECE;&#x5B50;&#x9875;&#x9762;&#x4E2D;&#x627E;&#x5230;&#x56FE;&#x7247;&#x4E0B;&#x8F7D;&#x5730;&#x5740;   <img src="&#x201D;&#x201C;">------

    # &#x5C06;&#x4E3B;&#x9875;&#x6E90;&#x7801;&#x4EA4;&#x7ED9;BeautifulSoup&#x5904;&#x7406;
    handle_main = BeautifulSoup(main_page_text, "html.parser")
    # print(handle_main)
    # &#x7F29;&#x5C0F;&#x6570;&#x636E;&#x5339;&#x914D;&#x8303;&#x56F4;
    son_link_list_a = handle_main.find_all(name="a", attrs={"class": "title-content"})
    # print(son_link_list)

    # &#x901A;&#x8FC7;&#x5FAA;&#x73AF;&#x53D6;&#x51FA;a&#x6807;&#x7B7E;&#x4E2D;&#x7684;href&#x3001;&#x6807;&#x9898;
    for a_href_a in son_link_list_a:
        # print(a_href_a)
        href = "https://www.53pic.com" + a_href_a.get("href")
        title = a_href_a.get("title")
        # print(href, title)

        # &#x62FF;&#x5230;&#x5B50;&#x9875;&#x9762;&#x7684;&#x9875;&#x9762;&#x6E90;&#x4EE3;&#x7801;
        son_page_info = requests.get(href)
        # &#x89E3;&#x51B3;&#x4E2D;&#x6587;&#x4E71;&#x7801;&#x95EE;&#x9898;
        son_page_info.encoding = "utf-8"
        son_page_info_text = son_page_info.text
        # print(son_page_info_text)
        # &#x5C06;&#x5B50;&#x9875;&#x9762;&#x4EA4;&#x7ED9;BeautifulSoup&#x5904;&#x7406;
        handle_son = BeautifulSoup(son_page_info_text, "html.parser")
        # &#x7F29;&#x5C0F;&#x5B50;&#x9875;&#x9762;&#x6570;&#x636E;&#x5339;&#x914D;&#x8303;&#x56F4;
        download_link_p = handle_son.find_all(name="div", attrs={"id": "showimgXFL"})
        # print(download_link_p)
        for div_src_div in download_link_p:
            # print(div_src_div)
            # &#x67E5;&#x627E;img&#x6807;&#x7B7E;
            download_src_img = div_src_div.find("img")
            # &#x5339;&#x914D;src&#x5C5E;&#x6027;
            download_src = download_src_img.get("src")
            # &#x8BF7;&#x6C42;&#x4E0B;&#x8F7D;
            download = requests.get(download_src)
            # print(download_src)
            # &#x5207;&#x6362;&#x5DE5;&#x4F5C;&#x76EE;&#x5F55;
            os.chdir(r"C:\Users\&#x5D14;&#x6CFD;\Desktop\mig")
            with open("%s.jpg" % title, mode='wb+') as file:
                # &#x4EE5;&#x4E8C;&#x8FDB;&#x5236;&#x6587;&#x4EF6;&#x5199;&#x5165;&#x6587;&#x4EF6;
                file.write(download.content)
                time.sleep(1)
            print("%s...&#x4E0B;&#x8F7D;&#x6210;&#x529F;&#xFF01;" % title)

Original: https://www.cnblogs.com/xiaozebuxiao/p/15890030.html
Author: 小泽不小
Title: python 爬虫 抓取高清美女壁纸 源码附上

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/814972/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球