scrapy

1、直接在python中下载
1、先下载它的部分插件,lxml,parsel,w3lib,Twisted,pyOpenSSL
1.1、出错的话输入 pip install --upgrade pip 去更新

scrapy命令行格式:
startproject 创建新工程
genspider 创建爬虫
crawl 运行爬虫

1.scrapy startproject 项目名称
cd SCRAPY框架
scrapy startproject python123demo

2.切换到项目目录下,scrapy genspider 爬虫名称 域名
cd python123demo
scrapy genspider demo itcast.cn

3.scrapy crawl 爬虫名字
scrapy crawl demo
FEED_EXPORT_ENCODING = 'utf-8'

import scrapy
from lxml import etree,html
from quanshudemo.items import QuanshudemoItem

class RementopSpider(scrapy.Spider):
name = ‘rementop’
allowed_domains = [‘qb5.tw’]
start_urls = [‘https://www.qb5.tw/top/monthvisit/’]

def parse(self, response):
books=response.xpath(“//*[@id=’articlelist’]/ul[2]/li”)
items = []
for shuji in books: #结果是列表

xpath定位

name = shuji.xpath(‘./span[2]/a/text()’).extract()
author = shuji.xpath(‘./span[3]/text()’).extract()
print(name,author)

把结果封装到对象中

item = QuanshudemoItem()
item[‘name’] = name
item[‘author’] = author
items.append(item)
return items

Original: https://blog.csdn.net/m0_52074139/article/details/125398826
Author: 4029小秃头
Title: scrapy

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/791501/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球