批量下载小说网站上的小说(python爬虫)

随便说点什么

因为在学python,所有自然而然的就掉进了爬虫这个坑里,好吧,主要是因为我觉得爬虫比较酷,才入坑的。

想象一下,你可以自动批量收集互联网上的海量数据,这是多么令人兴奋。

[En]

Imagine how exciting it is that you can automatically collect huge amounts of data on the Internet in batches.

所以我被这块大蛋糕吸引住了:)

[En]

So I was attracted by this big cake:)

掌握了基本的爬虫知识,主要是urllib,urlib2,re 这些库,以及Request(),urlopen()的基本用法之后,我就开始寻找爬取目标。

碰巧,我女朋友的哥哥让她帮忙下载小说,并让我推荐几本。我有一阵子不知道他哥哥喜欢什么类型,所以我想我不妨找个小说网批量下载一些小说。

[En]

It just so happens that my girlfriend’s brother asked her to help download novels and asked me to recommend several. I didn’t know what type his brother liked for a while, so I thought I might as well find a novel network to download some novels in bulk.

所以我花了一个中午的时间写了一个粗略的爬虫脚本,开始并发现它可以运行,让脚本在这里运行,然后我回去躺在床上睡着了。

[En]

So I took a noon time to write a rough crawler script, started and found it could be run, let the script run here, and I went back to lie in bed and fall asleep.

起来之后发现脚本遇到错误,停掉了,于是debug,干掉bug之后跑起来陆续又发现几个错误,于是干脆在一些容易出错的地方,例如urlopen()请求服务器的地方,本地write()写入的地方(是的,这也会有超时错误!)加入了try-except捕获错误进行处理,另外加入了socket.timeout网络超时限制,修修补补之后总算可以顺畅的运行。

如此,运行了两天,爬虫脚本把这个小说网上的几乎所有小说都下载到了本地,一共27000+本小说,一共40G。

从那时起,它就在过去完成、打包和发送。另外,百度云真的很烂,每次上传的次数、跟朋友分享的文件数量、分享的文件夹大小都是有限制的,所以我要整合成压缩版才能分享。

[En]

From then on, it was done and packed and sent in the past. In addition, Baidu Yun is really crappy, there is a limit on the number of uploads each time, the number of files shared with friends, and the size of the folder shared, so I have to be integrated into a compressed version to share.

下载界面

下面附上代码

该剧本只被导演为抢占“选书网”小说站,将小说归为“奇幻与奇幻”类。供网友参考,可自行修改。

[En]

The script is only directed to grab the “book selection network” novel station, the novel under the “fantasy and fantasy” category. For netizens’ reference, you can modify it by yourself.

字迹很粗糙,所以不要喷雾。

[En]

The writing is rough, so don’t spray it.

Original: https://www.cnblogs.com/yym2013/p/5762793.html
Author: Freecode
Title: 批量下载小说网站上的小说(python爬虫)

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/7905/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

发表回复

登录后才能评论
免费咨询
免费咨询
扫码关注
扫码关注
联系站长

站长Johngo!

大数据和算法重度研究者!

持续产出大数据、算法、LeetCode干货,以及业界好资源!

2022012703491714

微信来撩,免费咨询:xiaozhu_tec

分享本页
返回顶部