pyhon爬虫模拟网页登陆、正则表达式

2023年11月9日下午2:12 • Python • 阅读 36

前言:不是每个网页都能模拟成功,仅供学习

模拟网页登陆

–安装模块–
pip install urllib (运行cmd输入此段代码即可安装)

点击查看代码

from urllib import request
import urllib
from http import cookiejar

&#x5B9A;&#x4E49;&#x6587;&#x4EF6;&#x540D;
filename = 'cookie.txt'

&#x58F0;&#x660E;&#x4E00;&#x4E2A;cookie,&#x4F20;&#x5165;&#x6587;&#x4EF6;&#x540D;
cookie = cookiejar.MozillaCookieJar(filename)

&#x5B9A;&#x4E49;cookie&#x5904;&#x7406;
handler = request.HTTPCookieProcessor(cookie)

&#x5B9A;&#x4E49;&#x4E0B;&#x8F7D;&#x5668;,&#x4F20;&#x5165;&#x5904;&#x7406;&#x5668;
opener = request.build_opener(handler)

&#x5B9A;&#x4E49;&#x767B;&#x5F55;&#x7528;&#x7684;&#x8D26;&#x53F7;&#x5BC6;&#x7801;

postdata = urllib.parse.urlencode({

    'username': '&#x8D26;&#x53F7;',
    'password': '&#x5BC6;&#x7801;'
}).encode(encoding='UTF8')

url
loginUrl = '&#x7F51;&#x7AD9;'

&#x6A21;&#x62DF;&#x767B;&#x5F55;&#x9875;&#x9762;
resp = opener.open(loginUrl, postdata)

&#x4FDD;&#x5B58;cookie&#x5230;&#x6587;&#x4EF6;

cookie.save(ignore_discard=True, ignore_expires=True)

&#x518D;&#x6B21;&#x8BBF;&#x95EE;&#x7F51;&#x7AD9;
url2 = "&#x7F51;&#x7AD9;"

&#x6253;&#x5F00;&#x9875;&#x9762;
try:
    result = opener.open(url2)
except request.HTTPError as e:
    if hasattr(e, "code"):
        print(e.code)
except request.URLError as e:
    if hasattr(e, "reason"):
        print(e.reason)
else:
    print(result.read())

正则表达式

了解正则表达式
正则表达式是对字符串操作的一种逻辑公式，就是用事先定义好的一些特定字符、及这些特定字符的组合，组成一个”规则字符串”，这个”规则字符串”用来表达对字符串的一种过滤逻辑。
安装模块
pip install re (运行cmd输入此段代码即可安装)

import re
将正则表达式编译成Pattern对象，注意hello前面的r的意思是”原生字符串”
pattern = re.compile(r’hello’)
使用re.match匹配文本，获得匹配结果，无法匹配时将返回None
result1 = re.match(pattern,’hello’)
result2 = re.match(pattern,’helloo YC!’)
result3 = re.match(pattern,’helo YC!’)
result4 = re.match(pattern,’hello YC!’)
match表示从头开始匹配匹配不到返回none
search可以从任意位置匹配

认识匹配规则字符:

点击查看代码

import re

&#x5B9A;&#x4E49;&#x6B63;&#x5219;&#x5316;&#x89C4;&#x5219;=&#x5339;&#x914D;&#x6A21;&#x5F0F;,r&#x8868;&#x793A;&#x539F;&#x751F;&#x5B57;&#x7B26;&#x4E32;
rexg = re.compile(r'\d*\w*')
res = re.search(rexg, '555555555ddddddddddddd5')
print(res)

rexg1 = re.compile(r'\d+\w*')
res1 = re.search(rexg1, 'ppppp555555555ddddddddddddd5')
print(res1)

rexg2 = re.compile(r'\d?dddddd')
res2 = re.search(rexg2, 'ppppp555555555ddddddddddddd5')
print(res2)

&#x7535;&#x8BDD;&#x53F7;&#x7801;
rexg3 = re.compile(r'1\d{10}')
res3 = re.search(rexg3, '13865256323')
print(res3)

&#x5B9A;&#x4E49;&#x90AE;&#x7BB1;
rexg4 = re.compile(r'\d{5,12}@\w{2}\.\w{3}')
res4 = re.search(rexg4, '1sdfsdfdsfdsdfs647709191@qq.com')
print(res4)

rexg5 = re.compile(r'\d+')
res5 = re.search(rexg5, '11sdfsdfdsfdsdfs647709191@qq.com')
print(res5)

rexg6 = re.compile(r'\d{5,12}? ?')
res6 = re.search(rexg6, '11sdfsdfdsfdsdfs6477d09191@qq.com')
print(res6)

&#x8FB9;&#x754C;&#x5339;&#x914D;-&#x5339;&#x914D;&#x5B57;&#x7B26;&#x4E32;&#x5F00;&#x5934;
rexg7 = re.compile(r'\Aabc')
res7 = re.search(rexg7, 'abcqqqqqqqqqabccttttttttttt'
                        'abcctttttttttabc')
print(res7)

&#x4EFB;&#x610F;

rexg8 = re.compile(r'1\d{10}|\d{5,12}@\w{2}\.\w{3}')

res8 = re.search(rexg8, "sahsyhs1d8285555556gashgshasdag12345@qq.com")
print(res8)

&#x5206;&#x7EC4;

rexg9 = re.compile(r'(abc){3}')
res9 = re.search(rexg9, "zUJXHUJHXuabcabcabcosaojaodiabcosajosabc")
print(res9)

&#x5206;&#x7EC4; + &#x522B;&#x540D;
rexg10 = re.compile(r'(?P<tt>abc)888(?P=tt)')
res10 = re.search(rexg10, "hasghsabc888abc")
print(res10)

&#x5206;&#x7EC4; + &#x7F16;&#x53F7;

rexg11 = re.compile(r'(\d{5})uu\1')

res11 = re.search(rexg11, "12345uu12345")
print(res11)

</tt>

Original: https://www.cnblogs.com/wei3306/p/16000998.html
Author: N暖阳_李维宁
Title: pyhon爬虫模拟网页登陆、正则表达式

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/814742/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

用Python脚本能获取Wifi密码么？能。

注意，本文不是破解 WIFI 密码，当然你把程序发给别人再获取对方密码，那是社会工程学。文章目录 * – ⛳️ 实战场景与 subprocess 模块介绍 &#821…

Python 2023年9月26日
0041
python-opencv 常用函数

读取图像 retval = cv2.imread( filename[, flags] ) retval是返回值，其值是读取到的图像。如果未读取到图像，则返回”None…

Python 2023年11月1日
0021
python 包之 pycrypto 算法加密教程

pycryto能实现大致3种类型的数据加密（单向加密、对称加密和非对称加密），产生随机数，生成密钥对，数字签名单向加密：Crypto.Hash，其中中包含MD5、SHA1、SH…

Python 2023年6月10日
0057
pytest框架整体设计及规划

一、研发pytest框架的两种场景场景1:测试脚本已提交场景2:测试脚本未研发二、框架分层 1.pytest框架层：脚本层、配置层、测试报告层、数据驱动层、日志层 2.项目框…

Python 2023年9月11日
0031
Django与数据库交互

1 如何创建项目数据库首先，在虚拟机数据库中建立一个与项目同名的数据库，方便管理。 (django_test) bd@DF:~$ mysql -u admin -p 输入密码，进…

Python 2023年8月6日
0059
数据分析基础——Numpy中的函数

1.一元函数 np.abs 求绝对值 np.sqrt 开根 np.square 平方 np.exp 计算指数 np.log 求以e为底的对数 np.floor 朝着无穷小的方向去整…

Python 2023年8月30日
0037
如何在conda虚拟环境开启jupyter-notebook或使用指定conda环境作为jupyter-notebook的内核

文章目录一、正常情况下只能在conda的base环境下打开jupyter-notebook 二、在conda下的其他虚拟环境中 * 1.正常情况 2.在conda创建的虚拟环境中…

Python 2023年9月7日
0083
日本超级商场数据分析

import os import pandas as pd import matplotlib.pyplot as plt # 绘图库 import numpy as np plt…

Python 2023年8月9日
0036
查找100-999之间的水仙花数

水仙花数，即一个三位数的个，十，百三位数字的立方和等于该三位数。 1 from math import pow 2 3 if __name__ == "__main__&…

Python 2023年6月3日
0075
Render Interactive plots with Matplotlib

Matplotlib也可以渲染出交互式的可视化图表互动图表受到所有人的喜爱，因为它们能够更有效地讲述故事。在数据科学和相关领域也是如此。探索性数据分析是数据预处理管道中的一个重要…

Python 2023年9月3日
0056
Django运行manage.py服务显示ModuleNotFoundError:

Django运行manage.py服务显示ModuleNotFoundError: 记录下本人开始学习Django时所遇到的问题：启动 manage.py 服务报错： Modul…

Python 2023年8月5日
0070
Python实现HBA混合蝙蝠智能算法优化支持向量机分类模型(SVC算法)项目实战

说明：这是一个机器学习实战项目（附带数据+代码+文档+视频讲解），如需数据+代码+文档+视频讲解可以直接到文章最后获取。 1.项目背景蝙蝠算法是2010年杨教授基于群体智能提…

Python 2023年9月2日
0055
大数据Kudu（二）：Kudu架构

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。 Original: https://blo…

Python 2023年10月7日
0039
【python】点燃我，温暖你，快来Get同款~

Original: https://www.cnblogs.com/Qqun261823976/p/16877504.htmlAuthor: python倩Title: 【pyth…

Python 2023年10月30日
0039
centos7环境搭建Apache2.4+PHP7+MySQL8

之前在Windows系统上架构了一个web站点，但是因为我的服务器性能不够，导致我服务器经常卡死我就一直在考虑换个系统，犹豫了好几天，因为自己的服务器还是配置了很多东西，和一些文件…

Python 2023年6月12日
0076
java matplot_Matplotlib 基础知识

1.绘制正弦 from matplotlib.pyplot import plot, show import math T = range(100) # 0~99 X周的范围2pi…

Python 2023年9月4日
0043

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

pyhon爬虫模拟网页登陆、正则表达式

大家都在看