Python爬取全球最大视频网站YouTube视频

前言

作为目前全世界最大的视频网站,它几乎全是用Python来写的
该网站目前是行业内的在线视频服务提供商,其系统每天处理数千万个视频片段,为全球数千名用户提供高水平的视频上传、分发、展示和浏览服务。2015年2月,央视首次将春晚推送到网站。

[En]

The website is currently an online video service provider in the industry, and its system processes tens of millions of video clips every day, providing high-level video upload, distribution, display and browsing services for thousands of users around the world. In February 2015, CCTV pushed the Spring Festival Gala to the website for the first time.

今天,我们就要用Python来快速批量下载该网站的视频

开发环境

版 本: python 3.8
编辑器:pycharm 2021.2
第三方模块:requests + tqdm

所需模块

import requests
import re
import json
from tqdm import tqdm
import os

开始代码编写

请求数据

headers = {
    'cookie': 'VISITOR_INFO1_LIVE=9qZVrzB27uI; PREF=f4=4000000&tz=Asia.Shanghai; _ga=GA1.2.621834420.1648121145; _gcl_au=1.1.1853038046.1648121145; NID=511=Zc1APdmEbCD-iqVNVgI_vD_0S3LVI3XSfl-wUZEvvMU2MLePFKsQCaKUlUtchHSg-kWEVMGOhWUbxpQMwHeIuLjhxaslwniMh1OsjVfmOeTfhpwcRYpMgqpZtNQ7qQApY21xEObCvIez6DCMbjRhRQ5P7siOD3X87QX0CFyUxmY; OTZ=6430350_24_24__24_; GPS=1; YSC=0E115KqM_-I; GOOGLE_ABUSE_EXEMPTION=ID=d02004902c3d0f4d:TM=1648620854:C=r:IP=47.57.243.77-:S=YmZXPW7dxbu83bDuauEpXpE; CONSISTENCY=AGDxDeNysJ2boEmzRP4v6cwgg4NsdN4-FYQKHCGhA0AeW1QjFIU1Ejq1j8l6lwAc6c-pYTJiSaQItZ1M6QeI1pQ3wictnWXTOZ6_y8EKlt0Y_JdakwW6srR39-NLuPgSgXrXwtS0XTUGXpdnt4k3JjQ',
    'referer': 'https://www.youtube.com/results?search_query=jk%E7%BE%8E%E5%A5%B3',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36'
}
url = 'https://www.有土比.com/watch?v=ImoXcSpR_io'
response = requests.get(url=url, headers=headers)

解析数据

json_str = re.findall('var ytInitialPlayerResponse = (.*?);var', response.text)[0]
json_data = json.loads(json_str)
video_url = json_data['streamingData']['adaptiveFormats'][0]['url']
audio_url = json_data['streamingData']['adaptiveFormats'][-2]['url']
title = json_data['videoDetails']['title']
title = title.replace(' ', '')
title = re.sub(r'[\/:|?*"<>]', '', title)

视频数据

video_pbar = tqdm(total=file_size)
with open(f'{title}.mp4', mode='wb') as f:
    for video_chunk in video.iter_content(1024*1024*2):
        f.write(video_chunk)
        video_pbar.set_description(f'正在下载{title}视频中......')
        video_pbar.update(1024*1024*2)
    video_pbar.set_description('下载完成!')
    video_pbar.close()

音频数据

audio = requests.get(audio_url, stream=True)
file_size = int(audio.headers.get('Content-Length'))
audio_pbar = tqdm(total=file_size)
with open(f'{title}.mp3', mode='wb') as f:
    for audio_chunk in audio.iter_content(1024*1024*2):
        f.write(audio_chunk)
        audio_pbar.set_description(f'正在下载{title}音频中......')
        audio_pbar.update(1024*1024*2)
    audio_pbar.set_description('下载完成!')
    audio_pbar.close()

合并音频和视频

def merge(title):
    ffmpeg = r'D:\Download\ffmpeg\bin\ffmpeg.exe -i ' + title + '.mp4 -i ' + title + '.mp3 -acodec copy -vcodec copy ' + title + '-out.mp4'
    os.popen(ffmpeg)

Original: https://www.cnblogs.com/qshhl/p/16106887.html
Author: 松鼠爱吃饼干
Title: Python爬取全球最大视频网站YouTube视频

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/500086/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球