python-视频声音根据语音识别自动转为带时间的srt字幕文件

2023年5月25日下午6:44 • 人工智能 • 阅读 73

文章目录

问题
解决
*
截图
srt格式原理
识别语音的讯飞接口调用函数
处理结果，得到字符
列表合成字典

问题

讯飞文字转写长语音只有5h免费，想要体验50000分钟白嫖的，看我另一篇文章
最近，我看了一些教程，发现网上没有字幕，也没有配对。看起来很尴尬。

[En]

Recently, I was watching some tutorials and found that there were no subtitles and no matches on the Internet. It looked very awkward.

因此我使用au处理了视频，得到了视频声音，wav格式，20多分钟长度
然后利用iFLYTEK的语音识别接口进行识别，得到每句话对应的文字和视频时间。

[En]

Then the speech recognition interface of iFLYTEK is used to recognize it, and the corresponding time of the text and video of each sentence is obtained.

然后按照srt格式对其进行了输出
通过这种方式，您可以自动为没有字幕的视频添加字幕。

[En]

In this way, you can automatically add subtitles to videos without subtitles.

我的需求大致得到了满足。把它录下来。

[En]

My needs are roughly met. Record it.

解决

截图

视频字幕效果

字幕由语音识别自动添加。

[En]

Subtitles are added automatically by speech recognition.

代码框输出格式

最后会生成srt字幕文件

; srt格式原理

如图所示，第一个是序号，第二个是字幕显示时间段，精确到微秒，文字在底部，中英文随心所欲

[En]

As shown in the picture, the first is the serial number, and the second is the subtitle display time period, accurate to microseconds, with the text at the bottom, at will in both Chinese and English

字幕数量一般是按顺序增加的，但对于视频来说用处不大，主要是为了方便译者翻译和观看，但不可或缺，这是一种必要的格式。

[En]

The subtitle number is generally increased sequentially, but it is not useful for video, mainly for the convenience of translators to translate and watch, but indispensable, this is a necessary format.

更加详细的看这个链接，这是我查的资料https://www.cnblogs.com/tocy/p/subtitle-format-srt.html

识别语音的讯飞接口调用函数

这个直接复制粘贴就行，只是一个调用的函数，非常通用，下面的另外一个函数是调用他的，位于同一个文件夹下的两个py文件
voice_get_text.py


import base64
import hashlib
import hmac
import json
import os
import time

import requests

lfasr_host = 'http://raasr.xfyun.cn/api'

api_prepare = '/prepare'
api_upload = '/upload'
api_merge = '/merge'
api_get_progress = '/getProgress'
api_get_result = '/getResult'

file_piece_sice = 10485760

lfasr_type = 0

has_participle = 'false'
has_seperate = 'true'

max_alternatives = 0

suid = ''

class SliceIdGenerator:
    """slice id生成器"""

    def __init__(self):
        self.__ch = 'aaaaaaaaa`'

    def getNextSliceId(self):
        ch = self.__ch
        j = len(ch) - 1
        while j >= 0:
            cj = ch[j]
            if cj != 'z':
                ch = ch[:j] + chr(ord(cj) + 1) + ch[j + 1:]
                break
            else:
                ch = ch[:j] + 'a' + ch[j + 1:]
                j = j - 1
        self.__ch = ch
        return self.__ch

class RequestApi(object):
    def __init__(self, appid, secret_key, upload_file_path):
        self.appid = appid
        self.secret_key = secret_key
        self.upload_file_path = upload_file_path

    def gene_params(self, apiname, taskid=None, slice_id=None):
        appid = self.appid
        secret_key = self.secret_key
        upload_file_path = self.upload_file_path
        ts = str(int(time.time()))
        m2 = hashlib.md5()
        m2.update((appid + ts).encode('utf-8'))
        md5 = m2.hexdigest()
        md5 = bytes(md5, encoding='utf-8')

        signa = hmac.new(secret_key.encode('utf-8'), md5, hashlib.sha1).digest()
        signa = base64.b64encode(signa)
        signa = str(signa, 'utf-8')
        file_len = os.path.getsize(upload_file_path)
        file_name = os.path.basename(upload_file_path)
        param_dict = {}

        if apiname == api_prepare:

            slice_num = int(file_len / file_piece_sice) + (0 if (file_len % file_piece_sice == 0) else 1)
            param_dict['app_id'] = appid
            param_dict['signa'] = signa
            param_dict['ts'] = ts
            param_dict['file_len'] = str(file_len)
            param_dict['file_name'] = file_name
            param_dict['slice_num'] = str(slice_num)
        elif apiname == api_upload:
            param_dict['app_id'] = appid
            param_dict['signa'] = signa
            param_dict['ts'] = ts
            param_dict['task_id'] = taskid
            param_dict['slice_id'] = slice_id
        elif apiname == api_merge:
            param_dict['app_id'] = appid
            param_dict['signa'] = signa
            param_dict['ts'] = ts
            param_dict['task_id'] = taskid
            param_dict['file_name'] = file_name
        elif apiname == api_get_progress or apiname == api_get_result:
            param_dict['app_id'] = appid
            param_dict['signa'] = signa
            param_dict['ts'] = ts
            param_dict['task_id'] = taskid
        return param_dict

    def gene_request(self, apiname, data, files=None, headers=None):
        response = requests.post(lfasr_host + apiname, data=data, files=files, headers=headers)
        result = json.loads(response.text)
        if result["ok"] == 0:
            print("{} success:".format(apiname) + str(result))
            return result
        else:
            print("{} error:".format(apiname) + str(result))
            exit(0)
            return result

    def prepare_request(self):
        return self.gene_request(apiname=api_prepare,
                                 data=self.gene_params(api_prepare))

    def upload_request(self, taskid, upload_file_path):
        file_object = open(upload_file_path, 'rb')
        try:
            index = 1
            sig = SliceIdGenerator()
            while True:
                content = file_object.read(file_piece_sice)
                if not content or len(content) == 0:
                    break
                files = {
                    "filename": self.gene_params(api_upload).get("slice_id"),
                    "content": content
                }
                response = self.gene_request(api_upload,
                                             data=self.gene_params(api_upload, taskid=taskid,
                                                                   slice_id=sig.getNextSliceId()),
                                             files=files)
                if response.get('ok') != 0:

                    print('upload slice fail, response: ' + str(response))
                    return False
                print('upload slice ' + str(index) + ' success')
                index += 1
        finally:
            'file index:' + str(file_object.tell())
            file_object.close()
        return True

    def merge_request(self, taskid):
        return self.gene_request(api_merge, data=self.gene_params(api_merge, taskid=taskid))

    def get_progress_request(self, taskid):
        return self.gene_request(api_get_progress, data=self.gene_params(api_get_progress, taskid=taskid))

    def get_result_request(self, taskid):
        return self.gene_request(api_get_result, data=self.gene_params(api_get_result, taskid=taskid))

    def all_api_request(self):

        pre_result = self.prepare_request()
        taskid = pre_result["data"]

        self.upload_request(taskid=taskid, upload_file_path=self.upload_file_path)

        self.merge_request(taskid=taskid)

        while True:

            progress = self.get_progress_request(taskid)
            progress_dic = progress
            if progress_dic['err_no'] != 0 and progress_dic['err_no'] != 26605:
                print('task error: ' + progress_dic['failed'])
                return
            else:
                data = progress_dic['data']
                task_status = json.loads(data)
                if task_status['status'] == 9:
                    print('task ' + taskid + ' finished')
                    break
                print('The task ' + taskid + ' is in processing, task status: ' + str(data))

            time.sleep(20)

        aaa=self.get_result_request(taskid=taskid)
        return aaa
        print(aaa)

处理结果，得到字符

放入自己在讯飞申请的语音转文字功能的id与key,执行后会得到一个巨长的声音识别后的dict字符串，自己处理一下变成srt格式就行了。当然这里我写的输出就是srt
video_to_txt.py


import voice_get_text
import datetime
video_path=input("音频路径：").replace("\\",'/')
print("开始处理...请等待")
api = voice_get_text.RequestApi(appid="申请的id", secret_key="申请的key",
                             upload_file_path=video_path)
myresult=api.all_api_request()
def get_format_time(time_long):
    def format_number(num):
        if len(str(num))>1:
            return str(num)
        else:
            return "0"+str(num)
    myhour=0
    mysecond=int(time_long/1000)
    myminute=0
    mymilsec=0
    if mysecond<1:
        return "00:00:00,%s"%(time_long)
    else:
        if mysecond>60:
            myminute=int(mysecond/60)
            if myminute>60:
                myhour=int(myminute/60)
                myminute=myminute-myhour*60
                mysecond=mysecond-myhour*3600-myminute*60
                mymilsec=time_long-1000*(mysecond+myhour*3600+myminute*60)
                return "%s:%s:%s,%s"%(format_number(myhour),format_number(myminute),format_number(mysecond),\
                                      format_number(mymilsec))
            else:
                mysecond=int(mysecond-myminute*60)
                mymilsec=time_long-1000*(mysecond+myminute*60)
                return "00:%s:%s,%s"%(format_number(myminute),format_number(mysecond),format_number(mymilsec))
        else:
            mymilsec=time_long-mysecond*1000
            return "00:00:%s,%s"%(mysecond,mymilsec)
myresult_str=myresult["data"]
myresult_sp=myresult_str.split("},{")
myresult_sp=myresult_sp[1:-1]
myword=""
flag_num=0
for i in myresult_sp:
    flag_num+=1
    print(i)
    word=[]
    key=[]
    a=i.split(",")
    for j in a:
        temp=j.split(":")
        key.append(temp[0][1:-1])
        word.append(temp[1][1:-1])
    get_dic=dict(zip(key,word))
    print(get_dic)
    bg= get_format_time(int(get_dic["bg"]))
    ed= get_format_time(int(get_dic["ed"]))
    real_word=get_dic["onebest"]
    newword=str(flag_num)+"\n"+bg+" --> "+ed+'\n'+real_word+"\n\n\n"
    myword=myword+newword
print(myword)

nowTime_str = datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H-%M-%S')
path_file=r"C:\Users\Administrator.DESKTOP-KMH7HN6\Desktop\video_text\%s.srt"%(nowTime_str)
f = open(path_file,'a')
f.write(myword)
f.write('\n')
f.close()
print('已经识别完成，见输出目录下的srt文件')
input()

列表合成字典

这是一个可以忽略的随机代码。

[En]

This is a random code that can be ignored.


keys = ['a', 'b', 'c']
values = [1, 2, 3]
mydic = dict(zip(keys, values))
print (mydic)

Original: https://blog.csdn.net/lidashent/article/details/113987349
Author: lidashent
Title: python-视频声音根据语音识别自动转为带时间的srt字幕文件

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/515510/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`,求解

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)问题求解…

人工智能 2023年7月13日
00105
预测控制（一）：MPC轨迹跟踪

本文先讲解MPC如何应用于差速机器人，然后使用MATLAB进行仿真测试。 MPC原理 MPC轨迹跟踪的思路不难理解，在目前位姿，预测后面N个时刻机器人所处的位置，与目标轨迹进行比较…

人工智能 2023年6月24日
0095
图神经网络如何处理非结构化数据

图神经网络处理非结构化数据在深度学习领域中，图神经网络（Graph Neural Networks，GNNs）是一种用于处理图结构数据的模型。而在现实世界中，许多数据是非结构化的…

人工智能 2024年1月6日
0053
“WebDriver“ object has no attribute “find_element_by_css_selector“

今天用selenium写爬虫, 想用selector查找元素定位的时候报了这样的错误，如图：解决办法方法一：查看我的selenium的版本是最新的。把降版本降到 3.141.0…

人工智能 2023年7月29日
0061
OpenCV图像锐化—USM锐化和Laplace锐化

学更好的别人，做更好的自己。 ——《微卡智享》本文长度为1832 字，预计阅读4 分钟前言图像锐化 (image sharpening) 是补偿图像的轮廓，增强图像的边缘及…

人工智能 2023年7月20日
0081
pytorch报错（4）forward() missing 1 required positional argument: ‘x‘或者‘NoneType‘ object is not callable

解决：TypeErro: ‘NoneType’ object is not callable/forward()TypeErro: forward() mi…

人工智能 2023年7月5日
0073
【Python数据分析】基础入门学习指南到数据分析实战

Python是一种流行的通用编程语言，在科学领域被广泛使用。你很容易在Python代码中调用以前的C、Fortran或者R代码。 Python是面向对象语言，比C和Fortran更…

人工智能 2023年7月16日
0063
python动态规划算法实例详解

文章目录 python动态规划算法实例详解 * 一、什么是动态规划？二、新视角：从斐波那契数列看动态规划三、实例扩展（爬楼梯） – 1. 题目描述 2. 示例 + …

人工智能 2023年7月5日
0090
【读点论文】ViTGAN: Training GANs with Vision Transformers 将视觉transformer和gan结合起来

最近，Vision Transformers(vits)在图像识别方面表现出了具有竞争力的性能，需要较少的视觉特定的归纳偏差。在本文中，研究这种观察是否可以扩展到图像生成。将V…

人工智能 2023年5月28日
0085
PCA算法的学习

引入主成分分析（Principal components analysis，简称PCA）是最重要的降维方法之一。在数据压缩消除冗余和数据噪音消除等领域都有广泛的应用。一般我们提到…

人工智能 2023年5月23日
0087
[九]深度学习Pytorch-transforms图像增强(剪裁、翻转、旋转)

往期内容 [一]深度学习Pytorch-张量定义与张量创建 [二]深度学习Pytorch-张量的操作：拼接、切分、索引和变换 [三]深度学习Pytorch-张量数学运算 [四]深度…

人工智能 2023年6月23日
0082
MobileNet V3(2019)

创新点 1.引入SE结构 MobileNetV3 的另一个新颖想法是在核心架构中加入一种名为「Squeeze-and-Excitation」的神经网络（简称 SE-Net，也是 …

人工智能 2023年7月14日
0063
cv2.error: OpenCV(4.5.5) D:…opencvmodulesdnnsrctensorflowtf_importer.cpp:2984: error: (-215

项目场景：人脸识别 import cv2 as cvimport time def getFaceBox(net, frame, conf_threshold=0.7):frame…

人工智能 2023年5月26日
0048
paddledetection2.1 2.3如何安装cython_bbox

方法一首先获得cythoncython_bbox-0.1.3.tar.gz 这个安装包（科学检索，若需要可私发）然后你得拥有visual C++ build tools 打开到…

人工智能 2023年7月12日
0082
手把手教学基于简单神经网络的激光雷达点云车辆检测(附代码)

准备工作 python matlab KITTI数据集激光雷达部分参考文献（本文基本按照该文献方式处理）：邓淇天. 基于激光雷达和视觉传感器融合的障碍物识别技术研究[D]. 南京…

人工智能 2023年7月14日
0097
深度学习简介

1.什么是深度学习在介绍深度学习之前，我们先看下人工智能，机器学习和深度学习之间的关系：机器学习是实现人工智能的一种途径，深度学习是机器学习的一个子集，也就是说深度学习是实现机…

人工智能 2023年6月23日
0065

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

python-视频声音根据语音识别自动转为带时间的srt字幕文件

文章目录

截图

; srt格式原理

识别语音的讯飞接口调用函数

处理结果，得到字符

列表合成字典

大家都在看