八种常见的语音标注方法 | 语音标注

科技冬奥会为不久前刚刚结束的2022年北京冬奥会提出了美好愿景。科大讯飞作为冬奥会的“翻译员”,为冬奥会提供自动翻译及相关的多语种语音转换、语音识别、语音合成等技术,展现出浓郁的科技文化气息。

[En]

The Science and Technology Winter Olympic Games put forward a bright vision for the 2022 Beijing Winter Olympic Games, which just ended not long ago. As the “translator” of the Winter Olympic Games, iFLYTEK provides automatic translation and related multilingual speech conversion, speech recognition, speech synthesis and other technologies for the Winter Olympic Games, showing a full of science and technology culture.

数据标注的重要性

随着人工智能的逐步发展,语音识别技术已经普及到我们生活的方方面面。在我们的日常生活中,语音助手、智能音箱、智能客服等都被应用到语音识别中。

[En]

With the gradual development of artificial intelligence, speech recognition technology has been popularized to all aspects of our lives. In our daily life, voice assistant, intelligent speaker, intelligent customer service and so on are all applied to speech recognition.

人工智能商业化目前在算力、算法和数据方面基本达到了阶段性成熟,为了能更好的落地,需要大量经过标注处理的相关数据作为AI训练支撑。数据作为AI商业化重要的一环,可以说数据决定了AI落地的程度。

科技的发展离不开大量的标注数据来训练模型,对于人工智能企业来说,高质量的数据是不可或缺的,分析、开发和使用数据,从而创造它的价值,这体现了数据标注的价值。

[En]

The development of science and technology is inseparable from a large number of labeled data to train the model, for artificial intelligence enterprises, high-quality data is indispensable, analysis, development and use of the data, so as to create the value of it, which reflects the value of data tagging.

八种常见的语音标注方法 | 语音标注

什么是语音标注?

语音标注是数据标注行业中常见的标注类型。语音标注是指标注者首先对语音中包含的文本信息和各种声音进行提取,然后对其进行转换或合成。标注后的数据主要用于人工智能机器学习,相当于在计算机系统上装上了耳朵,这样计算机就可以实现准确的语音识别。

[En]

Voice annotation is a common type of annotation in the data annotation industry. Speech tagging means that the tagger first “extracts” the text information and all kinds of sounds contained in the speech, and then transforms or synthesizes them. The tagged data is mainly used for artificial intelligence machine learning, which is equivalent to installing “ears” on the computer system, so that the computer can achieve accurate speech recognition.

语音标注方法

ASR语音转写

ASR就是自动语音识别技术,是一种将人的语音转换成文本的技术。语音转写就是将语音数据转写成文字数据的过程,是数据标注领域比较常见的一种标注形式。转写是把一种字母表中的字符转换成另一种字母表中的字符的过程,简单来说,转写就是字符之间相对应的转换。语音转写只能相应地转换为另一个字母表中的字符,从而保证两个字母表之间能够进行完全的、无歧义的、可逆的转换。因此,转写是针对拼音文字系统之间的转换而言的。ASR语音转写就是通过和理解过程把语音信号转变为相应的文本或命令的高技术。

ASR语音转写常用于客服、教育培训机构、医疗、金融等领域。

八种常见的语音标注方法 | 语音标注

语音切割

语音切分是识别自然语言中单词、音节或音素之间的边界的过程。语音切分是语音识别技术领域的一个重要的子问题。与大多数自然语言处理问题一样,语音分割需要考虑上下文、语法和语义。

[En]

Speech cutting is the process of recognizing the boundaries between words, syllables or phonemes in a natural language. Speech cutting is an important sub-problem in the field of speech recognition technology. As with most natural language processing problems, speech segmentation needs to take into account context, grammar and semantics.

语音清洗

语音清洗是对语音进行重新检查和验证的过程,其目的是删除重复信息并纠正存在的错误。

[En]

Voice cleaning is a process of re-examination and verification of voice, which aims to delete duplicate information and correct existing errors.

错误并提供语音一致性。语音清洗是语音数据预处理的第一步,也是确保后续结果正确的重要环节。

[En]

Error and provide voice consistency. Voice cleaning is not only the first step of voice data preprocessing, but also an important part to ensure the correct follow-up results.

八种常见的语音标注方法 | 语音标注

情绪判定

人类的语音包含了大量的信息,而语音中的情感信息是反映人类情感的一个非常重要的行为信号。同时,识别语音中包含的情感信息是自然人机交互的重要组成部分。当以不同的情绪说话时,相同语音内容的语义可能完全不同。只有当计算机同时识别语音的内容和语音的情感时,我们才能准确地理解语言的语义。因此,理解语音的情感可以使人机交互更有意义。

[En]

Human speech contains a lot of information, and the emotional information in speech is a very important behavioral signal to reflect human emotion. at the same time, recognizing the emotional information contained in speech is an important part of natural human-computer interaction. The semantics of the same speech content may be completely different when spoken with different emotions. only when the computer recognizes the content of the speech and the emotion of the speech at the same time, can we accurately understand the semantics of the language. therefore, understanding the emotion of speech can make human-computer interaction more meaningful.

它可用于自动驾驶仪中的人机交互。

[En]

It can be used in human-computer interaction in autopilot.

八种常见的语音标注方法 | 语音标注

声纹识别

声纹识别是一种生物识别技术,它通过分析一个或多个语音信号的特征来识别未知的声音。简单地说,它是一种识别某人是否说了一句话的技术。

[En]

Voiceprint recognition is a kind of biometric technology, which can identify unknown sounds by analyzing the characteristics of one or more speech signals. to put it simply, it is a technology to identify whether a sentence is spoken by someone.

不同人使用的消音器大小和形状不同,因此每个人的声纹图案也不同,主要体现在共鸣模式、声音纯度、平均音高和音域四个方面。声纹识别是将声音信号转换为电信号,然后由计算机进行识别。

[En]

The silencers used by different people are different in size and shape, so the voiceprint patterns of each person are different, which are mainly reflected in the following four aspects: resonance mode, voice purity, average pitch and voice range. Voiceprint recognition is to convert sound signals into electrical signals and then recognize them by computer.

目前来看,声纹识别常用的方法包括模板匹配法、最近邻方法、神经元网络方法,VQ聚类法等。

声纹识别主要应用于公安、司法等需要使用声纹识别的领域;在日常生活中,也用于使用声纹密码进行身份认证、登录、授权、打卡、语音唤醒等。

[En]

Voiceprint recognition is mainly used in public security, judicial and other fields that need to use voiceprint identification; in daily life, it is also used to use voiceprint password for identity authentication, login, authorization, clocking in, voice awakening and so on.

八种常见的语音标注方法 | 语音标注

音素标注

音素是根据音素的自然属性划分的最小语音单位。根据对音节发音动作的分析,一个动作构成一个音素。音素是构成音节的最小语音单位或片段,从音质的角度来看,它们是最小的线性语音单位。

[En]

The phoneme is the smallest phonetic unit divided according to the natural attributes of the phoneme. According to the analysis of the pronunciation action in the syllable, one action constitutes a phoneme. Phonemes are the smallest units or segments of speech that make up syllables, and they are the smallest linear speech units divided from the point of view of sound quality.

用国际音标标注语音的方法称为拼音法,可分为广义型和严格型两种。宽音法用能够辨别意义的音素来标记,严格的音律用严格的音素差异来标记,尽可能地显示音素之间的差异。宽音法使用的符号有限,而严格音法使用的符号很多。但两者都有各自的用途。

[En]

The method of marking speech with international phonetic alphabet is called phonetic method, which can be divided into two types: wide type and strict type. The wide phonetic method is marked with phonemes that can distinguish meaning, and the strict phonetic law is marked with strict phoneme differences, showing the differences between phonemes as far as possible. the symbols used in the wide phonetic method are limited, while the strict phonetic method uses a lot of symbols. but both have their own uses.

简单地说,音素标注就是根据音标、组成音素和发音来标记语音。

[En]

To put it simply, phoneme tagging is to mark speech according to phonetic symbols, constituent phonemes and pronunciation.

八种常见的语音标注方法 | 语音标注

韵律标注

语音合成系统中的韵律标注一般是基于文本信息来预测韵律的。以汉语标注为例,韵律预测是基于文本信息进行的,文本信息通常是根据声母、韵母、词、短语、段落等信息来确定的。韵律标注由专业的标注人员根据韵律预测结果完成。

[En]

Prosody labeling in speech synthesis system is generally based on text information to predict prosody. Taking Chinese tagging as an example, prosody prediction is carried out based on text information, which is usually determined according to initial consonant, vowel, word, phrase, paragraph and other information. The prosody labeling is completed by professional tagging personnel according to the prosodic prediction results.

八种常见的语音标注方法 | 语音标注

发音校对是在口语训练和纠正不规范发音的全过程中收集数据的过程。

[En]

Pronunciation proofreading is the process of collecting data in the whole process of oral training and correcting non-standard pronunciation.

语音校对可以用于智能搜索。

[En]

Pronunciation proofreading can be used in intelligent search.

八种常见的语音标注方法 | 语音标注

井连文技术为语音标注提供支持

[En]

Jing Lianwen Technology provides support for speech tagging

为了提高标注数据的准确性,景联文科技作为专业的数据采集标注公司,是长三角区域最大的数据服务行业厂商之一。致力于采用自建数据标注基地、先进的数据标注平台和全品类标注工具,支持语音工程,包括语音切割、ASR语音转写、语音情绪判定、声纹识别标注等多种标注类型,可全方位满足合作方各类数据标注需求,为行业赋能。

作为国家信息技术标准化技术委员会委员,我们始终把客户数据安全放在首位,具有非常高的数据安全合规意识,形成了完善的数据安全保障机制。

[En]

As a member of the National Information Technology Standardization Technical Committee, we always put customer data security in the first place, have a very high awareness of data security compliance, and formed a sound data security guarantee mechanism.

景联文科技致力于解决AI场景落地多样性、丰富性的数据需求,并通过数据传输、任务创建、数据标注、数据质检、数据交付过程的全优化,全面提升数据处理的效率和降低处理成本。

景联文科技立足扎根市场的实践积累,帮助解决AI产业化落地的现实问题,让众多AI企业和各个行业的龙头企业选择与景联文科技合作,且保持着长期良好的关系。未来,景联文科技也愿为AI企业提供全流程一体化的高精、高质数据服务解决方案。

八种常见的语音标注方法 | 语音标注

联系我们进行数据收集和标签。

[En]

Contact us for data collection and labeling.

market@jinglianwen.com

Original: https://blog.csdn.net/weixin_55551028/article/details/123485585
Author: 景联文科技
Title: 八种常见的语音标注方法 | 语音标注

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/498112/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球