MATLAB说话人识别系统

2023年5月27日上午7:10 • 人工智能 • 阅读 86

MATLAB说话人识别系统

一．语音识别的简介

说话人识别是根据说话人的语音信号来区分说话人的身份。言语是人类的自然属性之一。由于说话人发声器官的生理差异和后天习得的行为差异，每个人的言语都带有强烈的个人色彩，这使得通过分析语音信号来识别说话人成为可能。使用语音识别说话人的身份具有许多独特的优势，如语音是人的固有功能，不会丢失或遗忘；语音信号采集方便，系统设备成本低；利用电话网络还可以实现远程客服。因此，近年来，说话人识别受到越来越多的关注。与指纹识别、手识别等其他生物识别技术相比，说话人识别不仅使用方便，而且非接触，容易被用户接受，是现有生物识别技术中唯一可用于远程验证的识别技术。因此，说话人识别的应用前景非常广阔：今天，说话人识别技术已经涉及到多学科的研究领域，不同领域的进步促进了说话人识别的发展。说话人识别技术是一项涉及声学、语言学、计算机、信息处理、人工智能等多个领域的综合技术。

[En]

Speaker recognition is to distinguish the identity of the speaker according to the speaker’s speech signal. Speech is one of the natural attributes of human beings. due to the physiological differences of the speaker’s vocal organs and the acquired behavior differences, everyone’s speech has a strong personal color, which makes it possible to identify the speaker by analyzing the speech signal. Using voice to identify the identity of the speaker has many unique advantages, such as voice is an inherent feature of people, will not be lost or forgotten; voice signal collection is convenient, the cost of system equipment is low; the use of telephone network can also achieve remote customer service. Therefore, in recent years, more and more attention has been paid to speaker recognition. Compared with other biometric technologies such as fingerprint recognition and hand recognition, speaker recognition is not only easy to use, but also non-contact, easy to be accepted by users, and among the existing biometric recognition technologies, it is the only recognition technology that can be used for remote verification. Therefore, the application prospect of speaker recognition is very broad: today, speaker recognition technology has been related to multi-disciplinary research fields, and advances in different fields have contributed to the development of speaker recognition. Speaker recognition technology is a comprehensive technology in many fields, such as acoustics, linguistics, computer, information processing, artificial intelligence and so on.

如何从语音信号中提取人的基本特征是说话人识别系统设计中的基本问题。也就是说，语音特征向量的提取是整个说话人识别系统的基础，对说话人识别的误拒率和误接受率有着非常重要的影响。与语音识别不同，说话人识别利用语音信号中的说话人信息，不考虑语音中单词的含义，强调说话人的个性。因此，单一的语音特征向量很难提高识别率，因此如何提取语音信号的关键成分就显得尤为重要。语音信号特征参数的好坏直接影响到语音识别的准确性。

[En]

The fundamental problem in the design of speaker recognition system is how to extract the basic human features from the speech signal. That is, the extraction of speech feature vector is the basis of the whole speaker recognition system, which has a very important influence on the false rejection rate and false acceptance rate of speaker recognition. Different from speech recognition, speaker recognition uses the speaker information in the speech signal, regardless of the meaning of the words in the speech, it emphasizes the personality of the speaker. Therefore, a single speech feature vector is difficult to improve the recognition rate, so how to extract the key components of the speech signal is particularly important. The quality of the characteristic parameters of speech signal directly leads to the accuracy of discrimination.

系统在说话人的识别中采用基于Mel的频率倒谱系数的模板匹配的说话人识别方法。具体的实现过程当中，采用了matlab软件来帮助完成这个项目。在matlab中实现采集，分析，特征提取，配对，识别。

二．语音识别算法原理

2.1 语音识别系统总体框架

说话人识别系统的总体结构如图1所示。首先，通过语音记录作为输入信号，对输入的模拟语音信号进行预处理，包括预滤波、采样量化、加窗、端点检测、预加重等。经过预处理后，有一个重要的部分：特征参数的提取。具体要求如下：

[En]

The overall structure of the speaker recognition system is shown in figure 1. First of all, through the voice recording as the input signal, the input analog speech signal should be preprocessed, including pre-filtering, sampling and quantization, windowing, endpoint detection, pre-emphasis and so on. After preprocessing, there is an important part: feature parameter extraction. The specific requirements are:

参考模块

语音输入

预处理

特征提取

测度估计

模板库

识别决策

专家知识

图1 说话人语音识别系统总体框图

三、源码

function M3 = blockFrames(s, fs, m, n)

% blockFrames:

% Puts the signal into frames 分帧函数

% Inputs:

% s contains the signal to analize 语音信号

% fs is the sampling rate of the signal 语音采样频率

% m is the distance between the beginnings of two frames 两帧之间的距离

% n is the number of samples per frame 每帧的采样点数

% Output:

% M3 is a matrix containing all the frames 数组形式，包含了所有的帧

l = length(s); %语音信号的长度

nbFrame = floor((l – n) / m) + 1; %帧数

for i = 1:n

for j = 1:nbFrame

M(i, j) = s(((j – 1) * m) + i); %逐帧扫描

end

h = hamming(n);

M2 = diag(h) * M; %加汉明窗

for i = 1:nbFrame

M3(:, i) = fft(M2(:, i)); %短时傅立叶变换

End

function code = train(traindir, n)

% 计算wav文件的VQ码码本

% Speaker Recognition: Training Stage

% Input:

% traindir : string name of directory contains all train sound files

% n : number of train files in traindir

% Output:

% code : trained VQ codebooks, code{i} for i-th speaker

% Note:

% Sound files in traindir is supposed to be:

% s1.wav, s2.wav, …, sn.wav

% Example:

% >> code = train(‘C:\data\train\’, 8);

k = 16; % number of centroids required

for i = 1:n % train a VQ codebook for each speaker

file = sprintf(‘%ss%d.wav’, traindir, i);

disp(file);

[s, fs] = wavread(file);

v = mfcc(s, fs); % Compute MFCC’s

code{i} = vqlbg(v, k); % Train VQ codebook

end

function d = disteu(x, y)

% DISTEU Pairwise Euclidean distances between columns of two matrices 测试失真度

% Input:

% x, y: Two matrices whose each column is an a vector data.

% Output:

% d: Element d(i,j) will be the Euclidean distance between two column vectors X(:,i) and Y(:,j)

% Note:

% The Euclidean distance D between two vectors X and Y is:

% D = sum((x-y).^2).^0.5

[M, N] = size(x);

[M2, P] = size(y);

if (M ~= M2)

error(‘不匹配！’)

end

d = zeros(N, P);

if (N < P)

copies = zeros(1,P);

for n = 1:N

d(n,:) = sum((x(:, n+copies) – y) .^2, 1);

end

else

copies = zeros(1,N);

for p = 1:P

d(:,p) = sum((x – y(:, p+copies)) .^2, 1)’;

end

d = d.^0.5;

function m = melfb(p, n, fs)

% MELFB Determine matrix for a mel-spaced filterbank

% Inputs: p number of filters in filterbank 滤波器数

% n length of fft FFT变换的点数

% fs sample rate in Hz 采样频率

% Outputs: x a (sparse) matrix containing the filterbank amplitudes

% size(x) = [p, 1+floor(n/2)]

% Usage: For example, to compute the mel-scale spectrum of a

% colum-vector signal s, with length n and sample rate fs:

% f = fft(s);

% m = melfb(p, n, fs);

% n2 = 1 + floor(n/2);

% z = m * abs(f(1:n2)).^2;

% z would contain p samples of the desired mel-scale spectrum

% To plot filterbanks e.g.:

% plot(linspace(0, (12500/2), 129), melfb(20, 256, 12500)’),

% title(‘Mel-spaced filterbank’), xlabel(‘Frequency (Hz)’);

f0 = 700 / fs;

fn2 = floor(n/2);

lr = log(1 + 0.5/f0) / (p+1);

% convert to fft bin numbers with 0 for DC term

bl = n * (f0 * (exp([0 1 p p+1] * lr) – 1));

b1 = floor(bl(1)) + 1;

b2 = ceil(bl(2));

b3 = floor(bl(3));

b4 = min(fn2, ceil(bl(4))) – 1;

pf = log(1 + (b1:b4)/n/f0) / lr;

fp = floor(pf);

pm = pf – fp;

r = [fp(b2:b4) 1+fp(1:b3)];

c = [b2:b4 1:b3] + 1;

v = 2 * [1-pm(b2:b4) pm(1:b3)];

m = sparse(r, c, v, p, 1+fn2);

function r = mfcc(s, fs) % s声音信号的向量 fs取样频率

% MFCC

% Inputs: s contains the signal to analize

% fs is the sampling rate of the signal

% Output: r contains the transformed signal

m = 100;

n = 256;

l = length(s);

nbFrame = floor((l – n) / m) + 1;

for i = 1:n

for j = 1:nbFrame

M(i, j) = s(((j – 1) * m) + i);

end

h = hamming(n);

M2 = diag(h) * M;

for i = 1:nbFrame

frame(:,i) = fft(M2(:, i));

end

t = n / 2;

tmax = l / fs;

m = melfb(20, n, fs);

n2 = 1 + floor(n / 2);

z = m * abs(frame(1:n2, :)).^2;

r = dct(log(z));

function r = vqlbg(d,k)

% VQLBG Vector quantization using the Linde-Buzo-Gray algorithme 使用LBG算法

% Inputs:

% d contains training data vectors (one per column)

% k is number of centroids required

% Output:

% r contains the result VQ codebook (k columns, one for each centroids)

e = .01;

r = mean(d, 2);

dpr = 10000;

for i = 1:log2(k)

r = [r(1+e), r(1-e)];

while (1 == 1)

z = disteu(d, r);

[m,ind] = min(z, [], 2);

t = 0;

for j = 1:2^i

r(:, j) = mean(d(:, find(ind == j)), 2);

x = disteu(d(:, find(ind == j)), r(:, j));

for q = 1:length(x)

t = t + x(q);

end

if (((dpr – t)/t) < e)

break;

else

dpr = t;

end

function test(testdir, n, code)

% Speaker Recognition: Testing Stage

% Input:

% testdir : string name of directory contains all test sound files

% n : number of test files in testdir 音频文件数目

% code : codebooks of all trained speakers 说话人的训练码本

% Note:

% Sound files in testdir is supposed to be:

% s1.wav, s2.wav, …, sn.wav

% Example:

% >> test(‘C:\data\test\’, 8, code);

for k = 1:n % 读出音频文件

file = sprintf(‘%ss%d.wav’, testdir, k);

[s, fs] = wavread(file);

v = mfcc(s, fs); % 计算MFCC’s

distmin = inf;

k1 = 0;

for l = 1:length(code) % 计算每个训练码本的失真度

d = disteu(v, code{l});

dist = sum(min(d,[],2)) / size(d,1);

if dist < distmin

distmin = dist;

k1 = l;

end

msg = sprintf(‘Speaker %d matches with speaker %d’, k, k1);

disp(msg);

end

Original: https://blog.csdn.net/m0_60677550/article/details/120264305
Author: 技术狼灭
Title: MATLAB说话人识别系统

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/524900/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

C#，人工智能，深度学习，OpenCV，C#开发环境OpenCvSharp的安装、搭建与可视化教程

一、OpenCV OpenCV是基于Apache2.0许可（开源）发行的跨平台计算机视觉和机器学习函数库，支持Windows、Linux、Android和Mac OS操作系统。Op…

人工智能 2023年7月19日
00160
商业智能和分析软件市场现状及发展趋势分析-

辰宇信息咨询市场调研公司最近发布-《2021-2027全球与中国商业智能和分析软件市场现状及未来发展趋势》内容摘要本文研究全球及中国市场商业智能和分析软件现状及未来发展趋势，…

人工智能 2023年7月17日
0059
Yolov5如何在训练意外中断后接续训练

Yolov5如何在训练意外中断后接续训练 1.配置环境 2.问题描述 3.解决方法 * 3.1设置需要接续训练的结果 3.2设置训练代码 4.原理 5.结束语 1.配置环境操作系…

人工智能 2023年7月20日
0083
【数据挖掘 | 可视化】 WordCloud 词云（附详细代码案例)

🤵‍♂️ 个人主页: @计算机魔术师👨‍💻 作者简介：CSDN内容合伙人，全栈领域优质创作者。 开发环&#…

人工智能 2023年7月30日
0071
【ROS机器人】 — 2-4. 路径规划_move_base

“我回来啦” 前期准备基础知识 * 1、action通信 2. 代价地图及组成 3. 碰撞算法文件详解及过程 * I、集成参数的launch文件 II、…

人工智能 2023年6月2日
0065
深度学习框架的主要应用领域有哪些

深度学习框架的主要应用领域深度学习框架是构建和训练深度神经网络的工具，广泛应用于许多领域，包括计算机视觉、自然语言处理、音频处理和强化学习等。下面将详细介绍深度学习框架在这些领域…

人工智能 2024年1月1日
0033
第5章【思考与练习1】延续回归模型的性能评估，计算使用全部数据学习得到的回归模型linreg在测试集上的性能。从例5-2训练集中取出前100条、200条样本，学习得到回归模型，并在测试集上预测分析

P96思考与练习1 1. 延续回归模型的性能评估，计算使用全部数据学习得到的回归模型linreg在测试集上的性能，与只使用训练集的模型linregTr进行比较，并对结果进行分析。 …

人工智能 2023年6月17日
0053
决策树基本原理：基于信息增益、增益率与基尼系数的划分选择，预剪枝与后剪枝，多变量决策树以及决策树优缺点概述

决策树算法是数据挖掘分类算法中常见的一种方法。它以树状结构表现，叶子结点代表分类结果，内部结点描述一个属性，从上到下的一条路径，确定一条分类规则。[1]谢妞妞.决策树算法综述[J]…

人工智能 2023年7月17日
0075
pytorch深度学习：神经网络拟合方程(回归问题)

神经网络入门的应用就是拟合方程，这篇文章就针对这个问题来熟悉pytorch怎么搭建神经网络模型。问题提出我们要拟合的是y = x^2这个最简单的一元二次方程，首先要创建我们的x…

人工智能 2023年6月18日
0078
Python人脸识别智能考勤系统 (供源码,附报告)(可答疑，可调试)

目录一、项目简介二、功能展示 1.人脸识别功能测试 2.识别并录入人脸及个人信息 3.数据库保存信息 4.考勤打卡三、环境安装实例 1.下载python对应版本的dlib 2…

人工智能 2023年6月24日
0067
机器学习——结构因果机制（SCM）、因果关系、因果推断

因果 1. 什么是因果；为什么研究因果 * 1.1 什么是 1.2 为什么研究 1.3 机器学习中用到的因果推论 1.4 因果性和相关性的区别 – Two main q…

人工智能 2023年6月15日
0075
深度可分离卷积

深度可分离卷积导读提出背景研究进展深度可分离卷积详细介绍空间可分离参数分析深度可分离卷积以及MobileNet的网络结构导读最近在学习模型量化相关的技术，过程中了…

人工智能 2023年7月14日
0043
100天精通Python（进阶篇）——第34天：正则表达式大总结

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。 Original: https:…

人工智能 2023年7月5日
0087
Keras深度学习实战——基于ResNet模型实现性别分类

Keras深度学习实战——基于ResNet模型实现性别分类 * – 0. 前言 – 1. ResNet 架构简介 – 2. 基于预训练的 Res…

人工智能 2023年7月2日
0057
jupyter notebook里配置pytorch和tensorflow

①需要激活pytorch环境activate pytorch，在pytorch环境下输入 conda install nb_conda_kernels 再重新激活tensorflo…

人工智能 2023年5月25日
0068
〖Python接口自动化测试实战篇⑥〗- 接口抓包工具 Chrome 的使用

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。 Original: https:…

人工智能 2023年7月5日
0063

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

MATLAB说话人识别系统

大家都在看