# 语音质量的评价指标介绍及MATLAB实现（二）

## 对数似然比（log-likelihood ratio, LLR）测度

x ( n ) = ∑ i = 1 p a x ( i ) x ( n − i ) + G x u ( n ) x\left( n \right)=\sum\limits_{i=1}^{p}{{{a}{x}}\left( i \right)x\left( n-i \right)}+{{G}{x}}u\left( n \right)x (n )=i =1 ∑p ​a x ​(i )x (n −i )+G x ​u (n )

d L L R ( a x , a ˉ x ^ ) = log ⁡ a ˉ x ^ T R x a ˉ x ^ a x T R x a x {{d}{LLR}}\left( {{\mathbf{a}}{x}},{{{\mathbf{\bar{a}}}}{{\hat{x}}}} \right)=\log \frac{\mathbf{\bar{a}}{{\hat{x}}}^{T}{{\mathbf{R}}{x}}{{{\mathbf{\bar{a}}}}{{\hat{x}}}}}{\mathbf{a}{x}^{T}{{\mathbf{R}}{x}}{{\mathbf{a}}{x}}}d L L R ​(a x ​,a ˉx ^​)=lo g a x T ​R x ​a x ​a ˉx ^T ​R x ​a ˉx ^​​

{x}}={{\left[ 1,-{{\alpha }{x}}\left( 1 \right),-{{\alpha }{x}}\left( 2 \right),\cdots ,-{{\alpha }{x}}\left( p \right) \right]}^{T}}a ˉx ​=[1 ,−αx ​(1 ),−αx ​(2 ),⋯,−αx ​(p )]T表示干净语音的LPC系数，a ˉ x ^ = [ 1 , − α x ^ ( 1 ) , − α x ^ ( 2 ) , ⋯ , − α x ^ ( p ) ] T {{\mathbf{\bar{a}}}{{\hat{x}}}}={{\left[ 1,-{{\alpha }{{\hat{x}}}}\left( 1 \right),-{{\alpha }{{\hat{x}}}}\left( 2 \right),\cdots ,-{{\alpha }{{\hat{x}}}}\left( p \right) \right]}^{T}}a ˉx ^​=[1 ,−αx ^​(1 ),−αx ^​(2 ),⋯,−αx ^​(p )]T表示增强语音的LPC系数，R x {{\mathbf{R}}{x}}R x ​表示干净语音的自相关矩阵。

## 代码如下：

function llr_mean= comp_llr(cleanFile, enhancedFile);

% ----------------------------------------------------------------------
%
%      Log Likelihood Ratio (LLR) Objective Speech Quality Measure
%
%
%     This function implements the Log Likelihood Ratio Measure
%     defined on page 48 of [1] (see Equation 2.18).

%
%   Usage:  llr=comp_llr(cleanFile.wav, enhancedFile.wav)
%
%         cleanFile.wav - clean input file in .wav format
%         enhancedFile  - enhanced output file in .wav format
%         llr           - computed likelihood ratio
%
%         Note that the LLR measure is limited in the range [0, 2].

%
%  Example call:  llr =comp_llr('sp04.wav','enhanced.wav')
%
%
%  References:
%
%     [1] S. R. Quackenbush, T. P. Barnwell, and M. A. Clements,
%       Objective Measures of Speech Quality.  Prentice Hall
%       Advanced Reference Series, Englewood Cliffs, NJ, 1988,
%       ISBN: 0-13-629056-6.

%
%  Authors: Bryan L. Pellom and John H. L. Hansen (July 1998)
%  Modified by: Philipos C. Loizou  (Oct 2006) - limited LLR to be in [0,2]
%
% Copyright (c) 2006 by Philipos C. Loizou
% $Revision: 0.0$  $Date: 10/09/2006$
% ----------------------------------------------------------------------

if nargin~=2
fprintf('USAGE: LLR=comp_llr(cleanFile.wav, enhancedFile.wav)\n');
fprintf('For more help, type: help comp_llr\n\n');
return;
end

alpha=0.95;
if ( fs1~= fs2)
error( 'The two files do not match!\n');
end

len= min( length( data1), length( data2));
data1= data1( 1: len)+eps;
data2= data2( 1: len)+eps;

IS_dist= llr( data1, data2,fs1);

IS_len= round( length( IS_dist)* alpha);
IS= sort( IS_dist);

llr_mean= mean( IS( 1: IS_len));

end

function distortion = llr(clean_speech, processed_speech,sample_rate)

% ----------------------------------------------------------------------
% Check the length of the clean and processed speech.  Must be the same.

% ----------------------------------------------------------------------

clean_length      = length(clean_speech);
processed_length  = length(processed_speech);

if (clean_length ~= processed_length)
disp('Error: Both Speech Files must be same length.');
return
end

% ----------------------------------------------------------------------
% Global Variables
% ----------------------------------------------------------------------

winlength   = round(30*sample_rate/1000); %240;        % window length in samples
skiprate    = floor(winlength/4);          % window skip in samples
if sample_rate<10000
P           = 10;          % LPC Analysis Order
else
P=16;     % this could vary depending on sampling frequency.

end
% ----------------------------------------------------------------------
% For each frame of input speech, calculate the Log Likelihood Ratio
% ----------------------------------------------------------------------

num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
start      = 1;                 % starting sample
window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));

for frame_count = 1:num_frames

% ----------------------------------------------------------
% (1) Get the Frames for the test and reference speech.

%     Multiply by Hanning Window.

% ----------------------------------------------------------

clean_frame = clean_speech(start:start+winlength-1);
processed_frame = processed_speech(start:start+winlength-1);
clean_frame = clean_frame.*window;
processed_frame = processed_frame.*window;

% ----------------------------------------------------------
% (2) Get the autocorrelation lags and LPC parameters used
%     to compute the LLR measure.

% ----------------------------------------------------------

[R_clean, Ref_clean, A_clean] = ...

lpcoeff(clean_frame, P);
[R_processed, Ref_processed, A_processed] = ...

lpcoeff(processed_frame, P);

% ----------------------------------------------------------
% (3) Compute the LLR measure
% ----------------------------------------------------------

numerator   = A_processed*toeplitz(R_clean)*A_processed';
denominator = A_clean*toeplitz(R_clean)*A_clean';
distortion(frame_count) = min(2,log(numerator/denominator));
start = start + skiprate;

end
end

function [acorr, refcoeff, lpparams] = lpcoeff(speech_frame, model_order)

% ----------------------------------------------------------
% (1) Compute Autocorrelation Lags
% ----------------------------------------------------------

winlength = max(size(speech_frame));
for k=1:model_order+1
R(k) = sum(speech_frame(1:winlength-k+1) ...

.*speech_frame(k:winlength));
end

% ----------------------------------------------------------
% (2) Levinson-Durbin
% ----------------------------------------------------------

a = ones(1,model_order);
E(1)=R(1);
for i=1:model_order
a_past(1:i-1) = a(1:i-1);
sum_term = sum(a_past(1:i-1).*R(i:-1:2));
rcoeff(i)=(R(i+1) - sum_term) / E(i);
a(i)=rcoeff(i);
a(1:i-1) = a_past(1:i-1) - rcoeff(i).*a_past(i-1:-1:1);
E(i+1)=(1-rcoeff(i)*rcoeff(i))*E(i);
end

acorr    = R;
refcoeff = rcoeff;
lpparams = [1 -a];
end


## Itakura-Satio（IS）测度

Itakura-Satio（IS）测度可以表示为
d I S ( a x , a ˉ x ^ ) = G x a ˉ x ^ T R x a ˉ x ^ G ˉ x ^ a x T R x a x + log ⁡ ( G ˉ x ^ G x ) − 1 {{d}{IS}}\left( {{\mathbf{a}}{x}},{{{\mathbf{\bar{a}}}}{{\hat{x}}}} \right)=\frac{{{G}{x}}\mathbf{\bar{a}}{{\hat{x}}}^{T}{{\mathbf{R}}{x}}{{{\mathbf{\bar{a}}}}{{\hat{x}}}}}{{{{\bar{G}}}{{\hat{x}}}}\mathbf{a}{x}^{T}{{\mathbf{R}}{x}}{{\mathbf{a}}{x}}}+\log \left( \frac{{{{\bar{G}}}{{\hat{x}}}}}{{{G}{x}}} \right)-1 d I S ​(a x ​,a ˉx ^​)=G ˉx ^​a x T ​R x ​a x ​G x ​a ˉx ^T ​R x ​a ˉx ^​​+lo g (G x ​G ˉx ^​​)−1

{x}}G x ​表示干净语音的全极点增益，G ˉ x ^ {{\bar{G}}{{\hat{x}}}}G ˉx ^​表示增强语音的全极点增益，其中G x {{G}{x}}G x ​可以表示为
G x = ( r x T a x ) 1 / 2 {{G}{x}}\text{=}{{\left( \mathbf{r}{x}^{T}{{\mathbf{a}}{x}} \right)}^{1/2}}G x ​=(r x T ​a x ​)1 /2

{x}}={{\left[ {{r}{x}}\left( 0 \right),{{r}{x}}\left( 1 \right),\cdots ,{{r}{x}}\left( p \right) \right]}^{T}}r x ​=[r x ​(0 ),r x ​(1 ),⋯,r x ​(p )]T表示干净语音的自相关，也就是R x {{\mathbf{R}}{x}}R x ​的第一行。从中可以看出，IS测度表示的是全极点的增益之差，但这也是IS测度的缺点，因为幅度差异对音质的影响最小。

## 代码如下：

function is_mean= comp_is(cleanFile, enhancedFile);
% ----------------------------------------------------------------------
%          Itakura-Saito (IS) Objective Speech Quality Measure
%
%   This function implements the Itakura-Saito distance measure
%   defined on page 50 of [1] (see Equation 2.26).  See also
%   Equation 12 (page 1480) of [2].

%
%   Usage:  IS=comp_is(cleanFile.wav, enhancedFile.wav)
%
%         cleanFile.wav - clean input file in .wav format
%         enhancedFile  - enhanced output file in .wav format
%         IS            - computed Itakura Saito measure
%
%         Note that the IS measure is limited in the range [0, 100].

%
%  Example call:  IS =comp_is('sp04.wav','enhanced.wav')
%
%
%  References:
%
%     [1] S. R. Quackenbush, T. P. Barnwell, and M. A. Clements,
%       Objective Measures of Speech Quality.  Prentice Hall
%       Advanced Reference Series, Englewood Cliffs, NJ, 1988,
%       ISBN: 0-13-629056-6.

%
%     [2] B.-H. Juang, "On Using the Itakura-Saito Measures for
%           Speech Coder Performance Evaluation", AT&T Bell
%       Laboratories Technical Journal, Vol. 63, No. 8,
%       October 1984, pp. 1477-1498.

%
%  Authors: Bryan L. Pellom and John H. L. Hansen (July 1998)
%  Modified by: Philipos C. Loizou  (Oct 2006) - limited IS to be in [0,100]
%
% Copyright (c) 2006 by Philipos C. Loizou
% $Revision: 0.0$  $Date: 10/09/2006$

% ----------------------------------------------------------------------

if nargin~=2
fprintf('USAGE: IS=comp_is(cleanFile.wav, enhancedFile.wav)\n');
fprintf('For more help, type: help comp_is\n\n');
return;
end

alpha=0.95;

if ( fs1~= fs2)
error( 'The two files do not match!\n');
end

len= min( length( data1), length( data2));
data1= data1( 1: len)+eps;
data2= data2( 1: len)+eps;

IS_dist= is( data1, data2,fs1);

IS_len= round( length( IS_dist)* alpha);
IS= sort( IS_dist);

is_mean= mean( IS( 1: IS_len));
end

function distortion = is(clean_speech, processed_speech,sample_rate)

% ----------------------------------------------------------------------
% Check the length of the clean and processed speech.  Must be the same.

% ----------------------------------------------------------------------

clean_length      = length(clean_speech);
processed_length  = length(processed_speech);

if (clean_length ~= processed_length)
disp('Error: Both Speech Files must be same length.');
return
end

% ----------------------------------------------------------------------
% Scale both clean speech and processed speech to have same dynamic
% range.  Also remove DC component from each signal
% ----------------------------------------------------------------------

%clean_speech     = clean_speech     - mean(clean_speech);
%processed_speech = processed_speech - mean(processed_speech);

%processed_speech = processed_speech.*(max(abs(clean_speech))/ max(abs(processed_speech)));

% ----------------------------------------------------------------------
% Global Variables
% ----------------------------------------------------------------------

%sample_rate = 8000;           % default sample rate
winlength   = round(30*sample_rate/1000); %240;        % window length in samples
skiprate    = floor(winlength/4);          % window skip in samples
if sample_rate<10000
P           = 10;           % LPC Analysis Order
else
P=16;     % this could vary depending on sampling frequency.

end
% ----------------------------------------------------------------------
% For each frame of input speech, calculate the Itakura-Saito Measure
% ----------------------------------------------------------------------

num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
start      = 1;                 % starting sample
window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));

for frame_count = 1:num_frames

% ----------------------------------------------------------
% (1) Get the Frames for the test and reference speech.

%     Multiply by Hanning Window.

% ----------------------------------------------------------

clean_frame = clean_speech(start:start+winlength-1);
processed_frame = processed_speech(start:start+winlength-1);
clean_frame = clean_frame.*window;
processed_frame = processed_frame.*window;

% ----------------------------------------------------------
% (2) Get the autocorrelation lags and LPC parameters used
%     to compute the IS measure.

% ----------------------------------------------------------

[R_clean, Ref_clean, A_clean] = ...

lpcoeff(clean_frame, P);
[R_processed, Ref_processed, A_processed] = ...

lpcoeff(processed_frame, P);

% ----------------------------------------------------------
% (3) Compute the IS measure
% ----------------------------------------------------------

numerator      = A_processed*toeplitz(R_clean)*A_processed';
denominator    = max(A_clean*toeplitz(R_clean)*A_clean',eps);
gain_clean     = max(R_clean*A_clean',eps);        % this is gain
gain_processed = max(R_processed*A_processed',eps); % squared (sigma^2)

ISvalue=(gain_clean/gain_processed)*(numerator/denominator) + ...

log(gain_processed/gain_clean)-1;

distortion(frame_count) = min(ISvalue,100);
start = start + skiprate;

end
end

function [acorr, refcoeff, lpparams] = lpcoeff(speech_frame, model_order)

% ----------------------------------------------------------
% (1) Compute Autocorrelation Lags
% ----------------------------------------------------------

winlength = max(size(speech_frame));
for k=1:model_order+1
R(k) = sum(speech_frame(1:winlength-k+1) ...
.*speech_frame(k:winlength));
end

% ----------------------------------------------------------
% (2) Levinson-Durbin
% ----------------------------------------------------------

a = ones(1,model_order);
E(1)=R(1);
for i=1:model_order
a_past(1:i-1) = a(1:i-1);
sum_term = sum(a_past(1:i-1).*R(i:-1:2));
rcoeff(i)=(R(i+1) - sum_term) / E(i);
a(i)=rcoeff(i);
a(1:i-1) = a_past(1:i-1) - rcoeff(i).*a_past(i-1:-1:1);
E(i+1)=(1-rcoeff(i)*rcoeff(i))*E(i);
end

acorr    = R;
refcoeff = rcoeff;
lpparams = [1 -a];
end


## 倒谱距离（Cepstrum Distance）测度

c ( m ) = a m + ∑ k = 1 m − 1 k m c ( k ) a m − k , 1 ≤ m ≤ p c\left( m \right)={{a}{m}}+\sum\limits{k=1}^{m-1}{\frac{k}{m}c\left( k \right){{a}{m-k}}}\ \ \ \ \ \ ,1\le m\le p c (m )=a m ​+k =1 ∑m −1 ​m k ​c (k )a m −k ​,1 ≤m ≤p

{j}a j ​表示LPC系数。

[En]

Then a measure of cepstrum distance can be expressed as

d c e p ( c x , c ˉ x ^ ) = 10 ln ⁡ 10 2 ∑ k = 1 p [ c x ( k ) − c x ^ ( k ) ] 2 {{d}{cep}}\left( {{\mathbf{c}}{x}},{{{\mathbf{\bar{c}}}}{{\hat{x}}}} \right)=\frac{10}{\ln 10}\sqrt{2\sum\limits{k=1}^{p}{{{\left[ {{c}{x}}\left( k \right)-{{c}{{\hat{x}}}}\left( k \right) \right]}^{2}}}}d c e p ​(c x ​,c ˉx ^​)=ln 1 0 1 0 ​2 k =1 ∑p ​[c x ​(k )−c x ^​(k )]2 ​

## 代码如下：

function cep_mean= comp_cep(cleanFile, enhancedFile);

% ----------------------------------------------------------------------
%          Cepstrum Distance Objective Speech Quality Measure
%
%   This function implements the cepstrum distance measure used
%   in [1]
%
%   Usage:  CEP=comp_cep(cleanFile.wav, enhancedFile.wav)
%
%         cleanFile.wav - clean input file in .wav format
%         enhancedFile  - enhanced output file in .wav format
%         CEP           - computed cepstrum distance measure
%
%         Note that the cepstrum measure is limited in the range [0, 10].

%
%  Example call:  CEP =comp_cep('sp04.wav','enhanced.wav')
%
%
%  References:
%
%     [1]   Kitawaki, N., Nagabuchi, H., and Itoh, K. (1988). Objective quality
%           evaluation for low bit-rate speech coding systems. IEEE J. Select.

%           Areas in Comm., 6(2), 262-273.

%
%  Author: Philipos C. Loizou
%  (LPC routines were written by Bryan Pellom & John Hansen)
%
% Copyright (c) 2006 by Philipos C. Loizou
% $Revision: 0.0$  $Date: 10/09/2006$

% ----------------------------------------------------------------------
if nargin~=2
fprintf('USAGE: CEP=comp_cep(cleanFile.wav, enhancedFile.wav)\n');
fprintf('For more help, type: help comp_cep\n\n');
return;
end

alpha=0.95;

if ( fs1~= fs2)
error( 'The two files do not match!\n');
end

len= min( length( data1), length( data2));
data1= data1( 1: len)+eps;
data2= data2( 1: len)+eps;

IS_dist= cepstrum( data1, data2,fs1);

IS_len= round( length( IS_dist)* alpha);
IS= sort( IS_dist);

cep_mean= mean( IS( 1: IS_len));
end

function distortion = cepstrum(clean_speech, processed_speech,sample_rate)

% ----------------------------------------------------------------------
% Check the length of the clean and processed speech.  Must be the same.

% ----------------------------------------------------------------------

clean_length      = length(clean_speech);
processed_length  = length(processed_speech);

if (clean_length ~= processed_length)
disp('Error: Both Speech Files must be same length.');
return
end

% ----------------------------------------------------------------------
% Scale both clean speech and processed speech to have same dynamic
% range.  Also remove DC component from each signal
% ----------------------------------------------------------------------

%clean_speech     = clean_speech     - mean(clean_speech);
%processed_speech = processed_speech - mean(processed_speech);

%processed_speech = processed_speech.*(max(abs(clean_speech))/ max(abs(processed_speech)));

% ----------------------------------------------------------------------
% Global Variables
% ----------------------------------------------------------------------

winlength   = round(30*sample_rate/1000); %240;        % window length in samples
skiprate    = floor(winlength/4);          % window skip in samples
if sample_rate<10000
P           = 10;           % LPC Analysis Order
else
P=16;     % this could vary depending on sampling frequency.

end
C=10*sqrt(2)/log(10);
% ----------------------------------------------------------------------
% For each frame of input speech, calculate the Itakura-Saito Measure
% ----------------------------------------------------------------------

num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
start      = 1;                 % starting sample
window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));

for frame_count = 1:num_frames

% ----------------------------------------------------------
% (1) Get the Frames for the test and reference speech.

%     Multiply by Hanning Window.

% ----------------------------------------------------------

clean_frame = clean_speech(start:start+winlength-1);
processed_frame = processed_speech(start:start+winlength-1);
clean_frame = clean_frame.*window;
processed_frame = processed_frame.*window;

% ----------------------------------------------------------
% (2) Get the autocorrelation lags and LPC parameters used
%     to compute the IS measure.

% ----------------------------------------------------------

[R_clean, Ref_clean, A_clean] = ...

lpcoeff(clean_frame, P);
[R_processed, Ref_processed, A_processed] = ...

lpcoeff(processed_frame, P);

C_clean=lpc2cep(A_clean);
C_processed=lpc2cep(A_processed);

% ----------------------------------------------------------
% (3) Compute the cepstrum-distance measure
% ----------------------------------------------------------

distortion(frame_count) = min(10,C*norm(C_clean-C_processed,2));

start = start + skiprate;

end
end

function [acorr, refcoeff, lpparams] = lpcoeff(speech_frame, model_order)

% ----------------------------------------------------------
% (1) Compute Autocorrelation Lags
% ----------------------------------------------------------

winlength = max(size(speech_frame));
for k=1:model_order+1
R(k) = sum(speech_frame(1:winlength-k+1) ...
.*speech_frame(k:winlength));
end

% ----------------------------------------------------------
% (2) Levinson-Durbin
% ----------------------------------------------------------

a = ones(1,model_order);
E(1)=R(1);
for i=1:model_order
a_past(1:i-1) = a(1:i-1);
sum_term = sum(a_past(1:i-1).*R(i:-1:2));
rcoeff(i)=(R(i+1) - sum_term) / E(i);
a(i)=rcoeff(i);
a(1:i-1) = a_past(1:i-1) - rcoeff(i).*a_past(i-1:-1:1);
E(i+1)=(1-rcoeff(i)*rcoeff(i))*E(i);
end

acorr    = R;
refcoeff = rcoeff;
lpparams = [1 -a];
end
%----------------------------------------------
function [cep]=lpc2cep(a)
%
% converts prediction to cepstrum coefficients
%
% Author: Philipos C. Loizou

M=length(a);
cep=zeros(1,M-1);

cep(1)=-a(2);

for k=2:M-1
ix=1:k-1;
vec1=cep(ix).*a(k-1+1:-1:2).*ix;
cep(k)=-(a(k+1)+sum(vec1)/k);

end
end


## 加权谱斜率（Weighted Spectral Slope, WSS）测度

[En]

The measure is calculated based on the weighted difference between the spectral slopes of each band. First, the spectral slope of each frequency band is found, where the spectral slope can be expressed as

{ S x ( k ) = C x ( k + 1 ) − C x ( k ) S ˉ x ^ ( k ) = C ˉ x ^ ( k + 1 ) − C ˉ x ^ ( k ) \left{ \begin{aligned} & {{S}{x}}\left( k \right)={{C}{x}}\left( k+1 \right)-{{C}{x}}\left( k \right) \ & {{{\bar{S}}}{{\hat{x}}}}\left( k \right)={{{\bar{C}}}{{\hat{x}}}}\left( k+1 \right)-{{{\bar{C}}}{{\hat{x}}}}\left( k \right) \ \end{aligned} \right.{​S x ​(k )=C x ​(k +1 )−C x ​(k )S ˉx ^​(k )=C ˉx ^​(k +1 )−C ˉx ^​(k )​

W ( k ) = K max ⁡ [ K max ⁡ + C max ⁡ − C x ( k ) ] K l o c max ⁡ [ K l o c max ⁡ + C l o c max ⁡ − C x ( k ) ] W\left( k \right)=\frac{{{K}{\max }}}{\left[ {{K}{\max }}+{{C}{\max }}-{{C}{x}}\left( k \right) \right]}\frac{{{K}{loc\max }}}{\left[ {{K}{loc\max }}+{{C}{loc\max }}-{{C}{x}}\left( k \right) \right]}W (k )=[K max ​+C max ​−C x ​(k )]K max ​​[K l o c max ​+C l o c max ​−C x ​(k )]K l o c max ​​

d W S S = ( C x , C ˉ x ) = ∑ k = 1 36 W ( k ) ( S x ( k ) − S ˉ x ^ ( k ) ) 2 {{d}{WSS}}=\left( {{C}{x}},{{{\bar{C}}}{x}} \right)=\sum\limits{k=1}^{36}{W\left( k \right){{\left( {{S}{x}}\left( k \right)-{{{\bar{S}}}{{\hat{x}}}}\left( k \right) \right)}^{2}}}d W S S ​=(C x ​,C ˉx ​)=k =1 ∑3 6 ​W (k )(S x ​(k )−S ˉx ^​(k ))2

## 代码如下：

function wss_dist= comp_wss(cleanFile, enhancedFile);
% ----------------------------------------------------------------------
%
%     Weighted Spectral Slope (WSS) Objective Speech Quality Measure
%
%     This function implements the Weighted Spectral Slope (WSS)
%     distance measure originally proposed in [1].  The algorithm
%     works by first decomposing the speech signal into a set of
%     frequency bands (this is done for both the test and reference
%     frame).  The intensities within each critical band are
%     measured.  Then, a weighted distances between the measured
%     slopes of the log-critical band spectra are computed.

%     This measure is also described in Section 2.2.9 (pages 56-58)
%     of [2].

%
%     Whereas Klatt's original measure used 36 critical-band
%     filters to estimate the smoothed short-time spectrum, this
%     implementation considers a bank of 25 filters spanning
%     the 4 kHz bandwidth.

%
%   Usage:  wss_dist=comp_wss(cleanFile.wav, enhancedFile.wav)
%
%         cleanFile.wav - clean input file in .wav format
%         enhancedFile  - enhanced output file in .wav format
%         wss_dist      - computed spectral slope distance
%
%  Example call:  ws =comp_wss('sp04.wav','enhanced.wav')
%
%  References:
%
%     [1] D. H. Klatt, "Prediction of Perceived Phonetic Distance
%       from Critical-Band Spectra: A First Step", Proc. IEEE
%       ICASSP'82, Volume 2, pp. 1278-1281, May, 1982.

%
%     [2] S. R. Quackenbush, T. P. Barnwell, and M. A. Clements,
%       Objective Measures of Speech Quality.  Prentice Hall
%       Advanced Reference Series, Englewood Cliffs, NJ, 1988,
%       ISBN: 0-13-629056-6.

%
%  Authors: Bryan L. Pellom and John H. L. Hansen (July 1998)
%  Modified by: Philipos C. Loizou  (Oct 2006)
%
% Copyright (c) 2006 by Philipos C. Loizou
% $Revision: 0.0$  $Date: 10/09/2006$
%
% ----------------------------------------------------------------------
if nargin~=2
fprintf('USAGE: WSS=comp_wss(cleanFile.wav, enhancedFile.wav)\n');
fprintf('For more help, type: help comp_wss\n\n');
return;
end

alpha= 0.95;

if ( fs1~= fs2)
error( 'The two files do not match!\n');
end

len= min( length( data1), length( data2));
data1= data1( 1: len)+eps;
data2= data2( 1: len)+eps;

wss_dist_vec= wss( data1, data2,fs1);
wss_dist_vec= sort( wss_dist_vec);
wss_dist= mean( wss_dist_vec( 1: round( length( wss_dist_vec)*alpha)));

function distortion = wss(clean_speech, processed_speech,sample_rate)

% ----------------------------------------------------------------------
% Check the length of the clean and processed speech.  Must be the same.

% ----------------------------------------------------------------------

clean_length      = length(clean_speech);
processed_length  = length(processed_speech);

if (clean_length ~= processed_length)
disp('Error: Files  musthave same length.');
return
end

% ----------------------------------------------------------------------
% Global Variables
% ----------------------------------------------------------------------

winlength   = round(30*sample_rate/1000);      % window length in samples
skiprate    = floor(winlength/4);          % window skip in samples
max_freq    = sample_rate/2;       % maximum bandwidth
num_crit    = 25;          % number of critical bands

USE_FFT_SPECTRUM = 1;          % defaults to 10th order LP spectrum
n_fft       = 2^nextpow2(2*winlength);
n_fftby2    = n_fft/2;         % FFT size/2
Kmax        = 20;          % value suggested by Klatt, pg 1280
Klocmax     = 1;           % value suggested by Klatt, pg 1280

% ----------------------------------------------------------------------
% Critical Band Filter Definitions (Center Frequency and Bandwidths in Hz)
% ----------------------------------------------------------------------

cent_freq(1)  = 50.0000;   bandwidth(1)  = 70.0000;
cent_freq(2)  = 120.000;   bandwidth(2)  = 70.0000;
cent_freq(3)  = 190.000;   bandwidth(3)  = 70.0000;
cent_freq(4)  = 260.000;   bandwidth(4)  = 70.0000;
cent_freq(5)  = 330.000;   bandwidth(5)  = 70.0000;
cent_freq(6)  = 400.000;   bandwidth(6)  = 70.0000;
cent_freq(7)  = 470.000;   bandwidth(7)  = 70.0000;
cent_freq(8)  = 540.000;   bandwidth(8)  = 77.3724;
cent_freq(9)  = 617.372;   bandwidth(9)  = 86.0056;
cent_freq(10) = 703.378;   bandwidth(10) = 95.3398;
cent_freq(11) = 798.717;   bandwidth(11) = 105.411;
cent_freq(12) = 904.128;   bandwidth(12) = 116.256;
cent_freq(13) = 1020.38;   bandwidth(13) = 127.914;
cent_freq(14) = 1148.30;   bandwidth(14) = 140.423;
cent_freq(15) = 1288.72;   bandwidth(15) = 153.823;
cent_freq(16) = 1442.54;   bandwidth(16) = 168.154;
cent_freq(17) = 1610.70;   bandwidth(17) = 183.457;
cent_freq(18) = 1794.16;   bandwidth(18) = 199.776;
cent_freq(19) = 1993.93;   bandwidth(19) = 217.153;
cent_freq(20) = 2211.08;   bandwidth(20) = 235.631;
cent_freq(21) = 2446.71;   bandwidth(21) = 255.255;
cent_freq(22) = 2701.97;   bandwidth(22) = 276.072;
cent_freq(23) = 2978.04;   bandwidth(23) = 298.126;
cent_freq(24) = 3276.17;   bandwidth(24) = 321.465;
cent_freq(25) = 3597.63;   bandwidth(25) = 346.136;

bw_min      = bandwidth (1);       % minimum critical bandwidth

% ----------------------------------------------------------------------
% Set up the critical band filters.  Note here that Gaussianly shaped
% filters are used.  Also, the sum of the filter weights are equivalent
% for each critical band filter.  Filter less than -30 dB and set to
% zero.

% ----------------------------------------------------------------------

min_factor = exp (-30.0 / (2.0 * 2.303));       % -30 dB point of filter

for i = 1:num_crit
f0 = (cent_freq (i) / max_freq) * (n_fftby2);
all_f0(i) = floor(f0);
bw = (bandwidth (i) / max_freq) * (n_fftby2);
norm_factor = log(bw_min) - log(bandwidth(i));
j = 0:1:n_fftby2-1;
crit_filter(i,:) = exp (-11 *(((j - floor(f0)) ./bw).^2) + norm_factor);
crit_filter(i,:) = crit_filter(i,:).*(crit_filter(i,:) > min_factor);
end

% ----------------------------------------------------------------------
% For each frame of input speech, calculate the Weighted Spectral
% Slope Measure
% ----------------------------------------------------------------------

num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
start      = 1;                 % starting sample
window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));

for frame_count = 1:num_frames

% ----------------------------------------------------------
% (1) Get the Frames for the test and reference speech.

%     Multiply by Hanning Window.

% ----------------------------------------------------------

clean_frame = clean_speech(start:start+winlength-1);
processed_frame = processed_speech(start:start+winlength-1);
clean_frame = clean_frame.*window;
processed_frame = processed_frame.*window;

% ----------------------------------------------------------
% (2) Compute the Power Spectrum of Clean and Processed
% ----------------------------------------------------------

if (USE_FFT_SPECTRUM)
clean_spec     = (abs(fft(clean_frame,n_fft)).^2);
processed_spec = (abs(fft(processed_frame,n_fft)).^2);
else
a_vec = zeros(1,n_fft);
a_vec(1:11) = lpc(clean_frame,10);
clean_spec     = 1.0/(abs(fft(a_vec,n_fft)).^2)';

a_vec = zeros(1,n_fft);
a_vec(1:11) = lpc(processed_frame,10);
processed_spec = 1.0/(abs(fft(a_vec,n_fft)).^2)';
end

% ----------------------------------------------------------
% (3) Compute Filterbank Output Energies (in dB scale)
% ----------------------------------------------------------

for i = 1:num_crit
clean_energy(i) = sum(clean_spec(1:n_fftby2) ...
.*crit_filter(i,:)');
processed_energy(i) = sum(processed_spec(1:n_fftby2) ...
.*crit_filter(i,:)');
end
clean_energy = 10*log10(max(clean_energy,1E-10));
processed_energy = 10*log10(max(processed_energy,1E-10));

% ----------------------------------------------------------
% (4) Compute Spectral Slope (dB[i+1]-dB[i])
% ----------------------------------------------------------

clean_slope     = clean_energy(2:num_crit) - ...
clean_energy(1:num_crit-1);
processed_slope = processed_energy(2:num_crit) - ...
processed_energy(1:num_crit-1);

% ----------------------------------------------------------
% (5) Find the nearest peak locations in the spectra to
%     each critical band.  If the slope is negative, we
%     search to the left.  If positive, we search to the
%     right.

% ----------------------------------------------------------

for i = 1:num_crit-1

% find the peaks in the clean speech signal

if (clean_slope(i)>0)        % search to the right
n = i;
while ((n<num_crit) & (clean_slope(n) > 0))
n = n+1;
end
clean_loc_peak(i) = clean_energy(n-1);
else             % search to the left
n = i;
while ((n>0) & (clean_slope(n)  0))
n = n-1;
end
clean_loc_peak(i) = clean_energy(n+1);
end

% find the peaks in the processed speech signal

if (processed_slope(i)>0)    % search to the right
n = i;
while ((n<num_crit) & (processed_slope(n) > 0))
n = n+1;
end
processed_loc_peak(i) = processed_energy(n-1);
else             % search to the left
n = i;
while ((n>0) & (processed_slope(n)  0))
n = n-1;
end
processed_loc_peak(i) = processed_energy(n+1);
end

end

% ----------------------------------------------------------
%  (6) Compute the WSS Measure for this frame.  This
%      includes determination of the weighting function.

% ----------------------------------------------------------

dBMax_clean       = max(clean_energy);
dBMax_processed   = max(processed_energy);

% The weights are calculated by averaging individual
% weighting factors from the clean and processed frame.

% These weights W_clean and W_processed should range
% from 0 to 1 and place more emphasis on spectral
% peaks and less emphasis on slope differences in spectral
% valleys.  This procedure is described on page 1280 of
% Klatt's 1982 ICASSP paper.

Wmax_clean        = Kmax ./ (Kmax + dBMax_clean - ...
clean_energy(1:num_crit-1));
Wlocmax_clean     = Klocmax ./ ( Klocmax + clean_loc_peak - ...
clean_energy(1:num_crit-1));
W_clean           = Wmax_clean .* Wlocmax_clean;

Wmax_processed    = Kmax ./ (Kmax + dBMax_processed - ...
processed_energy(1:num_crit-1));
Wlocmax_processed = Klocmax ./ ( Klocmax + processed_loc_peak - ...
processed_energy(1:num_crit-1));
W_processed       = Wmax_processed .* Wlocmax_processed;

W = (W_clean + W_processed)./2.0;

distortion(frame_count) = sum(W.*(clean_slope(1:num_crit-1) - ...
processed_slope(1:num_crit-1)).^2);

% this normalization is not part of Klatt's paper, but helps
% to normalize the measure.  Here we scale the measure by the
% sum of the weights.

distortion(frame_count) = distortion(frame_count)/sum(W);

start = start + skiprate;

end


## 复合测度

C s i g = 3.093 − 1.029 ⋅ LLR + 0.603 ⋅ PESQ − 0.009 ⋅ WSS {{C}{sig}}=3.093-1.029\cdot \text{LLR}+0.603\cdot \text{PESQ}-0.009\cdot \text{WSS}C s i g ​=3 .0 9 3 −1 .0 2 9 ⋅LLR +0 .6 0 3 ⋅PESQ −0 .0 0 9 ⋅WSS
C b a k = 1.634 + 0.478 ⋅ PESQ-0.007 ⋅ WSS+0.063 ⋅ segSNR {{C}
{bak}}=1.634+0.478\cdot \text{PESQ-0}\text{.007}\cdot \text{WSS+0}\text{.063}\cdot \text{segSNR}C b a k ​=1 .6 3 4 +0 .4 7 8 ⋅PESQ-0 .007 ⋅WSS+0 .063 ⋅segSNR
C o v l = 1.594 + 0.805 ⋅ PESQ − 0.512 ⋅ LLR − 0.007 ⋅ WSS {{C}_{ovl}}=1.594+0.805\cdot \text{PESQ}-0.512\cdot \text{LLR}-0.007\cdot \text{WSS}C o v l ​=1 .5 9 4 +0 .8 0 5 ⋅PESQ −0 .5 1 2 ⋅LLR −0 .0 0 7 ⋅WSS

## 代码如下：

function [Csig,Cbak,Covl]= composite(cleanFile, enhancedFile);
% ----------------------------------------------------------------------
%          Composite Objective Speech Quality Measure
%
%   This function implements the composite objective measure proposed in
%   [1].

%
%   Usage:  [sig,bak,ovl]=composite(cleanFile.wav, enhancedFile.wav)
%
%         cleanFile.wav - clean input file in .wav format
%         enhancedFile  - enhanced output file in .wav format
%         sig           - predicted rating [1-5] of speech distortion
%         bak           - predicted rating [1-5] of noise distortion
%         ovl           - predicted rating [1-5] of overall quality
%
%       In addition to the above ratings (sig, bak, & ovl) it returns
%       the individual values of the LLR, SNRseg, WSS and PESQ measures.

%
%  Example call:  [sig,bak,ovl] =composite('sp04.wav','enhanced.wav')
%
%
%  References:
%
%     [1]   Hu, Y. and Loizou, P. (2006). Evaluation of objective measures
%           for speech enhancement. Proc. Interspeech, Pittsburg, PA.

%
%   Authors: Yi Hu and Philipos C. Loizou
%   (the LLR, SNRseg and WSS measures were based on Bryan Pellom and John
%     Hansen's implementations)
%
% Copyright (c) 2006 by Philipos C. Loizou
% $Revision: 0.0$  $Date: 10/09/2006$

% ----------------------------------------------------------------------

if nargin~=2
fprintf('USAGE: [sig,bak,ovl]=composite(cleanFile.wav, enhancedFile.wav)\n');
fprintf('For more help, type: help composite\n\n');
return;
end

alpha= 0.95;

if ( fs1~= fs2)
error( 'The two files do not match!\n');
end

len= min( length( data1), length( data2));
data1= data1( 1: len)+eps;
data2= data2( 1: len)+eps;

% -- compute the WSS measure ---
%
wss_dist_vec= wss( data1, data2,fs1);
wss_dist_vec= sort( wss_dist_vec);
wss_dist= mean( wss_dist_vec( 1: round( length( wss_dist_vec)*alpha)));

% --- compute the LLR measure ---------
%
LLR_dist= llr( data1, data2,fs1);
LLRs= sort(LLR_dist);
LLR_len= round( length(LLR_dist)* alpha);
llr_mean= mean( LLRs( 1: LLR_len));

% --- compute the SNRseg ----------------
%
[snr_dist, segsnr_dist]= snr( data1, data2,fs1);
snr_mean= snr_dist;
segSNR= mean( segsnr_dist);

% -- compute the pesq ----
%
% if     Srate1==8000,    mode='nb';
% elseif Srate1 == 16000, mode='wb';
% else,
%      error ('Sampling freq in PESQ needs to be 8 kHz or 16 kHz');
% end

[pesq_mos_scores]= pesq(cleanFile, enhancedFile);

if length(pesq_mos_scores)==2
pesq_mos=pesq_mos_scores(1); % take the raw PESQ value instead of the
% MOS-mapped value (this composite
% measure was only validated with the raw
% PESQ value)
else
pesq_mos=pesq_mos_scores;
end

% --- now compute the composite measures ------------------
%
Csig = 3.093 - 1.029*llr_mean + 0.603*pesq_mos-0.009*wss_dist;
Csig = max(1,Csig);  Csig=min(5, Csig); % limit values to [1, 5]
Cbak = 1.634 + 0.478 *pesq_mos - 0.007*wss_dist + 0.063*segSNR;
Cbak = max(1, Cbak); Cbak=min(5,Cbak); % limit values to [1, 5]
Covl = 1.594 + 0.805*pesq_mos - 0.512*llr_mean - 0.007*wss_dist;
Covl = max(1, Covl); Covl=min(5, Covl); % limit values to [1, 5]

fprintf('\n LLR=%f   SNRseg=%f   WSS=%f   PESQ=%f\n',llr_mean,segSNR,wss_dist,pesq_mos);

return; %=================================================================

function distortion = wss(clean_speech, processed_speech,sample_rate)

% ----------------------------------------------------------------------
% Check the length of the clean and processed speech.  Must be the same.

% ----------------------------------------------------------------------

clean_length      = length(clean_speech);
processed_length  = length(processed_speech);

if (clean_length ~= processed_length)
disp('Error: Files  musthave same length.');
return
end

% ----------------------------------------------------------------------
% Global Variables
% ----------------------------------------------------------------------

winlength   = round(30*sample_rate/1000); %240;        % window length in samples
skiprate    = floor(winlength/4);          % window skip in samples
max_freq    = sample_rate/2;       % maximum bandwidth
num_crit    = 25;          % number of critical bands

USE_FFT_SPECTRUM = 1;          % defaults to 10th order LP spectrum
n_fft       = 2^nextpow2(2*winlength);
n_fftby2    = n_fft/2;         % FFT size/2
Kmax        = 20;          % value suggested by Klatt, pg 1280
Klocmax     = 1;           % value suggested by Klatt, pg 1280

% ----------------------------------------------------------------------
% Critical Band Filter Definitions (Center Frequency and Bandwidths in Hz)
% ----------------------------------------------------------------------

cent_freq(1)  = 50.0000;   bandwidth(1)  = 70.0000;
cent_freq(2)  = 120.000;   bandwidth(2)  = 70.0000;
cent_freq(3)  = 190.000;   bandwidth(3)  = 70.0000;
cent_freq(4)  = 260.000;   bandwidth(4)  = 70.0000;
cent_freq(5)  = 330.000;   bandwidth(5)  = 70.0000;
cent_freq(6)  = 400.000;   bandwidth(6)  = 70.0000;
cent_freq(7)  = 470.000;   bandwidth(7)  = 70.0000;
cent_freq(8)  = 540.000;   bandwidth(8)  = 77.3724;
cent_freq(9)  = 617.372;   bandwidth(9)  = 86.0056;
cent_freq(10) = 703.378;   bandwidth(10) = 95.3398;
cent_freq(11) = 798.717;   bandwidth(11) = 105.411;
cent_freq(12) = 904.128;   bandwidth(12) = 116.256;
cent_freq(13) = 1020.38;   bandwidth(13) = 127.914;
cent_freq(14) = 1148.30;   bandwidth(14) = 140.423;
cent_freq(15) = 1288.72;   bandwidth(15) = 153.823;
cent_freq(16) = 1442.54;   bandwidth(16) = 168.154;
cent_freq(17) = 1610.70;   bandwidth(17) = 183.457;
cent_freq(18) = 1794.16;   bandwidth(18) = 199.776;
cent_freq(19) = 1993.93;   bandwidth(19) = 217.153;
cent_freq(20) = 2211.08;   bandwidth(20) = 235.631;
cent_freq(21) = 2446.71;   bandwidth(21) = 255.255;
cent_freq(22) = 2701.97;   bandwidth(22) = 276.072;
cent_freq(23) = 2978.04;   bandwidth(23) = 298.126;
cent_freq(24) = 3276.17;   bandwidth(24) = 321.465;
cent_freq(25) = 3597.63;   bandwidth(25) = 346.136;

bw_min      = bandwidth (1);       % minimum critical bandwidth

% ----------------------------------------------------------------------
% Set up the critical band filters.  Note here that Gaussianly shaped
% filters are used.  Also, the sum of the filter weights are equivalent
% for each critical band filter.  Filter less than -30 dB and set to
% zero.

% ----------------------------------------------------------------------

min_factor = exp (-30.0 / (2.0 * 2.303));       % -30 dB point of filter

for i = 1:num_crit
f0 = (cent_freq (i) / max_freq) * (n_fftby2);
all_f0(i) = floor(f0);
bw = (bandwidth (i) / max_freq) * (n_fftby2);
norm_factor = log(bw_min) - log(bandwidth(i));
j = 0:1:n_fftby2-1;
crit_filter(i,:) = exp (-11 *(((j - floor(f0)) ./bw).^2) + norm_factor);
crit_filter(i,:) = crit_filter(i,:).*(crit_filter(i,:) > min_factor);
end

% ----------------------------------------------------------------------
% For each frame of input speech, calculate the Weighted Spectral
% Slope Measure
% ----------------------------------------------------------------------

num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
start      = 1;                 % starting sample
window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));

for frame_count = 1:num_frames

% ----------------------------------------------------------
% (1) Get the Frames for the test and reference speech.

%     Multiply by Hanning Window.

% ----------------------------------------------------------

clean_frame = clean_speech(start:start+winlength-1);
processed_frame = processed_speech(start:start+winlength-1);
clean_frame = clean_frame.*window;
processed_frame = processed_frame.*window;

% ----------------------------------------------------------
% (2) Compute the Power Spectrum of Clean and Processed
% ----------------------------------------------------------

if (USE_FFT_SPECTRUM)
clean_spec     = (abs(fft(clean_frame,n_fft)).^2);
processed_spec = (abs(fft(processed_frame,n_fft)).^2);
else
a_vec = zeros(1,n_fft);
a_vec(1:11) = lpc(clean_frame,10);
clean_spec     = 1.0/(abs(fft(a_vec,n_fft)).^2)';

a_vec = zeros(1,n_fft);
a_vec(1:11) = lpc(processed_frame,10);
processed_spec = 1.0/(abs(fft(a_vec,n_fft)).^2)';
end

% ----------------------------------------------------------
% (3) Compute Filterbank Output Energies (in dB scale)
% ----------------------------------------------------------

for i = 1:num_crit
clean_energy(i) = sum(clean_spec(1:n_fftby2) ...
.*crit_filter(i,:)');
processed_energy(i) = sum(processed_spec(1:n_fftby2) ...
.*crit_filter(i,:)');
end
clean_energy = 10*log10(max(clean_energy,1E-10));
processed_energy = 10*log10(max(processed_energy,1E-10));

% ----------------------------------------------------------
% (4) Compute Spectral Slope (dB[i+1]-dB[i])
% ----------------------------------------------------------

clean_slope     = clean_energy(2:num_crit) - ...
clean_energy(1:num_crit-1);
processed_slope = processed_energy(2:num_crit) - ...
processed_energy(1:num_crit-1);

% ----------------------------------------------------------
% (5) Find the nearest peak locations in the spectra to
%     each critical band.  If the slope is negative, we
%     search to the left.  If positive, we search to the
%     right.

% ----------------------------------------------------------

for i = 1:num_crit-1

% find the peaks in the clean speech signal

if (clean_slope(i)>0)        % search to the right
n = i;
while ((n<num_crit) & (clean_slope(n) > 0))
n = n+1;
end
clean_loc_peak(i) = clean_energy(n-1);
else             % search to the left
n = i;
while ((n>0) & (clean_slope(n)  0))
n = n-1;
end
clean_loc_peak(i) = clean_energy(n+1);
end

% find the peaks in the processed speech signal

if (processed_slope(i)>0)    % search to the right
n = i;
while ((n<num_crit) & (processed_slope(n) > 0))
n = n+1;
end
processed_loc_peak(i) = processed_energy(n-1);
else             % search to the left
n = i;
while ((n>0) & (processed_slope(n)  0))
n = n-1;
end
processed_loc_peak(i) = processed_energy(n+1);
end

end

% ----------------------------------------------------------
%  (6) Compute the WSS Measure for this frame.  This
%      includes determination of the weighting function.

% ----------------------------------------------------------

dBMax_clean       = max(clean_energy);
dBMax_processed   = max(processed_energy);

% The weights are calculated by averaging individual
% weighting factors from the clean and processed frame.

% These weights W_clean and W_processed should range
% from 0 to 1 and place more emphasis on spectral
% peaks and less emphasis on slope differences in spectral
% valleys.  This procedure is described on page 1280 of
% Klatt's 1982 ICASSP paper.

Wmax_clean        = Kmax ./ (Kmax + dBMax_clean - ...
clean_energy(1:num_crit-1));
Wlocmax_clean     = Klocmax ./ ( Klocmax + clean_loc_peak - ...
clean_energy(1:num_crit-1));
W_clean           = Wmax_clean .* Wlocmax_clean;

Wmax_processed    = Kmax ./ (Kmax + dBMax_processed - ...
processed_energy(1:num_crit-1));
Wlocmax_processed = Klocmax ./ ( Klocmax + processed_loc_peak - ...
processed_energy(1:num_crit-1));
W_processed       = Wmax_processed .* Wlocmax_processed;

W = (W_clean + W_processed)./2.0;

distortion(frame_count) = sum(W.*(clean_slope(1:num_crit-1) - ...
processed_slope(1:num_crit-1)).^2);

% this normalization is not part of Klatt's paper, but helps
% to normalize the measure.  Here we scale the measure by the
% sum of the weights.

distortion(frame_count) = distortion(frame_count)/sum(W);

start = start + skiprate;

end

%-----------------------------------------------
function distortion = llr(clean_speech, processed_speech,sample_rate)

% ----------------------------------------------------------------------
% Check the length of the clean and processed speech.  Must be the same.

% ----------------------------------------------------------------------

clean_length      = length(clean_speech);
processed_length  = length(processed_speech);

if (clean_length ~= processed_length)
disp('Error: Both Speech Files must be same length.');
return
end

% ----------------------------------------------------------------------
% Global Variables
% ----------------------------------------------------------------------

winlength   = round(30*sample_rate/1000); %  window length in samples
skiprate    = floor(winlength/4);          % window skip in samples
if sample_rate<10000
P           = 10;           % LPC Analysis Order
else
P=16;     % this could vary depending on sampling frequency.

end

% ----------------------------------------------------------------------
% For each frame of input speech, calculate the Log Likelihood Ratio
% ----------------------------------------------------------------------

num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
start      = 1;                 % starting sample
window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));

for frame_count = 1:num_frames

% ----------------------------------------------------------
% (1) Get the Frames for the test and reference speech.

%     Multiply by Hanning Window.

% ----------------------------------------------------------

clean_frame = clean_speech(start:start+winlength-1);
processed_frame = processed_speech(start:start+winlength-1);
clean_frame = clean_frame.*window;
processed_frame = processed_frame.*window;

% ----------------------------------------------------------
% (2) Get the autocorrelation lags and LPC parameters used
%     to compute the LLR measure.

% ----------------------------------------------------------

[R_clean, Ref_clean, A_clean] = ...

lpcoeff(clean_frame, P);
[R_processed, Ref_processed, A_processed] = ...

lpcoeff(processed_frame, P);

% ----------------------------------------------------------
% (3) Compute the LLR measure
% ----------------------------------------------------------

numerator   = A_processed*toeplitz(R_clean)*A_processed';
denominator = A_clean*toeplitz(R_clean)*A_clean';
distortion(frame_count) = log(numerator/denominator);
start = start + skiprate;

end

%---------------------------------------------
function [acorr, refcoeff, lpparams] = lpcoeff(speech_frame, model_order)

% ----------------------------------------------------------
% (1) Compute Autocorrelation Lags
% ----------------------------------------------------------

winlength = max(size(speech_frame));
for k=1:model_order+1
R(k) = sum(speech_frame(1:winlength-k+1) ...
.*speech_frame(k:winlength));
end

% ----------------------------------------------------------
% (2) Levinson-Durbin
% ----------------------------------------------------------

a = ones(1,model_order);
E(1)=R(1);
for i=1:model_order
a_past(1:i-1) = a(1:i-1);
sum_term = sum(a_past(1:i-1).*R(i:-1:2));
rcoeff(i)=(R(i+1) - sum_term) / E(i);
a(i)=rcoeff(i);
a(1:i-1) = a_past(1:i-1) - rcoeff(i).*a_past(i-1:-1:1);
E(i+1)=(1-rcoeff(i)*rcoeff(i))*E(i);
end

acorr    = R;
refcoeff = rcoeff;
lpparams = [1 -a];

% ----------------------------------------------------------------------

function [overall_snr, segmental_snr] = snr(clean_speech, processed_speech,sample_rate)

% ----------------------------------------------------------------------
% Check the length of the clean and processed speech.  Must be the same.

% ----------------------------------------------------------------------

clean_length      = length(clean_speech);
processed_length  = length(processed_speech);

if (clean_length ~= processed_length)
disp('Error: Both Speech Files must be same length.');
return
end

% ----------------------------------------------------------------------
% Scale both clean speech and processed speech to have same dynamic
% range.  Also remove DC component from each signal
% ----------------------------------------------------------------------

%clean_speech     = clean_speech     - mean(clean_speech);
%processed_speech = processed_speech - mean(processed_speech);

%processed_speech = processed_speech.*(max(abs(clean_speech))/ max(abs(processed_speech)));

overall_snr = 10* log10( sum(clean_speech.^2)/sum((clean_speech-processed_speech).^2));

% ----------------------------------------------------------------------
% Global Variables
% ----------------------------------------------------------------------

winlength   = round(30*sample_rate/1000); %240;        % window length in samples
skiprate    = floor(winlength/4);          % window skip in samples
MIN_SNR     = -10;         % minimum SNR in dB
MAX_SNR     =  35;         % maximum SNR in dB

% ----------------------------------------------------------------------
% For each frame of input speech, calculate the Segmental SNR
% ----------------------------------------------------------------------

num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
start      = 1;                 % starting sample
window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));

for frame_count = 1: num_frames

% ----------------------------------------------------------
% (1) Get the Frames for the test and reference speech.

%     Multiply by Hanning Window.

% ----------------------------------------------------------

clean_frame = clean_speech(start:start+winlength-1);
processed_frame = processed_speech(start:start+winlength-1);
clean_frame = clean_frame.*window;
processed_frame = processed_frame.*window;

% ----------------------------------------------------------
% (2) Compute the Segmental SNR
% ----------------------------------------------------------

signal_energy = sum(clean_frame.^2);
noise_energy  = sum((clean_frame-processed_frame).^2);
segmental_snr(frame_count) = 10*log10(signal_energy/(noise_energy+eps)+eps);
segmental_snr(frame_count) = max(segmental_snr(frame_count),MIN_SNR);
segmental_snr(frame_count) = min(segmental_snr(frame_count),MAX_SNR);

start = start + skiprate;

end


## 参考文献

Original: https://blog.csdn.net/yhcwjh/article/details/122054710
Author: YHCANDOU
Title: 语音质量的评价指标介绍及MATLAB实现（二）

(0)

### 大家都在看

• #### 数据挖掘：降低汽油精制过程中的辛烷值损失模型(二)

🤵‍♂️ 个人主页：@Lingxw_w的个人主页✍🏻作者简介：计算机科学与技术研究生在读🐋 希望大家多多支持，我们一起进步！😄如果文章对你有帮助的话，欢迎评论 💬点赞👍🏻 收藏 📂…

人工智能 2023年6月19日
0159
• #### yolov5+deepsort跟踪学习笔记

前段时间因为项目需要，需要使用yolo+deepsort来进行目标跟踪，此博客作为学习笔记与大家共勉，欢迎大家批评指正。接下来，将从整体流程以及各个小模块来进行介绍。 整体流程图 …

人工智能 2023年5月28日
0167
• #### 【损失函数：3】感知损失：Perceptual Loss、总变分损失（TV Loss）（附Pytorch实现）

损失函数 一、感知损失（Perceptual Loss） * 1.相关介绍 – 1）Perceptual Loss是什么？ 2）Perceptual Loss如何构造？…

人工智能 2023年7月21日
0164
• #### 目标检测：Faster-RCNN算法细节及代码解析

** Faster-RCNN是多阶段目标检测算法RCNN系列中的集大成者，下面来看看分别看看这个系列的算法细节。 代码github地址：https://github.com/che…

人工智能 2023年7月21日
0124
• #### 风控必学，手把手系列—基于时间序列实现坏账预估

较早前，番茄风控有一篇介绍偏长期产品的如何做坏账预估的方法：《基于移动平均ANR算法的各种资产指标》 文章里面介绍过如何使用移动平均ANR来计算资产，今天我们再跟大家介绍这种基于时…

人工智能 2023年6月17日
0132
• #### 【机器学习实战】Logistic回归实战（详解）

（一）、什么是Logistic回归 （二）、基于Logistic回归和Sigmoid函数的分类 * 1.Sigmoid函数 2.Logistic回归的优缺点 3.logistic回…

人工智能 2023年7月2日
0146
• #### 【时空融合：遥感图像】

MUSTFN: A spatiotemporal fusion method for multi-scale and multi-sensor remote sensing ima…

人工智能 2023年7月31日
0114
• #### 一本通1083；计算星期几

### 回答1： 一本 通 OJ 题库的测试数据， 通_常是用来验证提交的代码在各种情况下的正确性。测试数据可以分为两种类型，手动和自动。 手动测试数据是由题目的出题人根据题意和数…

人工智能 2023年6月27日
0176
• #### 数据分析思维（《数据分析思维：分析方法和业务知识》）

（1）理解数据。 （2）分析数据。而分析数据需要设定相关 指标来分析数据。 提示：有些数据从不同角度看，可以属于不同的分类。例如，收藏量，从收藏行为来说，可以属于行为数据，而从产品…

人工智能 2023年7月17日
0137
• #### 如何在Tensor对象上执行卷积操作

如何在Tensor对象上执行卷积操作 在深度学习中，卷积神经网络（Convolutional Neural Networks, CNN）是广泛应用于图像处理和计算机视觉领域的一种神…

人工智能 2024年1月1日
0128
• #### Jupyter notebook 安装教程（2022.9.24更新）

### 回答1： Jupyter Notebook 是一款非常流行的开源交互式编程环境，可以运行代码、编写文档并进行数据分析。下面是一个简单的 Jupyter Notebook 安…

人工智能 2023年7月29日
0151
• #### Ubuntu 安装 OpenCV 教程 【slam14讲行不通可以看看】

一、下载 OpenCV 源码 下载地址：https://opencv.org/releases/选一个自己喜欢的版本。我选的是 OpenCV-3.4.16，不为别的，就因为自己想要…

人工智能 2023年7月19日
0165
• #### 机器学习_深度学习毕设题目汇总——图像分类

可以使用 深度学习_算法中的 _图像分类_技术来实现垃圾 _分类。具体步骤如下： 1. 数据收集：收集垃圾 分类_的图像数据集，包括可回收物、有害垃圾、湿垃圾和干垃圾。 2. 数据…

人工智能 2023年6月30日
0206
• #### Yolov5的配置+训练（超级详细！！！）

我本来说只是单纯的记录一下第一次跑代码的流程的，结果看到了这么多大家都收藏和点赞，我决定再稍微改改他的排版，希望更多地朋友能在CV方向迅速上手！一、NVIDIA驱动安装与更新首先查…

人工智能 2023年7月28日
0138
• #### 实体对齐8.IJCAI2019：(MultiKE)Multi-view Knowledge Graph Embedding for Entity Alignment

*关键词: Entity names, realtions, attributes *摘要 我们研究了知识图之间基于嵌入的实体对齐问题。之前的研究主要集中在实体的关系结构上。有些还…

人工智能 2023年6月10日
0158
• #### python：DataFrame的创建以及DataFrame的属性

一、DataFrame的创建 Pandas 的数据结构主要是：Series（一维数组），DataFrame（二维数组）。DataFrame是由索引和内容组成，索引既有行索引inde…

人工智能 2023年7月14日
0147