AXE模式隐私号基于语音流分析的用户接听识别方案

背景

在使用AXE模式隐私号外呼用户时 发现几家隐私号服务提供商并不是都有接通回调可以设置
因此,有必要建立一个通用的用户响应识别方案(录制和播放欢迎消息等场景)。

[En]

Therefore, it is necessary to set up a general user response identification scheme (scenarios such as recording and broadcasting welcome messages).

目的

在接入语音模型训练之前通过波形 准确识别嘟嘟嘟和彩铃 覆盖90%以上的case

调研

VAD:
语音活动检测(Voice Activity Detection,VAD)又称语音端点检测,语音边界检测。目的是从声音信号流里识别和消除长时间的静音期,以达到在不降低业务质量的情况下节省话路资源的作用,它是IP电话应用的重要组成部分。静音抑制可以节省宝贵的带宽资源,可以有利于减少用户感觉到的端到端的时延。

TarsosDSP:
git 地址: https://github.com/JorenSix/TarsosDSP
TarsosDSP is a Java library for audio processing. Its aim is to provide an easy-to-use interface to practical music processing algorithms implemented, as simply as possible, in pure Java and without any other external dependencies. The library tries to hit the sweet spot between being capable enough to get real tasks done but compact and simple enough to serve as a demonstration on how DSP algorithms works. TarsosDSP features an implementation of a percussion onset detector and a number of pitch detection algorithms: YIN, the Mcleod Pitch method and a “Dynamic Wavelet Algorithm Pitch Tracking” algorithm. Also included is a Goertzel DTMF decoding algorithm, a time stretch algorithm (WSOLA), resampling, filters, simple synthesis, some audio effects, and a pitch shifting algorithm.

回铃音:
表示被叫用户处于被振铃状态,采用频率为450±25Hz的交流电源,发送电平为-10±3dBm,它是5s断续的信号音,即1s送,4s断,与振铃音一致。

AXE模式隐私号基于语音流分析的用户接听识别方案
彩铃音:
连续不间断的音乐波形
AXE模式隐私号基于语音流分析的用户接听识别方案

; 思路

AXE模式隐私号基于语音流分析的用户接听识别方案
根据对波形的分析 从左到右分为三段 分别为
  1. “请输入四位分机号以#号键结束”
  2. “振铃嘟嘟嘟”
  3. “用户说话”

所以目的分为三步
4. 跳过特定时长 绕过输入分机号的播报
5. 对沉默后的第一段活跃做检测 去匹配彩铃特征或者嘟声特征
6. 找到跳出特征的时刻 就是用户接听的时刻

代码实现

使用TarsosDSP提供的静音检测能力和频率识别能力
注意要自己引入一下依赖 tarsos包在上面调研的tarsos介绍的git地址里

调用:

 public static void main (String[] args){

        PickUp pickUp = new PickUp("xxx.wav", 8000, 16, 1000, 4500);
        pickUp.start();
        System.exit(-1);
    }

PickUp:

package xxx;

import be.tarsos.dsp.AudioDispatcher;
import be.tarsos.dsp.AudioEvent;
import be.tarsos.dsp.AudioProcessor;
import be.tarsos.dsp.SilenceDetector;
import be.tarsos.dsp.io.TarsosDSPAudioFloatConverter;
import be.tarsos.dsp.io.TarsosDSPAudioFormat;
import be.tarsos.dsp.io.UniversalAudioInputStream;
import be.tarsos.dsp.pitch.PitchDetectionHandler;
import be.tarsos.dsp.pitch.PitchDetectionResult;
import be.tarsos.dsp.pitch.PitchProcessor;
import java.io.*;
import java.util.concurrent.ConcurrentLinkedQueue;

public class PickUp {

    public enum RingbackType {
        UNCHECK,DU_NORMALITY,DU_OTHER,SONG;
    }

    private ConcurrentLinkedQueue<byte[]> audioQueue = new ConcurrentLinkedQueue<byte[]>();
    private boolean isFinishReadFile = false;
    private String filePath;
    private String fileName;
    private int readLength = 1600;
    private int noinputTimeout = 1000;
    private int silenceMaxTimes = 10;
    private float sampleRate = 8000;
    private int sampleSizeInBits = 16;

    public PickUp(String filePath, float sampleRate, int sampleSizeInBits, int noinputTimeout, int silenceTimeout) {
        this.filePath = filePath;
        this.sampleRate = sampleRate;
        this.sampleSizeInBits = sampleSizeInBits;

        this.readLength = (int)sampleRate*(sampleSizeInBits/8)/10;
        this.noinputTimeout = noinputTimeout;

        this.silenceMaxTimes = (int)silenceTimeout/100;
    }

    public void start() {
        File audioFile = new File(this.filePath);
        FileInputStream fis;
        try {
            audioQueue.clear();
            fileName = audioFile.getName();
            isFinishReadFile = false;
            Thread sttThread = new Thread(vadRunbale);
            sttThread.start();
            fis = new FileInputStream(audioFile);
            byte[] byteArr = new byte[this.readLength];

            int size;
            fis.skip(44);
            while ((size = fis.read(byteArr)) != -1) {
                audioQueue.add(byteArr.clone());
            }

            while (!audioQueue.isEmpty() && !isFinishReadFile) {
                Thread.sleep(2000);
            }
            isFinishReadFile = true;
            fis.close();
            while (sttThread.isAlive()) {
                Thread.sleep(2000);
            }

            System.out.println("正常结束");
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

    }

    private Runnable vadRunbale = new Runnable() {
        volatile int countHZ = 0;
        volatile int count450HZ = 0;
        @Override
        public void run() {
            RingbackType ringbackType = RingbackType.UNCHECK;
            int currentPartTime = 0, silenceTimes = 0, firstActiveTimes = 0, differentCount = 0;
            try {

                TarsosDSPAudioFormat tdspFormat = new TarsosDSPAudioFormat(sampleRate, sampleSizeInBits, 1, true, false);
                float[] voiceFloatArr = new float[readLength / tdspFormat.getFrameSize()];

                while (!isFinishReadFile) {
                    byte[] data = audioQueue.poll();
                    if (data == null) {
                        Thread.sleep(50);
                        continue;
                    }
                    TarsosDSPAudioFloatConverter.getConverter(tdspFormat).toFloatArray(data.clone(),voiceFloatArr);
                    SilenceDetector silenceDetector = new SilenceDetector();
                    boolean isSlience = silenceDetector.isSilence(voiceFloatArr);

                    if ((currentPartTime+=100) >= noinputTimeout) {
                        boolean checkHZ = false;
                        if (isSlience) {
                            if(firstActiveTimes == 0){
                                System.out.println("活动前静音,忽略");
                                continue;
                            }
                            System.out.println("检测到静音"+ringbackType);

                            if(++silenceTimes >=silenceMaxTimes){
                                isFinishReadFile = true;
                            }
                            switch(ringbackType){
                                case UNCHECK:
                                    if(countHZ==count450HZ){
                                        if(countHZ11){
                                            ringbackType = RingbackType.DU_NORMALITY;

                                            silenceMaxTimes = 41;
                                        }else {
                                            ringbackType = RingbackType.DU_OTHER;
                                            checkHZ = true;
                                        }
                                    }
                                    break;
                                case DU_OTHER:
                                    checkHZ = true;

                                    if(countHZ!=count450HZ){
                                        differentCount++;
                                        count450HZ = countHZ;
                                    }else {
                                        differentCount = 0;
                                    }
                                    if(differentCount>=3){
                                        isFinishReadFile = true;
                                    }

                                    checkHZ = true;
                                    break;
                                case SONG:

                                    isFinishReadFile = true;
                                    break;
                                default:
                                    break;
                            }
                        } else {
                            System.out.println("活动状态"+ringbackType);
                            switch(ringbackType){
                                case UNCHECK:
                                    firstActiveTimes++;

                                    if(firstActiveTimes>=20){
                                        ringbackType = RingbackType.SONG;
                                    }

                                    checkHZ = true;
                                    break;
                                case DU_NORMALITY:

                                    if(silenceTimes!=0 &&silenceTimes<35){
                                        isFinishReadFile = true;
                                    }

                                case DU_OTHER:

                                    if(countHZ!=count450HZ){
                                        differentCount++;
                                        count450HZ = countHZ;
                                    }else {
                                        differentCount = 0;
                                    }
                                    if(differentCount>=3){
                                        isFinishReadFile = true;
                                    }

                                    checkHZ = true;
                                    break;
                                default:
                                    break;
                            }

                            silenceTimes = 0;
                        }

                        if(checkHZ && !isFinishReadFile){

                            AudioDispatcher dispatcher = new AudioDispatcher(new UniversalAudioInputStream(new ByteArrayInputStream(data), tdspFormat), data.length, 0);
                            AudioProcessor audioProcessor = new PitchProcessor(PitchProcessor.PitchEstimationAlgorithm.FFT_YIN, 8000, data.length, new PitchDetectionHandler(){
                                @Override
                                public void handlePitch(PitchDetectionResult pitchDetectionResult, AudioEvent audioEvent) {
                                    countHZ++;
                                    float pitch = pitchDetectionResult.getPitch();
                                    System.out.println(pitch+"HZ");
                                    if(pitch>445&&pitch<455){
                                        count450HZ++;
                                    }
                                }
                            });
                            dispatcher.addAudioProcessor(audioProcessor);
                            dispatcher.run();
                        }

                    }
                }
                System.out.println(fileName+"退出,位置为"+currentPartTime/10+"     "+ringbackType);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    };

}

效果测试

回铃音

每0.1秒打印一次日志 频率特征符合预期 响1s停4s符合预期

AXE模式隐私号基于语音流分析的用户接听识别方案

; 彩铃音

每0.1秒打印一次日志 特征识别为音乐 特征结束符合实际接听时间(对应上面的彩铃音波形图)

AXE模式隐私号基于语音流分析的用户接听识别方案

Original: https://blog.csdn.net/weixin_37697242/article/details/122856754
Author: 搬砖工人5426
Title: AXE模式隐私号基于语音流分析的用户接听识别方案

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/498206/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球