AXE模式隐私号基于语音流分析的用户接听识别方案

2023年5月23日下午9:31 • 人工智能 • 阅读 84

背景

在使用AXE模式隐私号外呼用户时发现几家隐私号服务提供商并不是都有接通回调可以设置
因此，有必要建立一个通用的用户响应识别方案(录制和播放欢迎消息等场景)。

[En]

Therefore, it is necessary to set up a general user response identification scheme (scenarios such as recording and broadcasting welcome messages).

目的

在接入语音模型训练之前通过波形准确识别嘟嘟嘟和彩铃覆盖90%以上的case

调研

VAD:
语音活动检测(Voice Activity Detection,VAD)又称语音端点检测,语音边界检测。目的是从声音信号流里识别和消除长时间的静音期，以达到在不降低业务质量的情况下节省话路资源的作用，它是IP电话应用的重要组成部分。静音抑制可以节省宝贵的带宽资源，可以有利于减少用户感觉到的端到端的时延。

TarsosDSP:
git 地址: https://github.com/JorenSix/TarsosDSP
TarsosDSP is a Java library for audio processing. Its aim is to provide an easy-to-use interface to practical music processing algorithms implemented, as simply as possible, in pure Java and without any other external dependencies. The library tries to hit the sweet spot between being capable enough to get real tasks done but compact and simple enough to serve as a demonstration on how DSP algorithms works. TarsosDSP features an implementation of a percussion onset detector and a number of pitch detection algorithms: YIN, the Mcleod Pitch method and a “Dynamic Wavelet Algorithm Pitch Tracking” algorithm. Also included is a Goertzel DTMF decoding algorithm, a time stretch algorithm (WSOLA), resampling, filters, simple synthesis, some audio effects, and a pitch shifting algorithm.

回铃音:
表示被叫用户处于被振铃状态，采用频率为450±25Hz的交流电源，发送电平为-10±3dBm，它是5s断续的信号音，即1s送，4s断，与振铃音一致。

彩铃音:
连续不间断的音乐波形

; 思路

根据对波形的分析从左到右分为三段分别为

“请输入四位分机号以#号键结束”
“振铃嘟嘟嘟”
“用户说话”

所以目的分为三步
4. 跳过特定时长绕过输入分机号的播报
5. 对沉默后的第一段活跃做检测去匹配彩铃特征或者嘟声特征
6. 找到跳出特征的时刻就是用户接听的时刻

代码实现

使用TarsosDSP提供的静音检测能力和频率识别能力
注意要自己引入一下依赖 tarsos包在上面调研的tarsos介绍的git地址里

调用:

 public static void main (String[] args){

        PickUp pickUp = new PickUp("xxx.wav", 8000, 16, 1000, 4500);
        pickUp.start();
        System.exit(-1);
    }

PickUp:

package xxx;

import be.tarsos.dsp.AudioDispatcher;
import be.tarsos.dsp.AudioEvent;
import be.tarsos.dsp.AudioProcessor;
import be.tarsos.dsp.SilenceDetector;
import be.tarsos.dsp.io.TarsosDSPAudioFloatConverter;
import be.tarsos.dsp.io.TarsosDSPAudioFormat;
import be.tarsos.dsp.io.UniversalAudioInputStream;
import be.tarsos.dsp.pitch.PitchDetectionHandler;
import be.tarsos.dsp.pitch.PitchDetectionResult;
import be.tarsos.dsp.pitch.PitchProcessor;
import java.io.*;
import java.util.concurrent.ConcurrentLinkedQueue;

public class PickUp {

    public enum RingbackType {
        UNCHECK,DU_NORMALITY,DU_OTHER,SONG;
    }

    private ConcurrentLinkedQueue<byte[]> audioQueue = new ConcurrentLinkedQueue<byte[]>();
    private boolean isFinishReadFile = false;
    private String filePath;
    private String fileName;
    private int readLength = 1600;
    private int noinputTimeout = 1000;
    private int silenceMaxTimes = 10;
    private float sampleRate = 8000;
    private int sampleSizeInBits = 16;

    public PickUp(String filePath, float sampleRate, int sampleSizeInBits, int noinputTimeout, int silenceTimeout) {
        this.filePath = filePath;
        this.sampleRate = sampleRate;
        this.sampleSizeInBits = sampleSizeInBits;

        this.readLength = (int)sampleRate*(sampleSizeInBits/8)/10;
        this.noinputTimeout = noinputTimeout;

        this.silenceMaxTimes = (int)silenceTimeout/100;
    }

    public void start() {
        File audioFile = new File(this.filePath);
        FileInputStream fis;
        try {
            audioQueue.clear();
            fileName = audioFile.getName();
            isFinishReadFile = false;
            Thread sttThread = new Thread(vadRunbale);
            sttThread.start();
            fis = new FileInputStream(audioFile);
            byte[] byteArr = new byte[this.readLength];

            int size;
            fis.skip(44);
            while ((size = fis.read(byteArr)) != -1) {
                audioQueue.add(byteArr.clone());
            }

            while (!audioQueue.isEmpty() && !isFinishReadFile) {
                Thread.sleep(2000);
            }
            isFinishReadFile = true;
            fis.close();
            while (sttThread.isAlive()) {
                Thread.sleep(2000);
            }

            System.out.println("正常结束");
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

    }

    private Runnable vadRunbale = new Runnable() {
        volatile int countHZ = 0;
        volatile int count450HZ = 0;
        @Override
        public void run() {
            RingbackType ringbackType = RingbackType.UNCHECK;
            int currentPartTime = 0, silenceTimes = 0, firstActiveTimes = 0, differentCount = 0;
            try {

                TarsosDSPAudioFormat tdspFormat = new TarsosDSPAudioFormat(sampleRate, sampleSizeInBits, 1, true, false);
                float[] voiceFloatArr = new float[readLength / tdspFormat.getFrameSize()];

                while (!isFinishReadFile) {
                    byte[] data = audioQueue.poll();
                    if (data == null) {
                        Thread.sleep(50);
                        continue;
                    }
                    TarsosDSPAudioFloatConverter.getConverter(tdspFormat).toFloatArray(data.clone(),voiceFloatArr);
                    SilenceDetector silenceDetector = new SilenceDetector();
                    boolean isSlience = silenceDetector.isSilence(voiceFloatArr);

                    if ((currentPartTime+=100) >= noinputTimeout) {
                        boolean checkHZ = false;
                        if (isSlience) {
                            if(firstActiveTimes == 0){
                                System.out.println("活动前静音,忽略");
                                continue;
                            }
                            System.out.println("检测到静音"+ringbackType);

                            if(++silenceTimes >=silenceMaxTimes){
                                isFinishReadFile = true;
                            }
                            switch(ringbackType){
                                case UNCHECK:
                                    if(countHZ==count450HZ){
                                        if(countHZ11){
                                            ringbackType = RingbackType.DU_NORMALITY;

                                            silenceMaxTimes = 41;
                                        }else {
                                            ringbackType = RingbackType.DU_OTHER;
                                            checkHZ = true;
                                        }
                                    }
                                    break;
                                case DU_OTHER:
                                    checkHZ = true;

                                    if(countHZ!=count450HZ){
                                        differentCount++;
                                        count450HZ = countHZ;
                                    }else {
                                        differentCount = 0;
                                    }
                                    if(differentCount>=3){
                                        isFinishReadFile = true;
                                    }

                                    checkHZ = true;
                                    break;
                                case SONG:

                                    isFinishReadFile = true;
                                    break;
                                default:
                                    break;
                            }
                        } else {
                            System.out.println("活动状态"+ringbackType);
                            switch(ringbackType){
                                case UNCHECK:
                                    firstActiveTimes++;

                                    if(firstActiveTimes>=20){
                                        ringbackType = RingbackType.SONG;
                                    }

                                    checkHZ = true;
                                    break;
                                case DU_NORMALITY:

                                    if(silenceTimes!=0 &&silenceTimes<35){
                                        isFinishReadFile = true;
                                    }

                                case DU_OTHER:

                                    if(countHZ!=count450HZ){
                                        differentCount++;
                                        count450HZ = countHZ;
                                    }else {
                                        differentCount = 0;
                                    }
                                    if(differentCount>=3){
                                        isFinishReadFile = true;
                                    }

                                    checkHZ = true;
                                    break;
                                default:
                                    break;
                            }

                            silenceTimes = 0;
                        }

                        if(checkHZ && !isFinishReadFile){

                            AudioDispatcher dispatcher = new AudioDispatcher(new UniversalAudioInputStream(new ByteArrayInputStream(data), tdspFormat), data.length, 0);
                            AudioProcessor audioProcessor = new PitchProcessor(PitchProcessor.PitchEstimationAlgorithm.FFT_YIN, 8000, data.length, new PitchDetectionHandler(){
                                @Override
                                public void handlePitch(PitchDetectionResult pitchDetectionResult, AudioEvent audioEvent) {
                                    countHZ++;
                                    float pitch = pitchDetectionResult.getPitch();
                                    System.out.println(pitch+"HZ");
                                    if(pitch>445&&pitch<455){
                                        count450HZ++;
                                    }
                                }
                            });
                            dispatcher.addAudioProcessor(audioProcessor);
                            dispatcher.run();
                        }

                    }
                }
                System.out.println(fileName+"退出,位置为"+currentPartTime/10+"     "+ringbackType);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    };

}

效果测试

回铃音

每0.1秒打印一次日志频率特征符合预期响1s停4s符合预期

; 彩铃音

每0.1秒打印一次日志特征识别为音乐特征结束符合实际接听时间(对应上面的彩铃音波形图)

Original: https://blog.csdn.net/weixin_37697242/article/details/122856754
Author: 搬砖工人5426
Title: AXE模式隐私号基于语音流分析的用户接听识别方案

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/498206/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

Mediapipe入门——搭建姿态检测模型并实时输出人体关节点3d坐标

一. 引言MediaPipe 是一款由 Google Research 开发并开源的多媒体机器学习模型应用框架。在谷歌，一系列重要产品，如 YouTube、Google Lens、…

人工智能 2023年7月3日
0093
什么是GAN网络？

引言 GAN ，全称GenerativeAdversarialNetworks ，中文叫生成式对抗网络，了解GAN，私下我喜欢叫它为”内卷”网络，为啥这么说…

人工智能 2023年6月15日
0064
在AI算法部署过程中，如何处理模型的更新和迭代问题

问题背景在AI算法部署过程中，模型的更新和迭代是一个关键的问题。随着数据的变化和模型的性能提升，我们需要不断地更新模型参数，以保证模型的准确性和性能。介绍在模型的更新和迭代中…

人工智能 2024年1月4日
0076
【TS】枚举

ts中，枚举类型就是， 枚举里面的每&#…

人工智能 2023年6月29日
0058
Anaconda创建虚拟环境+Pycharm使用Anaconda创建的虚拟环境

首先需要下载anaconda然后在搜索栏中搜索Anaconda Prompt(anaconda)点击进入进入到envs目录然后输入以下命令： conda create -n to_…

人工智能 2023年7月6日
0053
PyQt5安装教程

本篇文章主要介绍pip安装方法：目录一、安装PyQt5 二、安装PyQt5-tools 三、配置环境变量四、配置QtDesigner 五、配置PyGUI 六、打开QtDesi…

人工智能 2023年7月30日
0065
什么是损失函数，常见的损失函数有哪些

问题：什么是损失函数？常见的损失函数有哪些？详细介绍：在机器学习和深度学习中，损失函数（Loss Function）用于衡量模型的预测结果与真实标签之间的差异程度。简单来说，损…

人工智能 2024年1月3日
0050
ubuntu部署deepsort目标跟踪算法，无人车/无人机应用

🍉1、算法简介 DeepSort是对Simple Online and Realtime Tracking（ Sort）的扩展，它通过预先训练的深度关联度量来整合外观信息。使用视觉…

人工智能 2023年7月10日
0052
基于海思hi3516dv300的yolov3目标检测任务的工程实战

在海思Hisilicon的Hi3516dv300芯片上，利用nnie和opencv库，简洁了官方yolov3用例中各种复杂的嵌套调用/复杂编译，提供了交叉编译后可成功上板部署运行的…

人工智能 2023年7月10日
0061
非线性回归-Polynomial regression

非线性回归-Polynomial regression 之前讲过线性回归。本文主要讲下非线性回归中的polynominal regression. 非线性回归包含的算法非常多，比如…

人工智能 2023年6月17日
00107
数据集划分与过拟合之间有何关系

问题介绍在机器学习中，我们通常将数据集划分为训练集和测试集。数据集划分的目的是为了评估我们的模型在真实世界中的表现。然而，数据集划分的不当可能导致过拟合现象。本文将详细介绍数据集…

人工智能 2023年12月30日
0040
多目标跟踪（MOT）数据集资源整理分享

我们已经生活在一个被摄像头和视频包围的世界里，从手机、汽车、无人机到各类监控设备，随处可见摄像头的”身影”。据前瞻产业研究院2020年的报告分析，预计到20…

人工智能 2023年6月23日
00205
论文阅读图片和文本联合训练：IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

摘要：介绍了一种新的视觉语言预训练模型 ImageBERT 用于图像-文本联合表示。使用基于Transformer [1]的模型，该模型采用不同的模态(模态指代图片or文本)作为…

人工智能 2023年5月31日
00101
ENVI：分类后处理_小斑块去除_Majority/Minority处理、聚类处理、过滤处理等

目录 01 阅读前要 02 小斑块处理-Majority/Minority处理 2.1 什么是Majority/Minority处理？ 2.2 Majority/Minority处…

人工智能 2023年7月2日
0069
Target Encoding

1. 简介目标编码，是针对类别特征的。它是一种将类别特征编码为数字特征的方法，就像one-hot编码一样，不同之处在于，Target encoding使用标签来创建编码。这就…

人工智能 2023年7月8日
0055
单目相机测距-图像处理中的四大坐标系

图像处理中的四大坐标系图像处理中的四大坐标系是什么？ * 像素坐标系图像坐标系相机坐标系世界坐标系为什么要有四大坐标系？ * 转换过程四大坐标系之间是如何相互转换的？ …

人工智能 2023年6月20日
0071

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

AXE模式隐私号基于语音流分析的用户接听识别方案

回铃音

; 彩铃音

大家都在看