“声音”背后的原理(2):采样、量化和编码

采样、量化和编码

音频处理的大致流程:

音频
——(采集设备)
——模拟信号(连续)
——(模数转换器ADC)
——数字信号(离散)
——编码
——储存于计算机

  1. 模拟信号数字化

“声音”背后的原理(2):采样、量化和编码

“声音”背后的原理(2):采样、量化和编码

; 1.1 采样

采样的原理很简单,就是按照 固定的频率对模拟信号的 振幅进行取值。这个频率,便是 采样率,单位为赫兹,表示 每秒钟取得的采样的个数

于一个周期信号,至少需要 采样两次:波峰和波谷各采样一次。
因此,给定一个采样率,我们可以重建的周期信号的频率是该采样率的一半(该频率称为奈奎斯特频率)。

[En]

Therefore, given a sampling rate, the frequency of the periodic signal that we can reconstruct is half of that sampling rate (this frequency is called * Nyquist frequency ).*

看似比较明确,但其定义还是比较模糊,很难在脑海中形成一个具体的概念,比如每个样本都做了什么?采样的对象到底是一个质子还是所有质子?样本究竟是如何采集的?采样时,波形是静态的还是不断变化的?一次抽样的范围是多少?

[En]

It seems to be relatively clear, but its definition is still more vague, and it is difficult to form a specific concept in mind, for example, what is done with each sample? What exactly is the object of sampling, one proton or all protons? How exactly is the sample taken? Is the waveform static or constantly changing when sampling? What is the range of one sampling?

归根结底,我认为这是由于对波形缺乏透彻的了解造成的:

[En]

In the final analysis, I think this is caused by a lack of a thorough understanding of waveforms:

波形图横坐标其实是 质子的”坐标”,其纵坐标是质子的振幅。
波形图表现的是在 某一时刻各个质子不同振幅(位移量)曲线。
更多可以查看:音频特征(3):各种波形图像的小结

其实,音频的采样是按照 固定的频率,在 时间轴上对模拟信号的 振幅进行取值。

所谓采样率就是在某个坐标轴上均匀地做采样的频率。
音频对时间轴上的幅度进行采样。

[En]

Audio samples the amplitude on the timeline.

图片是在 X Y轴上对颜色进行采样
视频在时间轴上对图片进行采样。

[En]

The video samples the picture on the timeline.

运动捕捉就是在时间轴上对XYZ采样

关于 采样率

参考知乎某答(1):

音频是一个波和一个模拟信号,采样将它变成一个离散量。

[En]

Audio is a wave and an analog signal, and sampling turns it into a discrete quantity.

如果波相当于曲线,采样就相当于用N个点来描述这个曲线。
点越多(间隔相同),直线越平滑,越接近原始曲线。

[En]

The more points (with the same interval), the smoother the line and the closer it is to the original curve.

音频 – 曲线
采样率 – 单位间隔点的个数
音质 – 相似度

参考知乎某答(2):

声音是连续的,这意味着可以把它切成1秒的片段,然后再切成0.1秒的片段,然后就可以无限地切割而不会结束。但计算机用数字记录声音,一组数字只能保存某个时刻或时刻(即无限小的时间点)的声音,所以计算机每0.0001秒只能记录一次声音(例如不准确)。然后以0.0002秒和0.0003秒的速度再录制一次。这样您就可以保存声音。这里的采样率是每秒10000次。

[En]

The sound is continuous, which means it can be cut into a segment of 1 second, and then cut into a segment of 0.1 second, and then it can be cut infinitely without coming to an end. But the computer records the sound with numbers, and a set of numbers can only save the sound of a certain moment or moment (that is, an infinitely small point in time), so the computer can only record the sound once in 0.0001 seconds (for example, inaccurate). Then record it again at 0.0002 seconds and again at 0.0003 seconds. This way you can save the sound. The sampling rate here is 10000 times per second.

最后用一张图的理解:

“声音”背后的原理(2):采样、量化和编码

音频(波形)可以理解为固定的参考,我们需要绘制这个波形,我们所能做的就是在波形上每隔一段时间(可能是0.0002秒)画一个点。上图所示为:

[En]

Audio (waveform) can be understood as a fixed reference, we need to draw this waveform, what we can do is to draw a point on the waveform at intervals (maybe 0.0002 seconds). What is shown in the above figure is:

  • 以更高分辨率采集的样本点将被更准确地描述
    [En]

    sample points collected with higher resolution will be depicted more accurately*

  • 较高的采样率将使采样间隔非常短,最终将使最终曲线更加平滑(更接近连续)。
    [En]

    A higher sampling rate will make the sampling interval very short, which will eventually make the final curve smoother (closer to continuous).*

在语音信号中,绝大部分的信息在 10000Hz以下,所以通常 20000Hz的采样率足以保留这些信息。但是,采样率越高,也往往意味着越大的计算量、储存量以及网络传输数据量。目前, 16000Hz的采样率应用十分的广泛。使用 16000Hz采样率储存的音频已经能够非常好地保留绝大部分语音信息。激光唱片(CD)通常采用 44100Hz的采样率,从而能够对高频信号进行效果比较好的保真。

(说白了,采样本质上是对音频模拟信号的一种“降维”。)

[En]

(to put it bluntly, in essence, sampling is a kind of “dimensionality reduction” for audio analog signals.)

; 1.2 量化

为了更有效地存储和传输每个采样点的值,对这些幅值进行归一化,这称为量化

[En]

In order to save and transmit the values of each sampling point more efficiently, these amplitude values are normalized, which is called * quantization .

量化的过程会损失一定的 精度*,按照精度可以将量化分为 8位量化16位量化32位量化等。

注:量化的 精度等于相邻两个整数所表示的实数的 差值(标准),如果两个实数之间的差距小于这个差值,他们便会被量化为同一个整数。(可以看文章开头的那张图,连续的信号号最后被转变为了阶梯状的图像)

“声音”背后的原理(2):采样、量化和编码
  1. 编码

将离散整数的量化序列转换为计算机实际存储的二进制字节序列的过程称为音频编码。相反,将二进制字节恢复为音频信号的过程称为解码

[En]

The process of converting a quantized sequence of discrete integers into a sequence of binary bytes actually stored by a computer is called audio coding. Conversely, the process of restoring a binary byte to an audio signal is called * decoding .*

(具体细节暂略)

参考:

  1. 《声纹技术》
  2. 什么是音频的采样率?采样率和音质有没有关系?
  3. 详解音频编解码的原理、演进和应用选型等——很强大的系列连载

Original: https://blog.csdn.net/Robin_Pi/article/details/109235946
Author: Robin_Pi
Title: “声音”背后的原理(2):采样、量化和编码

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/526807/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球