贝叶斯推理三种方法：MCMC 、HMC和SBI

2023年6月29日下午10:23 • 人工智能 • 阅读 71

对许多人来说，贝叶斯统计仍然有些陌生。因为贝叶斯统计中会有一些主观的先验，在没有测试数据的支持下了解他的理论还是有一些困难的。本文整理的是作者最近在普林斯顿的一个研讨会上做的演讲幻灯片，这样可以阐明为什么贝叶斯方法不仅在逻辑上是合理的，而且使用起来也很简单。这里将以三种不同的方式实现相同的推理问题。

; 数据

我们的例子是在具有倾斜背景的噪声数据中找到峰值的问题，这可能出现在粒子物理学和其他多分量事件过程中。

首先生成数据：

 %matplotlibinline
 %configInlineBackend.figure_format='svg'
 importmatplotlib.pyplotasplt
 importnumpyasnp

 defsignal(theta, x):
     l, m, s, a, b=theta

     peak=l*np.exp(-(m-x)**2/ (2*s**2))
     background  =a+b*x

     returnpeak+background

 defplot_results(x, y, y_err, samples=None, predictions=None):
     fig=plt.figure()
     ax=fig.gca()
     ax.errorbar(x, y, yerr=y_err, fmt=".k", capsize=0, label="Data")
     x0=np.linspace(-0.2, 1.2, 100)
     ax.plot(x0, signal(theta, x0), "r", label="Truth", zorder=0)

     ifsamplesisnotNone:
         inds=np.random.randint(len(samples), size=50)
         fori,indinenumerate(inds):
             theta_=samples[ind]
             ifi==0:
                 label='Posterior'
             else:
                 label=None
             ax.plot(x0, signal(theta_, x0), "C0", alpha=0.1, zorder=-1, label=label)
     elifpredictionsisnotNone:
         fori, predinenumerate(predictions):
             ifi==0:
                 label='Posterior'
             else:
                 label=None
             ax.plot(x0, pred, "C0", alpha=0.1, zorder=-1, label=label)

     ax.legend(frameon=False)
     ax.set_xlabel("x")
     ax.set_ylabel("y")
     fig.tight_layout()
     plt.close();
     returnfig

 # random x locations
 N=40
 np.random.seed(0)
 x=np.random.rand(N)

 # evaluate the true model at the given x values
 theta= [1, 0.5, 0.1, -0.1, 0.4]
 y=signal(theta, x)

 # add heteroscedastic Gaussian uncertainties only in y direction
 y_err=np.random.uniform(0.05, 0.25, size=N)
 y=y+np.random.normal(0, y_err)

 # plot
 plot_results(x, y, y_err)

有了数据我们可以介绍三种方法了

马尔可夫链蒙特卡罗 Markov Chain Monte Carlo

emcee是用纯python实现的，它只需要评估后验的对数作为参数θ的函数。这里使用对数很有用，因为它使指数分布族的分析评估更容易，并且因为它更好地处理通常出现的非常小的数字。

 importemcee

 deflog_likelihood(theta, x, y, yerr):
     y_model=signal(theta, x)
     chi2= (y-y_model)**2/ (yerr**2)
     returnnp.sum(-chi2/2)

 deflog_prior(theta):
     ifall(theta>-2) and (theta[2] >0) andall(theta<2): 5 return0 return-np.inf deflog_posterior(theta, x, y, yerr): lp="log_prior(theta)" ifnp.isfinite(lp): lp+="log_likelihood(theta," yerr) returnlp # create a small ball around the mle initialize each walker nwalkers, ndim="30," theta_guess="[0.5," 0.6, 0.2, -0.2, 0.1] pos="theta_guess+1e-4*np.random.randn(nwalkers," ndim) run emcee sampler="emcee.EnsembleSampler(nwalkers," ndim, log_posterior, args="(x," y_err)) sampler.run_mcmc(pos, 10000, progress="True);" < code></2):>

结果如下：

 100%|&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;| 10000/10000 [00:05<00:00, 1856.57it s] < code></00:00,>

我们应该始终检查生成的链，确定burn-in period，并且需要人肉观察平稳性：

 fig, axes=plt.subplots(ndim, sharex=True)
 mcmc_samples=sampler.get_chain()
 labels= ["l", "m", "s", "a", "b"]
 foriinrange(ndim):
     ax=axes[i]
     ax.plot(mcmc_samples[:, :, i], "k", alpha=0.3, rasterized=True)
     ax.set_xlim(0, 1000)
     ax.set_ylabel(labels[i])

 axes[-1].set_xlabel("step number");

现在我们需要细化链因为我们的样本是相关的。这里有一个方法来计算每个参数的自相关，我们可以将所有的样本结合起来:

 tau=sampler.get_autocorr_time()
 print("Autocorrelation time:", tau)
 mcmc_samples=sampler.get_chain(discard=300, thin=np.int32(np.max(tau)/2), flat=True)
 print("Remaining samples:", mcmc_samples.shape)

 #&#x7ED3;&#x679C;
 Autocorrelationtime: [122.51626866  75.87228105137.195509    54.63572513  79.0331587 ]
 Remainingsamples: (4260, 5)

emcee 的创建者 Dan Foreman-Mackey 还提供了这一有用的包corner来可视化样本：

 importcorner

 corner.corner(mcmc_samples, labels=labels, truths=theta);

虽然后验样本是推理的主要依据，但参数轮廓本身却很难解释。但是使用样本来生成新数据则要简单得多，因为这个可视化我们对数据空间有更多的理解。以下是来自50个随机样本的模型评估:

 plot_results(x, y, y_err, samples=mcmc_samples)

哈密尔顿蒙特卡洛 Hamiltonian Monte Carlo

梯度在高维设置中提供了更多指导。为了实现一般推理，我们需要一个框架来计算任意概率模型的梯度。这里关键的本部分是自动微分，我们需要的是可以跟踪参数的各种操作路径的计算框架。为了简单起见，我们使用的框架是 jax。因为一般情况下在 numpy 中实现的函数都可以在 jax 中的进行类比的替换，而jax可以自动计算函数的梯度。

另外还需要计算概率分布梯度的能力。有几种概率编程语言中可以实现，这里我们选择了 NumPyro。让我们看看如何进行自动推理：

 importjax.numpyasjnp
 importjax.randomasrandom
 importnumpyro
 importnumpyro.distributionsasdist
 fromnumpyro.inferimportMCMC, NUTS

 defmodel(x, y=None, y_err=0.1):

     # define parameters (incl. prior ranges)
     l=numpyro.sample('l', dist.Uniform(-2, 2))
     m=numpyro.sample('m', dist.Uniform(-2, 2))
     s=numpyro.sample('s', dist.Uniform(0, 2))
     a=numpyro.sample('a', dist.Uniform(-2, 2))
     b=numpyro.sample('b', dist.Uniform(-2, 2))

     # implement the model
     # needs jax numpy for differentiability here
     peak=l*jnp.exp(-(m-x)**2/ (2*s**2))
     background  =a+b*x
     y_model=peak+background

     # notice that we clamp the outcome of this sampling to the observation y
     numpyro.sample('obs', dist.Normal(y_model, y_err), obs=y)

 # need to split the key for jax's random implementation
 rng_key=random.PRNGKey(0)
 rng_key, rng_key_=random.split(rng_key)

 # run HMC with NUTS
 kernel=NUTS(model, target_accept_prob=0.9)
 mcmc=MCMC(kernel, num_warmup=1000, num_samples=3000)
 mcmc.run(rng_key_, x=x, y=y, y_err=y_err)
 mcmc.print_summary()

 #&#x7ED3;&#x679C;&#x5982;&#x4E0B;&#xFF1A;
 sample: 100%|&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;&#x2588;|4000/4000 [00:03<00:00, 0 1022.99it s, 17stepsofsize2.08e-01. acc. prob="0.94]" mean std median 5.0% 95.0% n_eff r_hat a -0.13 0.05 -0.22 -0.05 1151.15 1.00 b 0.46 0.07 0.36 0.57 1237.44 l 0.98 0.89 1.06 1874.34 m 0.50 0.01 0.49 0.51 1546.56 1.01 s 0.11 0.09 0.12 1446.08 numberofdivergences: < code></00:00,>

还是使用corner可视化Numpyro的mcmc结构:

因为我们已经实现了整个概率模型(与emcee相反，我们只实现后验)，所以可以直接从样本中创建后验预测。下面，我们将噪声设置为零，以得到纯模型的无噪声表示:

 fromnumpyro.inferimportPredictive

 # make predictions from posterior
 hmc_samples=mcmc.get_samples()
 predictive=Predictive(model, hmc_samples)
 # need to set noise to zero
 # since the full model contains noise contribution
 predictions=predictive(rng_key_, x=x0, y_err=0)['obs']

 # select 50 predictions to show
 inds=random.randint(rng_key_, (50,) , 0, mcmc.num_samples)
 predictions=predictions[inds]

 plot_results(x, y, y_err, predictions=predictions)

基于仿真的推理 Simulation-based Inference

在某些情况下，我们不能或不想计算可能性。所以我们只能一个得到一个仿真器（即学习输入之间的映射 θ 和仿真器的输出 D），这个仿真器可以形成似然或后验的近似替代。与产生无噪声模型的传统模拟案例的一个重要区别是，需要在模拟中添加噪声并且噪声模型应尽可能与观测噪声匹配。否则我们无法区分由于噪声引起的数据变化和参数变化引起的数据变化。

 importtorch
 fromsbiimportutilsasutils

 low=torch.zeros(ndim)
 low[3] =-1
 high=1*torch.ones(ndim)
 high[0] =2
 prior=utils.BoxUniform(low=low, high=high)

 defsimulator(theta, x, y_err):

     # signal model
     l, m, s, a, b=theta
     peak=l*torch.exp(-(m-x)**2/ (2*s**2))
     background  =a+b*x
     y_model=peak+background

     # add noise consistent with observations
     y=y_model+y_err*torch.randn(len(x))

     returny

让我们来看看噪声仿真器的输出:

 plt.errorbar(x, this_simulator(torch.tensor(theta)), yerr=y_err, fmt=".r", capsize=0)
 plt.errorbar(x, y, yerr=y_err, fmt=".k", capsize=0)
 plt.plot(x0, signal(theta, x0), "k", label="truth")

现在，我们使用 sbi 从这些模拟仿真中训练神经后验估计 (NPE)。

 fromsbi.inference.baseimportinfer

 this_simulator=lambdatheta: simulator(theta, torch.tensor(x), torch.tensor(y_err))

 posterior=infer(this_simulator, prior, method='SNPE', num_simulations=10000)

NPE使用条件归一化流来学习如何在给定一些数据的情况下生成后验分布：

 Running 10000 simulations.:   0%|          | 0/10000 [00:00<?, ?it/s]
 Neural network successfully converged after 172 epochs.

</code>

在推理时，以实际数据 y 为条件简单地评估这个神经后验：

 sbi_samples=posterior.sample((10000,), x=torch.tensor(y))
 sbi_samples=sbi_samples.detach().numpy()

可以看到速度非常快几乎不需要什么时间。

 Drawing 10000 posterior samples:   0%|          | 0/10000 [00:00<?, ?it/s]
</code>

然后我们再次可视化后验样本：

 corner.corner(sbi_samples, labels=labels, truths=theta);

 plot_results(x, y, y_err, samples=sbi_samples)

可以看到仿真SBI的的结果不如 MCMC 和 HMC 的结果。但是它们可以通过对更多模拟进行训练以及通过调整网络的架构来改进（虽然并不确定改完后就会有提高）。

但是我们可以看到即使在没有拟然性的情况下，SBI 也可以进行近似贝叶斯推理。

https://avoid.overfit.cn/post/7d210cd0e4424371a7d931b6ee247fc7

作者：Peter Melchior

Original: https://blog.csdn.net/m0_46510245/article/details/127584449
Author: deephub
Title: 贝叶斯推理三种方法：MCMC 、HMC和SBI

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/660075/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

基于Tensorflow实现声纹识别

基于Tensorflow实现语音识别安装librosa librosa是一种用于音频处理的工具包,具有图形处理，特征提取，绘制声音图形。安装命令:pip install pyte…

人工智能 2023年5月23日
0058
〖Python自动化办公篇⑩〗- word文件自动化 – 设置图片样式与表格样式

### 回答1： Python_实现 _word 自动_排版是通过操作 _Word文档_来实现的。 _Word文档_是一种MS _Word_应用程序的文档格式，使用 _Python…

人工智能 2023年7月5日
0052
pytorch框架实现BI-LSTM模型进行情感分类

总述本文的目标是针对一个句子，给出其情感二分类，正向/负向。代码存放地址： https://github.com/stay-leave/BI-LSTM-sentiment-cla…

人工智能 2023年7月21日
0053
人工智能显然超出了传统工具的范畴，具有学习能力，能做决策

智能化革命正在深刻地影响和塑造着人类的生产、生活方式与社会文明发展形态。如何应对人工智能技术发展和应用带来的伦理问题及挑战，成为智能化时代全人类需要面对的重大课题。国际学术界对新技…

人工智能 2023年7月18日
0051
Dapr 证书过期了怎么办？别慌，有救！

Dapr 默认证书有效时间是1年，证书过期后就不能执行相关控制面和数据面的交互了，如下图： Dapr 支持使用 Dapr 控制平面、Sentry 服务（中央证书颁发机构 (CA)）…

人工智能 2023年6月4日
0076
如何使用 Web Speech API 在浏览器中识别语音

当您运行此代码时，Chrome 会请求使用麦克风的权限。如果您在 Web 服务器上托管页面，请记住您在浏览器中的权限。允许使用麦克风并说话。当您结束对话时，它将 SpeechRec…

人工智能 2023年5月25日
0080
动手学数据分析2

先导入numpy和pandas，为避免列省略先在前设置展开全部列 #数据过大时行列会省略 #数据过大时行列会省略 pd.set_option(‘display.max_column…

人工智能 2023年7月8日
0069
数学建模实战10（时间序列回归）

销量数据预测一.Spss时间序列建模的思路二.销量数据预测 * 1.题目 2. 操作 – 【1】生成时间变量【2】画出时间序列图【3】查看建模结果【4】论文如…

人工智能 2023年6月18日
0091
ModuleNotFoundError: No module named ‘cv2‘

ModuleNotFoundError: No module named ‘cv2’网上说安装cv2，于是输入 pip install cv2后就报错ERR…

人工智能 2023年7月5日
0044
python学习 pandas DataFrame 修改值 loc[] iat[] at[]

DataFrame中数据的修改应用当中，如果要修改DataFrame中的数据，同样也有很方便的操作方法，pandas提供了一系列的方法，可以按列、坐标位置、条件等去修改，下面…

人工智能 2023年7月8日
0042
DDPM代码详细解读(2)：Unet结构、正向和逆向过程、IS和FID测试、EMA优化

以下是将 Unet_和门 _结构_结合的 _PyTorch 代码： import torch import torch.nn as nn import torch.nn.funct…

人工智能 2023年6月24日
00237
信号加窗

时域信号x，频域信号X，时域窗函数w，频域窗函数W卷积convolute()，乘积multiply()傅里叶变换F[] 时域加窗：multiply(w,x)F[multiply(w…

人工智能 2023年5月27日
00100
论文解读：AdderSR Towards Energy Efficient Image Super-Reso

本文主要介绍的是加法神经网络的超分应用 (CVPR 2021 Oral），这篇是华为诺亚AdderNet的衍生版本，在超分辨率上的应用。 AdderNet显著降低了分类网络的能耗，…

人工智能 2023年7月13日
0059
Seq2Seq 模型详解

在NLP任务中，我们通常会遇到不定长的语言序列，比如机器翻译任务中，输入可能是一段不定长的英文文本，输出可能是不定长的中文或者法语序列。当遇到输入和输出都是不定长的序列时，可以使用…

人工智能 2023年5月27日
0079
PyTorch&CUDA安装过程及测试

PyTorch&CUDA安装过程及测试 1 准备工作 1.1 Anaconda&NVDIA驱动之前安装过了Anaconda和NVIDIA驱动，所以直接跳过这几步。…

人工智能 2023年7月22日
0046
DD Course-01: 从0到1，在浏览器里运行 Disco Diffusion (全网最详细教程)

部分章节由团队小吴撰写首先来澄清一个常见的误解：有一些人认为（以下简称 DD）是由 Google 开发的，其实它只是运行在 Google 免费提供的计算资源上。 DD 是 …

人工智能 2023年6月15日
0081

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

贝叶斯推理三种方法：MCMC 、HMC和SBI

; 数据

马尔可夫链蒙特卡罗 Markov Chain Monte Carlo

哈密尔顿蒙特卡洛 Hamiltonian Monte Carlo

基于仿真的推理 Simulation-based Inference

大家都在看