基于FPGA的CNN卷积神经网络加速器

目录

1、本文背景

2、高级设计

2.1数学概述:

3、硬件设计

3.1 输入图像

3.2 VGA/摄像头

3.3卷积第一层

3.4 池化层

3.4 卷积第二层

3.5部分和

3.6第一个全连接层

3.7第二个全连接层

4、软件设计

5、系统设计

6、测试

7、硬件错误和问题

8、结果

10、可用性

11、结论

12、知识产权注意事项

13、改进和未来工作

14、Verilog代码和C代码

1、本文背景

神经网络是一种基于大脑神经网络的机器学习模型。一系列节点排列在”层”中,通过操作和权重相互连接。该模型已证明在图像分类任务中取得了成功,这些任务如今具有许多应用,从自动驾驶汽车到面部识别。标准CNN 可以具有浮点权重和特征图——这些需要大量的内存资源和计算能力来实现必要的乘法器等。

二元神经网络利用二值化特征图和权重,这大大减少了所需的存储和计算资源量,并使在资源受限系统(如FPGA )上的硬件中合成它们成为可能。我们实现的网络基于使用Tensorflow 机器学习库在Python 中实现的软件模型。Python 代码由康奈尔大学博士生Ritchie Zhao 提供。Verilog 代码在硬件中实现了用于构建软件模型的各个层和功能。该系统旨在对数字进行分类,并使用MNIST 数据集的一个子集来训练模型,并产生了大约40%的测试准确率。这可以通过使用非二值化特征图和实现附加功能(例如批量归一化)来改进。

Verilog 模型用于执行推理任务,但不训练用于计算的权重。相反,使用的权重由Python 实现生成,并在Verilog 模型中硬编码。当神经网络用于分类时,训练权重很耗时并且不是实时完成的。因此,我们选择将模型重点放在分类任务上,并使用预训练的权重进行计算。我们最初计划使用HPS 传递FPGA 使用的权重;然而,这导致使用了过多的逻辑单元并且设计不适合设备。

2、高级设计

****

计算不同输出特征图所涉及的数学主要限于乘法和加法运算。由于我们设计中的权重是二进制值,乘法运算可以替换为三元运算符,这些运算符决定一个值在”乘”以1 或-1 后是否必须加上或减去(0 的权重被视为-1 ).这大大减少了实现设计所需的DSP 模块数量。卷积操作是通过在输入特征图上”滑动”过滤器来执行的。重叠索引彼此相乘并相加以形成相应输出索引处的值。二值化是通过确定被二值化的值的符号并相应地将输出值分配为-1 或1 来实现的。虽然真正的二值化涉及将输出转换为1 或0 而不是1 或-1 ,但此网络所需的计算使其更有效地转换为1 或-1 。对于本报告的其余部分,对二值化的引用是指将数字转换为1 或-1 ,而不是1 或0 。池化操作涉及检查给定值集中的最大值并将输出分配给该最大值。下面的图片描述了所有这些过程。

基于FPGA的CNN卷积神经网络加速器

图1 :卷积示例

基于FPGA的CNN卷积神经网络加速器

图2 :池化示例

基于FPGA的CNN卷积神经网络加速器

图3 :二值化示例

总体概述:

二元神经网络由两个卷积层、两个池化层和两个全连接层组成。输入图像是一个7 x 7 的两位黑白图像。图像在底部和右侧填充-1s 以创建一个8 x 8 的图像,该图像被输入网络。第一个卷积层将输入图像与16 个3 x 3 滤波器进行卷积,以产生16 个8 x 8 输出映射,这些映射被二值化为仅包含1 和-1 。然后将这16 个映射合并以形成16 个4 x 4 的输出映射,然后将其馈入第二个卷积层。第二个卷积层包含512 个3 x 3 滤波器。每张图像都与32 个独特的过滤器进行卷积,以产生32 个4 x 4 的输出特征图。然后将这些二值化和池化,将它们变成2 x 2 的输出映射,然后传递到全连接层。第一个全连接层将传入的32 个2 x 2 特征映射展平为一个128 个条目的数组。然后将该数组与一个128 x 32 的滤波器数组进行矩阵相乘,以生成大小为32 的输出数组。然后将该输出数组二值化并乘以最终全连接层中的32 x 10 滤波器矩阵,以生成一个10 项数组。此数组中的每个条目对应于输入图像是与该数组索引对应的数字的图像的概率。例如,数组中的第0 个条目表示输入图像为0 的可能性。如果数组中的第0 个条目具有数组中的最大值,则BNN 将推断输入为数字0 。然后将该数组与一个128 x 32 的滤波器数组进行矩阵相乘,以生成大小为32 的输出数组。然后将该输出数组二值化并乘以最终全连接层中的32 x 10 滤波器矩阵,以生成一个10 项数组。此数组中的每个条目对应于输入图像是与该数组索引对应的数字的图像的概率。例如,数组中的第0 个条目表示输入图像为0 的可能性。如果数组中的第0 个条目具有数组中的最大值,则BNN 将推断输入为数字0 。然后将该数组与一个128 x 32 的滤波器数组进行矩阵相乘,以生成大小为32 的输出数组。然后将该输出数组二值化并乘以最终全连接层中的32 x 10 滤波器矩阵,以生成一个10 项数组。此数组中的每个条目对应于输入图像是与该数组索引对应的数字的图像的概率。例如,数组中的第0 个条目表示输入图像为0 的可能性。如果数组中的第0 个条目具有数组中的最大值,则BNN 将推断输入为数字0 。此数组中的每个条目对应于输入图像是与该数组索引对应的数字的图像的概率。例如,数组中的第0 个条目表示输入图像为0 的可能性。如果数组中的第0 个条目具有数组中的最大值,则BNN 将推断输入为数字0 。此数组中的每个条目对应于输入图像是与该数组索引对应的数字的图像的概率。例如,数组中的第0 个条目表示输入图像为0 的可能性。如果数组中的第0 个条目具有数组中的最大值,则BNN 将推断输入为数字0 。

所有的特征图和权重数组都存储在寄存器中,卷积和矩阵乘法是使用三元运算符实现的。使用DSP 模块会导致设计所需的乘法器短缺。特征映射的两位大小和1 位权重数组导致最小的存储要求,消除了对M10K 块等内存单元的需要。每个层的所有权重都在Verilog 中硬编码。我们最初计划使用PIO 端口将HPS 馈入重量;然而,这导致使用了更多FPGA 中可用的ALM 。


****

来自MNIST 测试集的十个输入图像对应于十个数字中的每一个,在FPGA 上以Verilog 进行硬编码。FPGA 接收来自HPS 的输入选择信号,该信号用于在各种图像中挑选作为输入并馈入二值化卷积网络以生成数字预测输出。来自MNIST 测试集的输入图像平均池化为7 x 7 大小的1 位灰度矩阵。我们为每个条目使用2 位,因为输入被二值化为1 或-1 ,2’b01 代表黑色像素,2’b11 代表白色像素。然后,在将图像输入到第一个卷积层之前,我们用-1s 填充底行和右列以形成8 x 8 矩阵。这使得矩阵大小均匀,更容易在更多层中使用。

****

我们最初的计划是使用NTSC 摄像机捕捉实时图像或手写数字作为输入,并实时执行数字分类。我们从Avalon Bus Master to HPS 页面上的Bruce 视频代码开始,它通过Qsys 中的Video_In_Subsystem 模块将视频输入存储到片上SRAM ,并且有一个总线主控将像素从SRAM 复制到双端口SDRAM ,其中然后,VGA 控制器模块将SDRAM 数据显示在VGA 屏幕上。我们使用了代码和Qsys 视频子系统模块。我们能够将8 位RGB 颜色转换为2 位灰度,如下图所示,使用Video_In_Clipper 和Video_In_Scaler Qsys 模块将输入大小从320×240 修剪为224×224 ,然后使用池化在HPS 上创建7×7 图像.后来发现这个方案不可行,当我们在FPGA 上运行ALM 时,我们最常使用它来构建实际的二值化神经网络。因此,我们选择在FPGA 上对来自MNIST 数据集的一些现有输入图像进行硬编码,并发送一个选择信号以从中选择各种图像。

基于FPGA的CNN卷积神经网络加速器

图 4:224×224 2 位灰度到 7×7 1 位灰度

****

第一个卷积层使用16 个3 x 3 的滤波器,每个条目的大小为1 位。输入图像是一个8 x 8 矩阵,条目大小为1 位,并与每个过滤器卷积以生成16 个大小为8 x 8 的输出特征图。输入图像的边为零,使其成为10 x 10 矩阵。当与3 x 3 矩阵进行卷积时,会产生一个8 x 8 矩阵。

卷积是通过使用三元运算符来实现的,以确定过滤器中的位是1 还是0 ,从而将输入fmap 中的值与临时和相加或相减。为了节省空间,我们使用1 位权重(1 或0 )和三元运算符而不是两个位权重来表示1 和-1 。临时总和存储在临时特征输出中。这对输出特征图中的每个条目重复进行,并为16 个3 x 3 过滤器中的每一个并行发生。一旦计算出所有临时和值,这些值的符号位用于将+1 或-1 分配给输出特征图中的相应条目。基本上,如果临时和为正且大于0 ,我们将其分配给+1 。否则,我们将其分配给-1 。请注意,我们使用此实现将-1 分配给临时总和0 。这一层使用两个组合always 块实现,一个实现填充,一个计算卷积。每个块都包含嵌套的for 循环,允许并行计算所有临时总和。在代码的主体中,一个生成循环用于实现16 个这样的卷积单元,以允许并行计算16 个输出特征图中的每一个。

****

网络中有两个最大池化层,每个卷积层后面有一个。池化层将输出特征图缩小了两倍。第一个池化层将8 x 8 的特征映射转换为4 x 4 的映射,而第二个池化层将4 x 4 的特征映射转换为2 x 2 的映射。这是通过在四个值的平方中取最大值并将该值指定为一个条目来代替输出特征图中的所有四个值来完成的,从而减小尺寸。两层都使用for 循环来生成硬件以同时处理输入特征图中的所有元素。

****

第二个卷积层的实现方式与第一个大致相同。两个组合always 块用于填充图像并计算卷积的临时总和,然后将其存储在输出特征图中。与第一个卷积块不同,这里的输出不会立即二值化,因为必须首先计算部分和。16 个特征映射中的每一个与32 个独特的过滤器的卷积为每个输入特征映射创建32 个输出特征映射。然后将这32 个输出相加并进行二值化以创建32 个最终输出映射。在主要代码体中,生成块中嵌套的for 循环用于并行实现所有卷积。

****

部分求和层接收由第二个卷积层计算的16*32 4 x 4 个特征映射,并将与输入的16 个特征映射中的每一个对应的32 个映射相加到该层。部分和是使用32 4 x 4 累积临时和数组计算的。状态机用于首先在第一个状态下将数组中的所有值初始化为0 ,并在下一个状态中迭代传递到层的16 x 32 x 4 x 4 数组中的16 行。嵌套for 循环用于并行计算32 x 4 x 4 部分和-在此状态下16 个时钟周期后,部分和已计算完毕,状态机移至下一个状态。在这里,部分和被二值化并分配给32 x 4 x 4 输出特征图,该特征图被传递到第二个池化层。

基于FPGA的CNN卷积神经网络加速器

图 5:部分和

****

全连接层接收第二个池化层输出的32 x 2 x 2 矩阵,并将其展平以形成一维128 长度的数组。这乘以一个128 x 32 的矩阵以形成一个长度为32 的数组。这一层也是使用状态机和一个长度为32 的临时和数组来实现的。在第一个状态中,临时和值都被初始化为0 。在下一个状态中,三元运算符用于确定权重矩阵中的值是否为a 1 或0 和存储在扁平特征图的相应索引中的值分别从临时和中添加或减去。重复128 次迭代-二维权重数组中的行数。一个for 循环用于并行执行32 个这样的操作。在这之后的状态中,

基于FPGA的CNN卷积神经网络加速器

图 6:第一个全连接层

****

第二个全连接层的结构与第一个相同。它从前一层获取长度为32 的数组,并使用与前面描述的相同的状态机结构将其与大小为32 的权重矩阵乘以10 。输出矩阵是一个大小为10 的数组,具有8 位条目-值未二值化以提供有关数字分类的更多信息。

基于FPGA的CNN卷积神经网络加速器

图 7:第二个全连接层

基于FPGA的CNN卷积神经网络加速器

图 8:模型摘要


二值化神经网络的最终输出是一个长度为10 的数组。此最终输出数组的给定索引处的值对应于处理的图像是该索引号的图像的可能性。例如,如果索引0 处的值是数组中的最小值,则表明处理后的图像为0 的可能性最低。同样,如果索引5 处的值是数组中的最高值,这意味着BNN 推断图像最有可能是数字5 。我们通过了这10 个最终输出使用8 位宽PIO 端口将值从FPGA 传输到HPS 。然后HPS 处理10 个最终输出并将数字转换为概率尺度,以确定图像的前三个最可能的分类。串行控制台上HPS 的输出如上图所示。为了计算概率,我们首先将所有正的最终输出值相加以获得正推理指数的总和。然后可以通过将索引n 处的最终输出值除以正推理索引的总和来计算数字n 的概率。

基于FPGA的CNN卷积神经网络加速器

图 9:HPS 串行控制台输出


下图显示了我们设计的Qsys 实现。PIO 端口从HPS 连接到轻量级axi 主总线,并导出到不同存储器地址的FPGA 架构。Pio_switch 是我们用来选择在hps 上硬编码的各种输入图像作为BNN 的新输入的输出信号。一旦pio_swich 被选择并输出到FPGA ,HPS 将pio_start 从低电平切换到高电平以重新启动BNN 数字识别计算。在BNN 重启时,Pio_end 被设置为低,只有在BNN 完成最终输出数组的计算时,FPGA 才会设置为高。通过记录复位时的时间和pio_end 变高的时间,我们可以通过开始和结束时间的时间差来计算我们的BNN 计算时间,我们发现大约为4-5us 。

FPGA 完成计算后,三个PIO 端口(时钟信号pio_hps_image_clk ,数据信号pio_out_data 和片选信号pio_out_cs )依次接收FPGA 到HPS 的10 个最终输出。片选线通常保持低电平以重置索引。当片选为高电平时,最终输出阵列的相应索引将在时钟信号的每个上升沿加载到数据信号中。此后,索引递增。为了开始接收最终输出,HPS 将片选拉高,翻转时钟信号,然后在数据端口读取并存储该值,从而将最终输出数组的值存储在索引0 处。然后重复此过程9 次接收所有最终输出数组数据值。

基于FPGA的CNN卷积神经网络加速器

图 10:Qsys PIO 端口


我们在 Modelsim 上测试了我们设计的初始迭代,并采用了单元测试来确保我们的每个模块都按预期工作。我们实现了每个模块并传入已知的输入值和模拟结果以验证输出是否符合预期。一旦我们为所有涉及的层完成了这些,我们就开始实例化所有层并将它们相互连接起来。然后我们将所有权重值和输入图像设置为已知值,并监控整个网络的流量。

基于FPGA的CNN卷积神经网络加速器

图 11:Modelsim 输出

一旦我们的设计仿真正确,我们将其移到 FPGA 上,并使用 LED 和 PIO 端口查看每一层的输出,以确保设计在硬件中执行与在仿真中一样。由于 Modelsim 仅模拟并行执行,因此我们必须对 FPGA 上的设计重复所有测试,以实际验证我们的层是否按预期工作。我们发现的一些错误是顺序操作的并行实现,例如累积和导致 FPGA 上的计算不准确。在 Modelsim 中,这些模拟正确,因为软件中的执行实际上是顺序的,但在生成实际电路时情况并非如此。

在 FPGA 上调试时,通过将输出映射到 LED 或通过 PIO 端口将其发送到 HPS 后将其打印在串行控制台上来测试每一层的实现。将硬件计算值与软件实现的 Python 模型进行比较,以验证每一层是否按预期运行。虽然调试模型的最有效方法是通过 PIO 端口传递输出值并在串行控制台上打印出来,但我们最终在 FPGA 上运行了算术逻辑模块 (ALM)。此时,我们必须切换到将输出映射到板上 LED 以验证计算出的值是否准确。


虽然我们最初希望完全并行实施设计,但系统的某些元素使这不可行。网络的某些组件,例如部分求和模块,需要多个周期才能正确运行。对于此模块,必须依次执行 16 次加法运算才能计算出累加和。这 16 个操作不能并行执行,因此需要几个时钟周期才能执行。我们遇到的其他问题是在连接 PIO 端口以在 FPGA 和 HPS 之间传递数据以及将 FPGA 输出映射到板上的 LED 时,板上的 ALM 反复耗尽。添加端口或 LED 映射有时会导致实现设计所需的 ALM 资源大幅增加,从而导致设计不适合电路板。我们通过找到使用较少 ALM 的变通方法来解决这些问题 – 例如,我们没有从 HPS 传递权重,而是在 Verilog 文件中对其进行了硬编码。由于权重在分类过程中的任何时候都不会改变,因此这对功能没有任何影响。


下图显示了我们最终演示的 LCD 显示屏。将显示前三个计算出的概率,以及传递到网络的 8 x 8 输入图像。完整的二值化神经网络能够准确地执行对图像进行分类所需的计算。将每一层的输出与软件中相应实现的输出进行比较,以验证是否正在执行预期的计算。软件准确度的预期准确度为 33% – 由于硬件模型模仿软件模型的计算,因此硬件分类器的预期准确度也可以假设为 33%。

基于FPGA的CNN卷积神经网络加速器

图 12:显示输出:数字 1

基于FPGA的CNN卷积神经网络加速器

图 13:显示输出:数字 5

软件模型的计算速度是通过将表示计算已经完成的完成信号传回HPS 并测量从HPS 发送到FPGA 的开始信号到从FPGA 发送完成信号之间的时间来衡量的回到HPS 。发现该FPGA BNN 计算时间约为0.004 ms 或4us 。另一方面,在PC 上运行的相同BNN 的Python 实现大约需要44us 。这个时间测量是通过在y_conv 上运行Tensorflow Eval 函数所需的持续时间来计算的:y_conv.eval(feed_dict=test_dict),其中y_conv 是BNN 的最后一个张量层。在1 个批次大小中,我们测量了处理1 个输入所需的时间,大约为64.4 毫秒,我们还测量了处理180 个输入所需的时间,大约为72.4 毫秒。因为CNN 的处理时间是加载权重和计算的总时间,为了粗略估计计算权重的时间,我们使用时间差和(72.4ms-64.4ms)/180 数据= 44us/数据。请注意,我们在四核PC 上运行Python 代码。PC 下测量时差存在不稳定性,各种因素会导致时间测量发生变化
9 、资源使用

下表总结了我们设计的最终实现所使用的一些不同资源。可以看出,BNN 仅使用FPGA 上可用总内存的一小部分,并且三元运算符的使用最大限度地减少了对乘法器/DSP 模块的需求。最常用的资源是ALM ,但当不包括用于将输出数据传输到HPS 、在设计中传达开始和结束信号等的PIO 端口时,其中一半以上的资源仍然可以在板上使用.这些结果证实了BNN 的低资源需求。

基于FPGA的CNN卷积神经网络加速器

图 14:资源使用情况摘要

、可用性

当前的设计不是非常灵活,因为输入图像必须硬编码到 Verilog 代码中才能进行处理。由于权重也是硬编码的,因此对这些权重的任何更改也需要修改和重新编译代码。通过使用 PIO 端口或 SRAM 存储器将权重从 HPS 传输到 FPGA,可以使设计更具可配置性;然而,在我们当前的实现中,引入这些元素中的任何一个都会导致设计不适合 FPGA。虽然数字分类本质上并不是一项非常广泛适用的任务,但图像分类今天有很多用途。硬件分类器的加速使其更适合时间是主要约束条件的实时分类任务。


在大多数情况下,我们的实施符合我们的期望。我们最初希望获得更高的准确度;直到开发过程的后期,我们才注意到 Python 实现中的错误。纠正这个错误对于使 Python 设计真正二进制至关重要,但也导致准确度下降了大约 0.4(从大约 0.8 到 0.4)。对网络硬件的更改可以适应提高准确性所需的更改,但实施这些更改需要时间超过我们的截止日期。因此,我们选择继续实施较低精度的模型。

我们希望在我们的模型中包含的一个功能是一个摄像头接口,它允许实时捕获、合并图像并馈送到 BNN。虽然我们拥有实现此类系统所需的 Verilog 和 HPS 代码,但将此功能纳入设计会导致所需的 ALM 总数超过板上可用的数量 – 在添加这些更改之前,我们的设计使用了大约 28,000 个 ALM,添加它们后,计数跃升至 38,000 左右。


实施的网络基于博士生 Ritchie Zhao 实施的框架。提供的代码也部分基于康奈尔大学高级课程的课堂作业。虽然没有专利或商标问题,但也没有专利机会,因为我们的硬件所基于的软件设计不是我们自己的设计。我们的 FPGA 代码是使用 ECE 5760 课程网页上提供的一些资源构建的。例如,我们用来与 VGA 显示器接口的代码来自类网站上的示例程序。除了参考相关语法和操作的在线资源之外,我们没有使用来自公共领域的任何其他代码。我们知道,我们的设计没有引起任何法律考虑。


如果我们重新完成这个项目,我们将改变的事情可能包括修改网络的设计,以支持来自每一层的二进制权重和非二进制输出特征图,因为这可以提高准确性。然而,虽然我们当前的实现使用很少的寄存器,但使用了很大比例的可用 ALM,因此这种实现可能不可行。另一个潜在的变化可能是改变网络的大小。目前,第一个卷积层有 16 个输出特征图,第二个卷积层和第一个全连接层有 32 个输出特征图。这些数字可以分别减少到 8、16 和 16。虽然这可能会导致精度下降,但较小的尺寸可以使设计适合电路板,而不会占用大量可用资源,

该模型的进一步改进可能包括扩展分类以处理来自不同数据集的图像,例如 CIFAR10,而不仅仅是数字。用于处理此类图像的神经网络比我们实现的神经网络更复杂,通常需要更多的内存和计算资源。由于我们已经在用这个网络推动 FPGA 计算资源的极限,我们可能需要使用更大的板来实现任何更复杂的东西。

14、Verilog代码和C代码


//verilog
// synthesis VERILOG_INPUT_VERSION SYSTEMVERILOG_2005
module DE1_SoC_Computer (

    // FPGA Pins

    // Clock pins
    CLOCK_50,
    CLOCK2_50,
    CLOCK3_50,
    CLOCK4_50,

    // ADC
    ADC_CS_N,
    ADC_DIN,
    ADC_DOUT,
    ADC_SCLK,

    // Audio
    AUD_ADCDAT,
    AUD_ADCLRCK,
    AUD_BCLK,
    AUD_DACDAT,
    AUD_DACLRCK,
    AUD_XCK,

    // SDRAM
    DRAM_ADDR,
    DRAM_BA,
    DRAM_CAS_N,
    DRAM_CKE,
    DRAM_CLK,
    DRAM_CS_N,
    DRAM_DQ,
    DRAM_LDQM,
    DRAM_RAS_N,
    DRAM_UDQM,
    DRAM_WE_N,

    // I2C Bus for Configuration of the Audio and Video-In Chips
    FPGA_I2C_SCLK,
    FPGA_I2C_SDAT,

    // 40-Pin Headers
    GPIO_0,
    GPIO_1,

    // Seven Segment Displays
    HEX0,
    HEX1,
    HEX2,
    HEX3,
    HEX4,
    HEX5,

    // IR
    IRDA_RXD,
    IRDA_TXD,

    // Pushbuttons
    KEY,

    // LEDs
    LEDR,

    // PS2 Ports
    PS2_CLK,
    PS2_DAT,

    PS2_CLK2,
    PS2_DAT2,

    // Slider Switches
    SW,

    // Video-In
    TD_CLK27,
    TD_DATA,
    TD_HS,
    TD_RESET_N,
    TD_VS,

    // VGA
    VGA_B,
    VGA_BLANK_N,
    VGA_CLK,
    VGA_G,
    VGA_HS,
    VGA_R,
    VGA_SYNC_N,
    VGA_VS,

    // HPS Pins

    // DDR3 SDRAM
    HPS_DDR3_ADDR,
    HPS_DDR3_BA,
    HPS_DDR3_CAS_N,
    HPS_DDR3_CKE,
    HPS_DDR3_CK_N,
    HPS_DDR3_CK_P,
    HPS_DDR3_CS_N,
    HPS_DDR3_DM,
    HPS_DDR3_DQ,
    HPS_DDR3_DQS_N,
    HPS_DDR3_DQS_P,
    HPS_DDR3_ODT,
    HPS_DDR3_RAS_N,
    HPS_DDR3_RESET_N,
    HPS_DDR3_RZQ,
    HPS_DDR3_WE_N,

    // Ethernet
    HPS_ENET_GTX_CLK,
    HPS_ENET_INT_N,
    HPS_ENET_MDC,
    HPS_ENET_MDIO,
    HPS_ENET_RX_CLK,
    HPS_ENET_RX_DATA,
    HPS_ENET_RX_DV,
    HPS_ENET_TX_DATA,
    HPS_ENET_TX_EN,

    // Flash
    HPS_FLASH_DATA,
    HPS_FLASH_DCLK,
    HPS_FLASH_NCSO,

    // Accelerometer
    HPS_GSENSOR_INT,

    // General Purpose I/O
    HPS_GPIO,

    // I2C
    HPS_I2C_CONTROL,
    HPS_I2C1_SCLK,
    HPS_I2C1_SDAT,
    HPS_I2C2_SCLK,
    HPS_I2C2_SDAT,

    // Pushbutton
    HPS_KEY,

    // LED
    HPS_LED,

    // SD Card
    HPS_SD_CLK,
    HPS_SD_CMD,
    HPS_SD_DATA,

    // SPI
    HPS_SPIM_CLK,
    HPS_SPIM_MISO,
    HPS_SPIM_MOSI,
    HPS_SPIM_SS,

    // UART
    HPS_UART_RX,
    HPS_UART_TX,

    // USB
    HPS_CONV_USB_N,
    HPS_USB_CLKOUT,
    HPS_USB_DATA,
    HPS_USB_DIR,
    HPS_USB_NXT,
    HPS_USB_STP
);

//=======================================================
//  PARAMETER declarations
//=======================================================

//=======================================================
//  PORT declarations
//=======================================================

// FPGA Pins

// Clock pins
input                       CLOCK_50;
input                       CLOCK2_50;
input                       CLOCK3_50;
input                       CLOCK4_50;

// ADC
inout                       ADC_CS_N;
output                  ADC_DIN;
input                       ADC_DOUT;
output                  ADC_SCLK;

// Audio
input                       AUD_ADCDAT;
inout                       AUD_ADCLRCK;
inout                       AUD_BCLK;
output                  AUD_DACDAT;
inout                       AUD_DACLRCK;
output                  AUD_XCK;

// SDRAM
output      [12: 0] DRAM_ADDR;
output      [ 1: 0] DRAM_BA;
output                  DRAM_CAS_N;
output                  DRAM_CKE;
output                  DRAM_CLK;
output                  DRAM_CS_N;
inout           [15: 0] DRAM_DQ;
output                  DRAM_LDQM;
output                  DRAM_RAS_N;
output                  DRAM_UDQM;
output                  DRAM_WE_N;

// I2C Bus for Configuration of the Audio and Video-In Chips
output                  FPGA_I2C_SCLK;
inout                       FPGA_I2C_SDAT;

// 40-pin headers
inout           [35: 0] GPIO_0;
inout           [35: 0] GPIO_1;

// Seven Segment Displays
output      [ 6: 0] HEX0;
output      [ 6: 0] HEX1;
output      [ 6: 0] HEX2;
output      [ 6: 0] HEX3;
output      [ 6: 0] HEX4;
output      [ 6: 0] HEX5;

// IR
input                       IRDA_RXD;
output                  IRDA_TXD;

// Pushbuttons
input           [ 3: 0] KEY;

// LEDs
output      [ 9: 0] LEDR;

// PS2 Ports
inout                       PS2_CLK;
inout                       PS2_DAT;

inout                       PS2_CLK2;
inout                       PS2_DAT2;

// Slider Switches
input           [ 9: 0] SW;

// Video-In
input                       TD_CLK27;
input           [ 7: 0] TD_DATA;
input                       TD_HS;
output                  TD_RESET_N;
input                       TD_VS;

// VGA
output      [ 7: 0] VGA_B;
output                  VGA_BLANK_N;
output                  VGA_CLK;
output      [ 7: 0] VGA_G;
output                  VGA_HS;
output      [ 7: 0] VGA_R;
output                  VGA_SYNC_N;
output                  VGA_VS;

// HPS Pins

// DDR3 SDRAM
output      [14: 0] HPS_DDR3_ADDR;
output      [ 2: 0]  HPS_DDR3_BA;
output                  HPS_DDR3_CAS_N;
output                  HPS_DDR3_CKE;
output                  HPS_DDR3_CK_N;
output                  HPS_DDR3_CK_P;
output                  HPS_DDR3_CS_N;
output      [ 3: 0] HPS_DDR3_DM;
inout           [31: 0] HPS_DDR3_DQ;
inout           [ 3: 0] HPS_DDR3_DQS_N;
inout           [ 3: 0] HPS_DDR3_DQS_P;
output                  HPS_DDR3_ODT;
output                  HPS_DDR3_RAS_N;
output                  HPS_DDR3_RESET_N;
input                       HPS_DDR3_RZQ;
output                  HPS_DDR3_WE_N;

// Ethernet
output                  HPS_ENET_GTX_CLK;
inout                       HPS_ENET_INT_N;
output                  HPS_ENET_MDC;
inout                       HPS_ENET_MDIO;
input                       HPS_ENET_RX_CLK;
input           [ 3: 0] HPS_ENET_RX_DATA;
input                       HPS_ENET_RX_DV;
output      [ 3: 0] HPS_ENET_TX_DATA;
output                  HPS_ENET_TX_EN;

// Flash
inout           [ 3: 0] HPS_FLASH_DATA;
output                  HPS_FLASH_DCLK;
output                  HPS_FLASH_NCSO;

// Accelerometer
inout                       HPS_GSENSOR_INT;

// General Purpose I/O
inout           [ 1: 0] HPS_GPIO;

// I2C
inout                       HPS_I2C_CONTROL;
inout                       HPS_I2C1_SCLK;
inout                       HPS_I2C1_SDAT;
inout                       HPS_I2C2_SCLK;
inout                       HPS_I2C2_SDAT;

// Pushbutton
inout                       HPS_KEY;

// LED
inout                       HPS_LED;

// SD Card
output                  HPS_SD_CLK;
inout                       HPS_SD_CMD;
inout           [ 3: 0] HPS_SD_DATA;

// SPI
output                  HPS_SPIM_CLK;
input                       HPS_SPIM_MISO;
output                  HPS_SPIM_MOSI;
inout                       HPS_SPIM_SS;

// UART
input                       HPS_UART_RX;
output                  HPS_UART_TX;

// USB
inout                       HPS_CONV_USB_N;
input                       HPS_USB_CLKOUT;
inout           [ 7: 0] HPS_USB_DATA;
input                       HPS_USB_DIR;
input                       HPS_USB_NXT;
output                  HPS_USB_STP;

//=======================================================
//  REG/WIRE declarations
//=======================================================

//wire          [15: 0] hex3_hex0;
//wire          [15: 0] hex5_hex4;

//assign HEX0 = ~hex3_hex0[ 6: 0]; // hex3_hex0[ 6: 0];
//assign HEX1 = ~hex3_hex0[14: 8];
//assign HEX2 = ~hex3_hex0[22:16];
//assign HEX3 = ~hex3_hex0[30:24];
//assign HEX4 = 7'b1111111;
//assign HEX5 = 7'b1111111;
//assign HEX0 = test[6:0]; // hex3_hex0[ 6: 0];

//HexDigit Digit0(HEX0, final_out[1][7:4]);//hex3_hex0[3:0]);
//HexDigit Digit1(HEX1, final_out[1][3:0]);
//HexDigit Digit2(HEX2, hex3_hex0[11:8]);
//HexDigit Digit3(HEX3, hex3_hex0[15:12]);

// MAY need to cycle this switch on power-up to get video
assign TD_RESET_N = SW[1];

// get some signals exposed
// connect bus master signals to i/o for probes
//assign GPIO_0[0] = TD_HS ;
//assign GPIO_0[1] = TD_VS ;
//assign GPIO_0[2] = TD_DATA[6] ;
//assign GPIO_0[3] = TD_CLK27 ;
//assign GPIO_0[4] = TD_RESET_N ;

//=======================================================
// Bus controller for AVALON bus-master
//=======================================================
wire [31:0] vga_bus_addr, video_in_bus_addr ; // Avalon addresses
reg  [31:0] bus_addr ;
wire [31:0] vga_out_base_address = 32'h0000_0000 ;  // Avalon address
wire [31:0] video_in_base_address = 32'h0800_0000 ;  // Avalon address
reg [3:0] bus_byte_enable ; // four bit byte read/write mask
reg bus_read  ;       // high when requesting data
reg bus_write ;      //  high when writing data
reg [31:0] bus_write_data ; //  data to send to Avalog bus
wire bus_ack  ;       //  Avalon bus raises this when done
wire [31:0] bus_read_data ; // data from Avalon bus
reg [31:0] timer ;
reg [3:0] state ;
reg last_vs, wait_one;
reg [19:0] vs_count ;
reg last_hs, wait_one_hs ;
reg [19:0] hs_count ;

// pixel address is
logic [9:0] vga_x_cood, vga_y_cood, video_in_x_cood, video_in_y_cood ;
reg [7:0] current_pixel_color1, current_pixel_color2 ;
// compute address
// 640 x 480, ceil(log2 640) = 10
assign vga_bus_addr = vga_out_base_address + {22'b0,video_in_x_cood + vga_x_cood} +
 ({22'b0,video_in_y_cood + vga_y_cood}<<10) 8 10 224 320 432 765 ; video in: by 240, x:0-319, y:0-239 x ceil(log2 320)="9" 224, x:0-223, y:0-223 224)="7.8" = assign video_in_bus_addr="video_in_base_address" + {22'b0,video_in_x_cood} ({22'b0,video_in_y_cood}<<8) logic [7:0] greyscale8; [1:0] greyscale; greyscale="(bus_read_data[6:5]">>1) + (bus_read_data[3:2]>>1);
assign greyscale8 = {{2{1'b0, greyscale}}, greyscale};

logic [9:0] vga_x_cood_2, vga_y_cood_2;
logic [31:0] vga_bus_addr_2;
assign vga_bus_addr_2 = vga_out_base_address + {22'b0,video_in_x_cood + vga_x_cood_2} +
 ({22'b0,video_in_y_cood + vga_y_cood_2}<<10) 31 ; logic [1:0] image_array [320][240]; always @(posedge clock2_50) begin clock_50 reset state machine and read write controls if (~key[0]) <="0" bus_read set to one a opeation from bus bus_write on operation base address of upper-left corner the screen vga_x_cood vga_y_cood vga_x_cood_2 vga_y_cood_2 video_in_x_cood video_in_y_cood bus_byte_enable timer end else + 1; bus-master put in small delay aviod hogging can be 2**n-1, so 3, 7, 15, bigger numbers mean slower frame update vga (state="=0" && sw[0] (timer & 3)="=0" ) all pixels video input 10'd1 (video_in_x_cood> 10'd223) begin
            video_in_x_cood <= 0 ; video_in_y_cood <="video_in_y_cood" + 10'd1 if (video_in_y_cood> 10'd223) begin
                video_in_y_cood <= 0 1 2 3 4 5 7 8 10 16 32 39 85 119 128 137 171 10'd0 ; end one byte data bus_byte_enable <="4'b0001;" read first pixel bus_addr signal the bus that a is requested bus_read finish you must do this check if (state="=1" && bus_ack="=1)" begin state (!sw[2]) current_pixel_color1 else write to vga memory bus_write bus_write_data image_array[video_in_x_cood][video_in_y_cood] and =="=============================" logic pio_start; pio_end; [2:0] pio_switch; different input images corresponding numbers always @ (*) (pio_switch="=3'd1)" ledr[7:0]="final_out[3];" 1. idx input_image="{
                " {-1,-1,-1,-1,-1,-1,-1,-1}, '{-1,-1,-1,-1, 1, 1,-1,-1}, '{-1,-1, '{-1, 1,-1,-1, 1,-1,-1,-1}, 1,-1,-1,-1,-1}, '{-1,-1,-1,-1,-1,-1,-1,-1}, '{-1,-1,-1,-1,-1,-1,-1,-1} }; 2. '{-1,-1,-1, 3. 4. 1,-1,-1,-1,-1,-1}, 1,-1, 5. '{-1,-1,-1,-1,-1, 6. {-1,-1,-1,-1,-1,-1,1,-1}, '{-1,-1,-1,-1,1,1,-1,-1}, '{-1,-1,-1,-1,1,-1,-1,-1}, '{-1,-1,-1,1,-1,-1,-1,-1}, '{-1,-1,1,1,-1,-1,-1,-1}, '{-1,-1,1,-1,-1,-1,-1,-1}, * (sw[9]) (sw[8]) (sw[7]) (sw[6]) (sw[5]) (sw[4]) (sw[3]) (sw[2]) {-1,-1,-1,-1,-1,-1,-1,1}, (sw[1]) '{-1,-1,-1,-1,1,1,-1,1}, final_out[0]; outputs hps signed [7:0] pio_out_data; pio_out_cs; integer out_count; (posedge pio_hps_image_clk) (pio_out_cs) (out_count<10) pio_out_data out_count + 1; weight initialization conv filters - 3x3s : filter [16][3][3]; { '{1,1,0},'{1,0,0},'{0,1,0} }, '{ '{1,0,1},'{0,0,0},'{1,0,0} '{0,1,1},'{1,1,1},'{1,0,1} '{1,0,1},'{1,1,1},'{1,1,0} '{0,0,0},'{1,0,1},'{1,1,1} '{1,0,0},'{1,1,1},'{1,0,1} '{0,1,1},'{1,1,0},'{0,1,1} '{0,0,1},'{1,0,1},'{1,1,0} '{0,1,1},'{1,1,1},'{0,0,0} '{0,1,0},'{0,1,1},'{0,0,0} '{1,1,1},'{0,1,1},'{1,1,1} '{1,1,0},'{0,1,1},'{1,0,1} '{0,0,1},'{0,1,0},'{0,1,0} '{0,1,1},'{1,0,0},'{0,0,0} '{0,0,0},'{1,0,0},'{0,0,0} '{1,1,1},'{1,1,1},'{1,1,1} } second 16*32 in row column format: filters_conv2 [16][32][3][3]; '{1,0,0},'{0,1,1},'{0,0,1} '{1,0,1},'{1,1,1},'{1,1,1} '{1,1,1},'{1,1,0},'{0,0,0} '{1,1,0},'{1,0,1},'{1,0,0} '{0,0,1},'{1,1,0},'{1,1,0} '{1,1,0},'{0,0,1},'{0,0,1} '{1,1,0},'{1,1,1},'{1,1,1} '{0,1,1},'{1,0,1},'{0,0,0} '{0,0,1},'{1,0,1},'{1,0,0} '{0,0,1},'{1,0,1},'{0,1,0} '{0,0,1},'{0,0,0},'{0,0,0} '{1,0,0},'{1,0,1},'{1,0,0} '{1,1,0},'{0,1,0},'{1,1,0} '{1,1,0},'{0,1,0},'{0,1,0} '{0,1,1},'{0,0,0},'{1,0,0} '{1,1,0},'{1,1,1},'{0,1,0} '{0,0,1},'{1,0,0},'{1,1,1} '{1,1,0},'{0,1,0},'{1,1,1} '{0,1,0},'{1,1,1},'{0,0,0} '{1,0,1},'{0,1,0},'{1,0,1} '{1,0,1},'{0,1,1},'{0,0,1} '{0,0,1},'{1,0,0},'{1,0,0} '{0,1,0},'{0,1,1},'{0,0,1} '{0,1,1},'{0,1,1},'{0,1,1} '{0,0,1},'{1,0,1},'{1,1,1} '{0,0,0},'{1,1,0},'{1,0,0} '{0,0,0},'{0,0,1},'{0,1,1} '{0,1,1},'{0,0,0},'{1,1,1} '{1,0,1},'{1,1,0},'{1,0,1} '{0,0,0},'{0,1,1},'{0,0,0} '{1,0,1},'{1,0,1},'{0,1,1} '{1,0,1},'{0,1,1},'{0,1,1} '{0,1,1},'{1,0,1},'{1,0,1} '{0,1,1},'{1,1,0},'{0,0,0} '{1,1,0},'{1,0,1},'{1,1,1} '{0,0,1},'{1,0,1},'{0,1,1} '{1,0,0},'{1,1,0},'{1,1,0} '{1,0,0},'{0,0,1},'{1,1,1} '{0,0,0},'{1,1,0},'{0,0,1} '{1,0,1},'{1,0,0},'{1,1,1} '{0,1,1},'{0,0,1},'{1,1,0} '{1,1,0},'{1,1,1},'{1,0,0} '{1,0,1},'{0,0,0},'{0,0,1} '{1,1,0},'{0,1,1},'{1,1,1} '{1,0,0},'{0,0,1},'{1,0,0} '{1,0,1},'{0,1,1},'{0,1,0} '{1,1,0},'{1,1,0},'{1,0,1} '{1,1,1},'{1,0,1},'{1,1,0} '{0,0,0},'{0,1,1},'{1,1,1} '{1,0,1},'{1,0,1},'{1,0,1} '{1,0,1},'{1,1,0},'{1,1,0} '{1,0,1},'{1,1,1},'{0,1,1} '{1,1,1},'{1,1,0},'{0,1,0} '{1,1,0},'{1,1,1},'{1,1,0} '{0,1,0},'{1,0,1},'{1,1,1} '{1,1,1},'{0,1,0},'{0,0,1} '{0,1,0},'{1,1,1},'{1,1,1} '{1,1,0},'{0,1,1},'{0,1,0} '{0,1,1},'{0,0,0},'{1,1,0} '{1,1,0},'{1,0,0},'{1,0,0} '{0,0,0},'{1,1,1},'{1,0,0} '{0,1,1},'{1,0,1},'{0,0,1} '{0,0,1},'{1,0,0},'{0,0,0} '{1,0,1},'{1,0,1},'{1,1,0} '{1,1,0},'{0,0,1},'{0,1,1} '{1,1,1},'{1,1,0},'{1,0,0} '{1,1,1},'{0,1,0},'{1,1,1} '{0,0,1},'{1,1,1},'{0,1,1} '{1,0,0},'{1,1,0},'{1,1,1} '{0,0,1},'{1,1,1},'{1,0,0} '{0,0,0},'{1,1,1},'{0,1,0} '{1,1,1},'{0,0,0},'{1,0,1} '{0,1,1},'{0,0,1},'{0,0,0} '{0,0,0},'{1,1,1},'{0,1,1} '{1,1,1},'{1,0,0},'{0,0,1} '{0,1,0},'{0,0,0},'{1,1,1} '{0,0,1},'{0,0,0},'{1,1,1} '{0,1,0},'{1,0,1},'{1,0,0} '{0,0,1},'{0,1,0},'{1,0,1} '{0,1,1},'{1,1,0},'{1,0,1} '{1,0,1},'{0,0,0},'{1,1,0} '{1,1,0},'{1,0,1},'{0,0,0} '{0,0,0},'{0,1,0},'{0,1,0} '{0,1,0},'{0,0,0},'{1,0,1} '{0,1,1},'{0,0,0},'{0,1,1} '{0,1,1},'{0,0,1},'{1,0,1} '{0,1,1},'{0,1,1},'{0,0,1} '{1,0,0},'{1,0,1},'{1,1,1} '{1,0,1},'{1,1,1},'{0,1,0} '{1,1,1},'{0,0,0},'{0,0,0} '{0,1,0},'{1,1,1},'{1,0,1} '{0,1,1},'{0,0,0},'{1,0,1} '{1,0,1},'{1,0,0},'{0,0,0} '{0,1,1},'{1,1,1},'{1,1,0} '{1,1,1},'{1,0,1},'{0,1,1} '{1,1,0},'{0,0,1},'{1,1,0} '{1,0,0},'{0,1,1},'{1,1,0} '{0,1,0},'{1,0,1},'{0,1,1} '{1,1,1},'{1,0,1},'{1,1,1} '{0,0,0},'{1,1,0},'{1,1,0} '{1,1,1},'{0,0,1},'{1,0,0} '{1,1,1},'{1,0,1},'{0,0,0} '{0,0,0},'{1,1,1},'{1,0,1} '{0,1,1},'{1,0,1},'{1,1,1} '{0,0,0},'{0,0,1},'{1,0,0} '{1,0,1},'{0,1,1},'{0,0,0} '{1,1,1},'{0,1,1},'{0,0,1} '{0,1,0},'{1,1,0},'{1,1,1} '{1,0,0},'{0,1,1},'{0,0,0} '{0,1,0},'{0,0,0},'{1,0,0} '{0,0,1},'{1,1,0},'{1,1,1} '{0,1,0},'{1,1,0},'{0,1,0} '{0,1,1},'{1,1,0},'{0,0,1} '{0,1,1},'{0,1,1},'{1,0,1} '{1,0,0},'{0,1,0},'{1,1,1} '{0,1,1},'{1,0,0},'{1,1,0} '{1,1,1},'{0,1,1},'{0,0,0} '{1,0,0},'{1,0,1},'{1,1,0} '{1,1,1},'{0,0,1},'{0,0,1} '{0,0,0},'{1,0,0},'{0,1,0} '{1,0,1},'{0,1,1},'{1,1,1} '{0,1,1},'{1,1,1},'{0,1,0} '{0,1,0},'{1,1,1},'{0,1,1} '{0,1,1},'{0,0,1},'{1,1,1} '{0,1,1},'{0,1,0},'{0,1,0} '{1,0,0},'{1,0,0},'{0,1,0} '{1,1,1},'{1,0,0},'{1,1,0} '{0,1,0},'{1,0,1},'{0,1,0} '{0,0,0},'{0,0,0},'{0,1,1} '{0,0,0},'{0,0,1},'{0,1,0} '{1,0,0},'{0,0,0},'{0,1,1} '{0,1,0},'{0,1,0},'{1,1,1} '{1,1,1},'{0,1,0},'{0,1,0} '{0,1,0},'{1,0,1},'{0,0,0} '{1,0,0},'{1,0,1},'{0,1,1} '{0,0,0},'{1,0,0},'{1,0,1} '{1,0,1},'{1,1,0},'{0,0,1} '{0,0,1},'{1,1,0},'{0,1,1} '{0,1,0},'{0,0,1},'{0,1,0} '{1,1,0},'{0,0,0},'{1,0,1} '{1,1,1},'{1,1,1},'{1,1,0} '{0,1,1},'{1,0,1},'{1,0,0} '{0,1,1},'{0,1,1},'{1,1,1} '{1,0,0},'{0,0,0},'{1,0,0} '{1,1,1},'{0,0,1},'{1,1,0} '{1,0,0},'{1,1,1},'{0,0,0} '{1,0,0},'{0,0,1},'{0,1,0} '{1,1,1},'{0,1,1},'{1,0,0} '{0,0,1},'{1,0,1},'{0,0,1} '{0,0,1},'{0,0,0},'{1,0,0} '{1,0,1},'{0,1,0},'{0,0,0} '{1,0,1},'{1,1,1},'{1,0,1} '{1,0,1},'{0,0,1},'{1,1,1} '{0,0,1},'{0,0,0},'{0,1,0} '{0,1,0},'{1,0,0},'{0,1,0} '{0,1,1},'{0,0,0},'{0,0,1} '{0,1,1},'{1,1,1},'{0,1,1} '{1,0,0},'{1,1,1},'{0,0,1} '{1,0,1},'{0,1,0},'{0,1,0} '{0,0,1},'{1,1,1},'{0,0,1} '{1,0,0},'{0,0,1},'{1,1,0} '{0,0,0},'{0,1,0},'{1,0,1} '{1,1,1},'{1,0,1},'{1,0,0} '{1,1,1},'{1,0,1},'{0,0,1} '{0,1,0},'{1,1,1},'{1,1,0} '{1,0,1},'{1,0,1},'{1,0,0} '{1,1,1},'{0,1,0},'{0,1,1} '{1,1,1},'{0,0,1},'{0,0,0} '{1,0,1},'{1,0,1},'{0,0,0} '{0,1,0},'{0,1,1},'{1,1,1} '{0,1,0},'{0,0,0},'{0,1,0} '{0,0,1},'{0,0,0},'{0,0,1} '{1,0,1},'{1,0,0},'{1,0,1} '{1,0,1},'{0,0,1},'{1,0,1} '{0,0,1},'{0,0,0},'{1,1,0} '{0,1,0},'{0,1,1},'{0,1,0} '{1,1,0},'{1,1,0},'{0,0,0} '{1,1,1},'{1,1,0},'{0,0,1} '{0,1,0},'{1,1,0},'{1,0,0} '{0,1,0},'{1,0,0},'{0,0,0} '{0,1,0},'{1,0,0},'{1,0,0} '{0,1,1},'{1,0,0},'{1,1,1} '{1,1,0},'{1,1,1},'{1,0,1} '{0,1,0},'{0,1,0},'{1,0,1} '{1,1,1},'{1,0,0},'{1,0,0} '{0,0,1},'{0,1,0},'{1,1,0} '{1,0,1},'{0,1,0},'{1,1,0} '{1,0,1},'{0,1,0},'{1,0,0} '{1,1,0},'{0,1,1},'{0,0,0} '{0,1,0},'{1,0,0},'{1,1,0} '{0,0,0},'{0,1,1},'{1,0,0} '{0,0,0},'{0,0,0},'{0,0,1} '{0,1,1},'{0,1,0},'{1,1,0} '{1,1,0},'{1,0,1},'{0,0,1} '{1,1,1},'{1,1,1},'{0,0,1} '{0,1,0},'{0,0,0},'{0,0,0} '{1,1,0},'{1,1,1},'{0,0,0} '{1,0,1},'{1,1,0},'{0,0,0} '{1,0,0},'{0,0,1},'{0,0,1} '{0,1,1},'{0,0,1},'{0,1,1} '{0,1,1},'{1,0,0},'{0,1,1} '{1,0,1},'{0,0,0},'{0,0,0} '{0,0,1},'{1,1,1},'{0,0,0} '{0,0,1},'{1,1,0},'{1,0,0} '{0,1,0},'{0,0,1},'{1,0,1} '{1,0,0},'{1,0,0},'{0,1,1} '{0,0,1},'{0,1,1},'{0,1,1} '{1,1,1},'{0,0,0},'{1,1,0} '{1,0,0},'{1,0,0},'{0,0,1} '{0,0,1},'{1,0,0},'{0,1,0} '{1,1,0},'{0,0,0},'{1,1,0} '{1,0,0},'{0,0,1},'{0,1,1} '{0,1,0},'{0,0,0},'{0,0,1} '{0,0,0},'{0,1,1},'{1,0,1} '{1,0,0},'{1,0,0},'{0,0,0} '{1,0,1},'{0,0,0},'{0,1,0} '{0,0,1},'{1,0,0},'{0,0,1} '{1,0,1},'{1,1,1},'{1,0,0} '{0,1,1},'{0,0,1},'{1,0,0} '{0,1,1},'{1,1,1},'{1,1,1} '{0,0,0},'{1,0,1},'{0,0,1} '{1,0,0},'{1,1,0},'{0,0,1} '{0,1,1},'{0,1,0},'{0,0,1} '{1,1,0},'{1,0,1},'{0,1,1} '{0,1,0},'{0,0,1},'{1,1,0} '{1,0,0},'{1,1,0},'{0,0,0} '{1,1,0},'{1,1,0},'{0,1,1} '{1,0,1},'{0,1,1},'{1,1,0} '{0,0,0},'{1,1,1},'{0,0,1} '{1,0,1},'{0,0,1},'{0,1,1} '{0,1,1},'{0,1,0},'{1,1,1} '{1,0,1},'{1,0,0},'{1,0,0} '{0,1,0},'{0,0,1},'{0,0,0} '{0,1,0},'{0,0,0},'{0,1,1} '{0,0,0},'{1,1,0},'{1,0,1} '{1,1,0},'{0,0,0},'{0,0,0} '{1,0,0},'{1,0,1},'{0,0,0} '{0,1,0},'{1,1,1},'{0,0,1} '{0,0,1},'{1,1,0},'{0,1,0} '{0,0,0},'{1,0,1},'{1,0,1} '{1,1,0},'{0,0,1},'{0,0,0} '{0,1,0},'{0,0,1},'{0,1,1} '{1,1,0},'{1,1,0},'{1,1,0} '{0,1,0},'{1,0,1},'{1,0,1} '{0,1,0},'{1,1,0},'{0,0,0} '{1,1,1},'{0,0,0},'{0,1,0} '{0,0,1},'{0,0,1},'{1,0,0} '{1,0,1},'{0,1,0},'{0,0,1} '{1,0,1},'{0,1,0},'{0,1,1} '{1,1,1},'{1,1,1},'{0,1,0} '{0,1,1},'{1,0,0},'{0,0,1} '{0,1,1},'{0,1,1},'{0,1,0} '{0,1,0},'{0,0,1},'{0,0,1} '{0,1,0},'{0,0,0},'{1,1,0} '{1,0,0},'{1,0,0},'{1,0,1} '{1,1,0},'{0,1,0},'{0,0,0} '{1,1,0},'{0,0,1},'{1,1,1} '{1,0,0},'{0,0,0},'{1,1,1} '{0,0,0},'{0,0,1},'{1,1,0} '{1,1,1},'{1,1,1},'{0,0,0} '{1,0,0},'{0,0,1},'{0,0,0} '{1,1,0},'{0,1,0},'{1,0,0} '{1,0,0},'{1,0,0},'{1,1,0} '{1,1,0},'{0,1,1},'{1,1,0} '{1,0,0},'{1,1,1},'{1,0,0} '{0,1,0},'{0,1,0},'{0,1,1} '{1,1,1},'{1,0,1},'{0,1,0} '{1,1,0},'{1,0,0},'{0,0,1} '{0,0,1},'{0,0,1},'{1,1,1} '{0,1,0},'{0,1,1},'{1,0,0} '{1,1,1},'{1,1,0},'{1,1,0} '{0,0,0},'{1,0,0},'{0,1,1} '{0,1,1},'{1,0,1},'{0,1,1} '{0,0,1},'{0,1,0},'{0,0,1} '{0,0,0},'{1,1,0},'{0,0,0} '{0,1,0},'{1,1,1},'{1,0,0} '{0,1,0},'{0,1,0},'{0,0,1} '{0,0,0},'{1,1,1},'{1,1,1} '{1,0,0},'{0,1,1},'{0,1,0} '{1,0,1},'{1,1,0},'{0,1,1} '{0,1,1},'{1,1,1},'{1,0,0} '{0,1,1},'{1,1,0},'{0,1,0} '{0,1,0},'{1,1,0},'{0,1,1} '{1,0,1},'{0,1,1},'{1,0,0} '{0,0,1},'{1,1,1},'{1,1,1} '{1,1,1},'{1,1,1},'{1,0,0} '{1,0,1},'{1,0,0},'{0,1,1} '{1,1,0},'{1,0,1},'{0,1,0} '{1,1,1},'{0,0,0},'{1,1,1} '{1,0,0},'{1,1,1},'{1,1,0} '{1,0,0},'{0,0,0},'{1,1,0} '{0,1,1},'{1,1,0},'{1,1,0} '{1,0,0},'{1,1,1},'{1,1,1} '{0,1,0},'{1,0,0},'{0,0,1} '{1,1,1},'{0,0,0},'{0,1,1} '{0,0,0},'{0,1,1},'{0,0,1} '{0,0,0},'{1,1,0},'{1,1,1} '{1,1,1},'{1,1,1},'{0,1,1} '{0,0,0},'{0,0,0},'{1,1,1} '{0,1,1},'{1,0,0},'{0,1,0} '{0,1,0},'{1,0,0},'{0,1,1} '{0,1,1},'{1,1,0},'{1,0,0} '{1,0,0},'{0,1,1},'{0,1,1} '{0,1,0},'{0,0,1},'{1,1,1} '{1,1,1},'{1,1,0},'{0,1,1} '{0,1,1},'{0,0,1},'{0,0,1} '{0,0,1},'{0,1,1},'{0,0,1} '{1,0,0},'{0,1,0},'{1,0,1} '{0,1,1},'{0,1,0},'{0,0,0} '{0,1,1},'{1,0,0},'{1,0,1} '{0,0,0},'{1,0,0},'{0,0,1} '{1,0,1},'{1,0,0},'{0,1,0} '{1,1,1},'{1,1,0},'{1,1,1} '{1,1,0},'{0,1,1},'{0,0,1} '{1,1,1},'{1,0,0},'{0,1,1} '{1,1,1},'{1,0,0},'{1,0,1} '{0,0,1},'{0,0,1},'{0,0,1} '{0,1,0},'{1,1,1},'{0,1,0} weights for fc layer columns: wa [128][32]; h, q; @(*) {0,1,0,0,0,1,1,1,1,1,1,0,0,0,1,0,1,1,0,1,0,1,1,0,1,0,0,1,1,0,1,1}, '{0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,1,1,0,1,1}, '{0,0,1,1,1,0,0,1,0,1,1,0,0,1,0,1,0,1,0,1,1,0,1,1,0,1,0,1,1,0,1,1}, '{1,0,1,1,0,1,1,1,0,0,0,0,1,0,1,1,1,0,0,1,0,1,0,1,1,1,0,1,1,0,0,0}, '{0,0,0,1,0,1,1,1,0,0,0,1,0,0,0,0,1,1,0,0,1,0,1,1,1,0,1,0,1,0,0,1}, '{0,1,0,1,0,0,1,1,1,0,1,1,0,0,1,1,0,1,0,1,0,1,1,0,0,1,1,1,0,1,1,1}, '{1,1,1,0,1,0,0,1,1,0,1,1,1,1,1,1,1,0,1,0,1,0,1,0,1,0,0,0,0,0,0,0}, '{1,1,0,0,0,0,1,0,0,1,1,0,1,1,1,0,0,0,1,0,0,0,1,1,1,0,1,1,1,0,1,1}, '{0,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,1,0,1,0,0,1,0}, '{1,0,0,1,1,1,0,0,0,0,1,1,0,1,1,0,0,1,0,1,0,1,0,1,0,1,0,1,1,1,1,1}, '{0,1,1,0,0,1,0,1,1,0,0,0,0,1,0,1,0,1,0,1,1,1,1,0,0,1,1,1,0,1,0,0}, '{0,0,0,1,1,1,0,1,0,0,1,0,0,1,1,0,1,0,1,0,0,1,1,1,1,0,0,0,1,0,0,0}, '{0,1,0,0,0,0,0,1,1,1,1,0,1,1,0,1,0,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0}, '{1,0,1,0,0,0,0,0,1,1,1,0,0,0,0,0,1,1,1,1,1,0,1,1,0,1,0,0,1,1,1,0}, '{1,0,1,0,1,1,1,0,0,0,1,0,1,1,1,1,0,1,1,0,0,1,1,1,0,0,0,0,1,1,0,1}, '{1,0,1,1,1,1,0,1,1,0,0,1,0,1,0,0,1,0,0,0,0,0,1,1,0,1,0,1,1,1,0,0}, '{0,1,0,0,0,1,0,0,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,0,0,0,0}, '{0,0,0,0,1,1,0,0,1,0,1,1,1,0,1,0,1,0,0,1,1,1,1,0,1,1,1,0,1,1,1,1}, '{0,1,0,0,0,1,1,0,1,0,1,1,1,0,0,1,0,1,1,1,1,1,0,0,0,1,1,1,0,1,1,0}, '{0,0,0,1,1,1,1,1,0,1,1,1,1,1,0,1,0,1,0,1,1,1,1,1,1,1,0,0,1,1,0,0}, '{0,1,1,1,1,1,0,0,0,0,0,1,0,0,1,0,1,0,1,1,1,1,0,0,0,0,0,0,1,1,1,1}, '{0,1,0,0,0,0,1,1,0,1,0,1,0,1,0,0,1,0,0,1,1,0,0,1,1,0,0,0,0,1,0,1}, '{1,0,1,0,0,1,0,1,0,0,0,1,1,0,1,0,1,1,1,1,0,0,0,0,1,0,1,1,0,0,1,1}, '{1,1,0,0,1,1,1,0,1,0,0,1,0,1,1,0,1,1,0,0,0,1,1,0,1,1,0,0,1,1,1,1}, '{1,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,1,0,0,1,1,1,0,0,1,1,1,1,0,0,1,1}, '{0,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,0,0,1,0,1,0,0,0,1,0,1,0,1,0,1}, '{1,1,0,1,0,0,1,0,1,0,1,0,1,1,1,0,1,1,0,1,1,1,0,1,1,1,0,0,1,0,1,1}, '{1,1,1,0,1,0,1,1,0,1,1,1,0,0,0,1,0,0,0,0,0,1,0,1,0,1,0,0,1,1,1,1}, '{1,1,1,0,1,1,0,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,1,0,1,0,0}, '{1,0,0,1,0,0,0,1,0,0,0,1,0,1,1,0,1,1,1,1,0,0,1,0,0,1,1,1,1,0,0,1}, '{0,1,1,1,1,0,0,0,1,0,0,1,1,1,0,0,1,0,0,1,0,0,0,1,0,1,0,0,1,1,0,1}, '{0,0,1,0,0,1,1,0,1,1,0,0,1,1,1,1,0,1,0,0,0,0,0,1,0,1,0,1,1,1,0,1}, '{1,0,0,0,0,0,1,1,1,1,0,0,1,0,0,0,0,1,1,1,0,1,1,0,1,0,1,1,1,1,0,1}, '{0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,1,1,1,0,1,1,0,0,1,0,0,0,1,1}, '{1,0,1,1,0,0,1,0,0,1,1,1,1,0,0,1,1,1,0,1,1,1,1,0,1,1,1,0,1,0,0,1}, '{1,0,0,0,0,1,0,0,0,1,0,1,1,1,1,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0,0,1}, '{0,0,0,1,1,0,1,0,1,1,0,0,0,1,0,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,0,0}, '{0,1,1,0,1,0,0,0,0,1,1,0,1,0,0,1,1,1,1,0,0,1,1,1,0,0,1,0,0,1,0,1}, '{1,0,0,1,1,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,1,1,1,1,0,1,0,1,1,1,1}, '{0,0,0,1,1,0,1,0,1,0,0,1,0,0,1,0,0,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0}, '{1,0,1,1,0,1,0,0,0,0,0,1,0,1,1,1,0,0,1,0,1,1,1,1,1,1,1,0,1,1,0,1}, '{1,0,0,1,0,1,0,0,0,1,1,1,0,1,1,1,1,0,0,0,0,1,1,0,1,1,1,0,1,1,0,1}, '{1,1,0,0,0,0,1,0,0,0,0,0,0,1,1,1,0,1,0,1,1,1,0,0,1,1,1,0,0,0,1,1}, '{0,0,0,0,0,0,0,1,0,1,0,1,1,0,0,0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,1,0}, '{0,0,0,1,0,0,0,1,1,0,1,1,1,0,0,0,1,0,0,0,1,1,0,1,0,1,1,0,1,1,0,0}, '{0,1,1,0,0,0,1,1,0,1,1,0,1,1,0,1,0,1,0,0,1,1,1,0,1,1,0,1,1,0,1,1}, '{1,0,0,1,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,1,1,0,1,1,0}, '{0,0,0,1,0,0,0,0,0,0,1,1,1,0,0,0,1,1,0,1,1,1,0,1,1,1,1,0,1,0,0,1}, '{0,0,0,0,0,0,0,1,0,0,1,1,1,1,0,1,0,0,0,0,1,1,0,0,1,0,1,0,0,0,1,0}, '{0,1,1,0,0,1,1,1,0,1,0,0,0,1,0,1,1,0,1,0,1,1,1,1,1,1,1,0,1,0,1,0}, '{0,1,1,0,1,1,1,0,1,1,0,0,1,0,0,0,1,0,0,0,1,0,1,1,1,1,1,0,1,0,1,1}, '{1,1,1,1,0,1,1,0,0,1,0,0,1,1,0,0,0,0,1,1,1,1,1,0,1,0,1,0,1,0,0,0}, '{0,0,0,1,1,1,1,1,1,0,0,1,0,1,0,1,0,0,1,0,0,0,1,0,0,1,1,1,1,1,1,0}, '{1,1,0,1,0,1,1,1,0,0,0,0,0,1,0,1,1,0,0,1,0,1,0,1,1,0,1,1,0,0,1,1}, '{0,0,0,1,0,1,1,1,1,0,1,1,0,1,1,0,1,1,1,1,0,1,1,0,1,0,1,0,0,0,1,1}, '{0,1,0,0,1,0,0,1,1,0,1,0,1,1,1,0,1,1,0,1,1,0,0,1,1,0,0,1,0,0,1,1}, '{1,0,1,0,1,0,1,0,0,0,1,0,0,0,1,0,0,1,0,1,1,0,1,0,0,0,0,1,1,1,1,1}, '{1,1,0,0,0,0,0,0,1,1,0,1,1,1,0,0,0,1,1,1,1,0,1,0,1,0,1,1,0,0,0,0}, '{1,0,1,0,1,0,1,1,1,0,1,1,1,0,0,0,1,0,0,1,1,0,0,0,0,0,1,0,1,1,0,1}, '{0,1,1,1,0,0,1,0,1,1,0,0,0,1,1,1,1,1,0,0,1,0,1,1,0,1,0,1,1,1,1,0}, '{1,1,1,1,1,1,1,1,0,1,1,0,1,0,0,1,0,1,1,1,1,1,0,0,1,0,1,0,0,1,1,0}, '{1,1,1,0,1,1,1,0,1,1,1,1,1,0,0,0,1,0,1,0,1,1,1,1,0,0,0,1,0,0,1,1}, '{0,1,1,1,1,0,1,0,1,1,1,0,1,0,1,1,1,1,0,0,1,1,1,0,0,0,1,0,1,0,1,1}, '{0,0,0,1,1,1,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,1,1,0,0,1,1,1}, '{0,1,0,0,0,0,1,1,0,1,0,0,1,1,0,1,0,0,1,1,0,1,0,1,1,1,1,0,1,1,1,1}, '{0,1,1,1,0,0,0,1,1,0,1,0,1,0,1,1,1,1,1,1,0,0,1,0,0,0,1,1,0,0,1,0}, '{1,1,1,0,0,0,1,0,1,0,1,0,1,1,1,0,0,1,1,1,0,1,1,1,1,1,1,1,1,0,1,1}, '{1,0,0,1,1,1,0,1,0,0,0,1,1,1,1,1,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0}, '{1,0,1,0,0,1,1,0,0,1,0,1,0,1,0,0,1,1,0,1,0,0,0,1,0,1,0,1,1,0,0,1}, '{0,1,0,1,0,1,1,0,1,0,0,1,1,0,0,0,1,0,0,1,1,0,1,1,0,1,0,0,1,1,1,0}, '{0,1,1,1,0,1,1,0,0,1,0,1,0,1,1,0,0,0,0,1,0,1,1,1,1,0,0,1,1,0,1,1}, '{1,1,1,1,1,1,0,1,0,1,1,0,0,1,0,1,1,0,1,0,0,0,1,0,0,0,0,1,0,1,1,0}, '{0,0,1,1,0,0,0,0,1,0,1,0,1,0,1,1,0,1,1,0,1,0,0,1,0,1,0,1,0,1,0,1}, '{1,0,1,1,1,0,0,1,0,1,1,0,0,0,1,1,0,1,1,1,1,0,1,0,1,0,0,1,0,1,1,0}, '{1,0,1,0,1,0,0,1,0,0,0,1,1,1,1,1,0,0,1,1,1,1,0,1,0,1,0,1,1,0,1,1}, '{0,0,1,0,1,0,0,1,1,0,0,1,0,0,0,1,0,1,0,0,1,1,1,0,0,1,1,1,1,0,0,0}, '{1,0,1,1,1,0,0,1,0,1,1,1,1,0,1,0,0,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1}, '{0,0,1,0,0,1,1,0,0,0,0,1,0,1,1,1,0,1,0,0,1,0,1,1,1,1,1,0,1,0,0,1}, '{0,1,1,0,0,1,0,0,1,0,0,1,0,0,1,1,1,0,0,0,1,0,1,1,0,0,0,0,1,1,1,0}, '{0,0,0,0,1,1,1,0,1,1,0,1,1,1,1,1,1,1,1,0,1,0,1,1,0,1,1,1,0,1,1,0}, '{1,1,0,1,1,0,1,1,0,1,1,1,0,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,1,0,1,1}, '{1,1,0,0,0,1,1,0,0,1,1,0,1,0,0,0,0,0,0,1,1,1,0,0,0,1,0,1,0,1,0,0}, '{1,1,0,0,1,1,0,0,0,1,1,0,0,1,0,0,1,1,0,0,1,0,1,1,1,0,1,1,0,1,1,0}, '{0,0,1,1,0,0,0,1,1,0,0,1,0,1,0,0,1,1,1,1,1,0,0,0,0,0,0,0,1,1,0,1}, '{0,1,1,1,1,1,0,1,1,1,0,0,0,0,0,1,1,1,0,0,0,1,0,1,1,0,1,0,0,0,1,0}, '{0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0}, '{1,1,1,0,1,0,0,1,0,1,1,0,1,1,1,0,0,0,0,1,0,0,1,0,0,1,1,1,1,1,1,1}, '{1,1,1,1,1,1,0,0,1,0,0,1,1,0,0,0,1,0,1,1,0,0,0,1,1,1,0,1,1,0,0,1}, '{1,0,0,1,1,0,0,0,0,1,1,0,0,0,1,1,0,0,0,0,1,0,1,1,1,0,1,1,0,0,0,1}, '{0,1,0,1,0,1,0,0,1,0,0,1,0,1,1,1,1,1,1,1,1,1,1,0,0,1,0,0,1,0,0,1}, '{0,1,1,0,1,0,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,0,1,1,1,0,0,1,0}, '{0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,1,1,0,0,1,0,0,0,0,0,1,0,1,0,0,0,1}, '{1,1,1,0,1,1,1,1,0,1,1,1,1,1,1,0,1,1,0,0,1,0,1,0,1,0,0,1,0,0,1,0}, '{0,1,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,1,1,1,1,0,1,0,1,0,0,0,1}, '{1,1,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,1,1,0,0,1,1,0,1,0}, '{1,1,1,1,0,0,1,1,0,1,1,0,0,1,0,1,0,1,0,1,0,1,1,0,1,1,1,1,0,0,0,1}, '{1,1,1,1,0,1,0,0,1,1,0,1,1,1,1,1,0,0,1,1,0,0,0,1,0,1,1,1,0,1,1,1}, '{0,1,0,1,1,1,1,0,0,1,0,0,0,0,0,0,1,1,0,1,0,0,1,1,1,0,0,1,0,0,0,0}, '{1,1,0,0,1,1,1,1,1,0,1,0,1,1,1,1,1,0,0,1,0,0,1,1,1,0,1,1,1,0,0,0}, '{1,0,0,0,1,1,1,0,1,0,0,1,1,1,1,1,1,0,1,1,0,1,1,1,0,1,0,0,0,0,1,1}, '{1,1,1,1,0,1,0,0,1,1,0,1,0,1,0,1,1,0,0,1,1,0,1,1,1,0,1,1,0,0,0,1}, '{0,0,1,1,1,1,0,0,1,0,1,1,0,1,1,1,0,1,1,1,1,1,0,0,1,1,1,0,1,0,1,0}, '{0,0,0,0,1,1,1,0,1,0,1,0,0,0,1,0,1,1,0,1,1,0,0,1,1,0,1,0,1,0,0,0}, '{0,0,1,1,1,0,1,1,1,1,0,1,1,0,1,1,0,1,0,0,1,1,1,1,1,0,0,1,1,0,0,0}, '{1,0,1,1,0,1,1,1,1,0,1,1,0,0,0,0,0,1,0,0,0,1,0,0,0,1,1,1,1,0,1,0}, '{0,0,1,0,1,1,0,1,1,1,1,1,1,0,0,1,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,1}, '{0,1,0,1,1,1,1,1,1,1,1,0,1,0,0,0,0,0,0,1,0,1,1,1,1,0,1,1,1,0,1,1}, '{1,1,0,1,0,0,0,1,0,1,0,1,0,0,1,1,1,1,0,1,0,1,0,0,0,0,0,1,0,1,0,1}, '{0,1,0,0,1,0,1,1,0,1,0,1,1,0,0,0,1,1,0,0,0,1,1,1,0,1,0,1,1,1,1,0}, '{1,0,1,1,0,0,1,1,0,1,1,1,0,1,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0}, '{0,0,0,0,0,1,1,1,1,0,1,0,0,0,1,1,1,0,1,0,1,1,1,0,1,1,1,0,0,0,0,0}, '{0,1,1,1,1,1,1,1,0,0,1,1,1,0,0,1,0,1,0,1,1,0,0,0,1,1,0,0,1,1,1,1}, '{1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,0,1,0,0,1,1,0,1,0,0,0,0,1,0,1}, '{0,0,1,0,0,1,1,1,1,1,0,1,1,0,0,1,1,0,0,1,0,0,1,1,0,1,1,0,1,0,0,1}, '{1,1,1,1,1,0,1,0,1,0,1,0,1,1,1,1,0,1,1,1,1,1,1,0,0,1,1,1,0,1,1,0}, '{1,1,1,0,0,1,0,1,0,0,1,1,1,0,0,0,0,0,0,1,0,0,0,1,1,1,1,0,1,0,0,0}, '{0,1,1,1,0,0,0,0,0,1,1,1,0,0,1,1,0,0,0,1,1,1,0,1,1,0,0,0,0,0,1,1}, '{1,1,0,1,1,0,1,1,0,1,0,1,1,0,0,1,0,0,1,1,1,0,0,1,0,1,0,1,0,1,0,1}, '{1,1,0,0,1,1,0,1,1,0,1,1,0,0,1,0,0,0,1,0,0,1,1,0,1,1,1,0,1,0,1,1}, '{0,0,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,0,0,0,1}, '{1,0,0,1,1,0,0,0,1,1,1,1,0,0,1,1,0,1,1,1,0,1,0,1,0,0,1,1,0,0,1,1}, '{1,0,0,0,0,0,0,1,1,1,1,1,0,0,1,1,0,1,0,1,0,1,0,1,0,1,0,0,0,1,1,1}, '{1,0,1,0,0,1,1,0,0,0,1,0,1,1,0,1,0,1,0,1,1,0,0,0,1,1,1,1,0,1,0,1}, '{1,0,0,1,1,0,0,0,1,0,1,0,1,1,1,0,1,1,0,1,1,0,0,0,0,1,0,0,1,1,0,0}, '{0,0,1,0,0,1,1,1,0,1,1,1,1,0,0,1,1,0,0,1,0,1,1,0,1,0,1,1,0,0,1,1}, '{1,0,0,0,1,1,0,0,1,0,0,1,1,1,1,1,1,1,0,0,1,0,1,1,1,1,0,0,1,0,0,1}, '{1,1,1,0,1,1,1,1,1,0,1,1,0,0,1,0,1,1,0,0,1,0,1,1,1,1,0,0,0,1,1,0}, '{1,0,1,1,1,0,1,0,1,1,1,0,0,1,1,1,0,1,1,1,1,1,1,1,0,1,0,1,0,1,1,1} final mapping rows wa1 [32][10]; r, k; {1,0,1,1,0,1,1,0,1,1}, '{1,0,0,0,1,1,1,0,0,0}, '{1,0,1,1,1,1,0,0,1,0}, '{0,0,0,1,1,0,1,0,0,1}, '{1,0,0,0,0,1,1,1,0,1}, '{0,1,0,1,0,1,0,1,1,0}, '{1,1,0,0,1,0,1,1,0,1}, '{0,1,0,0,1,0,0,1,1,1}, '{1,1,1,0,1,0,1,0,1,1}, '{0,0,0,0,1,0,1,0,1,1}, '{1,0,1,1,0,1,1,1,1,0}, '{0,1,0,0,1,1,1,1,1,1}, '{1,1,1,1,0,0,0,0,1,0}, '{0,1,0,1,0,0,1,1,1,0}, '{1,0,0,1,0,1,0,0,1,1}, '{0,0,1,1,1,1,0,1,0,1}, '{0,1,0,0,0,1,0,1,1,1}, '{1,0,1,1,1,0,1,0,1,1}, '{1,0,0,0,0,1,1,1,0,0}, '{0,1,1,0,1,1,0,1,1,1}, '{1,1,0,1,0,0,0,0,0,1}, '{0,1,0,1,0,0,1,1,1,1}, '{1,1,0,0,0,1,1,0,1,1}, '{0,1,1,1,0,0,1,0,0,0}, '{0,1,0,0,1,0,1,1,0,1}, '{1,1,0,1,1,1,0,1,1,0}, '{0,1,1,0,1,0,1,1,0,0}, '{1,0,1,0,1,1,1,0,1,1}, '{0,1,1,1,0,1,0,1,0,1}, '{1,0,0,1,1,0,0,0,0,0}, '{1,0,1,0,0,1,0,1,1,1}, '{1,0,1,0,1,1,0,0,1,0} convolutional [1:0] [8][8]; image out_map [16][8][8]; output genvar m; generate (m="0;" m<16; m++ ) begin: conv1 conv_1 (.fmap(input_image), .filter(filter[m]), .partial_sums(out_map[m]), .clk_50(clock_50)); , .start(start_conv1), .finish(finish_conv1[m]) endgenerate pooling produce 4x4 from 8x8 pool_conv1 [16][4][4]; pool1 (.pool_conv1(pool_conv1), .out_map(out_map), .start(finish_conv1[1]), .finish(finish_pool1) 4d array containing all 3x3 used convolution which contains partial sums generated by convolving fmaps with each [4:0] partials_conv2 [16][32][4][4]; conv2 finish_ps; finish_pool2; n, o; convole of unique set sets (n="0;" n<16; n++) (o="0;" o<32; o++) conv2_inner conv_2 (.fmap(pool_conv1[n]), .filter(filters_conv2[n][o]), .partial_sums(partials_conv2[n][o]), .clk_50(clock_50) ); calculate partial_sums conv_layer2 (.outmap_conv2d(outmap_conv2d), .partials_conv2(partials_conv2), .clk_50 (clock_50), .start(pio_start), .finish(finish_ps)); key[3] 3d at outmap_conv2d [32][4][4]; 2x2 pool_conv2 [32][2][2]; pool2 (.pool_conv2(pool_conv2), .outmap_conv2d(outmap_conv2d), .clk_50(clock_50), .start(finish_ps), .finish(finish_pool2)); finish_ps fully connected finish_fc1; start_fc2; 128x32 binary 1x32 fc_out [32]; [8:0] temp feed last fc1 full_1 (.fmap(pool_conv2), .wa(wa), .start(finish_pool2), .finish(finish_fc1), .fc_out(fc_out)); map 32x10 finish_fc2; 1x10 bit final_out [10]; ten_map (.fmap(fc_out), .wa1(wa1), .final_out(final_out), .start(finish_fc1), .finish(finish_fc2)); assign pio_end="finish_fc2;" structural coding computer_system the_system ( fpga side pio ports .pio_fpga_data_external_connection_export (pio_fpga_data), .pio_hps_image_data_external_connection_export (pio_hps_image_data), .pio_hps_image_clk_external_connection_export (pio_hps_image_clk), .pio_hps_image_cs_external_connection_export (pio_hps_image_cs), .pio_out_data_external_connection_export (pio_out_data), .pio_out_cs_external_connection_export (pio_out_cs), .pio_start_external_connection_export (pio_start), .pio_end_external_connection_export (pio_end), .pio_switch_external_connection_export (pio_switch), .pio_x_external_connection_export(pio_x), .pio_y_external_connection_export(pio_y), global signals .system_pll_ref_clk_clk .system_pll_ref_reset_reset (1'b0), sram shared block .onchip_sram_0_s1_address (sram_address), .onchip_sram_0_s1_clken (sram_clken), .onchip_sram_0_s1_chipselect (sram_chipselect), .onchip_sram_0_s1_write (sram_write), .onchip_sram_0_s1_readdata (sram_readdata), .onchip_sram_0_s1_writedata (sram_writedata), av config .av_config_sclk (fpga_i2c_sclk), .av_config_sdat (fpga_i2c_sdat), audio subsystem .audio_pll_ref_clk_clk (clock3_50), .audio_pll_ref_reset_reset .audio_clk_clk (aud_xck), .audio_adcdat (aud_adcdat), .audio_adclrck (aud_adclrck), .audio_bclk (aud_bclk), .audio_dacdat (aud_dacdat), .audio_daclrck (aud_daclrck), slider switches .slider_switches_export (sw), pushbuttons (~key[3:0]), .pushbuttons_export expansion jp1 .expansion_jp1_export ({gpio_0[35:19], gpio_0[17], gpio_0[15:3], gpio_0[1]}), jp2 .expansion_jp2_export ({gpio_1[35:19], gpio_1[17], gpio_1[15:3], gpio_1[1]}), leds .leds_export (ledr), seven segs .hex3_hex0_export (hex3_hex0), .hex5_hex4_export (hex5_hex4), ps2 .ps2_port_clk (ps2_clk), .ps2_port_dat (ps2_dat), .ps2_port_dual_clk (ps2_clk2), .ps2_port_dual_dat (ps2_dat2), irda .irda_rxd (irda_rxd), .irda_txd (irda_txd), .vga_pll_ref_clk_clk (clock2_50), .vga_pll_ref_reset_reset .vga_clk (vga_clk), .vga_blank (vga_blank_n), .vga_sync (vga_sync_n), .vga_hs (vga_hs), .vga_vs (vga_vs), .vga_r (vga_r), .vga_g (vga_g), .vga_b (vga_b), video .video_in_td_clk27 (td_clk27), .video_in_td_data (td_data), .video_in_td_hs (td_hs), .video_in_td_vs (td_vs), .video_in_clk27_reset (), .video_in_td_reset .video_in_overflow_flag .ebab_video_in_external_interface_address (bus_addr), .ebab_video_in_external_interface_byte_enable (bus_byte_enable), .byte_enable .ebab_video_in_external_interface_read (bus_read), .read .ebab_video_in_external_interface_write (bus_write), .write .ebab_video_in_external_interface_write_data (bus_write_data), .write_data .ebab_video_in_external_interface_acknowledge (bus_ack), .acknowledge .ebab_video_in_external_interface_read_data (bus_read_data), clock bridge ebab_video_in_external_interface_acknowledge .clock_bridge_0_in_clk_clk sdram .sdram_clk_clk (dram_clk), .sdram_addr (dram_addr), .sdram_ba (dram_ba), .sdram_cas_n (dram_cas_n), .sdram_cke (dram_cke), .sdram_cs_n (dram_cs_n), .sdram_dq (dram_dq), .sdram_dqm ({dram_udqm,dram_ldqm}), .sdram_ras_n (dram_ras_n), .sdram_we_n (dram_we_n), ddr3 .memory_mem_a (hps_ddr3_addr), .memory_mem_ba (hps_ddr3_ba), .memory_mem_ck (hps_ddr3_ck_p), .memory_mem_ck_n (hps_ddr3_ck_n), .memory_mem_cke (hps_ddr3_cke), .memory_mem_cs_n (hps_ddr3_cs_n), .memory_mem_ras_n (hps_ddr3_ras_n), .memory_mem_cas_n (hps_ddr3_cas_n), .memory_mem_we_n (hps_ddr3_we_n), .memory_mem_reset_n (hps_ddr3_reset_n), .memory_mem_dq (hps_ddr3_dq), .memory_mem_dqs (hps_ddr3_dqs_p), .memory_mem_dqs_n (hps_ddr3_dqs_n), .memory_mem_odt (hps_ddr3_odt), .memory_mem_dm (hps_ddr3_dm), .memory_oct_rzqin (hps_ddr3_rzq), ethernet .hps_io_hps_io_gpio_inst_gpio35 (hps_enet_int_n), .hps_io_hps_io_emac1_inst_tx_clk (hps_enet_gtx_clk), .hps_io_hps_io_emac1_inst_txd0 (hps_enet_tx_data[0]), .hps_io_hps_io_emac1_inst_txd1 (hps_enet_tx_data[1]), .hps_io_hps_io_emac1_inst_txd2 (hps_enet_tx_data[2]), .hps_io_hps_io_emac1_inst_txd3 (hps_enet_tx_data[3]), .hps_io_hps_io_emac1_inst_rxd0 (hps_enet_rx_data[0]), .hps_io_hps_io_emac1_inst_mdio (hps_enet_mdio), .hps_io_hps_io_emac1_inst_mdc (hps_enet_mdc), .hps_io_hps_io_emac1_inst_rx_ctl (hps_enet_rx_dv), .hps_io_hps_io_emac1_inst_tx_ctl (hps_enet_tx_en), .hps_io_hps_io_emac1_inst_rx_clk (hps_enet_rx_clk), .hps_io_hps_io_emac1_inst_rxd1 (hps_enet_rx_data[1]), .hps_io_hps_io_emac1_inst_rxd2 (hps_enet_rx_data[2]), .hps_io_hps_io_emac1_inst_rxd3 (hps_enet_rx_data[3]), flash .hps_io_hps_io_qspi_inst_io0 (hps_flash_data[0]), .hps_io_hps_io_qspi_inst_io1 (hps_flash_data[1]), .hps_io_hps_io_qspi_inst_io2 (hps_flash_data[2]), .hps_io_hps_io_qspi_inst_io3 (hps_flash_data[3]), .hps_io_hps_io_qspi_inst_ss0 (hps_flash_ncso), .hps_io_hps_io_qspi_inst_clk (hps_flash_dclk), accelerometer .hps_io_hps_io_gpio_inst_gpio61 (hps_gsensor_int), .adc_sclk (adc_sclk), .adc_cs_n (adc_cs_n), .adc_dout (adc_dout), .adc_din (adc_din), general purpose i o .hps_io_hps_io_gpio_inst_gpio40 (hps_gpio[0]), .hps_io_hps_io_gpio_inst_gpio41 (hps_gpio[1]), i2c .hps_io_hps_io_gpio_inst_gpio48 (hps_i2c_control), .hps_io_hps_io_i2c0_inst_sda (hps_i2c1_sdat), .hps_io_hps_io_i2c0_inst_scl (hps_i2c1_sclk), .hps_io_hps_io_i2c1_inst_sda (hps_i2c2_sdat), .hps_io_hps_io_i2c1_inst_scl (hps_i2c2_sclk), pushbutton .hps_io_hps_io_gpio_inst_gpio54 (hps_key), led .hps_io_hps_io_gpio_inst_gpio53 (hps_led), sd card .hps_io_hps_io_sdio_inst_cmd (hps_sd_cmd), .hps_io_hps_io_sdio_inst_d0 (hps_sd_data[0]), .hps_io_hps_io_sdio_inst_d1 (hps_sd_data[1]), .hps_io_hps_io_sdio_inst_clk (hps_sd_clk), .hps_io_hps_io_sdio_inst_d2 (hps_sd_data[2]), .hps_io_hps_io_sdio_inst_d3 (hps_sd_data[3]), spi .hps_io_hps_io_spim1_inst_clk (hps_spim_clk), .hps_io_hps_io_spim1_inst_mosi (hps_spim_mosi), .hps_io_hps_io_spim1_inst_miso (hps_spim_miso), .hps_io_hps_io_spim1_inst_ss0 (hps_spim_ss), uart .hps_io_hps_io_uart0_inst_rx (hps_uart_rx), .hps_io_hps_io_uart0_inst_tx (hps_uart_tx), usb .hps_io_hps_io_gpio_inst_gpio09 (hps_conv_usb_n), .hps_io_hps_io_usb1_inst_d0 (hps_usb_data[0]), .hps_io_hps_io_usb1_inst_d1 (hps_usb_data[1]), .hps_io_hps_io_usb1_inst_d2 (hps_usb_data[2]), .hps_io_hps_io_usb1_inst_d3 (hps_usb_data[3]), .hps_io_hps_io_usb1_inst_d4 (hps_usb_data[4]), .hps_io_hps_io_usb1_inst_d5 (hps_usb_data[5]), .hps_io_hps_io_usb1_inst_d6 (hps_usb_data[6]), .hps_io_hps_io_usb1_inst_d7 (hps_usb_data[7]), .hps_io_hps_io_usb1_inst_clk (hps_usb_clkout), .hps_io_hps_io_usb1_inst_stp (hps_usb_stp), .hps_io_hps_io_usb1_inst_dir (hps_usb_dir), .hps_io_hps_io_usb1_inst_nxt (hps_usb_nxt) endmodule helper modules module (fmap, filter, partial_sums, clk_50); start, clk_50; fmap[8][8]; [3][3]; fmap_padded[10][10]; padded be 10x10 temp_sum[8][8]; partial_sums[8][8]; pad maintain size after (int i<10; i++) j="0;" j<10; j++) ((i="=0)" || (i="=9)" (j="=0)" fmap_padded[i][j] -1 fmap_padded[i][j]<="fmap[i-1][j-1];" k="1;" k<9; k++) l="1;" l<9; l++) get surrounded fmap_padded[k][l], multiply add temp_sum[k-1][l-1]="((filter[0][0]" ? fmap_padded[k-1][l-1] -fmap_padded[k-1][l-1]) top-left (filter[0][1] fmap_padded[k-1][l] -fmap_padded[k-1][l])) top-middle ((filter[0][2] fmap_padded[k-1][l+1] -fmap_padded[k-1][l+1]) top-right (filter[1][0] fmap_padded[k][l-1] -fmap_padded[k][l-1])) middle-left ((filter[1][1] fmap_padded[k][l] -fmap_padded[k][l]) middle-middle (filter[1][2] fmap_padded[k][l+1] -fmap_padded[k][l+1])) middle-right ((filter[2][0] fmap_padded[k+1][l-1] -fmap_padded[k+1][l-1]) bottom-left (filter[2][1] fmap_padded[k+1][l] -fmap_padded[k+1][l])) bottom-middle (filter[2][2] fmap_padded[k+1][l+1] -fmap_padded[k+1][l+1]); bottom-right store sum matrix (temp_sum[k-1][l-1]="=" 0) partial_sums[k-1][l-1]="2'b11;">>>4) ?  2'b11 : 2'b01; //load in 1 or -1
            end
        end
    end
endmodule

module pool1 (pool_conv1, out_map, clk_50); //, start, finish
    input clk_50;
    input signed [1:0] out_map [16][8][8];
    output logic signed [1:0] pool_conv1 [16][4][4];
    integer h;

    //max pooling - check if any ones in 4x4 square-- if yes, max = 1, if no max = -1 since outmap_conv2 binarized to 1/-1
    always @(*) begin //posedge clk_50
        //if (start) begin
            for (h=0; h<16; 0 1 2 5 16 32 h++) begin pool_conv1 [h][0][0] <="((out_map[h][0][0]&out_map[h][0][1]&out_map[h][1][0]&out_map[h][1][1])==2'b01)" ? 2'b01 : 2'b11; [h][0][1] [h][1][0] [h][1][1] [h][0][2] [h][0][3] [h][1][2] [h][1][3] [h][2][0] [h][2][1] [h][3][0] [h][3][1] [h][2][2] [h][2][3] [h][3][2] [h][3][3] end finish endmodule module conv_2 (fmap, filter, partial_sums, clk_50); , start, input clk_50; signed [1:0] fmap [4][4]; image - bit 4x4 filter [3][3]; 3x3 logic fmap_padded [6][6]; convert to 6x6 [4:0] temp_sum output partial_sums pad size by maintain after convolving always @(*) for (int i="0;" i<6; i++) row j="0;" j<6; j++) column if ((i="=0)" || (i="=5)" (j="=0)" fmap_padded[i][j] else fmap_padded[i][j]<="fmap[i-1][j-1];" k="1;" k<5; k++) l="1;" l<5; l++) partial_sums[k-1][l-1]="((filter[0][0]" fmap_padded[k-1][l-1] -fmap_padded[k-1][l-1]) top-left + (filter[0][1] fmap_padded[k-1][l] -fmap_padded[k-1][l])) top-middle ((filter[0][2] fmap_padded[k-1][l+1] -fmap_padded[k-1][l+1]) top-right (filter[1][0] fmap_padded[k][l-1] -fmap_padded[k][l-1])) middle-left ((filter[1][1] fmap_padded[k][l] -fmap_padded[k][l]) middle-middle (filter[1][2] fmap_padded[k][l+1] -fmap_padded[k][l+1])) middle-right ((filter[2][0] fmap_padded[k+1][l-1] -fmap_padded[k+1][l-1]) bottom-left (filter[2][1] fmap_padded[k+1][l] -fmap_padded[k+1][l])) bottom-middle (filter[2][2] fmap_padded[k+1][l+1] -fmap_padded[k+1][l+1]); bottom-right (outmap_conv2d, partials_conv2, clk_50, finish); sum up sets partial sums and binarize the generate final fmaps partials_conv2 [16][32][4][4]; range from -9 9, so outmap_conv2d [32][4][4]; [9:0] temp_sum[32][4][4]; integer a, b, c, d, e, f, g, h, i; start; finish; [2:0] state; initial state="3'b0;" b="0;" @ (posedge clk_50) (start) (state="=" 3'b0) (a="0;" a<32; a++) columns reset (g="0;" g<4; g++) iterate through all 4x4s (h="0;" h<4; temp_sum[a][g][h] 3'd1) rows (c="0;" c<4; c++) (d="0;" d<4; d++) temp_sum[a][c][d] partials_conv2[b][a][c][d]; b<="b+1;" times (b="=15)" 3'd2) (e="0;" e<4; e++) transfer sign temporary (f="0;" f<4; f++) (temp_sum[a][e][f]="=" 0) outmap_conv2d[a][e][f]>>>8) ?  2'b11 : 2'b01; //store 1 or -1 based on sign bit
                        end
                    end
                end
                state <= 1 128 3'd3; end if (state="=" 3'd3) begin state <="3'd3;" finish else b<="0;" i<="0;" endmodule module pool2 (pool_conv2, outmap_conv2d, clk_50, start, finish); input clk_50; logic signed [1:0] outmap_conv2d [32][4][4]; output pool_conv2 [32][2][2]; integer g; start; finish; initial max pooling - check any ones in 2x2 square-- yes, no since outmap_conv2 binarized to -1 always @(posedge clk_50) (start) (g<32) [g][0][0] ? 2'b01 : 2'b11; [g][0][1] [g][1][0] [g][1][1] g 1; g<="0;" fc1 (fmap, wa, clk_50,start, finish, fc_out); fmap from last layer wa[128][32]; weights array fc_out [32]; i, j, k, l; fmap_flat [128]; [7:0] count; [2:0] state; [8:0] temp flatten 2d @(*)begin for (i="0;" i<32; i++) fmap_flat[i]="fmap[i][0][0];" fmap_flat[i+32]="fmap[i][0][1];" fmap_flat[i+64]="fmap[i][1][0];" fmap_flat[i+96]="fmap[i][1][1];" @ (posedge 3'b0) temp[i] j calculate cumulative sum 3'd1) (k="0;k<32;" k++) temp[k] + (wa[j][k] fmap_flat[j] -fmap_flat[j] ); (j="=127)" iterate times 3'd2) (l="0;" l<32; l++) ( temp[l]="=" 0) fc_out[l]>>>8) ? 2'b11 : 2'b01; //binarize
                end
                state <= 3'd3; end if (state="=" 3'd3) begin state <="3'd3;" finish else j i<="0;" endmodule module ten_map (fmap, wa1, final_out, clk_50, start, finish); input logic clk_50; signed [1:0] fmap [32]; wa1 [32][10]; output [7:0] final_out [10]; start; finish; multiply matrices 1x128 x 128x32="1x32" integer j, k; [2:0] state; initial temp[10]; always @ (posedge clk_50) (start) 3'b0) for (k="0;k<10;" k++) temp[k] calculate cumulative sum 3'd1) + (wa1[j][k] ? fmap[j] : - fmap[j]); (j="=32)" final_out[k] 3'd2) j<="0;" {"mode":"full","isactive":false}< code></=></=></16;></=></=></10)></10)>
//HPS
///
/// 640x480 version!

/// test VGA with hardware video input copy to VGA
///

//gcc v1.c -o v1

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys types.h>
#include <sys ipc.h>
#include <sys shm.h>
#include <sys mman.h>
#include <sys time.h>
#include "address_map_arm_brl4.h"
#include <math.h>
#include <pthread.h>

/* function prototypes */
void VGA_text (int, int, char *);
void VGA_text_clear();
void VGA_box (int, int, int, int, short);
void VGA_line(int, int, int, int, short) ;
void VGA_disc (int, int, int, short);
int  VGA_read_pixel(int, int) ;
int  video_in_read_pixel(int, int);
void draw_delay(void) ;

// the light weight buss base
void *h2p_lw_virtual_base;
volatile unsigned int *h2p_lw_video_in_control_addr=NULL;
volatile unsigned int *h2p_lw_video_in_resolution_addr=NULL;
//volatile unsigned int *h2p_lw_video_in_control_addr=NULL;
//volatile unsigned int *h2p_lw_video_in_control_addr=NULL;

volatile unsigned int *h2p_lw_video_edge_control_addr=NULL;

// pixel buffer
volatile unsigned int * vga_pixel_ptr = NULL ;
void *vga_pixel_virtual_base;

// video input buffer
volatile unsigned int * video_in_ptr = NULL ;
void *video_in_virtual_base;

// character buffer
volatile unsigned int * vga_char_ptr = NULL ;
void *vga_char_virtual_base;

// /dev/mem file id
int fd;

// shared memory
key_t mem_key=0xf0;
int shared_mem_id;
int *shared_ptr;
int shared_time;
int shared_note;
char shared_str[64];

// pixel macro
#define VGA_PIXEL(x,y,color) do{\
    char  *pixel_ptr ;\
    pixel_ptr = (char *)vga_pixel_ptr + ((y)<<10) + (x) ;\ *(char *)pixel_ptr="(color);\" } while(0) #define video_in_pixel(x,y,color) do{\ char *pixel_ptr pixel_ptr="(char" *)video_in_ptr ((y)<<9) measure time struct timeval t1, t2; double elapsedtime; timespec delay_time ; hps_image_data_base 0x00000070 hps_image_clk_base 0x00000090 hps_image_cs_base 0x00000080 out_data_base 0x00000120 out_cs_base 0x00000130 pio_start_base 0x00000140 pio_end_base 0x00000150 pio_switch_base 0x00000160 1. function to read input image from file------------------------------- and send it fpga hps volatile signed int * hps_image_data="NULL" unsigned hps_image_clk="NULL" hps_image_cs="NULL" pio_start="NULL" pio_end="NULL" pio_switch="NULL" image_matrix[8*8]; void toggle_image_clk (void){ *hps_image_clk="0;" sleep(1); load_input(void){ initialize things in *hps_image_cs="0;" toggle_image_clk(); set cs high open input_data.txt file output *myfile; c; counter="0;" nfile; myfile="fopen("input_data.txt"," "r"); if (myfile="=" null) { printf("fail \n"); exit(1); printf("input successfully while ((c="getc(myFile))" !="255){" '1')){ *hps_image_data="1;" image_matrix[counter]="1;" counter++; printf("number#%d: %d \n", counter, 1); (counter%8="=" 0) printf("\n"); else '0')){ -1); print printf("%d is loaded counter); i, offset; for (i="0;" i < 8; i++){ offset="i" image_matrix[offset] , image_matrix[offset+1], image_matrix[offset+2], image_matrix[offset+3] image_matrix[offset+4], image_matrix[offset+5], image_matrix[offset+6] image_matrix[offset+7]); fclose(myfile); out_data="NULL" out_cs="NULL" final_out [10]; read_output(void){ *out_cs="0;" i; 10; final_out[i]="(signed" int) (*out_data); global_maxidx, global_maxidx2, global_maxidx3; float global_probability, global_probability2, global_probability3; ------------------------------- print_output(void){ printf("negative %d\n", final_out[0], final_out[1], final_out[2], final_out[3], final_out[4], final_out[5], final_out[6], final_out[7], final_out[8], final_out[9]); printf("convert positive probablity computation sum_magnitude="0," maxvalue="-9999," maxidx="0;" (final_out[i]> 127) final_out[i] = final_out[i] - 256;
        if (final_out[i] > 0) sum_magnitude += final_out[i];
        //else sum_magnitude -= final_out[i];
        // Extract max values in the list
        if (final_out[i] > maxValue) {
          maxValue = final_out[i];
          maxIdx = i;
        }
      }
      printf("%d %d %d %d %d %d %d %d %d %d\n", final_out[0], final_out[1],
      final_out[2], final_out[3], final_out[4], final_out[5],
      final_out[6], final_out[7], final_out[8], final_out[9]);
      float probability, probability2, probability3;
      probability = (float) maxValue/(float)sum_magnitude;
      printf("Probability that it is #%d is %.3f\n", maxIdx, probability);
      int maxValue2 = -9999, maxIdx2 = 0;
      for (i = 0; i < 10; i++){
        // Extract second max values in the list
        if ((final_out[i] > maxValue2) && (i != maxIdx)) {
          maxValue2 = final_out[i];
          maxIdx2 = i;
        }
      }
      int maxValue3 = -9999, maxIdx3 = 0;
      for (i = 0; i < 10; i++){
        // Extract third max values in the list
        if ((final_out[i] > maxValue3) && (i != maxIdx) && (i != maxIdx2)) {
          maxValue3 = final_out[i];
          maxIdx3 = i;
        }
      }
      probability2 = (float) maxValue2/(float)sum_magnitude;
      probability3 = (float) maxValue3/(float)sum_magnitude;
      printf("Probability that it is #%d is %.3f\n", maxIdx2, probability2);
      printf("Probability that it is #%d is %.3f\n", maxIdx3, probability3);
      global_maxIdx = maxIdx;
      global_maxIdx2 = maxIdx2;
      global_maxIdx3 = maxIdx3;
      global_probability = probability;
      global_probability2 = probability2;
      global_probability3 = probability3;
}

int control = 1;
double elapsedTime;
void *scan_thread(void * t){
        int input;
        float input_value;
        int input_dt;
    struct timeval t1, t2;
    while(1){
        printf("Note: \n");
        printf("Enter 0 to read and display output \n");
        printf("Enter 1 to 6 output prebuilt module \n");
        //printf("Enter 3 to restart drum on HPS \n");
        printf(">");
        scanf("%d", &input);
        while (input < 0 || input > 6){
            printf("Enter a number from 1 to 3 \n");
            printf(">");
            scanf("%d", &input);
        }

    if (input == 0){    //Enter 0 to read and display output
            read_output();
      print_output();
        }
        else if ((input >= 1)&&(input <= 0 1 224 240 255 320 6)){ enter 1-6 to read and display output *pio_switch="input;" control="input;" *pio_start="0;" start compute gettimeofday(&t1, null); while (*pio_end !="1);" wait until finish gettimeofday(&t2, elapsedtime="(t2.tv_sec" - t1.tv_sec) * 1000.0; sec ms +="(t2.tv_usec" t1.tv_usec) us printf ("compute time is %.3f ms\n", elapsedtime); read_output(); print_output(); vga_text_clear(); } else if (input="=" 7){ printf("--------done---------\n\n"); int main(void) { printf("hello \n"); delay_time.tv_nsec="10" ; delay_time.tv_sec="0" declare volatile pointers i o registers (volatile means that io load store instructions will be used access these pointer locations, instead of regular memory loads stores) =="=" need mmap:="======================" fpga_char_base fpga_onchip_base hw_regs_base get fpga addresses="=================" open dev mem if( ( fd="open(" " mem", o_rdwr | o_sync ) -1 printf( "error: could not \" mem\"...\n" ); return( virtual addr maps physical h2p_lw_virtual_base="mmap(" null, hw_regs_span, prot_read prot_write ), map_shared, fd, map_failed mmap1() failed...\n" close( return(1); h2p_lw_video_in_control_addr="(volatile" unsigned *)(h2p_lw_virtual_base+video_in_base+0x0c); h2p_lw_video_in_resolution_addr="(volatile" *)(h2p_lw_virtual_base+video_in_base+0x08); *(h2p_lw_video_in_control_addr)="0x04" turn on video capture *(h2p_lw_video_in_resolution_addr)="0x00f00140" high low h2p_lw_video_edge_control_addr="(volatile" *)(h2p_lw_virtual_base+video_in_base+0x10); *h2p_lw_video_edge_control_addr="0x01" edges new pio hps_image_data="(signed" int*)(h2p_lw_virtual_base hps_image_data_base); hps_image_clk="(unsigned" hps_image_clk_base); hps_image_cs="(unsigned" hps_image_cs_base); out_data="(signed" out_data_base); out_cs="(unsigned" out_cs_base); pio_start="(unsigned" pio_start_base); pio_end="(unsigned" pio_end_base); pio_switch="(unsigned" pio_switch_base); vga char vga_char_virtual_base="mmap(" fpga_char_span, mmap2() the address character vga_char_ptr="(unsigned" *)(vga_char_virtual_base); pixel sdram vga_pixel_virtual_base="mmap(" fpga_onchip_span, sdram_base); sdram_base mmap3() buffer vga_pixel_ptr="(unsigned" *)(vga_pixel_virtual_base); input="======================" on-chip ram video_in_virtual_base="mmap(" fpga_onchip_base); format video_in_ptr="(unsigned" *)(video_in_virtual_base); create a message displayed lcd displays text_top_row[40]="DE1-SoC ARM/FPGA\0" text_top_row_2[40]="BNN Inference on FPGA" text_top_row_1[40]="By: Vidya and Xitang" text_bottom_row[40]="Cornell ece5760 - Bruce Land :D\0" num_string[20], time_string[50] from pixel_color; index i,j; clear screen vga_box (0, 0, 639, 479, 0x03); text vga_text (1, 56, text_top_row); 57, text_bottom_row); timer load_input(); pthread_t threads; thread attribute here allow join pthread_attr_t attr; initialize mutex condition variable objects pthread_mutex_t run_mutex="PTHREAD_MUTEX_INITIALIZER;" pthread_mutex_init(&run_mutex, for portability, explicitly threads in joinable state pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, pthread_create_joinable); pthread_create(&threads, &attr, scan_thread, pixel_read; struct timeval t3, t4; gettimeofday(&t3, pixel_color_matrix[224][224]; sum_color_matrix[8][8]; black, white (i="0;" i<224; i++) (j="0;" j<224; j++) pixel_color_matrix[i][j]="0;" rand()%2 or temp="0;" i<8; j<8; sum_color_matrix[i][j]="image_matrix[temp++];//j;" as 3, greyscale 0-3, black x_offset, y_offset, grey_color, x_offset_end, y_offset_end; start_idx_i, end_idx_i, start_idx_j, end_idx_j; images monitor matrix1[8][8]="{" {-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1, 1, 1,-1,-1}, {-1,-1, {-1, 1,-1,-1, 1,-1,-1,-1}, 1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1} }; matrix2[8][8]="{" {-1,-1,-1, matrix3[8][8]="{" matrix4[8][8]="{" 1,-1,-1,-1,-1,-1}, 1,-1, matrix5[8][8]="{" {-1,-1,-1,-1,-1, matrix6[8][8]="{" matrix7[8][8]="{" {-1,-1,-1,-1,-1,-1,1,-1}, {-1,-1,-1,-1,1,1,-1,-1}, {-1,-1,-1,-1,1,-1,-1,-1}, {-1,-1,-1,1,-1,-1,-1,-1}, {-1,-1,1,1,-1,-1,-1,-1}, {-1,-1,1,-1,-1,-1,-1,-1}, while(1) note this version vga_disk has throttled write software copy test. production, hardware does put few video_in_pixel(160,120,0xff); video_in_pixel(0,0,0xff); video_in_pixel(319,239,0xff); video_in_pixel(300,200,0xff); by -- over every 2s gettimeofday(&t4, ((t4.tv_sec t3.tv_sec)> 2){
       //VGA_disc((rand()&0x3ff), (rand()&0x1ff), rand()&0x3f, rand()</=></10)></pthread.h></math.h></sys></sys></sys></sys></sys></fcntl.h></unistd.h></stdlib.h></string.h></stdio.h>

Original: https://blog.csdn.net/capa_shi/article/details/118920526
Author: Clavin_Shi
Title: 基于FPGA的CNN卷积神经网络加速器

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/682698/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球