论文笔记(一)一种基于GPU的高效并行安全机器学习框架An Efficient Parallel Secure Machine Learning Framework on GPUs

@article{DBLP:journals/tpds/ZhangCZZZD21,
author = {Feng Zhang and
Zheng Chen and
Chenyang Zhang and
Amelie Chi Zhou and
Jidong Zhai and
Xiaoyong Du},
title = {An Efficient Parallel Secure Machine Learning Framework on GPUs},
journal = {{IEEE} Trans. Parallel Distributed Syst.},
volume = {32},
number = {9},
pages = {2262–2276},
year = {2021},
url = {An Efficient Parallel Secure Machine Learning Framework on GPUs | IEEE Journals & Magazine | IEEE Xplore},
doi = {10.1109/TPDS.2021.3059108},
timestamp = {Thu, 14 Oct 2021 09:20:51 +0200},
biburl = {https://dblp.org/rec/journals/tpds/ZhangCZZZD21.bib},
bibsource = {dblp computer science bibliography,dblp: computer science bibliography}
}

I 、论文梗概

背景:隐私保护重要性–MPC 用于很多应用中,尤其在机器学习中,MPC 有特殊的优势(相对DP )

we find that the low performance problem exists even with two-party computation, which is mainly due to the following reasons.

SecureML [10], proposed by Mohassel and others, is the state-of-the-art machine learning framework based on two-party computation.

GPUs have been widely used as a powerful accelerator to machine learning algorithms [15], [16]. However, none of existing studies has focused on the acceleration of secure machine learning algorithms using GPUs.

针对的问题:性能

Previous work on secure machine learning mostly focused on novel protocols or improving accuracy, while the performance metric has been ignored.

本文的解决策略:提出基于GPU 的框架 GPU-based framework ParSecureML

遇到的挑战:

  1. complex computation patterns, 复杂计算模式
  2. frequent intra-node data transmission between CPU and GPU, 节点内CPU和GPU间数据传输
  3. complicated inter-node data dependence 复杂的节点间数据依赖

提出的结构思路:

  1. profiling-guided adaptive GPU utilization,
  2. fine-grained double pipeline for intra-node CPU-GPU cooperation,
  3. compressed transmission for inter-node communication,
  4. integrate architecture specifific optimizations, such as Tensor Cores, into ParSecureML

成果:

  1. the first GPU-based secure machine learning framework.

  2. Compared to the state-of-the-art framework, ParSecureML achieves an average of 33.8X speedup.

  3. ParSecureML can also be applied to inferences, which achieves 31.7X speedup on average.

ParSecureML 创新点

针对三大挑战: Building a GPU-based secure machine learning framework requires handling three challenges.

  1. the complex triplet multiplication based computation patterns
  2. how to handle the PCIe transmission overhead caused by frequent intra-node data transmission between CPU and GPU.

  3. the complicated inter-node data dependence

三项新技术

  1. a profiling guided adaptive GPU engine分析过程找到计算最密集的部分
  2. a double pipeline design, which can overlap not only the GPU computation and PCIe data transmission, but also potential steps among different NN layers
  3. a novel compression-based transmission method

对CPU和GPU进行了深度优化:

  1. 对随机数设计了线程安全的随机生成设计(a thread-safe random number generation design);
  2. 计算密集复杂部分置于GPU(使用cache优化来并行这些操作)
  3. 引入架构的特殊优化,将TensorCores 加入GPUs

对比ML算法(6种):

convolutional neural network (CNN) [19], multilayer perceptron (MLP) [20], linear regression [21], logistic regression [22], recurrent neural network (RNN) [23], and Support Vector Machine (SVM) [24],

5个数据集:

MNIST [25], VGGFace2 [26], NIST [27], CIFAR-10 [28], and a synthetic dataset.

II 、ParSecureML协议

  1. Overview :

三个组件:

1)profiling-guided adaptive GPU utilization针对挑战一

  1. Double pipeline execution for overlapping intra-node data transmission and computation (compute1和communicate作为CPU执行的reconstruct phase,compute2作为GPU部分,形成一条pipeline;另外ML中单层中多个步骤,在这条pipeline中层间操作可以重叠)
  2. Compressed transmission for inter-node communication 针对挑战三

论文笔记(一)一种基于GPU的高效并行安全机器学习框架An Efficient Parallel Secure Machine Learning Framework on GPUs

多技术集成的困难:GPU任务需要与pipeline执行和压缩传输合作;

双pipeline设计更复杂(CPU-GPU传输、计算、压缩传输)

压缩传输的数据能在GPU中存储

workflow:ML tasks各层中有forward propagation and backward propagation,both with reconstruct and GPU operation phases

  1. profiling-guided adaptive GPU utilization

  2. offline:三元组中矩阵乘法可GPU加速

  3. online:

论文笔记(一)一种基于GPU的高效并行安全机器学习框架An Efficient Parallel Secure Machine Learning Framework on GPUs
  • Activation function design:

Equation (9) to simulate the original nonlinear functions in GPUs

论文笔记(一)一种基于GPU的高效并行安全机器学习框架An Efficient Parallel Secure Machine Learning Framework on GPUs
  1. Double pipeline execution for overlapping intra-node data transmission and computation

思路:

每层 forward:数据处理 backward:参数更新;每层都需要数据传输

————————需要fine-grained pipeline设计,不使用coarse-grained pipeline[43][44]

许多步骤贯穿多层

————————需要a second pipeline来overlap the possible steps in different layers

Pipeline Design:

  • First Pipeline: overlap GPU computation and PCIe data transmission in equation (8)

论文笔记(一)一种基于GPU的高效并行安全机器学习框架An Efficient Parallel Secure Machine Learning Framework on GPUs
  • Second Pipeline:

各层Forward and backward都需要reconstruct步骤和GPU操作,

后续层处理基于当前层的forward propagation,因此前后层forward 中reconstruct无法重叠。而backward中reconstruct不需要等待下一层,可与下一层propagation重叠——————可以节省一个reconstruct时间

论文笔记(一)一种基于GPU的高效并行安全机器学习框架An Efficient Parallel Secure Machine Learning Framework on GPUs
  1. Compressed Transmission for Inter-Node Communication

分析:迭代后的矩阵通常为稀疏矩阵。激活函数后会有多个零;当层数上涨,初始几层损失函数的梯度很小。

论文笔记(一)一种基于GPU的高效并行安全机器学习框架An Efficient Parallel Secure Machine Learning Framework on GPUs

论文笔记(一)一种基于GPU的高效并行安全机器学习框架An Efficient Parallel Secure Machine Learning Framework on GPUs

论文笔记(一)一种基于GPU的高效并行安全机器学习框架An Efficient Parallel Secure Machine Learning Framework on GPUs
  1. 优化

1)CPU 加速

  • 随机数产生的加速:使用 a thread-safe random number generator, Mersenne Twister 19937 generator (MT19937) [48], from C++ 11 random library(1.06 X rand()运行时间)另种可能的提高方式:cuRAND on GPUs,不过只在大矩阵下有好的加速效果

论文笔记(一)一种基于GPU的高效并行安全机器学习框架An Efficient Parallel Secure Machine Learning Framework on GPUs
  • 矩阵加减法优化 (5)(6)中加减法多,可以通过multi-threaded for-loop in parallel

2) GPU加速

  • nvprof分析GPU运行,发现有三部分:host-to-device内存复制,通用矩阵乘法操作(针对),device-to-device内存复制

论文笔记(一)一种基于GPU的高效并行安全机器学习框架An Efficient Parallel Secure Machine Learning Framework on GPUs
  • Tensor Core Utilization.

Popular GPU machine learning frameworks, including TensorFlow [35], PyTorch [36], MXNet [51], and Caffe2 [52], all utilize Tensor Cores.

论文笔记(一)一种基于GPU的高效并行安全机器学习框架An Efficient Parallel Secure Machine Learning Framework on GPUs

Original: https://blog.csdn.net/weixin_41839176/article/details/126611037
Author: BambooDoo
Title: 论文笔记(一)一种基于GPU的高效并行安全机器学习框架An Efficient Parallel Secure Machine Learning Framework on GPUs

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/689307/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球