【架构分析】Tensorflow Internals源码分析2 – 计算图创建与执行

2023年5月26日下午6:04 • 人工智能 • 阅读 84

概述

本文基于Tensorflow r1.15源码链接通过Sample示例，重点分析计算图创建与执行的内部原理。

Sample示例

import tensorflow as tf

x1 = tf.placeholder(tf.float32, shape=(2, 2))
y1 = tf.placeholder(tf.float32, shape=(2, 2))
b1 = tf.placeholder(tf.float32, shape=(2, 2))

x2 = tf.matmul(x1, y1)
y2 = tf.add(x2, b1)

with tf.Session() as sess:
  vals = sess.run(y2, feed_dict={x1:[[0.7, 0.5], [0.7, 0.5]] ,
    y1:[[0.7, 0.5], [0.7, 0.5]],
    b1:[[0.7, 0.5], [0.7, 0.5]]} )
  sess.close()

【架构分析】Tensorflow Internals源码分析2 - 计算图创建与执行

Sample代码对应的计算图

非常直观简单的一个Sample示例，3个Placeholder作为输入，做MatMul和Add两个计算

GPU设备创建

GPU Device创建时序

Sample程序运行时python部分会创建新的Session，在Native部分会如上图所示创建GPUDevice，特别是创建stream_executor模块中的stream对象，它是真正提供CUDA API计算能力的模块，如下图所示: StreamExecutorInterface 提供操作GPU的抽象API接口，GpuExecutor提供API的实现并通过GpuDrivre真正调用了CUDA Runtime/Driver API操作GPU设备

//tensorflow/stream_executor/stream_executor_internal.h

// CUDA-platform implementation of the platform-agnostic
// StreamExecutorInferface.

class GpuExecutor : public internal::StreamExecutorInterface {
...

//tensorflow/stream_executor/gpu/gpu_executor.h

// CUDA-platform implementation of the platform-agnostic
// StreamExecutorInferface.

class GpuExecutor : public internal::StreamExecutorInterface {
...

//tensorflow/stream_executor/cuda/cuda_gpu_executor.cc

port::Status GpuExecutor::Init(int device_ordinal,
                               DeviceOptions device_options) {
  device_ordinal_ = device_ordinal;

  auto status = GpuDriver::Init();
  if (!status.ok()) {
    return status;
  }

  status = GpuDriver::GetDevice(device_ordinal_, &device_);
  if (!status.ok()) {
    return status;
  }

  status = GpuDriver::CreateContext(device_ordinal_, device_, device_options,
                                    &context_);
  if (!status.ok()) {
    return status;
  }

//tensorflow/stream_executor/cuda/cuda_driver.cc

// Actually performs the work of CUDA initialization. Wrapped up in one-time
// execution guard.

static port::Status InternalInit() {
...

    res = cuInit(0 /* = flags */);
...

/* static */ port::Status GpuDriver::GetDevice(int device_ordinal,
                                               CUdevice* device) {
  RETURN_IF_CUDA_RES_ERROR(cuDeviceGet(device, device_ordinal),
                           "Failed call to cuDeviceGet");
...

/* static */ port::Status GpuDriver::CreateContext(
    int device_ordinal, CUdevice device, const DeviceOptions& device_options,
    GpuContext** context) {
...

  CHECK_EQ(CUDA_SUCCESS,
           cuDevicePrimaryCtxGetState(device, &former_primary_context_flags,
                                      &former_primary_context_is_active));

计算图创建与执行

计算图创建与执行时序图

该部分的时序图比较复杂，按如下的层次结构来理解和看图

计算图运行DirectSession::Run 主要工作包括下面几部分

创建所有子图的执行器DirectSession::GetOrCreateExecutors 主要工作包括
为Sample程序创建完整的计算图DirectSession::BuildGraph，为计算图通过grappler模块做优化，包括各种Optimizer和优化Pass，其中会通过VirtualPlacer模块将计算图中各个节点放到合适的计算设备上，通过PruneGraph做剪枝优化，对于Sample程序生成下图的计算图

Sample程序计算图

将计算图分割到不同计算设备分割成子图DirectSession::Partition，这一步会根据不同节点在计算设备之间（CPU

Sample程序计算图经过Partition分割成CPU和GPU上两个子图

为每个子图创建它的执行器NewLocalExecutor，该步骤中会为每个子图上的阶段创建对应的Op Kernel实例，以便在后续执行真正的计算
运行各个子图DirectSessoin::RunInternal，每个子图的执行器异步运行子图，子图中的每个节点通过线程池来调度执行 Process -> PrepareInputs -> Compute/ComputeAsync（执行OP Kernel完成GPU的计算） -> ProcessOutputs -> PropagateOutputs -> ActivateNodes -> NodeDone 这个pipeline，直到所有计算图中的节点执行完毕（该过程更详细的代码介绍，可以参考我的前一篇Tensorflow Internals源码分析文章链接

Original: https://blog.csdn.net/HaoBBNuanMM/article/details/123458507
Author: HaoBBNuanMM
Title: 【架构分析】Tensorflow Internals源码分析2 – 计算图创建与执行

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/521048/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

计数数据分析模型:零膨胀负二项（ZINB）回归模型

1.计数统计模型 4.4 计数数据模型 – 百度文库https://wenku.baidu.com/view/2b488e62561252d380eb6eac.html…

人工智能 2023年6月16日
00101
torch.nn.MaxPool1d各参数分析

torch.nn.MaxPool1d各参数小白文分析一、官方定义和参数解释 * 1.1 初步解释各个参数二、用代码测试各个参数的影响 * 2.1 kernel_size、str…

人工智能 2023年7月21日
0047
视觉SLAM十四讲报错ch12: segmentation fault (core dumped) ./pointcloud_mapping

编译高博的视觉SLAM十四讲ch12的 ./pointcloud_mapping没有问题，但是运行的时候出现错误： $ ./pointcloud_mapping [1] 863 s…

人工智能 2023年6月2日
0060
100天精通Python（数据分析篇）——第58天：Pandas读写数据库（read_sql、to_sql）

### 回答1：这个问题的意思是询问一个学习 Python 数据分析_的 _100 天_计划，我的回答如下： _Python 数据分析_是非常重要的技能之一，学习它需要长时间的实…

人工智能 2023年7月16日
0052
【python-pandas】利用pandas操作Excel.xlsx数据，写入可覆盖，无法追加数据（版本过低）（踩坑）

前言：今天遇到个问题，项目上有个Excel输出文件需要进行二次读写，对已存在的.xlsx文件追加一张sheet表，但是实际操作过程中发现利用pandas.to_excel()操作失…

人工智能 2023年7月6日
0051
[异常检测] Graph Embedded Pose Clustering for Anomaly Detection

Graph Embedded Pose Clustering for Anomaly Detection 会议：CVPR 2020 单位：Tel-Aviv University, …

人工智能 2023年5月31日
0072
tensorflow环境搭建教程

tensorflow环境搭建教程－已失效前言一、下载anaconda 二、修改Python版本三、搭建tensorflow环境四、安装其他库模块总结文章目录前言一、…

人工智能 2023年5月26日
0061
机器学习—sklearn

; 1.Sklearn简介 sklearn (全称 Scikit-Learn) 是基于 Python 语言的机器学习工具,Sklea是处理机器学习 (有监督学习和无监督学习) 的包…

人工智能 2023年6月12日
0051
用python完成多项式拟合曲线

用python完成多项式拟合需要用到的库有 numpy和 matplotlib 曲线拟合的函数在numpy库中： polyfit（x,y,n） x 为源数据点对应的横坐标，可为行向…

人工智能 2023年7月5日
0054
PyTorch深度学习（13）PyTorch Torch Vision python 版本对应

pytorch，torchvision，python 版本对应 pytorch，torchvision，python 三者的对应关系来源于 pytorch 官方 github，链接…

人工智能 2023年7月21日
0039
问题匹配鲁棒性评测方案总结

文章目录 1. 赛题描述与分析 * 1.1 赛题描述 1.2 赛题分析 2. 方案介绍 * 2.1 整体方案 2.2 模型方案一 2.3 模型方案二 2.4 模型方案三 2.5 后…

人工智能 2023年5月28日
0067
机器学习1一回归模型（一）

机器学习前先了解数据挖掘常用的6个模块：模块名称模块作用应用场景math数学模块科学计算方法，例如平方根、对数计算、三角函数等对数据进行标准化、求统计值等处理datetime…

人工智能 2023年6月18日
0073
第3章 C语言高级的预处理

文章目录 * – 文档配套视频讲解链接地址 – 第03章预处理 – + 3.1 编译过程 + * 1. 编译过程分为4个步骤 * 2. 预处理…

人工智能 2023年6月26日
0074
每日论文3—CVPR2022 3D目标检测

A Versatile multi-view framework for lidar-based 3d object detection with guidance from pa…

人工智能 2023年7月9日
0059
S3C2410——LED灯实验

写在最前面：嵌入式实验，ping不通时一定要先看看网线有没有接通，一般接通都是会有黄绿色色闪烁。一、S3C2410输入/输出的原理 Linux主要有字符设备、块设备和网络设备3类…

人工智能 2023年6月4日
0085
yolov5——训练策略

yolov5——训练策略前言 1. 训练预热——Warmup * 1.1 what是Warmup 1.2 why用Warmup 1.3 常见Warmup类型 1.4 yolov5…

人工智能 2023年6月16日
0097

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

【架构分析】Tensorflow Internals源码分析2 – 计算图创建与执行

大家都在看