CPU与GPU性能的比较报告

运行时间分析

不同机型在CPU和GPU下的时间差别很大。一般来说,GPU的速度是CPU的5-20倍。我们选择了INSITION v3最常用的分类模型,输入图像大小为:3x299x299。

[En]

The time of different models under cpu and gpu is quite different. Generally speaking, gpu is 5-20 times faster than cpu. We chose the most commonly used classification model of inception v3, and the input image size is: 3x299x299.

GPU

在P100GPU(显存16276MiB)上,性能如下:

[En]

In a P100GPU (video memory 16276MiB), the performance is as follows:

CPU与GPU性能的比较报告

从上图可以看出,时间消耗随着进程数量的增加而线性增加。

[En]

As can be seen from the above figure, the time consumption increases linearly as the number of processes increases.

因此:如果服务中同一张卡上打开更多的进程,则只有服务中连接/下载镜像的并发达到并发速度(在神经元框架中,连接建立、镜像下载和算法处理是并发且独立的,可以近似地认为彼此独立);算法的吞吐量基本不变。而且从RT的角度来看,单进程独占显卡更好(此时任务可以占据90%左右的易失性GPU-Util)。

[En]

Therefore: if more processes are opened on the same card in the service, only the concurrency of connecting / downloading images in the service achieves concurrent speed (in the neuron framework, connection establishment, image download and algorithm processing are concurrent and independent, which can be approximately considered to be independent of each other); * the throughput of the algorithm is basically unchanged * . And from the RT point of view, it is better for a single process to monopolize the card (when the task can occupy about 90% of the Volatile GPU-Util).

当然,如果RT满足要求,还可以同时在卡上部署其他任务。

[En]

Of course, if the RT meets the requirements, other tasks can be deployed on the card at the same time.

CPU

Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz 24核服务器下。TF默认为尽可能的占用所有的核,真正的neuron服务也是尽可能的占用所有的核,所以请求量大的时候RT会上升。

CPU与GPU性能的比较报告

24核服务器下,QPS呈现log趋势。 考虑RT,算法可以开启10个进程较优。此时CPU使用率已经逼近2400%。当然 如果RT有限制,则采用更小的并发/更多的机器

PS:一个进程下CPU占用率1600% 2个并发2000% 3个并发2100% 4个并发2200% 5个并发2250% 6个并发已达2280%。

CPU more

另一方面,通过改变进程使用的核数来计算RT值。这一节类似于上一节的CPU。

[En]

On the other hand, the RT value is counted by changing the number of cores used by the process. This section is similar to the previous CPU section.

CPU与GPU性能的比较报告

对于此分类任务,10核(并发达到此级别)后性能不会提升。

[En]

For this classification task, the performance will not improve after 10 cores (concurrency is up to this level).

如果要保证一定的RT,就要保证每个请求可以拿到足够多的核

总结

单GPU QPS可以达到55;24核CPU的QPS可以达到24左右。但是GPU的TR要远低于CPU,不过GPU并发数上来,RT也会线性增加。

按照目前线上一个GPU的成本约等于96个CPU核,CPU性价比还是远优于GPU的

PS:评测中P100性能较好价格较贵、CPU E5-2620已经较为(古老)便宜了。

Original: https://www.cnblogs.com/houkai/p/9671543.html
Author: 侯凯
Title: CPU与GPU性能的比较报告

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/6321/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

最近整理资源【免费获取】:   👉 程序员最新必读书单  | 👏 互联网各方向面试题下载 | ✌️计算机核心资源汇总