tensorflow使用gpu进行训练

GPU之nvidia-smi命令详解查看显卡的信息:

cmd: nvidia-smi

https://www.jianshu.com/p/ceb3c020e06b

tensorflow使用gpu进行训练
  • GPU:本机中的GPU编号(有多块显卡的时候,从0开始编号)图上GPU的编号是:0
  • Fan:风扇转速(0%-100%),N/A表示没有风扇
  • Name:GPU类型,图上GPU的类型是:Tesla T4
  • Temp:GPU的温度(GPU温度过高会导致GPU的频率下降)
  • Perf:GPU的性能状态,从P0(最大性能)到P12(最小性能),图上是:P0
  • Persistence-M:持续模式的状态,持续模式虽然耗能大,但是在新的GPU应用启动时花费的时间更少,图上显示的是:off
  • Pwr:Usager/Cap:能耗表示,Usage:用了多少,Cap总共多少
  • Bus-Id:GPU总线相关显示,domain:bus:device.function
  • Disp.A:Display Active ,表示GPU的显示是否初始化
  • Memory-Usage:显存使用率
  • Volatile GPU-Util:GPU使用率
  • Uncorr. ECC:关于ECC的东西,是否开启错误检查和纠正技术,0/disabled,1/enabled
  • Compute M:计算模式,0/DEFAULT,1/EXCLUSIVE_PROCESS,2/PROHIBITED
  • Processes:显示每个进程占用的显存使用率、进程号、占用的哪个GPU

隔几秒刷新一下显存状态:nvidia-smi -l 秒数

隔两秒刷新一下GPU的状态: nvidia-smi -l 2

tensorflow的显卡使用方式

1、直接使用

此方法基本上会占用当前机器上所有显卡的剩余显存,请注意,它是机器上所有显卡的剩余显存。所以程序可能只需要一块显卡,但是程序太霸道了,我不用其他显卡,或者我不能用那么多显卡,但我只想占据它。

[En]

This method will basically occupy the remaining video memory of all the video cards on the current machine, note that it is the remaining video memory of all the video cards on the machine. So the program may only need one graphics card, but the program is so bossy, I don’t use other graphics cards, or I can’t use that many graphics cards, but I just want to occupy it.

with tf.compat.v1.Session() as sess:
        # 输入图片为256x256,2个分类
        shape, classes = (224, 224, 3), 20
        # 调用keras的ResNet50模型
        model = keras.applications.resnet50.ResNet50(input_shape = shape, weights=None, classes=classes)
        model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

        # 训练模型 categorical_crossentropy sparse_categorical_crossentropy
        # training = model.fit(train_x, train_y, epochs=50, batch_size=10)
        model.fit(train_x,train_y,validation_data=(test_x, test_y), epochs=20, batch_size=6,verbose=2)
        # # 把训练好的模型保存到文件
        model.save('resnet_model_dog_n_face.h5')

2、分配比例使用

这种方法和直接使用上面的方法的不同之处在于,我不会占用所有的视频内存。例如,如果我这样写,我将拥有每个视频卡的60%。

[En]

The difference between this method and the direct use of the above is that I don’t occupy all the video memory. For example, if I write like this, I have 60% of each video card.

from tensorflow.compat.v1 import ConfigProto# tf 2.x的写法
config =ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction=0.6
with tf.compat.v1.Session(config=config) as sess:
     model = keras.applications.resnet50.ResNet50(input_shape = shape, weights=None, classes=classes)

3. 动态申请使用

该方法动态申请显存,只申请内存,不释放内存。如果其他人的程序占用了所有剩余的显卡,它们将报告错误。

[En]

This method is dynamically applied for video memory, only memory will be applied for, and memory will not be released. And if other people’s programs occupy all the remaining graphics cards, they will report an error.

以上三种方式应根据现场情况进行选择。

[En]

The above three ways should be chosen according to the scene.

第一个是因为它占用了所有的内存,所以只要模型的大小不超过显存的大小,就不会产生显存碎片,影响计算性能。可以说,部署应用程序的配置是合适的。

[En]

The first is because it takes up all the memory, so as long as the size of the model does not exceed the size of the video memory, it will not produce video memory fragments and affect computing performance. It can be said that it is appropriate to deploy the configuration of the application.

第二种和第三种适合多人使用一台服务器,但第二种是显存的浪费,第三种避免了某个程序显存的浪费,但程序很容易因为内存申请失败而崩溃。

[En]

The second and the third are suitable for multiple people to use a server, but the second is a waste of video memory, and the third avoids the waste of video memory in a certain program, but it is very easy for the program to crash due to failure to apply for memory.

config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.InteractiveSession(config=config)
with tf.compat.v1.Session(config=config) as sess:
     model

4 指定GPU

在有多块GPU的服务器上运行tensorflow的时候,如果使用python编程,则可指定GPU,代码如下:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2"

配上一个完整的示例:resnet50图片分类:

tensorflow使用gpu进行训练
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.InteractiveSession(config=config)
with tf.compat.v1.Session(config=config) as sess:
        # 输入图片为256x256,2个分类
        shape, classes = (224, 224, 3), 20
        # 调用keras的ResNet50模型
        model = keras.applications.resnet50.ResNet50(input_shape = shape, weights=None, classes=classes)
        model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

        # 训练模型 categorical_crossentropy sparse_categorical_crossentropy
        # training = model.fit(train_x, train_y, epochs=50, batch_size=10)
        model.fit(train_x,train_y,validation_data=(test_x, test_y), epochs=20, batch_size=6,verbose=2)
        # # 把训练好的模型保存到文件
        model.save('resnet_model_dog_n_face.h5')

Original: https://blog.csdn.net/qq_38735017/article/details/119991239
Author: 甜辣uu
Title: tensorflow使用gpu进行训练

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/508937/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球