opencv系列之基于NVIDIA显卡的opencv-python硬解方案

ffmpeg编译使用cuvid硬解方案试过了,不过解码出来的像素格式为YUV420, opencv中使用需要转成BGR,转色彩空间这部占用的CPU过高。

因此需要将转色彩空间这步也用GPU来处理,NVIDIA 开源了适用于 Python 的视频处理框架「VideoProcessingFramework(VPF)」。该框架为开发人员提供了一个简单但功能强大的 Python 工具,可用于硬件加速的视频编码、解码和处理类等任务。

同时,由于 Python 绑定下的 C ++ 代码,它使开发者可以在数十行代码中实现较高的 GPU 利用率。解码后的视频帧以 NumPy 数组或 CUDA 设备指针的形式公开,以简化交互过程及其扩展功能。

目前,VPF 并未对 NVIDIA Video Codec SDK 附加任何限制,开发者可充分利用 NVIDIA 专业级 GPU 的功能。

同时,VPF also supports exporting GPU memory objects such as decoded video frames to PyTorch tensors without Host to Device copies.

对于PyTorch推理及其友好。

前置安装

①安装与GPU匹配的CUDA和英伟达显卡驱动,需要注意版本对应。
下载NVIDIA Video Codec SDK并解压,官网下载需要注册
安装对应nvidia驱动版本的Nvidia Video Codec SDK
我的是linux 470.86, 因此下载VideoCodecSDK11.1
解压后拷贝头文件和so到指定位置

unzip Video_Codec_SDK.zip
cd Video_Codec_SDK
$ sudo cp Interface/* /usr/local/cuda/include
$ sudo cp Lib/linux/stubs/x86_64/* /usr/local/cuda/lib64/stubs

安装VPF


cd ~/installs
git clone https://github.com/NVIDIA/VideoProcessingFramework.git

export CUDACXX=/usr/local/cuda-11.3/bin/nvcc

cd VideoProcessingFramework
mkdir -p install
mkdir -p build
cd build

cmake ..   -DFFMPEG_DIR:PATH="/usr/local/ffmpeg3.4.9"  \
-DVIDEO_CODEC_SDK_INCLUDE_DIR:PATH="/usr/local/cuda/include"   \
-DGENERATE_PYTHON_BINDINGS:BOOL="1"   \
-DGENERATE_PYTORCH_EXTENSION:BOOL="0"  \
-DPYTHON_LIBRARY=/home/hw/anaconda3/envs/cd_test/lib/libpython3.8.so   \
-DCMAKE_INSTALL_PREFIX:PATH="../install" \
-DPYTHON_EXECUTABLE=/home/hw/anaconda3/envs/cd_test/bin/python3 \
-DPYTHON_INCLUDE_DIR=/home/hw/anaconda3/envs/cd_test/include/python3.8

make -j6  && sudo make install

cd ../install/bin
conda activate cd_test
$ python3 SampleDecodeRTSP.py 0 rtsp://xxxx
This sample decodes multiple videos in parallel on given GPU.

It doesn't do anything beside decoding, output isn't saved.

Usage: SampleDecodeRTSP.py $gpu_id $url1 ... $urlN .
[h264 @ 0x55678af45560] co located POCs unavailable
Input
  Metadata:
    title           : Stream
  Duration: N/A, start: -0.856438, bitrate: N/A
    Stream
    Stream
Output
  Metadata:
    title           : Stream
    encoder         : Lavf57.83.100
    Stream
Stream mapping:
  Stream
Press [q] to stop, [?] for help
3e123055-63a0-45f4-b8ac-82cf60f321ea 508kB time=00:00:03.52 bitrate=1180.1kbits/s speed=1.11x
3e123055-63a0-45f4-b8ac-82cf60f321ea1985kB time=00:00:05.57 bitrate=2916.0kbits/s speed=1.07x
3e123055-63a0-45f4-b8ac-82cf60f321ea2749kB time=00:00:06.59 bitrate=3416.6kbits/s speed=1.06x
3e123055-63a0-45f4-b8ac-82cf60f321ea3448kB time=00:00:07.58 bitrate=3721.1kbits/s speed=1.05x

查看了下Sample源码,使用ffmpeg做了解封装,然后再用VPF的API做硬解码

如果需要在其他工程中使用VPF,则拷贝编译好的PyNvCodec.cpython-38-x86_64-linux-gnu.so文件到工程主目录下,或者在工程代码中使用sys.path.append(‘/root/user/installs/VideoProcessingFramework/install/bin’)来添加,还可以将生成的.so文件拷贝到使用的Python包路径(例如cp PyNvCodec.cpython-38-x86_64-linux-gnu.so /root/conda/envs/env_name/lib/python3.8/site-packages/)。

编码使用


import multiprocessing
import sys
import os
import threading
from typing import Dict
import cv2

if os.name == 'nt':

    cuda_path = os.environ["CUDA_PATH"]
    if cuda_path:
        os.add_dll_directory(cuda_path)
    else:
        print("CUDA_PATH environment variable is not set.", file=sys.stderr)
        print("Can't set CUDA DLLs search path.", file=sys.stderr)
        exit(1)

    sys_path = os.environ["PATH"]
    if sys_path:
        paths = sys_path.split(';')
        for path in paths:
            if os.path.isdir(path):
                os.add_dll_directory(path)
    else:
        print("PATH environment variable is not set.", file=sys.stderr)
        exit(1)

import PyNvCodec as nvc
import numpy as np

from io import BytesIO
from multiprocessing import Process
import subprocess
import uuid
import json
import pycuda.driver as cuda

def get_stream_params(url: str) -> Dict:
    cmd = [
        'ffprobe',
        '-v', 'quiet',
        '-print_format', 'json',
        '-show_format', '-show_streams', url]
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
    stdout = proc.communicate()[0]

    bio = BytesIO(stdout)
    json_out = json.load(bio)

    params = {}
    if not 'streams' in json_out:
        return {}

    for stream in json_out['streams']:
        if stream['codec_type'] == 'video':
            params['width'] = stream['width']
            params['height'] = stream['height']
            params['framerate'] = float(eval(stream['avg_frame_rate']))

            codec_name = stream['codec_name']
            is_h264 = True if codec_name == 'h264' else False
            is_hevc = True if codec_name == 'hevc' else False
            if not is_h264 and not is_hevc:
                raise ValueError("Unsupported codec: " + codec_name +
                                 '. Only H.264 and HEVC are supported in this sample.')
            else:
                params['codec'] = nvc.CudaVideoCodec.H264 if is_h264 else nvc.CudaVideoCodec.HEVC

                pix_fmt = stream['pix_fmt']
                is_yuv420 = pix_fmt == 'yuv420p'
                is_yuv444 = pix_fmt == 'yuv444p'

                is_yuvj420 = pix_fmt == 'yuvj420p'
                is_yuvj444 = pix_fmt == 'yuvj444p'

                if is_yuvj420:
                    is_yuv420 = True
                    params['color_range'] = nvc.ColorRange.JPEG
                if is_yuvj444:
                    is_yuv444 = True
                    params['color_range'] = nvc.ColorRange.JPEG

                if not is_yuv420 and not is_yuv444:
                    raise ValueError("Unsupported pixel format: " +
                                     pix_fmt +
                                     '. Only YUV420 and YUV444 are supported in this sample.')
                else:
                    params['format'] = nvc.PixelFormat.NV12 if is_yuv420 else nvc.PixelFormat.YUV444

                if 'color_range' not in params:
                    params['color_range'] = nvc.ColorRange.MPEG

                if 'color_range' in stream:
                    color_range = stream['color_range']
                    if color_range == 'pc' or color_range == 'jpeg':
                        params['color_range'] = nvc.ColorRange.JPEG

                params['color_space'] = nvc.ColorSpace.BT_601

                if 'color_space' in stream:
                    color_space = stream['color_space']
                    if color_space == 'bt709':
                        params['color_space'] = nvc.ColorSpace.BT_709

                return params
    return {}

def rtsp_client(url: str, name: str, gpu_id: int) -> None:

    params = get_stream_params(url)

    if not len(params):
        raise ValueError("Can not get " + url + ' streams params')

    w = params['width']
    h = params['height']
    f = params['format']
    c = params['codec']
    g = gpu_id

    if nvc.CudaVideoCodec.H264 == c:
        codec_name = 'h264'
    elif nvc.CudaVideoCodec.HEVC == c:
        codec_name = 'hevc'
    bsf_name = codec_name + '_mp4toannexb,dump_extra=all'

    cmd = [
        'ffmpeg',       '-hide_banner',
        '-loglevel',   'quiet',
        '-i',           url,
        '-c:v',         'copy',
        '-bsf:v',       bsf_name,
        '-f',           codec_name,
        'pipe:1'
    ]

    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)

    cuda.init()
    cuda_ctx = cuda.Device(gpu_id).retain_primary_context()
    cuda_ctx.push()
    cuda_str = cuda.Stream()
    cuda_ctx.pop()

    nvdec = nvc.PyNvDecoder(w, h, f, c, g)
    nvCvt = nvc.PySurfaceConverter(w, h, nvc.PixelFormat.NV12, nvc.PixelFormat.BGR, cuda_ctx.handle, cuda_str.handle)
    nvDwn = nvc.PySurfaceDownloader(w, h, nvCvt.Format(), cuda_ctx.handle, cuda_str.handle)
    frameSize = int(w*h*3)
    rawFrame = np.ndarray(shape=(frameSize), dtype=np.uint8)
    cc_ctx = None

    read_size = 4096

    rt = 0
    fd = 0

    while True:

        if not read_size:
            read_size = int(rt / fd)

            rt = read_size
            fd = 1

        bits = proc.stdout.read(read_size)
        if not len(bits):
            print("Can't read data from pipe")
            break
        else:
            rt += len(bits)

        enc_packet = np.frombuffer(buffer=bits, dtype=np.uint8)
        pkt_data = nvc.PacketData()
        try:
            surface_nv12 = nvdec.DecodeSurfaceFromPacket(enc_packet, pkt_data)

            if not surface_nv12.Empty():
                fd += 1

                if pkt_data.bsl < read_size:
                    read_size = pkt_data.bsl

                fps = int(params['framerate'])

                if cc_ctx is None:
                    cspace = params['color_space']
                    crange = nvc.ColorRange.MPEG
                    cc_ctx = nvc.ColorspaceConversionContext(cspace, crange)

                surface_bgr = nvCvt.Execute(surface_nv12, cc_ctx)
                if surface_bgr.Empty():
                    break
                if not nvDwn.DownloadSingleSurface(surface_bgr, rawFrame):
                    break

                img_bgr = rawFrame.reshape((h, w, 3))

        except nvc.HwResetException:
            nvdec = nvc.PyNvDecoder(w, h, f, c, g)
            continue

if __name__ == "__main__":
    print("This sample decodes multiple videos in parallel on given GPU.")
    print("It doesn't do anything beside decoding, output isn't saved.")

    print("Usage: SampleDecodeRTSP.py $gpu_id $url1 ... $urlN .")

    if(len(sys.argv) < 3):
        print("Provide gpu ID and input URL(s).")
        exit(1)

    gpuID = int(sys.argv[1])
    urls = []

    for i in range(2, len(sys.argv)):
        urls.append(sys.argv[i])

    pool = []
    for url in urls:
        client = Process(target=rtsp_client, args=(
            url, str(uuid.uuid4()), gpuID))
        client.start()
        pool.append(client)

    for client in pool:
        client.join()

ps: 经测试,解码+色彩空间转换,由40%的cpu使用率降到了6%, 但是nvDwn.DownloadSingleSurface从gpu下载到cpu,使用率又升到了24%。所以尽可能的不用下载到cpu直接送入推理,全流程gpu才是王道。

Original: https://blog.csdn.net/kkae8643150/article/details/123307662
Author: 狂奔的CD
Title: opencv系列之基于NVIDIA显卡的opencv-python硬解方案

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/633926/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球