GCN图卷积神经网络入门讲解+实战结印识别–详细注释解析恩培作品4

2023年7月20日上午7:57 • 人工智能 • 阅读 109

感谢恩培大佬对项目进行了完整的实现，并将代码进行开源，供大家交流学习。

一、项目简介

项目用 python实现，调用opencv，mediapipe，pytorch等库，由以下步骤组成：

1、使用OpenCV读取摄像头视频流；

2、识别手掌关键点像素坐标；

3、根据识别得到的手掌关键点信息，以图的方式构建数据结构；

4、用Pytorch提供的GCN图卷积神经网络训练数据并手势进行分类；

二、知识拆解

下面演示以下python版本，手指检测的使用方式。

安装库：

pip install mediapipe

调用示例：

import cv2

import mediapipe as mp

mp_drawing = mp.solutions.drawing_utils

mp_drawing_styles = mp.solutions.drawing_styles

mp_hands = mp.solutions.hands

For static images:

IMAGE_FILES = []

with mp_hands.Hands(

static_image_mode=True,

max_num_hands=2,

min_detection_confidence=0.5) as hands:

for idx, file in enumerate(IMAGE_FILES):

Read an image, flip it around y-axis for correct handedness output (see

above).

image = cv2.flip(cv2.imread(file), 1)

Convert the BGR image to RGB before processing.

results = hands.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

Print handedness and draw hand landmarks on the image.

print('Handedness:', results.multi_handedness)

if not results.multi_hand_landmarks:

continue

image_height, image_width, _ = image.shape

annotated_image = image.copy()

for hand_landmarks in results.multi_hand_landmarks:

print('hand_landmarks:', hand_landmarks)

print(

f'Index finger tip coordinates: (',

f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x * image_width}, '

f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y * image_height})'

)

mp_drawing.draw_landmarks(

annotated_image,

hand_landmarks,

mp_hands.HAND_CONNECTIONS,

mp_drawing_styles.get_default_hand_landmarks_style(),

mp_drawing_styles.get_default_hand_connections_style())

cv2.imwrite(

'/tmp/annotated_image' + str(idx) + '.png', cv2.flip(annotated_image, 1))

Draw hand world landmarks.

if not results.multi_hand_world_landmarks:

continue

for hand_world_landmarks in results.multi_hand_world_landmarks:

mp_drawing.plot_landmarks(

hand_world_landmarks, mp_hands.HAND_CONNECTIONS, azimuth=5)

For webcam input:

cap = cv2.VideoCapture(0)

with mp_hands.Hands(

model_complexity=0,

min_detection_confidence=0.5,

min_tracking_confidence=0.5) as hands:

while cap.isOpened():

success, image = cap.read()

if not success:

print("Ignoring empty camera frame.")

If loading a video, use 'break' instead of 'continue'.

continue

To improve performance, optionally mark the image as not writeable to

pass by reference.

image.flags.writeable = False

image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

results = hands.process(image)

Draw the hand annotations on the image.

image.flags.writeable = True

image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

if results.multi_hand_landmarks:

for hand_landmarks in results.multi_hand_landmarks:

mp_drawing.draw_landmarks(

image,

hand_landmarks,

mp_hands.HAND_CONNECTIONS,

mp_drawing_styles.get_default_hand_landmarks_style(),

mp_drawing_styles.get_default_hand_connections_style())

Flip the image horizontally for a selfie-view display.

cv2.imshow('MediaPipe Hands', cv2.flip(image, 1))

if cv2.waitKey(5) & 0xFF == 27:

break

cap.release()

什么是图

计算机怎么表示图

在计算机表示中，vertex(边)、node(点)、graph(图)都由embedding向量表示。

这里我们可以看到，图神经网络的思想也是，将图的点、边、图用向量来表示。我们需要关心的就是这些向量在数据处理中如何运用，以及我们是否有办法通过数据来学到这些向量的值。

此外，在图的分类中，主要是有向图和无向图两种。

简单说来，你我都是朋友，这叫无向。你喜欢我，而我不喜欢你，这叫有向。

其它数据如何表示为图

引入图这种数据结构及其表示方法之后，我们自然有一个疑问，那就是如何将图片，文本等信息，用图来表示呢？

同理呀，其实分子、社交关系、文章引用等等，都很容易用图来表示。可见图的应用之广泛。

用图来解决什么问题

二是点层面的问题。例子，如下的图中，按照某种标准，将所有的点分为两类。

深度学习中运用图，有什么难题

其实这个问题，我们稍作分析便不难发现，邻接关系描述的信息本质就是，哪两个点连接了。

进入正题，什么是图神经网络

下定义也许不容易理解，只需要有个概念，它是一个神经网络，并具备以下两个特点。

1、输入是一张图，输出也是一张图。

2、对边和顶点向量做一系列变换，但是连接关系不会改变。

现在来看一个具体的问题。例如我要对以上图中的某个点做分类，假设这些点的embedding向量已知，那问题便简单了，直接输入一个神经网络做分类问题即可。这和传统的神经网络没有啥区别。

有了汇聚的操作，我们就能够通过神经网络来处理点、边的信息，并通过汇聚补全了。但细心的朋友已经发现，这样做并没有完全利用到一个图的信息。那我们如何能做到输入一张图的顶点、边的所有信息呢。

此时还有一个问题，那就是当图很大的时候，要将远方的点的信息传过来，岂不是需要消耗很长的时间。正因为如此，才提出了一个全局向量U，表示整张图的平均属性。

至此，图神经网络的基本原理便介绍清楚了。

那么GCN呢，讲的其实是在汇聚的过程中，设计K层汇聚网络，每次汇聚都往外看N步，那么每个点的视野范围便是K*N。其实等价于拿出邻接矩阵做乘法。

可以看到，图神经网络的灵活性非常高，基本上所有的数据都可以表示成图。但同时也带来了它的问题，想在这种稀疏的架构上做优化是非常困难的，况且图还是一个动态的架构。

三、实战演练

用mediapipe得到手部框，再用InterHand进行手部关节点的精确识别

InterHand的权重链接：Release InterHand2.6M release · facebookresearch/InterHand2.6M · GitHub Official PyTorch implementation of “InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image”, ECCV 2020 – Release InterHand2.6M release · facebookresearch/InterHand2.6M GCN图卷积神经网络入门讲解+实战结印识别--详细注释解析恩培作品4 https://github.com/facebookresearch/InterHand2.6M/releases/tag/v1.0 ;

更多疑问，欢迎私信交流。thujiang000

输入一个手部图片，返回3D坐标

class HandPose:

def __init__(self):

cfg.set_args('0')

cudnn.benchmark = True

joint set information is in annotations/skeleton.txt

self.joint_num = 21 # single hand

self.joint_type = {'right': np.arange(0,self.joint_num), 'left': np.arange(self.joint_num,self.joint_num*2)}

snapshot load

model_path = './snapshot_19.pth.tar'

assert osp.exists(model_path), 'Cannot find self.hand_pose_model at ' + model_path

print('Load checkpoint from {}'.format(model_path))

self.hand_pose_model = get_model('test', self.joint_num)

self.hand_pose_model = DataParallel(self.hand_pose_model).cuda()

ckpt = torch.load(model_path)

self.hand_pose_model.load_state_dict(ckpt['network'], strict=False)

self.hand_pose_model.eval()

prepare input image

self.transform = transforms.ToTensor()

def get3Dpoint(self,x_t_l, y_t_l, cam_w, cam_h,original_img):

bbox = [x_t_l, y_t_l, cam_w, cam_h] # xmin, ymin, width, height

original_img_height, original_img_width = original_img.shape[:2]

bbox = process_bbox(bbox, (original_img_height, original_img_width, original_img_height))

img, trans, inv_trans = generate_patch_image(original_img, bbox, False, 1.0, 0.0, cfg.input_img_shape)

img = self.transform(img.astype(np.float32))/255

img = img.cuda()[None,:,:,:]

forward

inputs = {'img': img}

targets = {}

meta_info = {}

with torch.no_grad():

out = self.hand_pose_model(inputs, targets, meta_info, 'test')

img = img[0].cpu().numpy().transpose(1,2,0) # cfg.input_img_shape[1], cfg.input_img_shape[0], 3

joint_coord = out['joint_coord'][0].cpu().numpy() # x,y pixel, z root-relative discretized depth

rel_root_depth = out['rel_root_depth'][0].cpu().numpy() # discretized depth

hand_type = out['hand_type'][0].cpu().numpy() # handedness probability

restore joint coord to original image space and continuous depth space

joint_coord[:,0] = joint_coord[:,0] / cfg.output_hm_shape[2] * cfg.input_img_shape[1]

joint_coord[:,1] = joint_coord[:,1] / cfg.output_hm_shape[1] * cfg.input_img_shape[0]

joint_coord[:,:2] = np.dot(inv_trans, np.concatenate((joint_coord[:,:2], np.ones_like(joint_coord[:,:1])),1).transpose(1,0)).transpose(1,0)

joint_coord[:,2] = (joint_coord[:,2]/cfg.output_hm_shape[0] * 2 - 1) * (cfg.bbox_3d_size/2)

restore right hand-relative left hand depth to continuous depth space

rel_root_depth = (rel_root_depth/cfg.output_root_hm_shape * 2 - 1) * (cfg.bbox_3d_size_root/2)

right hand root depth == 0, left hand root depth == rel_root_depth

joint_coord[self.joint_type['left'],2] += rel_root_depth

3D节点信息

return joint_coord

经过三维关节点提取后，我们便可以构建手部的图了。思考一下，我们的任务是对不同的图做分类，边的指向和长度是图的特征。基于特征的特点，我们可以尝试用边来构造这个图。注意手指之间是无向的，要做处理。

首先构造一个两层的图卷积神经网络

class GCN(nn.Module):

def __init__(self, in_feats, h_feats, num_classes):

super(GCN, self).__init__()

self.conv1 = GraphConv(in_feats, h_feats)

self.conv2 = GraphConv(h_feats, num_classes)

def forward(self, g, in_feat):

h = self.conv1(g, in_feat)

h = F.relu(h)

h = self.conv2(g, h)

g.ndata['h'] = h

return dgl.mean_nodes(g, 'h')

u、v分别为图的起点和终点。

然后按照每个点对于手掌的中心坐标的相对坐标为特征，输入原始图中。

构造图以及特征

u,v = torch.tensor([[0,0,0,0,0,4,3,2,8,7,6,12,11,10,16,15,14,20,19,18,0,21,21,21,21,21,25,24,23,29,28,27,33,32,31,37,36,35,41,40,39],

[4,8,12,16,20,3,2,1,7,6,5,11,10,9,15,14,13,19,18,17,21,25,29,33,37,41,24,23,22,28,27,26,32,31,30,36,35,34,40,39,38]])

g = dgl.graph((u,v))

无向处理

bg = dgl.to_bidirected(g)

计算相对坐标

x_y_z_column = self.relativeMiddleCor(x_list, y_list,z_list)

添加特征

bg.ndata['feat'] =torch.tensor( x_y_z_column ) # x,y,z坐标

测试模型

device = torch.device("cuda:0")

device = torch.device("cpu")

bg = bg.to(device)

self.modelGCN = self.modelGCN.to(device)

pred = self.modelGCN(bg, bg.ndata['feat'].float())

pred_type =pred.argmax(1).item()

完整项目代码地址：

CVprojects/codes at main · enpeizhao/CVprojects · GitHub GCN图卷积神经网络入门讲解+实战结印识别--详细注释解析恩培作品4 https://github.com/enpeizhao/CVprojects/tree/main/codes ;

运行时注意：

1、采集训练数据时，运行demo.py文件中的handRecognize.getTrainningData(task_type = ‘6’)。参数task_type取值为0~6，最多可识别七种手势。

2、运行时，假设采集了5种手势，要将对应的手部分类的输出层改为5。

self.modelGCN = GCN(3, 16, 5)

self.modelGCN.load_state_dict(torch.load('./saveModel/handsModel.pth'))

self.modelGCN.eval()

self.handPose = HandPose()

self.mp_hands = mp.solutions.hands

3、若需要gpu加速，dgl需要安装gpu版本。nvcc –version查看自己的cuda版本，然后去Deep Graph Library GCN图卷积神经网络入门讲解+实战结印识别--详细注释解析恩培作品4 https://www.dgl.ai/pages/start.html ;查找对应的安装指令。

若运行遇到苦难，欢迎留言交流。

Original: https://blog.csdn.net/thujiang000/article/details/122788564
Author: 清华江同学
Title: GCN图卷积神经网络入门讲解+实战结印识别–详细注释解析恩培作品4

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/704536/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

Unity-粒子特效

粒子特效创建粒子系统 * 模块创建粒子系统方法一：创建一个空物体(游戏物体)，给物体加上Particle System组件。方法二：在Hierarchy视图右击选择 Eff…

人工智能 2023年7月31日
0067
cvc降噪和主动降噪_最牛数字主动降噪耳机设计要点

目前应用在耳机中的主动降噪(ANC)技术有两种模式，分别称为前馈降噪和反馈降噪，两者结合则组成混合降噪。不同的主动降噪技术在降噪深度和带宽上有各自的局限性，这主要是由耳机声学结构、…

人工智能 2023年5月27日
00181
谣言检测文献阅读一A Review on Rumour Prediction and Veracity Assessment in Online Social Network

系列文章目录谣言检测文献阅读一—A Review on Rumour Prediction and Veracity Assessment in Online Social Ne…

人工智能 2023年6月19日
0091
卷积神经网络手势识别之剪刀石头布

剪刀石头布手势识别 1.加载数据并解压（1）使用wget下载训练样本和测试样本的压缩文件 !wget https://storage.googleapis.com/laurenc…

人工智能 2023年5月25日
0061
Vision Transformer模型与预训练权重简析

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言一、ViT原理图二、算法实现过程三、ViT-B/16结构详图四、ViT-B/16预训练权重…

人工智能 2023年7月28日
00229
YOLOv5-v3.1，推理环境配置、Tensorrt加速一步到位（各种问题总结，吐血整理）

cuda11.0 ，pytorch 1.10 ，tensorrt7.2.3.4 ，其中会需要opencv c++ 安装，和tensorrt安装文章目录一、opencv安装 * …

人工智能 2023年6月19日
0093
深度学习——卷积神经网络（CNN）简介

卷积神经网络简介文章目录卷积神经网络简介 * 前言一.如何理解卷积 – 1.1什么是卷积 1.2 为什么要卷积二.神经网络的结构三.卷积层四.池化层五.全…

人工智能 2023年6月16日
0083
【Pytorch】MNIST 图像分类代码 – 超详细解读

【Pytorch】MNIST 图像分类代码 – 超详细解读目录【Pytorch】MNIST 图像分类代码 – 超详细解读前言一、代码框架二、实现代…

人工智能 2023年6月16日
0085
python实现图书管理系统

Python基于函数模块化设计的图书管理系统函数模块，操作权限，内存调用Python函数的模块化设计可以解决现实中的问题。该过程就是抽象的问题进行函数模块化设计。图书管理系统…

人工智能 2023年7月5日
0060
阿里无影云电脑试用评测

–总有些一些项目需要在家里和公司两头做，不管是用 svn 、git 、云盘同步，还是U盘拷贝都是很麻烦的，背笔记本更累；以前一直想买个挂机宝，但那玩意的配置实在是低，又想说买个云电…

人工智能 2023年6月29日
00202
今天面了个阿里拿27k出来的小哥，让我见识到了什么是天花板

2022年堪称大学生就业最难的一年，应届毕业生人数是1076万。失业率超50%！但是我观察到一个数据，那就是已经就业的毕业生中，计算机通信等行业最受毕业生欢迎！计算机IT行业薪资…

人工智能 2023年7月10日
0078
数据驱动实践五 – 预测客户的下一个购买日

数据驱动实践五预测客户的下一个购买日本系列文章中所采用的大部分行为和分析都是基于一个同样的思想方法：以客户所值得的方式对待他们，要早于他们的预期(例如，LTV);在不好的事情发…

人工智能 2023年7月1日
0055
测试开发知识图谱

❝记录从单纯的测试或者技术小白如何一步步进阶成为一名合格的测试开发工程师❞ Tips文中如果有不当的地方欢迎大家指正有同学对某些点感兴趣的或者有想了解某些领域相关知识的欢迎留言和投…

人工智能 2023年6月4日
00106
零基础快速做一个语音控制系统

语音识别红外控制系统零基语音红外控制系统的研制 [En] Zero basis to make a speech infrared control system 项目概述适合人…

人工智能 2023年5月27日
0061
64. 如何在浏览器里执行 SAPGUI 的事务

如果本教程的学习者是从其他编程语言的开发者转到 ABAP 开发上，那么一定体验过在浏览器里进行开发的在线编辑器，比如在阿里云开发平台的在线编辑器里，直接编写 Java 代码：能编…

人工智能 2023年6月28日
0090
PyTorch安装成功，但不能使用GPU功能：PyTorch no longer supports this GPU. CUDA error: no kernel image is available

导师配了一个台式机，便着手配置PyTorch环境。根据台式机的显卡驱动(472.12)、CUDA、cuDNN版本安装好PyTorch之后，调用torch.cuda.is_avail…

人工智能 2023年6月17日
0070

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31