Mediapipe实战-导出身体节点坐标并用TensorFlow搭建LSTM网络来训练自己的手势检测模型再部署到树莓派4B

一、前言
在YouTube上看到up主——Nicholas Renotte的相关教程,觉得非常有用。使用他的方法,我训练了能够检测四种手势的模型,在这里和大家分享一下。
附上该up主的视频链接Sign Language Detection using ACTION RECOGNITION with Python | LSTM Deep Learning Model

视频的代码链接https://github.com/nicknochnack/ActionDetectionforSignLanguage
我的系列文章一:Mediapipe入门——搭建姿态检测模型并实时输出人体关节点3d坐标
我的系列文章二:Mediapipe姿态估计——用坐标计算手指关节弯曲角度并实时标注

我使用的环境
Pycharm2021
mediapipe0.8.9
tensorflow2.3.0
openCV4.5.4
个人认为版本影响不大,可以跟我不一致,但tensorflow最好2.0以上

二、使用mediapipe搭建姿态估计模型并打开摄像头采集坐标数据集
源代码中,up主进行了很好地封装,代码稍长,接下来我只挑重要的部分说一下,完整的代码请看文末(代码中的中文注释是我添加的,英文的是原作者的)。
第一个是处理视频流的函数。

[En]

The first is the function that handles the video stream.

def mediapipe_detection(image, model):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image.flags.writeable = False
    results = model.process(image)
    image.flags.writeable = True
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    return image, results

然后还有在人体上渲染节点的功能。

[En]

Then there is the function of rendering the node on the human body.

def draw_styled_landmarks(image, results):
    mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS,
                             mp_drawing.DrawingSpec(color=(80,22,10), thickness=2, circle_radius=4),
                             mp_drawing.DrawingSpec(color=(80,44,121), thickness=2, circle_radius=2)
                             )
    ......

这两个功能比较简单,如果想了解如何用mediapipe搭建姿态检测模型,请看我的系列文章一
然后是比较重要的提取坐标的函数,将process返回的坐标提取出来,并转换为numpy矩阵。为了训练手势模型,我使用了姿势坐标33个、左右手坐标各21个。原作者还使用了脸部坐标一起训练,个人没这个需求,将相关代码注释了。

def extract_keypoints(results):

    pose = np.array([[res.x, res.y, res.z, res.visibility] for res in results.pose_landmarks.landmark]).flatten() if results.pose_landmarks else np.zeros(33*4)

    lh = np.array([[res.x, res.y, res.z] for res in results.left_hand_landmarks.landmark]).flatten() if results.left_hand_landmarks else np.zeros(21*3)

    rh = np.array([[res.x, res.y, res.z] for res in results.right_hand_landmarks.landmark]).flatten() if results.right_hand_landmarks else np.zeros(21*3)
    return np.concatenate([pose, lh, rh])

33个姿势节点如下所示。

Mediapipe实战-导出身体节点坐标并用TensorFlow搭建LSTM网络来训练自己的手势检测模型再部署到树莓派4B
21个手部节点如下所示。
Mediapipe实战-导出身体节点坐标并用TensorFlow搭建LSTM网络来训练自己的手势检测模型再部署到树莓派4B

现在使用os库在同一目录下新建文件夹存放等下要采集的数据集。

DATA_PATH = os.path.join('MP_Data')

下一步更重要。我要训练的四个手势是:“666”、“拇指”、“心形”和“剪刀手”。每个动作将被收集30次,每个30帧(可以更改)

[En]

The next step is more important. The four gestures I will train are “666”, “thumb”, “heart” and “scissors hand”. Each action will be collected 30 times, 30 frames each (these can be changed)

actions = np.array(['666', 'thumbs_up', 'finger_heart','scissor_hand'])

no_sequences = 30

sequence_length = 30

for action in actions:
    for sequence in range(no_sequences):
        try:
            os.makedirs(os.path.join(DATA_PATH, action, str(sequence)))
        except:
            pass

然后运行该程序的这一部分以开始收集数据集(请参阅本文末尾的完整代码)。收藏之前会有提示,原著作者做得很好。

[En]

Then run this part of the program to start collecting data sets (see the end of the article for the complete code). There will be a hint before the collection, and the original author has done a good job.

Mediapipe实战-导出身体节点坐标并用TensorFlow搭建LSTM网络来训练自己的手势检测模型再部署到树莓派4B
所以慢慢收集,大约几分钟后,收集将自动结束程序。
[En]

So slowly collect, about a few minutes, after the collection will automatically end the program.

三、使用Tensorflow搭建LSTM网络进行训练,然后保存模型
有了数据集,开始搭建网络训练。关于长短期记忆网络LSTM,请看官网的介绍


from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

model = Sequential()

model.add(LSTM(64, return_sequences=True, activation='relu', input_shape=(30,258)))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(actions.shape[0], activation='softmax'))
model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['categorical_accuracy'])
model.fit(X_train, y_train, epochs=2000, callbacks=[tb_callback])
model.summary()
model.save('action.h5')

tensorflow的使用还是比较简单的,如果看不懂,请看TensorFlow中文官网。训练结果如图。

Mediapipe实战-导出身体节点坐标并用TensorFlow搭建LSTM网络来训练自己的手势检测模型再部署到树莓派4B
虽有2000个epochs,但即使是CPU下训练速度也很快。最后在同一目录下得到了我们的权重文件action.h5,接下来就可以实际使用训练好的模型了。
四、使用训练好的模型进行实际检测
请看效果图。当对应的手势被识别时,对应标签的边框颜色条变长,这表示对该手势进行分类的概率。同时,运行时还会输出此时检测到的手势类别。这部分代码与上面的代码大致相似,请参阅文章末尾。一般说来,手势识别基本正确,反应迅速。
[En]

Look at the effect diagram. When the corresponding gesture is recognized, the frame color bar of the corresponding label becomes longer, which represents the probability of classifying the gesture. At the same time, the runtime will also output the gesture category detected at this moment. This part of the code is roughly similar to the above code, please see the end of the article. Generally speaking, gestures are basically recognized correctly and respond quickly.

Mediapipe实战-导出身体节点坐标并用TensorFlow搭建LSTM网络来训练自己的手势检测模型再部署到树莓派4B
最后,我将这个模型部署到覆盆子馅饼上,虽然有点慢,但非常成功。部署,即注意要安装的相关库,然后拖动代码和权重文件运行,没有任何困难。
[En]

In the end, I deployed the model to the raspberry pie, which was a bit slow, but very successful. Deployment, that is, pay attention to the relevant libraries should be installed, and then drag the code and weight files to run, there is no difficulty.

五、总结
借助该up主的代码,可以简便的训练自己的手势识别模型,准确率也高。不过要注意的是,当使用训练好的模型进行实际检测时, 所做动作务必和采集数据集时的动作保持一致。这是因为,代码中使用的mediapipe坐标会随你离摄像头的距离变化而变化。所以同样的手势动作,只要你离摄像头的距离或角度变了,识别准确率就会大大下降,这是我多次实践得出的结论。 使用自己的模型时,所做动作务必和采集数据集时的动作保持一致

六、所有代码
如果你想复现我的模型,你不需要改动任何代码;如果想扩大数据集,请修改no_sequences 和sequence_length;如果想训练别的动作或增加动作数目,请修改actions列表和colors列表(增加或减少动作数目就要修改);想训练面网坐标,增加表情识别,请取消相应注释。如果有其他不懂的,可以在评论区问我。
首先是采集数据集的代码

import cv2
import numpy as np
import os
import mediapipe as mp

mp_holistic = mp.solutions.holistic
mp_drawing = mp.solutions.drawing_utils

def mediapipe_detection(image, model):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image.flags.writeable = False
    results = model.process(image)
    image.flags.writeable = True
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    return image, results

def draw_styled_landmarks(image, results):
"""
    要训练脸部坐标就取消注释
    # Draw face connections
    mp_drawing.draw_landmarks(image, results.face_landmarks, mp_holistic.FACEMESH_CONTOURS,
                             mp_drawing.DrawingSpec(color=(80,110,10), thickness=1, circle_radius=1),
                             mp_drawing.DrawingSpec(color=(80,256,121), thickness=1, circle_radius=1)
                             )
"""

    mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS,
                             mp_drawing.DrawingSpec(color=(80,22,10), thickness=2, circle_radius=4),
                             mp_drawing.DrawingSpec(color=(80,44,121), thickness=2, circle_radius=2)
                             )

    mp_drawing.draw_landmarks(image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS,
                             mp_drawing.DrawingSpec(color=(121,22,76), thickness=2, circle_radius=4),
                             mp_drawing.DrawingSpec(color=(121,44,250), thickness=2, circle_radius=2)
                             )

    mp_drawing.draw_landmarks(image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS,
                             mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=4),
                             mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)
                             )

def extract_keypoints(results):
    pose = np.array([[res.x, res.y, res.z, res.visibility] for res in results.pose_landmarks.landmark]).flatten() if results.pose_landmarks else np.zeros(33*4)

    lh = np.array([[res.x, res.y, res.z] for res in results.left_hand_landmarks.landmark]).flatten() if results.left_hand_landmarks else np.zeros(21*3)
    rh = np.array([[res.x, res.y, res.z] for res in results.right_hand_landmarks.landmark]).flatten() if results.right_hand_landmarks else np.zeros(21*3)
    return np.concatenate([pose, lh, rh])

DATA_PATH = os.path.join('MP_Data')

actions = np.array(['666', 'thumbs_up', 'finger_heart','scissor_hand'])

no_sequences = 30

sequence_length = 30
for action in actions:
    for sequence in range(no_sequences):
        try:
            os.makedirs(os.path.join(DATA_PATH, action, str(sequence)))
        except:
            pass

cap = cv2.VideoCapture(0)

with mp_holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5) as holistic:

    for action in actions:

        for sequence in range(no_sequences):

            for frame_num in range(sequence_length):

                ret, frame = cap.read()

                image, results = mediapipe_detection(frame, holistic)

                draw_styled_landmarks(image, results)

                if frame_num == 0:
                    cv2.putText(image, 'STARTING COLLECTION', (120, 200),
                                cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 4, cv2.LINE_AA)
                    cv2.putText(image, 'Collecting frames for {} Video Number {}'.format(action, sequence), (15, 12),
                                cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1, cv2.LINE_AA)

                    cv2.imshow('OpenCV Feed', image)
                    cv2.waitKey(2000)
                else:
                    cv2.putText(image, 'Collecting frames for {} Video Number {}'.format(action, sequence), (15, 12),
                                cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1, cv2.LINE_AA)

                    cv2.imshow('OpenCV Feed', image)

                keypoints = extract_keypoints(results)
                npy_path = os.path.join(DATA_PATH, action, str(sequence), str(frame_num))
                np.save(npy_path, keypoints)

                if cv2.waitKey(10) & 0xFF == ord('q'):
                    break
cap.release()
cv2.destroyAllWindows()

使用TensorFlow搭建LSTM网络进行训练

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from tensorflow.keras.callbacks import TensorBoard
import numpy as np
import os
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical

log_dir = os.path.join('Logs')
tb_callback = TensorBoard(log_dir=log_dir)

no_sequences = 30

sequence_length = 30
DATA_PATH = os.path.join('MP_Data')
actions = np.array(['666', 'thumbs_up', 'finger_heart','scissor_hand'])

label_map = {label:num for num, label in enumerate(actions)}
sequences, labels = [], []
for action in actions:
    for sequence in range(no_sequences):
        window = []
        for frame_num in range(sequence_length):
            res = np.load(os.path.join(DATA_PATH, action, str(sequence), "{}.npy".format(frame_num)))
            window.append(res)
        sequences.append(window)
        labels.append(label_map[action])

X = np.array(sequences)
y = to_categorical(labels).astype(int)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05)

model = Sequential()
model.add(LSTM(64, return_sequences=True, activation='relu', input_shape=(30,258)))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(actions.shape[0], activation='softmax'))
model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['categorical_accuracy'])
model.fit(X_train, y_train, epochs=2000, callbacks=[tb_callback])
model.summary()
model.save('action.h5')

使用训练好的模型进行实际测试

[En]

Use the trained model for actual testing

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
import cv2
import numpy as np
import mediapipe as mp

mp_holistic = mp.solutions.holistic
mp_drawing = mp.solutions.drawing_utils

sequence = []
sentence = []
threshold = 0.8
actions = np.array(['666', 'thumbs_up', 'finger_heart','scissor_hand'])

def mediapipe_detection(image, model):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image.flags.writeable = False
    results = model.process(image)
    image.flags.writeable = True
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    return image, results

def draw_styled_landmarks(image, results):

"""
    mp_drawing.draw_landmarks(image, results.face_landmarks, mp_holistic.FACEMESH_CONTOURS,
                             mp_drawing.DrawingSpec(color=(80,110,10), thickness=1, circle_radius=1),
                             mp_drawing.DrawingSpec(color=(80,256,121), thickness=1, circle_radius=1)
                             )
"""

    mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS,
                             mp_drawing.DrawingSpec(color=(80,22,10), thickness=2, circle_radius=4),
                             mp_drawing.DrawingSpec(color=(80,44,121), thickness=2, circle_radius=2)
                             )

    mp_drawing.draw_landmarks(image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS,
                             mp_drawing.DrawingSpec(color=(121,22,76), thickness=2, circle_radius=4),
                             mp_drawing.DrawingSpec(color=(121,44,250), thickness=2, circle_radius=2)
                             )

    mp_drawing.draw_landmarks(image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS,
                             mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=4),
                             mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2)
                             )

def extract_keypoints(results):
    pose = np.array([[res.x, res.y, res.z, res.visibility] for res in results.pose_landmarks.landmark]).flatten() if results.pose_landmarks else np.zeros(33*4)

    lh = np.array([[res.x, res.y, res.z] for res in results.left_hand_landmarks.landmark]).flatten() if results.left_hand_landmarks else np.zeros(21*3)
    rh = np.array([[res.x, res.y, res.z] for res in results.right_hand_landmarks.landmark]).flatten() if results.right_hand_landmarks else np.zeros(21*3)
    return np.concatenate([pose, lh, rh])

model = Sequential()
model.add(LSTM(64, return_sequences=True, activation='relu', input_shape=(30,258)))
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(actions.shape[0], activation='softmax'))
model.load_weights('action.h5')

colors = [(245, 117, 16), (117, 245, 16), (16, 117, 245),(16, 117, 245)]

def prob_viz(res, actions, input_frame, colors):
    output_frame = input_frame.copy()
    for num, prob in enumerate(res):
        cv2.rectangle(output_frame, (0, 60 + num * 40), (int(prob * 100), 90 + num * 40), colors[num], -1)
        cv2.putText(output_frame, actions[num], (0, 85 + num * 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2,
                    cv2.LINE_AA)
    return output_frame

cap = cv2.VideoCapture(0)

with mp_holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5) as holistic:
    while cap.isOpened():

        ret, frame = cap.read()

        image, results = mediapipe_detection(frame, holistic)
        print(results)

        draw_styled_landmarks(image, results)

        keypoints = extract_keypoints(results)
        sequence.append(keypoints)
        sequence = sequence[-30:]
        if len(sequence) == 30:
            res = model.predict(np.expand_dims(sequence, axis=0))[0]
            print(actions[np.argmax(res)])

            if res[np.argmax(res)] > threshold:
                if len(sentence) > 0:
                    if actions[np.argmax(res)] != sentence[-1]:
                        sentence.append(actions[np.argmax(res)])
                else:
                    sentence.append(actions[np.argmax(res)])
            if len(sentence) > 5:
                sentence = sentence[-5:]

            image = prob_viz(res, actions, image, colors)
        cv2.rectangle(image, (0, 0), (640, 40), (245, 117, 16), -1)
        cv2.putText(image, ' '.join(sentence), (3, 30),
                    cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2, cv2.LINE_AA)

        cv2.imshow('OpenCV Feed', image)

        if cv2.waitKey(10) & 0xFF == ord('q'):
            break
cap.release()
cv2.destroyAllWindows()

七、我也只是搬运工,欢迎在评论区讨论、赐教

Original: https://blog.csdn.net/kalakalabala/article/details/124081529
Author: 港来港去
Title: Mediapipe实战-导出身体节点坐标并用TensorFlow搭建LSTM网络来训练自己的手势检测模型再部署到树莓派4B

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/7741/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

最近整理资源【免费获取】:   👉 程序员最新必读书单  | 👏 互联网各方向面试题下载 | ✌️计算机核心资源汇总