first-order-model实现照片动起来（附工具代码） | 机器学习

2023年6月20日上午7:33 • 人工智能 • 阅读 101

前言

看到一个很有意思的项目，其实在之前就在百度飞浆等平台上看到类似的实现效果。

可以将照片按照视频的表情，动起来。看一下项目给出的效果。

项目地址：first-order-model项目地址

还是老样子，不管作者给出的种种效果，自己测试一下。

资源下载和安装

我们先看一下README关于项目的基本信息，可以看出除了表情驱动照片，还可以姿态迁移。

模型文件提供了线上的下载地址。

文件很大而且难下，我下好了放到我的云盘上，可以从下面云盘下载。

链接：https://pan.baidu.com/s/1ANQjl4SBEjBZuX87KPXmnA
提取码：tuan

模型文件放到根目录下新建的checkpoint文件夹下。

将requirements.txt中的依赖安装一下。

安装补充

在测试README中的命令的时候，如果出现一下报错。

Traceback (most recent call last):
File “demo.py”, line 17, in

这个问题主要是我使用的pillow版本过高的原因，如果不想找对应的低版本，可以按照我的方式解决。

1、修改functional.py代码，将PILLOW_VERSION调整为__version__。

2、将imageio升级。

pip install --upgrade imageio -i https://pypi.douban.com/simple

3、安装imageio_ffmpeg模块。

pip install imageio-ffmpeg -i https://pypi.douban.com/simple

工具代码验证

官方给出的使用方法我就不重复测试，大家可以按照下面的命令去测试一下。

这里我推荐一个可视化的库gradio，下面我将demo.py的代码改造了一下。

新的工具文件代码如下：

#!/user/bin/env python
coding=utf-8
"""
@project : first-order-model
@author  : 剑客阿良_ALiang
@file   : hy_gradio.py
@ide    : PyCharm
@time   : 2022-06-23 14:35:28
"""
import uuid
from typing import Optional

import gradio as gr
import matplotlib

matplotlib.use('Agg')
import os, sys
import yaml
from argparse import ArgumentParser
from tqdm import tqdm

import imageio
import numpy as np
from skimage.transform import resize
from skimage import img_as_ubyte
import torch
from sync_batchnorm import DataParallelWithCallback

from modules.generator import OcclusionAwareGenerator
from modules.keypoint_detector import KPDetector
from animate import normalize_kp
from scipy.spatial import ConvexHull

if sys.version_info[0] < 3:
    raise Exception("You must use Python 3 or higher. Recommended version is Python 3.7")

def load_checkpoints(config_path, checkpoint_path, cpu=False):
    with open(config_path) as f:
        config = yaml.load(f)

    generator = OcclusionAwareGenerator(**config['model_params']['generator_params'],
                                        **config['model_params']['common_params'])
    if not cpu:
        generator.cuda()

    kp_detector = KPDetector(**config['model_params']['kp_detector_params'],
                             **config['model_params']['common_params'])
    if not cpu:
        kp_detector.cuda()

    if cpu:
        checkpoint = torch.load(checkpoint_path, map_location=torch.device('cpu'))
    else:
        checkpoint = torch.load(checkpoint_path)

    generator.load_state_dict(checkpoint['generator'])
    kp_detector.load_state_dict(checkpoint['kp_detector'])

    if not cpu:
        generator = DataParallelWithCallback(generator)
        kp_detector = DataParallelWithCallback(kp_detector)

    generator.eval()
    kp_detector.eval()

    return generator, kp_detector

def make_animation(source_image, driving_video, generator, kp_detector, relative=True, adapt_movement_scale=True,
                   cpu=False):
    with torch.no_grad():
        predictions = []
        source = torch.tensor(source_image[np.newaxis].astype(np.float32)).permute(0, 3, 1, 2)
        if not cpu:
            source = source.cuda()
        driving = torch.tensor(np.array(driving_video)[np.newaxis].astype(np.float32)).permute(0, 4, 1, 2, 3)
        kp_source = kp_detector(source)
        kp_driving_initial = kp_detector(driving[:, :, 0])

        for frame_idx in tqdm(range(driving.shape[2])):
            driving_frame = driving[:, :, frame_idx]
            if not cpu:
                driving_frame = driving_frame.cuda()
            kp_driving = kp_detector(driving_frame)
            kp_norm = normalize_kp(kp_source=kp_source, kp_driving=kp_driving,
                                   kp_driving_initial=kp_driving_initial, use_relative_movement=relative,
                                   use_relative_jacobian=relative, adapt_movement_scale=adapt_movement_scale)
            out = generator(source, kp_source=kp_source, kp_driving=kp_norm)

            predictions.append(np.transpose(out['prediction'].data.cpu().numpy(), [0, 2, 3, 1])[0])
    return predictions

def find_best_frame(source, driving, cpu=False):
    import face_alignment

    def normalize_kp(kp):
        kp = kp - kp.mean(axis=0, keepdims=True)
        area = ConvexHull(kp[:, :2]).volume
        area = np.sqrt(area)
        kp[:, :2] = kp[:, :2] / area
        return kp

    fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, flip_input=True,
                                      device='cpu' if cpu else 'cuda')
    kp_source = fa.get_landmarks(255 * source)[0]
    kp_source = normalize_kp(kp_source)
    norm = float('inf')
    frame_num = 0
    for i, image in tqdm(enumerate(driving)):
        kp_driving = fa.get_landmarks(255 * image)[0]
        kp_driving = normalize_kp(kp_driving)
        new_norm = (np.abs(kp_source - kp_driving) ** 2).sum()
        if new_norm < norm:
            norm = new_norm
            frame_num = i
    return frame_num

def h_interface(input_image: str):
    parser = ArgumentParser()
    opt = parser.parse_args()
    opt.config = "./config/vox-256.yaml"
    opt.checkpoint = "./checkpoint/vox-cpk.pth.tar"
    opt.source_image = input_image
    opt.driving_video = "./data/input/ts.mp4"
    opt.result_video = "./data/result/{}.mp4".format(uuid.uuid1().hex)
    opt.relative = True
    opt.adapt_scale = True
    opt.cpu = True
    opt.find_best_frame = False
    opt.best_frame = False
    # source_image = imageio.imread(opt.source_image)
    source_image = opt.source_image
    reader = imageio.get_reader(opt.driving_video)
    fps = reader.get_meta_data()['fps']
    driving_video = []
    try:
        for im in reader:
            driving_video.append(im)
    except RuntimeError:
        pass
    reader.close()

    source_image = resize(source_image, (256, 256))[..., :3]
    driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]
    generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint, cpu=opt.cpu)

    if opt.find_best_frame or opt.best_frame is not None:
        i = opt.best_frame if opt.best_frame is not None else find_best_frame(source_image, driving_video, cpu=opt.cpu)
        print("Best frame: " + str(i))
        driving_forward = driving_video[i:]
        driving_backward = driving_video[:(i + 1)][::-1]
        predictions_forward = make_animation(source_image, driving_forward, generator, kp_detector,
                                             relative=opt.relative, adapt_movement_scale=opt.adapt_scale, cpu=opt.cpu)
        predictions_backward = make_animation(source_image, driving_backward, generator, kp_detector,
                                              relative=opt.relative, adapt_movement_scale=opt.adapt_scale, cpu=opt.cpu)
        predictions = predictions_backward[::-1] + predictions_forward[1:]
    else:
        predictions = make_animation(source_image, driving_video, generator, kp_detector, relative=opt.relative,
                                     adapt_movement_scale=opt.adapt_scale, cpu=opt.cpu)
    imageio.mimsave(opt.result_video, [img_as_ubyte(frame) for frame in predictions], fps=fps)
    return opt.result_video

if __name__ == "__main__":
    demo = gr.Interface(h_interface, inputs=[gr.Image(shape=(500, 500))], outputs=[gr.Video()])

    demo.launch()
    # h_interface("C:\\Users\\huyi\\Desktop\\xx3.jpg")

代码说明

1、将原demo.py中的main函数内容，重新编辑为h_interface方法，输入是想要驱动的图片。

2、其中driving_video参数使用了我自己录制的一段表情视频ts.mp4，我建议在使用的时候可以自己用手机录制一段替换。

3、使用gradio来生成方法的页面，下面会展示给大家看。

4、使用uuid为结果视频命名。

执行结果如下

Running on local URL: http://127.0.0.1:7860/
To create a public link, set share=True in launch().

打开本地的地址：http://localhost:7860/

可以看到我们实现的交互界面如下：

我们上传一下我准备的样例图片，提交制作。

看一下执行的日志，如下图。

看一下制作结果。

由于上传不了视频，我将视频转成了gif。

还是蛮有意思的，具体的参数调优我就不弄了，大家可能根据需要调整我提供的方法里面的参数。

总结

还是非常推荐gradio，大家有兴趣还是可以玩玩。

人们觉得你只能在以下二者中居其一：要么你是条鲨鱼，要么你只得躺在那里，任鲨鱼活生生地把你吃掉——这个世界就是这样。而我，我是那种会走出去，与鲨鱼搏斗的人。

——《十一种孤独》

Original: https://blog.csdn.net/zhiweihongyan1/article/details/125432506
Author: 剑客阿良_ALiang
Title: first-order-model实现照片动起来（附工具代码） | 机器学习

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/640842/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

Pandas读写数据与常用数据结构

目录读/写不同数据源的数据读取CSV文件读取Excel文件存储Excel文件 Series 创建Series DataFrame 数据框的构造基本属性查改增删DataF…

人工智能 2023年7月7日
0060
章节1 计算机体系结构

1.2.1-计算机硬件组成-CPU 计算机组成 ; 台式机硬件-内部台式机硬件-外部结构 ; CPU Center Processing Unit（中央处理器/处理器）常见的电…

人工智能 2023年6月30日
0080
DBSCAN算法，概念+示例，超详细！！

重复步骤2，直到遍历完所有的种子点 seeds = [4,7,9,10, 1,7,9,16] #第一次步骤2，标记4 7的周围有12,4，小于min_points，seeds不…

人工智能 2023年6月2日
0092
DataFrame基本操作

这些操作在网上都可以百度得到，为了便于记忆自己再根据理解总结在一起。———励志做一个优雅的网上搬运工 1.建立dataframe （1）Dict…

人工智能 2023年6月2日
0082
np.ndarray与PIL.Image对象相互转换时出现了 AttributeError: type object ‘Image‘ has no attribute ‘fromarray‘

先介绍一下用 cv2 的 imread()函数和 PIL.Image 的 open()函数这两个库中的函数分别读入两张图返回值的类型上代码： #!/usr/bin/env …

人工智能 2023年7月18日
0059
Torch_1_从构建线性神经网络开始

动手学深度学习（zh-v2.d2l.ai）线性网络是最基础的神经网络，最简化的情况就是线性回归。首先从最简单的模型开始，忽视复杂的结构，重点学习构建模型的流程，各部分的功能等。 …

人工智能 2023年7月14日
0073
【机器学习】04. 神经网络模型 MLPClassifier分类算法与MLPRegressor回归算法（代码注释，思路推导）

目录 * – 资源下载* 1. MLPClassifier分类算法* – 1.a 读取数据并进行归一化 – 1.b MLPClassifier多…

人工智能 2023年6月30日
00103
bert中最大处理序列长度超过512的处理策略

导读：由于bert文本长度最大为512，因此当文本超过512时，需要改进bert。本文就此改进进行阐述。针对长度超过512的文本，可以应用如下转换策略（预留[CLS]和[SEP]）…

人工智能 2023年5月28日
0071
《自然语言处理实战入门》知识图谱 —- 初探

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。 Original: https:…

人工智能 2023年6月4日
0069
【pytorch载入模型参数报错以及解决办法,小心使用strict=False】

pytorch载入模型参数报错以及解决办法,小心使用strict=False pytorch载入模型参数报错以及解决办法： * 代码：问题1：RuntimeError: Erro…

人工智能 2023年7月23日
0094
Keras 找不到权重的梯度 WARNING:tensorflow:Gradients do not exist for variables when minimizing the loss

在构建复数网络的时候，需要按照实部real与虚部image来分别创建计算权重： shape = (2,) + (input_dim, self.units) # dense&amp…

人工智能 2023年5月26日
00115
Framework中常用的AI算法有哪些

常用AI算法在AI算法中，有许多常用的算法可以应用于各种不同的问题和任务。这些算法通常涉及到数据的处理、特征工程、模型训练和评估等方面。下面将介绍一些常用的AI算法以及它们的原理…

人工智能 2024年1月1日
0058
dll缺失怎么修复？有什么好的修复方法推荐？

目录：dll缺失怎么修复？一、d3dcompiler_43.dll是什么二、d3dcompiler_43.dll缺失修复办法 * 2.1 第一种修复方法 2.2 第二种修复方法…

人工智能 2023年6月29日
0087
OpenCv中计算图像像素最大值、最小值、均值和方差

1、寻找图像像素的最大值最小值寻找图像最大值最小值的函数 minMaxLoc()函数 minMaxLoc()函数原型 void cv::minMaxLoc(InputArray …

人工智能 2023年5月26日
0098
神经网络训练过程中出现loss为nan，神经元坏死

最近在手撸Tensorflow2版本的Faster RCNN模型，稍后会进行整理。但在准备好了模型和训练数据之后的训练环节中出现了大岔子，即训练过程中loss变为nan。nan表示…

人工智能 2023年7月14日
0075
基于深度学习方法的点云算法3——PointNet++（点云分类分割）

基于深度学习方法的点云算法3——PointNet++（点云分类分割）请点点赞，会持续更新！！！基于深度学习方法的点云算法1——PointNetLK（点云配准）基于深度学习方法的点…

人工智能 2023年6月30日
0073

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

first-order-model实现照片动起来（附工具代码） | 机器学习

大家都在看