Yolo-FastestV2在树莓派4B上的MNN移植记录

致谢

Yolo-FastestV2 https://github.com/dog-qiuqiu/Yolo-FastestV2/, 非常感谢作者的分享!

模型准备

首先,下载代码,根据要求训练,或者直接使用作者训练好的模型,根据作者的文档,导出onnx模型。

MNN编译

下载最新的MNN代码。

编译MNNConvert
首先编译MNNConvert,这个是x86_64版本的,还好,用cmake可以在不同的目录里build,根据官方文档编译,默认是不成功的,我的版本是1.2.1。编译方式如下:

cd MNN/
./schema/generate.sh
mkdir build
cd build

cmake .. -DMNN_BUILD_CONVERTER=true && make -j16

报错如下:

/tmp/MNN-1.2.1/tools/converter/include/cxxopts.hpp: In function 'void cxxopts::values::detail::check_signed_range(bool, U, const string&)':
/tmp/MNN-1.2.1/tools/converter/include/cxxopts.hpp:343:25: error: 'numeric_limits' is not a member of 'std'
  343 |     SignedCheck<T, std::numeric_limits<T>::is_signed>()(negative, value, text);
      |                         ^~~~~~~~~~~~~~
/tmp/MNN-1.2.1/tools/converter/include/cxxopts.hpp:343:25: error: 'numeric_limits' is not a member of 'std'
/tmp/MNN-1.2.1/tools/converter/include/cxxopts.hpp:343:41: error: template argument 2 is invalid
  343 |     SignedCheck<T, std::numeric_limits<T>::is_signed>()(negative, value, text);
      |                                         ^
/tmp/MNN-1.2.1/tools/converter/include/cxxopts.hpp:343:53: error: qualified-id in declaration before '>' token
  343 |     SignedCheck<T, std::numeric_limits<T>::is_signed>()(negative, value, text);
      |                                                     ^

解决办法如下:

vim /tmp/MNN-1.2.1/tools/converter/include/cxxopts.hpp

如果报错:

Could NOT find Protobuf (missing: Protobuf_LIBRARIES Protobuf_INCLUDE_DIR)

使用下面方式安装可以解决。

sudo apt-get install protobuf-compiler libprotobuf-dev

也可以下载编译好的,见https://www.yuque.com/mnn/cn/model_convert的最后面。

模型转换
没有使用 --bizCode biz,不知道干啥用的,网上搜也没搜到,但是文档中却有,没空研究MNN代码。谁知道,麻烦告诉我,谢谢。


python3 pytorch2onnx.py --data data/coco.data --weights modelzoo/coco2017-0.241078ap-model.pth --output yolo-fastestv2.onnx

python3 -m onnxsim yolo-fastestv2.onnx yolo-fastestv2-opt.onnx

./MNNConvert -f ONNX --modelFile /home/yiifburj/code/Yolo-FastestV2/yolo-fastestv2-opt.onnx --MNNModel yolofastestv2-opt.mnn

MNN官方称1.2.0版本已经支持TORCH的方式转换,但实测发现还是不支持,MNN官方给出的pytorch导出模型方法如下:

import torch

model.eval()

model_trace = torch.jit.trace(model, torch.rand(1, 3, 1200, 1200))
model_trace.save('model_trace.pt')

model_script = torch.jit.script(model)
model_script.save('model_script.pt')

可以放在 pytorch2onnx.py 里面,这是两种方法,网上搜索发现,如果包含了一些不支持的操作,后面一种是不成功的,如果使用前面那种方案,为了保证正常工作,要保证所有的警告都已经被处理,另外前面的方案会和当前使用的device绑定,比如CPU或GPU,如果不是后面的方案失败,或是为了提速,建议使用后面那种(网上说的),因为暂不支持,所以暂未测试这种方式。

下载交叉编译器,由于新的交叉编译器编译的结果在老的系统中运行会有问题,我下载了一个相对比较老的版本。gcc-linaro-7.5.0-2019.12-x86_64_aarch64-linux-gnu.tar.xz, 在https://releases.linaro.org/components/toolchain/binaries/latest-7/aarch64-linux-gnu/里面找的。
如果树莓派里面的程序比较新,应该可以直接使用ubuntu源里面提供的,我的ubuntu太新了,而树莓派里面的ubuntu比较老,18.04,比较老,经测试,这个编译器编译,运行有问题,提示找不到xxx,比如下面:

/home/huike/nf/usr/local/lib/libMNN.so: undefined reference to pthread_create@GLIBC_2.34'
/home/huike/nf/usr/local/lib/libMNN.so: undefined reference to __libc_single_threaded@GLIBC_2.32'
/home/huike/nf/usr/local/lib/libMNN.so: undefined reference to std::_Sp_make_shared_tag::_S_eq(std::type_info const&)@GLIBCXX_3.4.26'
/home/huike/nf/usr/local/lib/libMNN.so: undefined reference to @GLIBC_2.29'
/home/huike/nf/usr/local/lib/libMNN.so: undefined reference to std::__throw_bad_array_new_length()@GLIBCXX_3.4.29'
/home/huike/nf/usr/local/lib/libMNN.so: undefined reference to @GLIBC_2.29'
collect2: error: ld returned 1 exit status
CMakeFiles/yolodepth.dir/build.make:161: recipe for target 'yolodepth' failed
make[2]: *** [yolodepth] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/yolodepth.dir/all' failed
make[1]: *** [CMakeFiles/yolodepth.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

一看提示,缺少的东西带的版本号,就是libc等版本过低的问题了,基于新版本编译的都会无法运行。

使用源里面的交叉编译器的方法如下:

sudo apt install  gcc-aarch64-linux-gnu g++-aarch64-linux-gnu

以下两个选项,经测试,树莓派上都不支持(开启-DMNN_OPENCL=ON,已经编译成功,我的树莓派的ubuntu中也有opencl的软件包,但是运行还是提示不支持,不知道为什么),所以开关都无所谓。

MNN_OPENCL  MNN_VULKAN

MNN_OPENMP开启了也不管用,因为优先使用了 MNN_USE_THREAD_POOL,在 cmake ..的那一步就会提示,可以 cmake步骤之后通过 ccmake ..配置关闭或者在 cmake的时候通过 -D关闭。不过,两种方式应该差别不是很大。

上面说的都是后话了。先按照官方文档中的尝试一下:


./schema/generate.sh

mkdir aarch64build
cd aarch64build

cmake .. -DCMAKE_SYSTEM_NAME=Linux -DCMAKE_SYSTEM_VERSION=1 -DCMAKE_SYSTEM_PROCESSOR=aarch64 -DCMAKE_C_COMPILER=交叉编译器路径/gcc-linaro-7.5.0-2019.12-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -DCMAKE_CXX_COMPILER=交叉编译器路径/gcc-linaro-7.5.0-2019.12-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-g++ -DCMAKE_EXPORT_COMPILE_COMMANDS=ON

make -j 16

报错如下:

/tmp/MNN-1.2.1/source/backend/cpu/compute/WinogradInt8Helper.cpp: 在函数'void MNN::TRANS_4x4(MNN::VecType&, MNN::VecType&, MNN::VecType&, MNN::VecType&)'中:
/tmp/MNN-1.2.1/source/backend/cpu/compute/WinogradInt8Helper.cpp:39:48: 附注: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts
     auto m0 = vtrn1q_s32(vec0.value, vec1.value), m1 = vtrn2q_s32(vec0.value, vec1.value);
                                                ^
/tmp/MNN-1.2.1/source/backend/cpu/compute/WinogradInt8Helper.cpp:39:48: 错误: cannot convert 'int8x16_t {aka __vector(16) signed char}' to 'int32x4_t {aka __vector(4) int}' for argument '1' to 'int32x4_t vtrn1q_s32(int32x4_t, int32x4_t)'
/tmp/MNN-1.2.1/source/backend/cpu/compute/WinogradInt8Helper.cpp:40:48: 错误: cannot convert 'int8x16_t {aka __vector(16) signed char}' to 'int32x4_t {aka __vector(4) int}' for argument '1' to 'int32x4_t vtrn1q_s32(int32x4_t, int32x4_t)'
     auto m2 = vtrn1q_s32(vec2.value, vec3.value), m3 = vtrn2q_s32(vec2.value, vec3.value);
                                                ^
/tmp/MNN-1.2.1/source/backend/cpu/compute/WinogradInt8Helper.cpp:42:29: 错误: 'm1'在此作用域中尚未声明
     vec1.value = vtrn1q_s64(m1, m3);
                             ^~
/tmp/MNN-1.2.1/source/backend/cpu/compute/WinogradInt8Helper.cpp:42:33: 错误: 'm3'在此作用域中尚未声明
     vec1.value = vtrn1q_s64(m1, m3);
                                 ^~

解决方案已经在报错中给出了,添加 -flax-vector-conversionsflag, 两种办法:

ccmake ..

cmake .. -DCMAKE_SYSTEM_NAME=Linux -DCMAKE_SYSTEM_VERSION=1 -DCMAKE_SYSTEM_PROCESSOR=aarch64 -DCMAKE_C_COMPILER=交叉编译器路径/gcc-linaro-7.5.0-2019.12-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -DCMAKE_CXX_COMPILER=交叉编译器路径/gcc-linaro-7.5.0-2019.12-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-g++ -DCMAKE_CXX_FLAGS='-flax-vector-conversions'

然后再编译,成功了。
收集安装文件,头文件和库:


mkdir installdir
make DESTDIR=installdir install

还有一种方法,直接使用源码中提供的编译脚本。但是由于我使用的编译器的名字和脚本中的不同,需要做一些修改。我的编译器是 aarch64-linux-gnu-xxx,而脚本中使用的是 aarch64-linux-gnueabihf-gcc-xxx,具体两者是否有区别,还没清楚,以前都是叫 aarch64-linux-gnueabihf-gcc-xxx这样的名字,和elf等编译裸机程序的相区别,现在又出现了 aarch64-linux-gnu-xxx这种名字,暂不清楚二者的区别,谁知道,可以留言告诉我,谢谢。

注意 修改了cmake文件需要删除编译的临时目录重来,仅仅 make clean,往往是不行的。


vim project/cross-compile/build.sh

vim project/cross-compile/arm.toolchain.cmake


project/cross-compile/build.sh aarch64-linux-gnu

另外,还可以针对cpu提速,树莓派4B用的是cotex-a72,不过没看见明显的增强。同样官方1.2.0开始支持的BF16, -DMNN_SUPPORT_BF16=ON开启,代码中也要开启,微小提升,但是测试图片识别少识别了两个框,这说明由于这是极精简的模型,本来就不是十分精确,再用这种降低精确度的优化,未必会很适合。另外,同时开启 -DCMAKE_CXX_FLAGS="-mcpu=cortex-a72"-DMNN_SUPPORT_BF16=ON, 和只开启 -DMNN_SUPPORT_BF16=ON相比, 性能反而降低了一点点, 这是为什么呢,很神奇,不知道原理!

-DCMAKE_CXX_FLAGS="-flax-vector-conversions" 修改为
-DCMAKE_CXX_FLAGS="-flax-vector-conversions -mcpu=cortex-a72"
添加-mcpu=cortex-a72

代码移植

这部分花了比较长的时间,一个是不熟悉输入输出,通过看Yolo-FastestV2的训练和测试程序,还有用netron看导出的onnx的模型的图,可以了解,主要是后处理,输出的几部分都是什么意思要清楚,Yolo-FastestV2在是否export onnx的时候是不一样的,主要是是否包含sigmoid和softmax的计算,这点在后处理上是有区别的,实际上简化了后处理。还有一个原因就是不熟悉MNN,第一次用,要一点点探索,尤其是输入通道顺序等的转换问题,和其他相关bug混在一起,造成误判断,分析的时候浪费了不少时间。

代码是在Yolo-FastestV2的基础上修改的,主要是把ncnn相关的部分程序和数据结构改成MNN的。 关于MNN的输入和输出通道的问题,参见注释。另外读取图片使用的opencv,没有使用MNN的那一套。代码如下:

yolo-fastestv2.h:

#ifndef YOLO_FASTEST_V2_H_
#define YOLO_FASTEST_V2_H_

#include
#include
#include

class TargetBox
{
private:
    float getWidth() { return (x2 - x1); };
    float getHeight() { return (y2 - y1); };

public:
    int x1;
    int y1;
    int x2;
    int y2;

    int cate;
    float score;

    float area() { return getWidth() * getHeight(); };
};

class yoloFastestv2
{
private:
    std::shared_ptr<MNN::Interpreter> net=nullptr;

    MNN::Session* session=nullptr;

    MNN::ScheduleConfig config;

    MNN::BackendConfig backendConfig;
    std::vector<float> anchor;

    const char *inputName;
    const char *outputName1;
    const char *outputName2;
    const char *outputNames[2];

    int numAnchor;
    int numOutput;
    int numThreads;
    int numCategory;
    int inputWidth, inputHeight;

    float nmsThresh;

    int nmsHandle(std::vector<TargetBox> &tmpBoxes, std::vector<TargetBox> &dstBoxes);
    int getCategory(const float *values, int index, int &category, float &score);

    int predHandle(std::unique_ptr<MNN::Tensor>*outs, std::vector<TargetBox> &dstBoxes,
            const float scaleW, const float scaleH, const float thresh);

public:
    yoloFastestv2();
    ~yoloFastestv2();

    int loadModel(const char* binPath);
    int detection(const cv::Mat srcImg, std::vector<TargetBox> &dstBoxes,
                  const float thresh = 0.3);
};

#endif

yolo-fastestv2.cpp:

#include
#include
#include "MNN/Tensor.hpp"
#include "yolo-fastestv2.h"
#include
#include
#include

using namespace std;

yoloFastestv2::yoloFastestv2()
{
    printf("Creat yoloFastestv2 Detector...\n");

    numOutput = 2;

    numThreads = 4;

    numAnchor = 3;

    numCategory = 80;

    nmsThresh = 0.25;

    inputWidth = 352;
    inputHeight = 352;

    inputName = "input.1";
    outputNames[0] = "794";
    outputNames[1] = "796";
    outputName1 = "794";
    outputName2 = "796";

    printf("numThreads:%d\n", numThreads);
    printf("inputWidth:%d inputHeight:%d\n", inputWidth, inputHeight);

    std::vector<float> bias {12.64,19.39, 37.88,51.48, 55.71,138.31,
                             126.91,78.23, 131.57,214.55, 279.92,258.87};

    anchor.assign(bias.begin(), bias.end());
}

yoloFastestv2::~yoloFastestv2()
{
    printf("Destroy yoloFastestv2 Detector...\n");
}

int yoloFastestv2::loadModel(const char* path)
{
    printf("Mnn mode init:%s\n", path);

    net = std::shared_ptr<MNN::Interpreter> (MNN::Interpreter::createFromFile(path));
    config.numThread = numThreads;
    config.type = MNN_FORWARD_CPU;

#if 0

    backendConfig.precision = MNN::BackendConfig::Precision_Low;
    config.backendConfig = &backendConfig;
#endif
    session = net->createSession(config);

    return 0;
}

float intersection_area(const TargetBox &a, const TargetBox &b)
{
    if (a.x1 > b.x2 || a.x2 < b.x1 || a.y1 > b.y2 || a.y2 < b.y1)
    {

        return 0.f;
    }

    float inter_width = std::min(a.x2, b.x2) - std::max(a.x1, b.x1);
    float inter_height = std::min(a.y2, b.y2) - std::max(a.y1, b.y1);

    return inter_width * inter_height;
}

bool scoreSort(TargetBox a, TargetBox b)
{
    return (a.score > b.score);
}

int yoloFastestv2::nmsHandle(std::vector<TargetBox> &tmpBoxes,
                             std::vector<TargetBox> &dstBoxes)
{
    std::vector<int> picked;

    sort(tmpBoxes.begin(), tmpBoxes.end(), scoreSort);

    for (int i = 0; i < tmpBoxes.size(); i++) {
        int keep = 1;
        for (int j = 0; j < picked.size(); j++) {

            float inter_area = intersection_area(tmpBoxes[i], tmpBoxes[picked[j]]);

            float union_area = tmpBoxes[i].area() + tmpBoxes[picked[j]].area() - inter_area;
            float IoU = inter_area / union_area;

            if(IoU > nmsThresh && tmpBoxes[i].cate == tmpBoxes[picked[j]].cate) {
                keep = 0;
                break;
            }
        }

        if (keep) {
            picked.push_back(i);
        }
    }

    for (int i = 0; i < picked.size(); i++) {
        dstBoxes.push_back(tmpBoxes[picked[i]]);
    }

    return 0;
}

int yoloFastestv2::getCategory(const float *values, int index, int &category, float &score)
{
    float tmp = 0;
    float objScore  = values[4 * numAnchor + index];

#if 1

    auto start = &values[5*numAnchor];
    auto end = &values[5*numAnchor] + numCategory;
    category = std::max_element(start, end) - start;
    score = start[category] * objScore;

#else
    for (int i = 0; i < numCategory; i++) {
        float clsScore = values[4 * numAnchor + numAnchor + i];
        clsScore *= objScore;

        if(clsScore > tmp) {
            score = clsScore;
            category = i;

            tmp = clsScore;
        }
    }
#endif

    return 0;
}

int yoloFastestv2::predHandle(std::unique_ptr<MNN::Tensor>*outs, std::vector<TargetBox> &dstBoxes,
                              const float scaleW, const float scaleH, const float thresh)
{
    for (int i = 0; i < numOutput; i++) {
        int stride;
        int outW, outH, outC;
        auto &out = outs[i];
        auto shape = out->shape();

        outH = shape[1];
        outW = shape[2];
        outC = shape[3];

        assert(inputHeight / outH == inputWidth / outW);
        stride = inputHeight / outH;

        auto values = out->host<float>();

        for (int h = 0; h < outH; h++) {
            const float* valueh = &values[h*outW * outC];

            for (int w = 0; w < outW; w++) {
                for (int b = 0; b < numAnchor; b++) {

                    TargetBox tmpBox;
                    int category = -1;
                    float score = -1;

                    getCategory(valueh, b, category, score);

                    if (score > thresh) {
                        float bcx, bcy, bw, bh;

                        bcx = ((valueh[b * 4 + 0] * 2. - 0.5) + w) * stride;
                        bcy = ((valueh[b * 4 + 1] * 2. - 0.5) + h) * stride;
                        bw = pow((valueh[b * 4 + 2] * 2.), 2) * anchor[(i * numAnchor * 2) + b * 2 + 0];
                        bh = pow((valueh[b * 4 + 3] * 2.), 2) * anchor[(i * numAnchor * 2) + b * 2 + 1];

                        tmpBox.x1 = (bcx - 0.5 * bw) * scaleW;
                        tmpBox.y1 = (bcy - 0.5 * bh) * scaleH;
                        tmpBox.x2 = (bcx + 0.5 * bw) * scaleW;
                        tmpBox.y2 = (bcy + 0.5 * bh) * scaleH;
                        tmpBox.score = score;
                        tmpBox.cate = category;

                        dstBoxes.push_back(tmpBox);
                    }
                }
                valueh += outC;
            }
        }
    }
    return 0;
}

int yoloFastestv2::detection(const cv::Mat srcImg, std::vector<TargetBox> &dstBoxes, const float thresh)
{
    dstBoxes.clear();

    float scaleW = (float)srcImg.cols / (float)inputWidth;
    float scaleH = (float)srcImg.rows / (float)inputHeight;

    cv::Mat small;
    cv::resize(srcImg, small, cv::Size(), 1./scaleW, 1./scaleH, cv::INTER_LINEAR);
    small.convertTo(small, CV_32FC3, 1./255);

    auto input = net->getSessionInput(session, NULL);

    std::vector<int> dim{1, inputHeight, inputWidth, 3};

    std::unique_ptr<MNN::Tensor> nhwc_Tensor(MNN::Tensor::create<float>(dim, NULL, MNN::Tensor::TENSORFLOW));
    auto nhwc_data = nhwc_Tensor->host<float>();
    auto nhwc_size = nhwc_Tensor->size();
    ::memcpy(nhwc_data, small.data, nhwc_size);
    input->copyFromHostTensor(nhwc_Tensor.get());

    net->runSession(session);

    auto outmap = net->getSessionOutputAll(session);
    std::unique_ptr<MNN::Tensor> out[2];

    out[0]  = std::make_unique<MNN::Tensor>(outmap[outputNames[0]], MNN::Tensor::CAFFE);
    out[1]  = std::make_unique<MNN::Tensor>(outmap[outputNames[1]], MNN::Tensor::CAFFE);

    outmap[outputNames[0]]->copyToHostTensor(out[0].get());
    outmap[outputNames[1]]->copyToHostTensor(out[1].get());

#if 0

    auto shape = outmap[outputNames[0]]->shape();
    auto val0 = out[0]->host<float>();
    auto val1 = outmap[outputNames[0]]->host<float>();
    for(int i = 0; i < 10; i++){
        cout << val0[i] << " " << val1[i] << endl;
    }
#endif
    std::vector<TargetBox> tmpBoxes;

    predHandle(out, tmpBoxes, scaleW, scaleH, thresh);

    nmsHandle(tmpBoxes, dstBoxes);

    return 0;
}

test.c:

#include "yolo-fastestv2.h"
#include

int main(int argc, char **argv)
{
    assert(argc == 3);
    static const char* class_names[] = {
        "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
        "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
        "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
        "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
        "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
        "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
        "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
        "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
        "hair drier", "toothbrush"
    };

    yoloFastestv2 api;

    api.loadModel(argv[1]);

    cv::Mat cvImg = cv::imread(argv[2]);

    std::vector<TargetBox> boxes;

    int64 t = cv::getTickCount();
    for (int i = 0; i < 100; i++) {
        api.detection(cvImg, boxes, 0.3);
    }
    t = cv::getTickCount() - t;

    printf("Time elapsed: %fms\n", t*1000/cv::getTickFrequency());

    for (int i = 0; i < boxes.size(); i++) {
        std::cout<<boxes[i].x1<<" "<<boxes[i].y1<<" "<<boxes[i].x2<<" "<<boxes[i].y2
                 <<" "<<boxes[i].score<<" "<<boxes[i].cate<<std::endl;

        char text[256];
        sprintf(text, "%s %.1f%%", class_names[boxes[i].cate], boxes[i].score * 100);

        int baseLine = 0;
        cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);

        int x = boxes[i].x1;
        int y = boxes[i].y1 - label_size.height - baseLine;
        if (y < 0)
            y = 0;
        if (x + label_size.width > cvImg.cols)
            x = cvImg.cols - label_size.width;

        cv::rectangle(cvImg, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)),
                      cv::Scalar(255, 255, 255), -1);

        cv::putText(cvImg, text, cv::Point(x, y + label_size.height),
                    cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 0));

        cv::rectangle (cvImg, cv::Point(boxes[i].x1, boxes[i].y1),
                       cv::Point(boxes[i].x2, boxes[i].y2), cv::Scalar(255, 255, 0), 2, 2, 0);
    }

    cv::imwrite("output.png", cvImg);

    return 0;
}

CMakeLists.txt:


cmake_minimum_required(VERSION 2.8.12)

PROJECT(test)
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=gnu++14")

find_package( OpenCV REQUIRED )

include_directories( ${OpenCV_INCLUDE_DIRS})

set(CMAKE_THREAD_PREFER_PTHREAD TRUE)
set(THREADS_PREFER_PTHREAD_FLAG TRUE)
find_package(Threads REQUIRED)

set(USER_PATH "/home/huike/installdir/usr/local")

include_directories( "${USER_PATH}/include")
link_directories("${USER_PATH}/lib")

ADD_EXECUTABLE(test test.cpp yolo-fastestv2.cpp)
TARGET_LINK_LIBRARIES(test ${OpenCV_LIBS} Threads::Threads MNN)

代码中,循环检测100次,测试结果如下:

Creat yoloFastestv2 Detector...

numThreads:4
inputWidth:352 inputHeight:352
Mnn mode init:/home/huike/yolofastestv2-opt.mnn
Time elapsed: 2348.645662ms
170 53 264 172 0.868516 0
116 132 250 247 0.464004 1
0 175 92 339 0.318743 0
Destroy yoloFastestv2 Detector...

没有GPU加速,arm处理器,100次只用了2.348秒,实时性不错。
测试结果

Yolo-FastestV2在树莓派4B上的MNN移植记录

和NCNN性能对比

NCNN的交叉编译比较顺畅,执行代码中的编译脚本就可以了,不过默认编译的太多了,修改build.sh,去掉不需要的

mybuild.sh:

mkdir -p build-aarch64-linux-gnu
cd build-aarch64-linux-gnu
cmake -DCMAKE_TOOLCHAIN_FILE=../toolchains/aarch64-linux-gnu.toolchain.cmake ..

make -j4
make install

同样,如果需要的话 toolchains/aarch64-linux-gnu.toolchain.cmake也要修改,修改交叉编译器的路径。然后把build-aarch64-linux-gnu/install中的内容和 Yolo-FastestV2中的示例程序(在 sample/ncnn中)一起放到树莓派中,编译,需要修改示例程序中的build.sh,修改对应的路径,同样添加 -O3,不过因为不是主要运行部分,影响不大。

Yolo-FastestV2中提供的给予NCNN的代码改了一下,改成了循环100次检测,放在树莓派中编译,然后和MNN的耗时结果对比。
demo.cpp:

#include "yolo-fastestv2.h"

int main()
{
    static const char* class_names[] = {
        "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
        "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
        "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
        "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
        "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
        "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
        "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
        "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
        "hair drier", "toothbrush"
    };

    yoloFastestv2 api;

    api.loadModel("./model/yolo-fastestv2-opt.param",
                  "./model/yolo-fastestv2-opt.bin");

cv::Mat cvImg = cv::imread("/home/huike/000139.jpg");

    std::vector<TargetBox> boxes;

    int64 t = cv::getTickCount();
    for (int i = 0; i < 100; i++) {
        api.detection(cvImg, boxes, 0.3);
    }
    t = cv::getTickCount() - t;

    printf("Time elapsed: %fms\n", t*1000/cv::getTickFrequency());

    for (int i = 0; i < boxes.size(); i++) {
        std::cout<<boxes[i].x1<<" "<<boxes[i].y1<<" "<<boxes[i].x2<<" "<<boxes[i].y2
                 <<" "<<boxes[i].score<<" "<<boxes[i].cate<<std::endl;

        char text[256];
        sprintf(text, "%s %.1f%%", class_names[boxes[i].cate], boxes[i].score * 100);

        int baseLine = 0;
        cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);

        int x = boxes[i].x1;
        int y = boxes[i].y1 - label_size.height - baseLine;
        if (y < 0)
            y = 0;
        if (x + label_size.width > cvImg.cols)
            x = cvImg.cols - label_size.width;

        cv::rectangle(cvImg, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)),
                      cv::Scalar(255, 255, 255), -1);

        cv::putText(cvImg, text, cv::Point(x, y + label_size.height),
                    cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 0));

        cv::rectangle (cvImg, cv::Point(boxes[i].x1, boxes[i].y1),
                       cv::Point(boxes[i].x2, boxes[i].y2), cv::Scalar(255, 255, 0), 2, 2, 0);
    }

    cv::imwrite("output.png", cvImg);

    return 0;
}

用同样的图片测试

./demo
Creat yoloFastestv2 Detector...

numThreads:4
inputWidth:352 inputHeight:352
Ncnn mode init:
./model/yolo-fastestv2-opt.param
./model/yolo-fastestv2-opt.bin
Ncnn model init sucess...

Time elapsed: 5238.277544ms
170 53 264 172 0.868419 0
116 132 250 247 0.45254 1
0 175 92 339 0.314176 0
Destroy yoloFastestv2 Detector...

令人意外的是,耗时是5.238秒,相差一倍还多,可见虽然在编译MNN上可能会遇到点挫折,但是MNN速度还是很快的。当然无论哪个,用了Yolo-FastestV2都是很快的,天下模型,无精不破,唯快不破,向大佬致敬。

其他

笔者用的是1.2.1版本,如果是1.2.0版本,还会遇到一些其他错误,有汇编错误,还有函数指针转换的错误,笔者解决不掉, 建议使用1.2.1版本,而不是1.2.0版本。报错内容如下。

tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S: Assembler messages:
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:100: 错误: operand mismatch -- mov v17.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:100: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:100: Info:      mov v17.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:100: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:100: Info:      mov v17.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:101: 错误: operand mismatch --  v18.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:101: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:101: Info:      mov v18.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:101: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:101: Info:      mov v18.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:102: 错误: operand mismatch -- mov v19.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:102: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:102: Info:      mov v19.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:102: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:102: Info:      mov v19.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:103: 错误: operand mismatch --  v20.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:103: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:103: Info:      mov v20.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:103: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:103: Info:      mov v20.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:104: 错误: operand mismatch -- mov v21.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:104: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:104: Info:      mov v21.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:104: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:104: Info:      mov v21.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:105: 错误: operand mismatch --  v22.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:105: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:105: Info:      mov v22.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:105: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:105: Info:      mov v22.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:106: 错误: operand mismatch -- mov v23.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:106: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:106: Info:      mov v23.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:106: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:106: Info:      mov v23.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:107: 错误: operand mismatch --  v24.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:107: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:107: Info:      mov v24.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:107: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:107: Info:      mov v24.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:108: 错误: operand mismatch -- mov v25.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:108: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:108: Info:      mov v25.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:108: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:108: Info:      mov v25.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:109: 错误: operand mismatch --  v26.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:109: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:109: Info:      mov v26.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:109: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:109: Info:      mov v26.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:110: 错误: operand mismatch -- mov v27.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:110: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:110: Info:      mov v27.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:110: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:110: Info:      mov v27.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:111: 错误: operand mismatch --  v28.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:111: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:111: Info:      mov v28.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:111: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:111: Info:      mov v28.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:112: 错误: operand mismatch -- mov v29.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:112: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:112: Info:      mov v29.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:112: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:112: Info:      mov v29.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:113: 错误: operand mismatch --  v30.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:113: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:113: Info:      mov v30.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:113: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:113: Info:      mov v30.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:114: 错误: operand mismatch -- mov v31.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:114: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:114: Info:      mov v31.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:114: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:114: Info:      mov v31.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:385: 错误: operand mismatch --  v17.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:385: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:385: Info:      mov v17.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:385: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:385: Info:      mov v17.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:386: 错误: operand mismatch -- mov v18.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:386: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:386: Info:      mov v18.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:386: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:386: Info:      mov v18.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:387: 错误: operand mismatch --  v19.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:387: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:387: Info:      mov v19.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:387: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:387: Info:      mov v19.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:388: 错误: operand mismatch -- mov v20.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:388: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:388: Info:      mov v20.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:388: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:388: Info:      mov v20.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:389: 错误: operand mismatch --  v21.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:389: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:389: Info:      mov v21.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:389: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:389: Info:      mov v21.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:390: 错误: operand mismatch -- mov v22.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:390: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:390: Info:      mov v22.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:390: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:390: Info:      mov v22.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:391: 错误: operand mismatch --  v23.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:391: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:391: Info:      mov v23.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:391: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:391: Info:      mov v23.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:533: 错误: operand mismatch -- mov v17.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:533: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:533: Info:      mov v17.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:533: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:533: Info:      mov v17.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:534: 错误: operand mismatch --  v18.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:534: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:534: Info:      mov v18.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:534: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:534: Info:      mov v18.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:535: 错误: operand mismatch -- mov v19.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:535: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:535: Info:      mov v19.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:535: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:535: Info:      mov v19.16b, v16.16b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:647: 错误: operand mismatch --  v17.4s,v16.4s'
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:647: Info:    did you mean this?

/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:647: Info:      mov v17.8b, v16.8b
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:647: Info:    other valid variant(s):
/tmp/MNN-1.2.0/source/backend/cpu/arm/arm64/MNNPackedSparseMatMulEpx4.S:647: Info:      mov v17.16b, v16.16b

还有
tmp/MNN-1.2.0/source/backend/cpu/compute/WinogradInt8Helper.cpp: 在函数'void MNN::_sourceTransUnit4x4Pack4x4(const int8_t*, int8_t*, size_t, size_t, size_t, size_t)'中:
/tmp/MNN-1.2.0/source/backend/cpu/compute/WinogradInt8Helper.cpp:67:59: 错误: cannot convert 'int8_t* {aka signed char*}' to 'int32_t* {aka int*}' for argument '1' to 'void vst1q_lane_s32(int32_t*, int32x4_t, int)'
             vst1q_lane_s32(dstStart + 0 * dstZStep, tmp, 0);
                                                           ^
/tmp/MNN-1.2.0/source/backend/cpu/compute/WinogradInt8Helper.cpp:68:59: 错误: cannot convert 'int8_t* {aka signed char*}' to 'int32_t* {aka int*}' for argument '1' to 'void vst1q_lane_s32(int32_t*, int32x4_t, int)'
             vst1q_lane_s32(dstStart + 1 * dstZStep, tmp, 1);
                                                           ^
/tmp/MNN-1.2.0/source/backend/cpu/compute/WinogradInt8Helper.cpp:69:59: 错误: cannot convert 'int8_t* {aka signed char*}' to 'int32_t* {aka int*}' for argument '1' to 'void vst1q_lane_s32(int32_t*, int32x4_t, int)'
             vst1q_lane_s32(dstStart + 2 * dstZStep, tmp, 2);
                                                           ^
/tmp/MNN-1.2.0/source/backend/cpu/compute/WinogradInt8Helper.cpp:70:59: 错误: cannot convert 'int8_t* {aka signed char*}' to 'int32_t* {aka int*}' for argument '1' to 'void vst1q_lane_s32(int32_t*, int32x4_t, int)'
             vst1q_lane_s32(dstStart + 3 * dstZStep, tmp, 3);
                                                           ^
/tmp/MNN-1.2.0/source/backend/cpu/compute/WinogradInt8Helper.cpp: 在函数'void MNN::_sourceTransUnit4x4Pack16x4(const int8_t*, int8_t*, size_t, size_t, size_t, size_t)'中:
/tmp/MNN-1.2.0/source/backend/cpu/compute/WinogradInt8Helper.cpp:154:59: 错误: cannot convert 'int8_t* {aka signed char*}' to 'int32_t* {aka int*}' for argument '1' to 'void vst1q_lane_s32(int32_t*, int32x4_t, int)'
             vst1q_lane_s32(dstStart + 0 * dstZStep, tmp, 0);
                                                           ^
/tmp/MNN-1.2.0/source/backend/cpu/compute/WinogradInt8Helper.cpp:155:59: 错误: cannot convert 'int8_t* {aka signed char*}' to 'int32_t* {aka int*}' for argument '1' to 'void vst1q_lane_s32(int32_t*, int32x4_t, int)'
             vst1q_lane_s32(dstStart + 1 * dstZStep, tmp, 1);
                                                           ^
/tmp/MNN-1.2.0/source/backend/cpu/compute/WinogradInt8Helper.cpp:156:59: 错误: cannot convert 'int8_t* {aka signed char*}' to 'int32_t* {aka int*}' for argument '1' to 'void vst1q_lane_s32(int32_t*, int32x4_t, int)'
             vst1q_lane_s32(dstStart + 2 * dstZStep, tmp, 2);
                                                           ^
/tmp/MNN-1.2.0/source/backend/cpu/compute/WinogradInt8Helper.cpp:157:59: 错误: cannot convert 'int8_t* {aka signed char*}' to 'int32_t* {aka int*}' for argument '1' to 'void vst1q_lane_s32(int32_t*, int32x4_t, int)'
             vst1q_lane_s32(dstStart + 3 * dstZStep, tmp, 3);
                                                           ^

Original: https://blog.csdn.net/weixin_39266208/article/details/122131303
Author: weixin_39266208
Title: Yolo-FastestV2在树莓派4B上的MNN移植记录

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/686453/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球