open-mmlab. mmclassification构建自己的数据集格式并训练

2023年8月27日下午8:40 • Python • 阅读 57

0.常用的两种数据格式，详见官网

1.示例使得数据格式为包含图片的文件夹、包含路径和标签的TXT文件\mmclassification\mmcls\data\mnist\filelist.py

2.修改数据加载方式，以加载自己格式的数据\mmclassification\mmcls\datasets\my_filelist.py

3.添加自己的类名\mmclassification\mmcls\datasets__init__.py

4.在训练配置文件中改为自己的类名，和数据D:\Code\mmclassification\configs\resnet\GWF_resnet18_8xb32_in1k.py

5.相关文档推荐

0.常用的两种数据格式，详见官网

The CustomDataset supports two kinds of format:

An annotation file is provided, and each line indicates a sample image. The sample images can be organized in any structure, like:

train/
├── folder_1
│   ├── xxx.png
│   ├── xxy.png
│   └── ...

├── 123.png
├── nsdf3.png
└── ...

And an annotation file records all paths of samples and corresponding category index. The first column is the image path relative to the folder (in this example, train) and the second column is the index of category:

folder_1/xxx.png 0
folder_1/xxy.png 1
123.png 1
nsdf3.png 2
...

NOTE The value of the category indices should fall in range [0, num_classes - 1].

The sample images are arranged in the special structure:

train/
├── cat
│   ├── xxx.png
│   ├── xxy.png
│   └── ...

│       └── xxz.png
├── bird
│   ├── bird1.png
│   ├── bird2.png
│   └── ...

└── dog
    ├── 123.png
    ├── nsdf3.png
    ├── ...

    └── asd932_.png

In this case, you don’t need provide annotation file, and all images in the directory cat will be recognized as samples of cat.

Usually, we will split the whole dataset to three sub datasets: train, val and test for training, validation and test. And every sub dataset should be organized as one of the above structures.

And in your config file, you can modify the data field as below:

...

dataset_type = 'CustomDataset'
classes = ['cat', 'bird', 'dog']  # The category names of your dataset

data = dict(
    train=dict(
        type=dataset_type,
        data_prefix='data/my_dataset/train',
        ann_file='data/my_dataset/meta/train.txt',
        classes=classes,
        pipeline=train_pipeline
    ),
    val=dict(
        type=dataset_type,
        data_prefix='data/my_dataset/val',
        ann_file='data/my_dataset/meta/val.txt',
        classes=classes,
        pipeline=test_pipeline
    ),
    test=dict(
        type=dataset_type,
        data_prefix='data/my_dataset/test',
        ann_file='data/my_dataset/meta/test.txt',
        classes=classes,
        pipeline=test_pipeline
    )
)
...

1.示例使得数据格式为包含图片的文件夹、包含路径和标签的TXT文件\mmclassification\mmcls\data\mnist\filelist.py

实例：将数据格式从以下格式：

--\mmclassification\mmcls\data\mnist\train
&#xA0; &#xA0; &#xA0; &#xA0; --0&#xA0;
&#xA0; &#xA0; &#xA0; &#xA0; &#xA0;  --00001.png
&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;--00021.png
&#xA0; &#xA0; &#xA0; &#xA0; &#xA0; &#xA0;--&#x3002;&#x3002;&#x3002;
&#xA0; &#xA0; &#xA0; &#xA0; --1
&#xA0; &#xA0; &#xA0; &#xA0; --&#x3002;&#x3002;&#x3002;

修改为：

--\mmclassification\mmcls\data\mnist\train
&#xA0; &#xA0; &#xA0; &#xA0; --00001.png
&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;&#xA0;--00021.png
&#xA0; &#xA0; &#xA0; &#xA0;&#xA0;--&#x3002;&#x3002;&#x3002;

并且生成文件\mmclassification\mmcls\data\mnist\train.txt，形如：

open-mmlab. mmclassification构建自己的数据集格式并训练

The value of the category indices should fall in range [0, num_classes - 1].

#\mmclassification\mmcls\data\filelist.py

import numpy as np
import os
import shutil
path = r'D:\Code\mmclassification\mmcls\data\1'
train_path = os.path.join(path, 'train')
train_out = os.path.join(path, 'train.txt')
val_path = os.path.join(path, 'test')
val_out = os.path.join(path, 'test.txt')

data_train_out = os.path.join(path, 'train_filelist')
data_val_out = os.path.join(path, 'test_filelist')
if not os.path.exists(data_train_out):
    os.mkdir(data_train_out)
if not os.path.exists(data_val_out):
    os.mkdir(data_val_out)

labelISfilename = False  #True: when label is filename
def get_filelist(input_path,output_path):
    print('get_filelist input_path,output_path:', input_path, output_path)
    with open(output_path, 'w') as f:
        i = 0
        index__label_name = {}
        for dir_path, dir_names, file_names in os.walk(input_path): #os.walk返回的是一个三元组(root,dirs,files)。root 所指的是当前正在遍历的这个文件夹的本身的地址;dirs 是一个 list ，内容是该文件夹中所有的目录(文件夹)的名字(不包括子目录)；files 同样是 list , 内容是该文件夹中所有的文件的名字(不包括子目录)
            print('-dir_path, dir_names, file_names-', dir_path, dir_names, file_names)
            if dir_path == input_path:
                label_name = dir_names
            if dir_path != input_path:
                if labelISfilename:
                    label = int(dir_path.split('\\')[-1])
                else:
                    label = i
                #print('label:', label)
                for filename in file_names:
                    f.write(filename +' '+str(label)+"\n")
                index__label_name[str(i)] = label_name[i]
                i+=1
        print('index__label_name:', index__label_name)
        return index__label_name

def move_imgs(input_path,output_path):
    for dir_path, dir_names, file_names in os.walk(input_path):
        print('-dir_path, dir_names, file_names-', dir_path, dir_names, file_names)
        for filename in file_names:
            if dir_path != input_path:
                source_path = os.path.join(dir_path, filename)
                print('source_path:', source_path)
                print('out_path:', os.path.join(output_path, filename))
                shutil.copyfile(source_path, os.path.join(output_path, filename))

print('strat get_filelist:')
get_filelist(train_path, train_out)
get_filelist(val_path, val_out)

print('strat move_imgs:')
move_imgs(train_path, data_train_out)
move_imgs(val_path, data_val_out)

2.修改数据加载方式，以加载自己格式的数据\mmclassification\mmcls\datasets\my_filelist.py

#\mmclassification\mmcls\datasets\my_filelist.py

import numpy as np

from .builder import DATASETS
from .base_dataset import BaseDataset

@DATASETS.register_module()
class MyFilelist(BaseDataset):
    CLASSES = ['0','1','2','3','4','5','6','7','8','9']

    def load_annotations(self):
        assert isinstance(self.ann_file, str) #对应配置文件GWF_resnet18_8xb32_in1k.py中ann_file表示包括文件名和标签的txt文件

        data_infos = []
        with open(self.ann_file) as f:
            samples = [x.strip().split(' ') for x in f.readlines()]
            for filename, gt_label in samples:
                info = {'img_prefix': self.data_prefix} #对应配置文件GWF_resnet18_8xb32_in1k.py中data_prefix表示文件路径前缀
                info['img_info'] = {'filename': filename}
                info['gt_label'] = np.array(gt_label, dtype=np.int64)
                data_infos.append(info)
            return data_infos

3.添加自己的类名\mmclassification\mmcls\datasets__init__.py

#\mmclassification\mmcls\datasets\__init__.py

Copyright (c) OpenMMLab. All rights reserved.

from .base_dataset import BaseDataset
from .builder import (DATASETS, PIPELINES, SAMPLERS, build_dataloader,
                      build_dataset, build_sampler)
from .cifar import CIFAR10, CIFAR100
from .cub import CUB
from .custom import CustomDataset
from .dataset_wrappers import (ClassBalancedDataset, ConcatDataset,
                               KFoldDataset, RepeatDataset)
from .imagenet import ImageNet
from .imagenet21k import ImageNet21k
from .mnist import MNIST, FashionMNIST
from .multi_label import MultiLabelDataset
from .samplers import DistributedSampler, RepeatAugSampler
from .voc import VOC
from .my_filelist import MyFilelist #add by gwf

__all__ = [
    'BaseDataset', 'ImageNet', 'CIFAR10', 'CIFAR100', 'MNIST', 'FashionMNIST',
    'VOC', 'MultiLabelDataset', 'build_dataloader', 'build_dataset',
    'DistributedSampler', 'ConcatDataset', 'RepeatDataset',
    'ClassBalancedDataset', 'DATASETS', 'PIPELINES', 'ImageNet21k', 'SAMPLERS',
    'build_sampler', 'RepeatAugSampler', 'KFoldDataset', 'CUB', 'CustomDataset', 'MyFilelist'
]

4.在训练配置文件中改为自己的类名，和数据D:\Code\mmclassification\configs\resnet\GWF_resnet18_8xb32_in1k.py

        type='MyFilelist', #新写的类
        data_prefix='../mmcls/data/mnist/train',#里面直接保存所有的图片
        ann_file='../mmcls/data/mnist/train.txt',#格式为图片名 标签（为一行）

5.相关文档推荐

记录一次 mmclassification 自定义数据训练和推理

Original: https://blog.csdn.net/weixin_40712293/article/details/126197608
Author: 黛玛日孜
Title: open-mmlab. mmclassification构建自己的数据集格式并训练

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/761387/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

【python基础教程】csv文件的写入与读取

Original: https://www.cnblogs.com/tuixiulaozhou/p/16746323.htmlAuthor: 退休的老周Title: 【python…

Python 2023年6月9日
0059
python实现简易五子棋小游戏（三种方式）

tkinter库：Python的标准Tk GUI工具包的接口示例： from tkinter import * root = Tk() #你&#x7…

Python 2023年8月30日
0067
让学前端不再害怕英语单词（一）

欢迎关注csdn前端领域博主: 前端小王hs email: 337674757@qq.com 前端交流群： 598778642 有很多跟着我学习的学生经常跟我抱怨前端的单词很多…

Python 2023年9月16日
0050
【GO】读写文件遇到的不同操作系统的问题

最近使用Go写了一个小工具，过程使用OS库操作文件的时候，遇到了一些一个跨系统的问题，在Windows可以正常运行的代码，打包到Linux就无法运行了，在此记录一下。除此之外还有通…

Python 2023年6月11日
0067
呼叫中心系统组成部分有哪些？

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

Python 2023年11月8日
0038
认识RocketMQ4.x架构设计

消息模型单体的消息模型 RocketMQ消息模型跟其他的消息队列一样都是 producer – > topic->consumer producer 生…

Python 2023年10月21日
0041
《Pygame游戏编程入门》学习——第4章用户输入：Bomb Catcher游戏

《Pygame游戏编程入门》学习——第4章用户输入：Bomb Catcher游戏第4章挑战[^1] * 问题1. Bomb Catching游戏太小了，玩起来不是很过瘾。毕竟…

Python 2023年9月19日
0057
Pytorch：使用Tensorboard记录训练状态

我们知道TensorBoard是Tensorflow中的一个强大的可视化工具，它可以让我们非常方便地记录训练loss波动情况。如果我们是其它深度学习框架用户（如Pytorch），而…

Python 2023年10月28日
0043
Pygame之滑稽球壁碰

安装pygame: 打开cmd，输入 pip install pygame 首先，我们需要一个小球图片，可以自行寻找，我是通过觅元素（免费下载）找到了免抠素材要是嫌麻烦，直接从我…

Python 2023年9月18日
0085
python allure报告_pytest+allure+jenkins，生成allure报告

1、本地生成allure报告 1、安装依赖 allure-2.13.2 allure-pytest-2.8.11 pytest-5.4.1 python-jenkins jdk-1…

Python 2023年9月14日
0056
端口安全 | DHCP snooping

1、端口安全用于防止mac地址的欺骗、mac地址泛洪攻击。主要思想就是在交换机的端口下通过手工或者自动绑定mac地址，这就就只能是绑定的mac地址能够通过。 2、通过静态的端口绑定…

Python 2023年10月23日
0080
python scrapy的模拟登录

学习目标：1、应用请求对象cookies参数的使用；2、了解 start_requests函数的作用；3、应用构造并发送post请求； 1、回顾之前的模拟登录的方法1.1、re…

Python 2023年10月3日
0041
python删除空值多于50%的行_删除Pandas中“空”值超过60%的列

我有一个这样的数据框：import pandas as pd data = { ‘c1’: [‘Test1′,’Test…

Python 2023年8月8日
0075
python人脸口罩识别检测_Python实现人脸口罩检测！这玩意太强大了啊!

原标题：Python实现人脸口罩检测！这玩意太强大了啊! 测试环境 windows10 系统；软件：pyCharm;使用模型：tenforflow1.15.0 ；python3.7…

Python 2023年9月24日
0047
横向细化的元胞自动机模型

404. 抱歉，您访问的资源不存在。可能是网址有误，或者对应的内容被删除，或者处于私有状态。代码改变世界，联系邮箱 contact@cnblogs.com 园子的商业化努力-困…

Python 2023年6月15日
0088
flask 请求数据和响应

固定参数是指在URL中固定的, 是不可获取的一部分, 区别与查询字符串, 查询字符串是可有可无的. 标准格式 /path/ 默认参数格式是字符串但是我们还有其他的格式，可以做…

Python 2023年8月11日
0061

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

open-mmlab. mmclassification构建自己的数据集格式并训练

1.示例使得数据格式为包含图片的文件夹、包含路径和标签的TXT文件\mmclassification\mmcls\data\mnist\filelist.py

大家都在看