hrsc2016数据集xml格式转换为yolo格式，附下载链接

2023年10月26日上午3:01 • Python • 阅读 63

数据集介绍

数据集背景：

HRSC2016数据集

包含27种类型的遥感地物目标
提取自Google Earth
由西北工业大学于2016年发布
采用oriented bounding boxes(OBB)标注格式

HRSC2016 (Liu et al.，2016)是西北工业大学采集的用于轮船的检测的数据，包含4个大类19个小类共2976个船只实例信息。论文中特别指出他们的数据集是高分辨率数据集，分辨率介于0.4m和2m之间。数据集所有图像均来自六个著名的港口，包括海上航行的船只和靠近海岸的船只，船只图像的尺寸范围从300到1500，大多数图像大于1000×600。

数据集类别说明

本数据集中目标为航拍图像下的船只，包括海上船只与近岸船只。作者在对船只模型进行分类时采用了高度为3的树形结构，L1层次为Class、L2层次为category、L3层次为Type，类似生物学的分类观点，具体表示如下：

样本标注信息

HRSC2016采用OBB（oriented bounding box）的标注方法，提供了三类标注信息，包括bounding box、rotated bounding box和pixel-based segmentation，还包括港口、数据源、拍摄时间等额外信息，部分数据标注展示如下：

<hrsc_image>
  <img_custype>sealand</img_custype>
  <img_location>69.040297,33.070036</img_location>
  <img_sizewidth>1138</img_sizewidth>
  <img_sizeheight>833</img_sizeheight>
  <img_sizedepth>3</img_sizedepth>
  <img_resolution>1.07</img_resolution>
  <img_resolution_layer>18</img_resolution_layer>
  <img_scale>100</img_scale>
  <segmented>0</segmented>
  <img_havemask>0</img_havemask>
  <img_rotation>274d</img_rotation>
  <hrsc_objects>
    <hrsc_object>
      <object_id>100000008</object_id>
      <class_id>100000013</class_id>
      <object_no>100000008</object_no>
      <truncated>0</truncated>
      <difficult>0</difficult>
      <box_xmin>628</box_xmin>//bounding box&#x5750;&#x6807;&#x70B9;
      <box_ymin>40</box_ymin>
      <box_xmax>815</box_xmax>
      <box_ymax>783</box_ymax>
      <mbox_cx>719.9324</mbox_cx>//&#x65CB;&#x8F6C;&#x540E;&#x7684;&#x5DE6;&#x4E0A;&#x89D2;&#x5750;&#x6807;
      <mbox_cy>413.0048</mbox_cy>
      <mbox_w>741.8246</mbox_w>
      <mbox_h>172.6959</mbox_h>
      <mbox_ang>1.499893</mbox_ang>//&#x65CB;&#x8F6C;&#x89D2;&#x5EA6;
      <segmented>0</segmented>
      <seg_color>
      </seg_color>
      <header_x>713</header_x>//&#x8239;&#x5934;&#x90E8;&#x4FE1;&#x606F;
      <header_y>777</header_y>
    </hrsc_object>
  </hrsc_objects>
</hrsc_image>

数据图像示例

这里先上代码

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join

sets=[ ('2007', 'test')]

classes = ["ship"]

def convert(size, box):
    dw = 1./size[0]
    dh = 1./size[1]
    x = (box[0] + box[1])/2.0
    y = (box[2] + box[3])/2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

def convert_annotation(year, image_id):
    # &#x8F6C;&#x6362;&#x8FD9;&#x4E00;&#x5F20;&#x56FE;&#x7247;&#x7684;&#x5750;&#x6807;&#x8868;&#x793A;&#x65B9;&#x5F0F;&#xFF08;&#x683C;&#x5F0F;&#xFF09;,&#x5373;&#x8BFB;&#x53D6;xml&#x6587;&#x4EF6;&#x7684;&#x5185;&#x5BB9;&#xFF0C;&#x8BA1;&#x7B97;&#x540E;&#x5B58;&#x653E;&#x5728;txt&#x6587;&#x4EF6;&#x4E2D;
    in_file = open('./data/VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id))
    out_file = open('./data/VOCdevkit/VOC%s/labels/%s.txt'%(year, image_id), 'w')
    tree=ET.parse(in_file)
    root = tree.getroot()
    # size = root.find('size')
    w = int(root.find('Img_SizeWidth').text)
    h = int(root.find('Img_SizeHeight').text)

    if root.find('HRSC_Objects'):
        for obj in root.iter('HRSC_Object'):
            difficult = obj.find('difficult').text
            cls = 'ship'
            # cls = obj.find('name').text
            # if cls not in classes or int(difficult) == 1:
            if int(difficult) == 1:
                continue
            cls_id = classes.index(cls)
            # xmlbox = obj.find('bndbox')
            b = (float(obj.find('box_xmin').text), float(obj.find('box_xmax').text), float(obj.find('box_ymin').text), float(obj.find('box_ymax').text))
            bb = convert((w,h), b)
            out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

wd = getcwd()

for year, image_set in sets:
    if not os.path.exists('./data/VOCdevkit/VOC%s/labels/'%(year)):
        os.makedirs('./data/VOCdevkit/VOC%s/labels/'%(year))
    image_ids = open('./data/VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
    list_file = open('%s_%s.txt'%(year, image_set), 'w')
    for image_id in image_ids:
        list_file.write('./data/%s/VOCdevkit/VOC%s/JPEGImages/%s.bmp\n'%(wd, year, image_id))
        convert_annotation(year, image_id)
    list_file.close()

以上是hrsc2016数据集xml格式转换为yolo格式txt文件

注意路径问题

数据集3大类27小类，共2,976个目标

这里是讲数据集划分到只有一类 ship

image size:300 × 300 ~ 1500 × 900
image number:1061 在训练集、验证集和测试集中分别包含436、181和444张图像
object number:2976

数据集下载地址附上链接

https://aistudio.baidu.com/aistudio/datasetdetail/54106

本文参考
CSDN博主「Marlowee」的原创文章，遵循CC 4.0 BY-SA版权协议，附上
原文链接：https://blog.csdn.net/weixin_43427721/article/details/122057389

Original: https://www.cnblogs.com/liyuanzhouye/p/16209294.html
Author: 李沅洲也
Title: hrsc2016数据集xml格式转换为yolo格式，附下载链接

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/805235/

转载文章受原作者版权保护。转载请注明原作者出处！

python

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

NAS 简述

NAS 简述本文是神经网络架构搜索（ Neural Architecture Search——NAS）的简要综述。由于本人刚开始涉足这个方向，因此本文也可以看做是最近一段根据本人…

Python 2023年10月25日
0035
字符个数统计-python

题目描述编写一个函数，计算字符串中含有的不同字符的个数。字符在 ASCII 码范围内( 0~127 ，包括 0 和 127 )，换行表示结束符，不算在字符里。不在范围内的不作统计…

Python 2023年6月12日
0080
Attention-LSTM模型的python实现

1.模型结构 Attention-LSTM模型分为输入层、LSTM 层、 Attention层、全连接层、输出层五层。LSTM 层的作用是实现高层次特征学习； Attention …

Python 2023年8月2日
0065
tensorrt在不同batchsize下的性能对比

tensorrt利用GPU进行加速，天然的GPU是适合并行计算，因此加大batchsize是优化tensorrt常见的方式之一 tensorrt默认是batchsize=1，接下来…

Python 2023年8月21日
0039
torch.load()加载模型及其map_location参数

函数格式为： torch.load(f, map_location=None, pickle_module=pickle, **pickle_load_args)，一般我们使用的时…

Python 2023年10月27日
0032
Python基础学习【一万八千词】

一、Python数据类型类型例子整数浮点数字符串 Numpy数组类型例子布尔型 Pandas类型自定义二、数字 a = 1 print("整型类型：&quot…

Python 2023年8月19日
0040
python面向对象

传统的结构化设计方法的基点是面向过程的，将系统划分为多个过程。面向对象的方法采用了构建模型的观点，系统开发过程中每一步的共同目标都是建立问题域的模型。在面向对象的设计中，初始元素是…

Python 2023年5月24日
0059
（超级详细）numpy与torch用法对比手册

由于numpy比较基础和通用，但是GPU上跑实验必须使用tensor，故还是直接用torch里面的函数更加直接快速，其两者直接的对比整理如下： import numpy as np…

Python 2023年8月27日
0053
np.linalg.norm()用法总结

前言 np.linalg.norm()用于求范数，linalg本意为linear(线性) + algebra(代数)，norm则表示范数。用法 np.linalg.norm(x,…

Python 2023年8月2日
00111
Python自学笔记11-函数的定义和调用

函数是组织代码的非常有效的方式，有了函数，我们就可以编写大规模的项目。可以说，函数是组织代码的最小单元。 Python函数的定义函数是代码封装的一种手段，函数中包含一段可以重复执…

Python 2023年11月1日
0048
【MUI+Flask+MongoDB+HBuilderX】APP开发之答题积分逻辑详解

文章目录一，前言二，逻辑实现（数据库设计） * 2.1，time.time详解三，后端实现【flask】 * 3.1，（情况一）今天答题已经上限了，不计入分数 3.2，（情况…

Python 2023年8月15日
0047
pytest入门 —— 测试用例规则

一、pytest测试用例规则模块名必须以 test_开头或 _test结尾测试类必须以 Test开头且不能有 init方法测试方法必须以 test开头二、pytest测试用…

Python 2023年9月10日
0052
Android Studio 安装步骤详细图解

目录〇、前言一、Android Studio 下载二、安装步骤图解三、创建第一个Android Studio项目〇、前言 Android Studio 安装前需要先安装好…

Python 2023年11月9日
0049
小样本利器2.文本对抗+半监督 FGSM & VAT & FGM代码实现

小样本利器2.文本对抗+半监督 FGSM & VAT & FGM代码实现上一章我们聊了聊通过一致性正则的半监督方案，使用大量的未标注样本来提升小样本模型的泛化能力…

Python 2023年10月25日
0033
python concat axis_Python的合并数据集

Python 的pandas能够通过一些内置的方式进行数据集的合并。Pandas.merge函数可以实现根据一个或多个键值关键的键值，将两个DataFrame进行连接，这一特性，特…

Python 2023年8月9日
0057
【最新】win10 Python 3.x Scrapy 框架安装

1.第一步已安装了python 2.第二步使用pip 安装wheel 2.1 win + R下输入cmd&am…

Python 2023年10月6日
0037

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31