Datahub小结

2023年5月30日下午10:58 • 技术杂谈 • 阅读 87

往datahub发送数据时，建议使用Producer。好处是不用设置shardId，这样datahub在增加或减少shard时，业务代码都不需要变更。
另外datahub的shardId只会往前增，老的数据不用，只能停用。

<dependency>
    <groupId>com.aliyun.datahubgroupId>
    <artifactId>aliyun-sdk-datahubartifactId>
    <version>2.18.0-publicversion>
dependency>
<dependency>
      <groupId>com.aliyun.datahubgroupId>
      <artifactId>datahub-client-libraryartifactId>
      <version>1.1.12-publicversion>
dependency>

import com.aliyun.datahub.client.exception.AuthorizationFailureException;
import com.aliyun.datahub.client.exception.DatahubClientException;
import com.aliyun.datahub.client.exception.InvalidParameterException;
import com.aliyun.datahub.client.exception.MalformedRecordException;
import com.aliyun.datahub.client.exception.NoPermissionException;
import com.aliyun.datahub.client.exception.ShardNotFoundException;
import com.aliyun.datahub.client.model.Field;
import com.aliyun.datahub.client.model.FieldType;
import com.aliyun.datahub.client.model.RecordEntry;
import com.aliyun.datahub.client.model.RecordSchema;
import com.aliyun.datahub.client.model.TupleRecordData;
import com.aliyun.datahub.clientlibrary.config.ProducerConfig;
import com.aliyun.datahub.clientlibrary.producer.Producer;
import com.aliyun.datahub.exception.ResourceNotFoundException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;

public class DatahubWriter {
    private static final Logger LOG = LoggerFactory.getLogger(DatahubWriter.class);

    private static void sleep(long milliSeconds) {
        try {
            TimeUnit.MILLISECONDS.sleep(milliSeconds);
        } catch (InterruptedException e) {
            // TODO:自行处理异常
        }
    }

    private static List genRecords(RecordSchema schema) {
        List recordEntries = new ArrayList<>();
        for (int cnt = 0; cnt < 10; ++cnt) {
            RecordEntry entry = new RecordEntry();
            entry.addAttribute("key1", "value1");
            entry.addAttribute("key2", "value2");

            TupleRecordData data = new TupleRecordData(schema);
            data.setField("field1", "testValue");
            data.setField("field2", 1);

            entry.setRecordData(data);
            recordEntries.add(entry);
        }
        return recordEntries;
    }

    private static void sendRecords(Producer producer, List recordEntries) {
        int maxRetry = 3;
        while (true) {
            try {
                // 自动选择shard写入
                producer.send(recordEntries, maxRetry);

                // 指定写入shard "0"
                // producer.send(recordEntries, "0", maxRetry);
                LOG.error("send records: {}", recordEntries.size());
                break;
            } catch (MalformedRecordException e) {
                // record 格式非法，根据业务场景选择忽略或直接抛异常
                LOG.error("write fail", e);
                throw e;
            } catch (InvalidParameterException |
                    AuthorizationFailureException |
                    NoPermissionException e) {
                // 请求参数非法
                // 签名不正确
                // 没有权限
                LOG.error("write fail", e);
                throw e;
            } catch (ShardNotFoundException e) {
                // shard 不存在, 如果不是写入自己指定的shard，可以不用处理
                LOG.error("write fail", e);
                sleep(1000);
            } catch (ResourceNotFoundException e) {
                // project, topic 或 shard 不存在
                LOG.error("write fail", e);
                throw e;
            } catch (DatahubClientException e) {
                // 基类异常，包含网络问题等，可以选择重试
                LOG.error("write fail", e);
                sleep(1000);
            }
        }
    }

    public static void main(String[] args) {
        // Endpoint以Region: 华东1为例，其他Region请按实际情况填写
                String endpoint = "http://dh-cn-hangzhou.aliyuncs.com";
                String accessId = "";
                String accessKey = "";
                String projectName = "";
                String topicName = "";

        RecordSchema schema = new RecordSchema();
        schema.addField(new Field("field1", FieldType.STRING));
        schema.addField(new Field("field2", FieldType.BIGINT));

        ProducerConfig config = new ProducerConfig(endpoint, accessId, accessKey);
        Producer producer = new Producer(projectName, topicName, config);

        // 根据场景控制循环
        boolean stop = false;
        try {
            while (!stop) {
                List recordEntries = genRecords(schema);
                sendRecords(producer, recordEntries);
            }
        } finally {
            // 确保资源正确释放
            producer.close();
        }
    }
}

上面示例中的RecordSchema也可以通过datahubclient动态获取：
RecordSchema recordSchema = datahubClient.getTopic(projectName, topicName).getRecordSchema();

初始化datahubClient的办法：

// https://help.aliyun.com/document_detail/158841.html// Endpoint以Region: 华东1为例，其他Region请按实际情况填写
String endpoint = "http://dh-cn-hangzhou.aliyuncs.com";
String accessId = "";
String accessKey = "";
// 创建DataHubClient实例
DatahubClient datahubClient = DatahubClientBuilder.newBuilder()
        .setDatahubConfig(
                new DatahubConfig(endpoint,
                        // 是否开启二进制传输，服务端2.12版本开始支持
                        new AliyunAccount(accessId, accessKey), true))
                        //专有云使用出错尝试将参数设置为           false
        // HttpConfig可不设置，不设置时采用默认值
        .setHttpConfig(new HttpConfig()
                .setCompressType(HttpConfig.CompressType.LZ4) // 读写数据推荐打开网络传输 LZ4压缩
                .setConnTimeout(10000))
        .build();

可能会出现的报错：

[{"errorcode":"LimitExceeded","index":1,"message":"The limit of throughput rate is exceeded."},{"errorcode":"LimitExceeded","index":2,"message":"The limit of throughput rate is exceeded."}],"requestId":"202203101111111111111"}

报错的原因，超过了相关指标。

指标描述及查看：

Web Console目前提供Metric功能，用户可以通过Metric界面查看准实时的Topic级别流量等信息，目前提供的指标有：

QPS:读写Request/Second
RPS: 读写Record/Second
Throughput:读写Throughput/Second (单位KB)
Latency:读写请求Latency/Request （单位微秒）
https://help.aliyun.com/document_detail/158786.html

相关限制描述【超过会报错】：

https://help.aliyun.com/document_detail/47441.html

以下三个指标都是基于数据包大小的，只是不同的维度：
单个String长度：是针对单个filed的
Http BodySize：这个限制是针对单个写入请求
Throughput限制：是某个时间点，所有请求加起来的大小。如果超限，则报错：

The limit of throughput rate is exceeded.

关于RPS批量提交场景的统计规则：

List recordEntries

producer.send(recordEntries, maxRetry);

调批量接口时，是不是List中的多少条，就有多少条Record。
从api调用看只有1次，但RPS可能很大，譬如List有1万条，那么RPS就是1万

sdk中批量api将待推送的数据作为一个整体发给datahub，然后DataHub收到后一条一条处理

Original: https://www.cnblogs.com/softidea/p/15988645.html
Author: 沧海一滴
Title: Datahub小结

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/545846/

转载文章受原作者版权保护。转载请注明原作者出处！

技术杂谈

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

redis的基本命令学习

1.简单理解redis 基于内存的key-value数据库基于c语言编写的，可以支持多种语言的api //set每秒11万次，取get 81000次支持数据持久化value可以是s…

技术杂谈 2023年6月21日
0086
Mixing a dll boost library with a static runtime is a really bad idea错误的解决

作者：朱金灿同事在使用boost库时遇到一个问题，在编译时出现一个错误：Mixing a dll boost library with a static runtime is a…

技术杂谈 2023年5月31日
0076
基于zookeeper集群的云平台-配置中心的功能设计

最近准备找工作面试，就研究了下基于zookeeper集群的配置中心。下面是自己设想的关于开源的基于zookeeper集群的云平台-配置中心的功能设计。大家觉得哪里有问题，请提出宝…

技术杂谈 2023年7月11日
0073
Word中三线表的问题（底线无法加粗）

博客园：当前访问的博文已被密码保护请输入阅读密码: Original: https://www.cnblogs.com/hxsyl/p/6512767.htmlAuthor: …

技术杂谈 2023年5月31日
0081
ord chr 字符串切片字母转数字

func CharToASCII(a string) int { return int(a[0]) } func ByteToASCII(a byte) int { return …

技术杂谈 2023年5月31日
0078
Codeforces Round #752 (Div. 2)

Codeforces Round #752 (Div. 2) A. Era 思路分析：答案其实就是这个数减去它改变位置后的pos即可。对于第一位如果不是1，那么它就要在前面插入…

技术杂谈 2023年7月24日
0093
安装bootcamp时提示“找不到$winpedriver$文件夹,请验证该文件夹是否和bootcamp处于同一文件夹内？”

问题：我苹果系统是10。8。3的装的win7 64位的！这个bootcamp是我在别人那里拷贝的，我装的时候就这样了，但是别人装是好好的，还有我在MAC系统下载bootcamp…

技术杂谈 2023年5月31日
00106
Centos7：maven打包构建项目失败，No compiler is provided in this environment.Perhaps。。。

环境： Centos7 如果你输入 javac -version，没有显示版本信息，那么大概率解决方法是这个 yum install java-devel安装原生的：再次运行 m…

技术杂谈 2023年7月24日
0094
Selenium 4 有哪些不一样？

转载请注明出处❤️ 作者：测试蔡坨坨原文链接：caituotuo.top/d59b986c.html 你好，我是测试蔡坨坨。众所周知，Selenium在2021年10月13号发…

技术杂谈 2023年7月11日
0091
MySQL中Join和inner join的区别，以及left join、right join之间的区别

点击阅读本文来自博客园，作者：一个程序员的成长，转载请注明原文链接：https://www.cnblogs.com/bingfengdada/p/15539727.html Or…

技术杂谈 2023年7月11日
0064
会话技术 cookie 和 Session（1）

CookieCookie 属于客户端会话技术，它是服务器发送给浏览器的小段文本信息，存储在客户端浏览器的内存中或硬盘上。当浏览器保存了Cookie 后，每次访问服务器，都会在HTT…

技术杂谈 2023年6月21日
00133
简单的实现微信获取openid

微信公众平台获取openid在公众号的开发中有很多用途，前段时间为实现用户使用公众号在登录一次以后可以免密登陆而使用了openid。开发过程中遇到了一些问题，在这里向需要且还没有获…

技术杂谈 2023年5月31日
0067
cookie、session和token理解

1. cookie Cookie是浏览器在电脑本地保存数据的一种机制，浏览器通过cookie这种机制可以在浏览器上保存用户在浏览器上浏览过的商品，或者在浏览器通过cookie记录一…

技术杂谈 2023年6月21日
0090
Timer

public class Timer1 { private final TaskQueue queue = new TaskQueue();//这是&#…

技术杂谈 2023年5月31日
0084
设计模式–依赖倒转原则

依赖倒转原则又称依赖倒置原则：抽象不应该依赖细节，细节应该依赖于抽象。说白了，就是针对接口编程，不要针对实现编程。依赖倒置原则包括三层含义： 1)高层模块不应该依赖低层模块，两…

技术杂谈 2023年6月1日
0071
Path Finder系统文件管理工具 mac中文

Original: https://www.cnblogs.com/aurora-123/p/16863372.htmlAuthor: 佛系女孩Title: Path Finder…

技术杂谈 2023年7月11日
0080

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Datahub小结

大家都在看