从行业角度看,数仓领域的未来是什么?

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:300a80d6-5225-4beb-bb01-a2f09971261c

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:40ab2384-bfb6-4014-b4cc-678bb70ae1db

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:7b33efdd-eb36-4635-a53f-420ccf7ba92f

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:7cdcbff2-eaf8-4f75-b280-3f3d008a2199

IDC 2021 年报告数据显示,2021 年全球大数据软件市场规模达预计可达 5414.2 亿人民币。”十三五”时期,我国大数据产业快速起步,产业发展取得显著成效,《”十四五”大数据产业发展规划》更是提到:到 2025 年,我国大数据产业规模预计将突破 3 万亿元。

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:8f093a93-dd2d-4de0-aa51-6f38a253cb71

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:18c76e62-1ee9-44c4-9036-d3ff0ff673b0

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:33562fc9-8c6c-4b63-bdc5-3080ad8ef95f

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:e1ad2684-7aa4-4e87-abdf-d525ae6e7e1f

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:403dbf88-607f-4690-a5ed-e80dc6b67921

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:ff327565-7589-4cba-ad80-b8fb08c0e83b

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:8b9a1a9c-7e28-4aa5-ad46-6a615354e2b3

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:c7465383-c6a5-41a2-a5e5-c48bee3678d4

最近 10 年,以 hadoop 技术体系为代表的大数据平台大规模部署,大大小小的企业和政府部门都搭建了大数据平台和分析应用,以隔天和小时级数据延迟的应用得到了普及;以 Flink 为代表的实时计算引擎解决了数据统计场景的时效性问题。

随着业务的发展和技术的进步,业务部门不再满足于 T+1 的分析需求和固化的实时统计,更期望业务发生后秒级/分钟级延迟即可看到统计结果;同时,功能上期望实现交互性探查分析数据,毫秒/秒级返回结果保持良好的用户体验。

在新的企业级数据架构中,有些已经构建了大数据平台的企业,会使用云原生数据仓库构建实时数仓来满足有高时效性要求的业务,以此作为 Hadoop 平台的补充;有些数据量低于 1PB,且没有构建 Hadoop 等大数据平台的企业,则直接以云原生数据仓库构建轻量级数据仓库。

大数据应用逐步从互联网企业和政府部门,并深入到工业企业。各行业都先后进行了业务数据的大集中、用户行为数据和 IOT 数据的广泛采集存储,企业和政府单位的数据量更是以每年呈现 30%以上的增长速度。

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:939b8246-1e2e-470b-8617-d4b8a10153fe

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:d6c927e2-d7fe-4696-bd77-aaf84702887b

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:efd4797d-c3c8-4f84-afe7-2ff6b81b5f5b

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:64b20fd3-ea4c-4267-8bc5-739fb47598f8

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:a86a38c3-90a0-46ea-b181-f243e6cbdd1d

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:a204ccf6-755e-4ff7-9aeb-4838303a0c69

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:c484bfa6-176d-4de5-b0ff-5a7f2991e5ae

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:cc52ff6b-37d0-4e47-82e6-93a43e65d959

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:24319079-949d-4cfd-bcb4-632386ff507e

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:2129441a-03c1-4114-8392-2e45fb4c1ea5

业务上云从而数据上云,也在推动数据处理平台的云原生升级。

近年来,以 Snowflake 为代表的云原生数据仓库得到了客户的认同,市场上取得了巨大的成功。其核心功能和技术点是云原生的架构设计,利用 IAAS 的高可用和资源池化特性,通过存储计算分离、多租户隔离、容器化技术,提供数据仓库的扩展性、稳定性、可维护性和易用性,整体上提高资源利用率。

国际上,除了 Snowflake 之外,谷歌的 BigQuery、AWS 的 RedShift、Azure 的 Synapse 都实现了云原生的架构升级,实现了存储计算分离和多租户管理。Databricks、Fireblot 等新生的厂商及产品如雨后春笋一样涌现出来。

在国内,阿里云、华为云、腾讯云都推出了自己的云原生数据仓库产品;PingCap 的 TiDB、鼎石科技的 StarRocks 等独立产品也选择了云原生道路。

OLAP 产品有如下几个技术趋势:

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:67614d67-a2f0-40e7-82f4-5a70d03391e1

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:86942cee-4f14-4d4e-b13f-e5a30469d27a

对数据存储层进行统一抽象,灵活采用 HDFS 块存储或 S3 等对象存储作为数据存储载体,最终实现存储服务化,便于解决存储扩展性、读写吞吐瓶颈问题、数据一致性问题,同时能大幅降低存储成本。此外,实现存储服务化后,对于产品的跨云兼容和多云部署带来方便。

由于 OLAP 应用负载的波动特点,特别在支持多租户的场景下,通过计算资源池化,根据实时负载进行计算资源统一调度管理,实现资源隔离的同时,又能支持资源共享和实时弹性扩缩。从而提高集群整体利用率。

在企业级应用中,OLAP 场景可以细分为交互查询和批量计算,前者要求毫秒/秒级响应并支持高并发查询,后者可以接受分钟/小时级延迟,但要求计算性能的稳定性和较好的 failover 机制。自适应支持多场景的混合负载是 OLAP 产品的核心能力。

OLAP 平台中的计算资源、内存、网络带宽是最宝贵的资源,系统资源利用率通常围绕这三个资源进行优化。很多产品开始在计算 Serverless 化、分布式缓存等方向进行探索。

字节跳动内部有非常多分析引擎,包括 ClickHouse、 Druid、 Elastic Search、 Kylin 等,为什么答案是 ClickHouse,下篇将为您揭晓!

基于开源 ClickHouse 的分析型数据库,支持用户交互式分析 PB 级别数据,通过多种自研表引擎,灵活支持各类数据分析和应用。

欢迎关注 字节跳动数据平台微信公众号,回复【1】进入官方交流群

Original: https://www.cnblogs.com/bytedata/p/16378296.html
Author: 字节跳动数据平台
Title: 从行业角度看,数仓领域的未来是什么?

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/561889/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球