浪院长 | spark streaming的使用心得

今天。主要想谈谈使用星火直播的体会。

[En]

Today. Mainly want to talk about the experience of using spark streaming.

1,基本使用

主要是变换运算符。开拍。和状态运算符,它们实际上是根据源代码中的API手持设备或接口的组合进行编码的。

[En]

Mainly the transformation operator. Action . And state operators, which, in fact, are coded according to the combination of api handsets or interfaces in the source code.

事实上。要想更好地利用火花流,就必须掌握火花核、火花RPC、火花任务调度和火花并行的原理。

[En]

In fact. It is very necessary to master the principles of spark core,spark rpc,spark task scheduling and spark parallelism in order to make good use of spark streaming.

2,中间状态缓存

当谈到中间操作符时,我们肯定会想到像UpdateStateByKey这样的状态。这里面有很多预防措施。就像排序一样。维护KEY的超时机制。这适用于少量的数据量,尤其适用于少量维度的键。价值的情况并不大。

[En]

When it comes to intermediate operators, we will certainly think of states such as UpdateStateByKey. There are a lot of precautions in it. Like sequencing. The timeout mechanism of key is maintained. This is suitable for a small amount of data, especially for a small number of dimensions of key. The situation of value is not big.

当然,数据量增加了,如果想保持中间状态怎么办?事实上,这个时候肯定是第三方存储,比如redis、alluxio。Redis更适合具有超时机制的Key。而且数据量不能太大。

[En]

Of course, the amount of data has increased, what to do if you want to maintain the intermediate state? In fact, it must be third-party storage at this time, such as redis,alluxio. Redis is more suitable for key with timeout mechanism. And the amount of data must not be too large.

而alluxio就非常适合那种高吞吐量的。比方去重统计。

3。结果输出

direct streaming能保证仅一次处理,可是要求输出存储支持密等性。或者主动将结果更改为存在更新不存在插入。

当然,假设外部存储系统支持事务会更令人兴奋。它可以一次处理一次。

[En]

Of course, it would be even more exciting to assume that the external storage system supports transactions. It can be processed at exactly one time.

实际上在offset维护这个层面上,spark streaming 不同版本号于kafka不同版本号结合实现有非常大不同。

4。监控告警及故障自己主动恢复

我认为它会主动从监控警报和故障中恢复过来。与业务场景一样重要。因为无论生意变现得多好。我不能忍受这个系统,然后挂断电话。你不知道。因为你不能一天24小时盯着这个系统。而且很多公司都有主动进行故障恢复的KPI,比如3min,不可能手动查看故障并进行恢复,因此需要实施监控系统。

[En]

I think it takes the initiative to recover from monitoring alarms and failures. Is no less important than a business scenario. Because no matter how good the business is realized. I can’t stand the system and hang up. You don’t know. Because you can’t stare at the system 24 hours a day. And many companies have kpi for fault recovery on their own initiative, such as 3min, it is impossible to check the fault and recover manually, so they need to implement a monitoring system.

5,调优

调优对于spark streaming非常重要。由于一个批次处理延迟就会导致job堆积,结果输出延迟,深圳任务挂掉数据丢失。调优事实上最注重对spark 原理把控,数据量的了解及资源和数据的关系。

6,源代码

源代码阅读,以帮助您更透彻地理解原理。

[En]

Source code reading, in order to help you understand the principle more thoroughly.

它将分为三个部分:

[En]

It will be divided into three parts:

spark streaming 与kafka-0.8.2 direct stream。

spark streaming 与kafka-0.8.2 receiver based stream。

spark streaming 与kafka-0.10.2 direct api。

这些内容比你的还多。本周日从晚上8点开始到10:00东南融通将有一个QQ直播。对这些内容感兴趣的朋友可以低价扫码游览。这可以看作是对郎健创作的支持。

[En]

There are more of these contents than you. This Sunday * from 8: 00 p.m. to 10: 00 * Longtop is going to have a * qq live broadcast * . Friends who are interested in these contents can scan the code tour at a low cost. It can be regarded as a support for Langjian’s creation.

当然,准备是两三次。一次两个小时,实际次数取决于效率。

[En]

Of course, the preparation is two or three times. Two hours at a time, the actual number of times depends on efficiency.

直接联系直播微信158570986

[En]

Direct contact with LVB Wechat * 158570986 *

浪院长 | spark streaming的使用心得

当然,如果你喜欢波尖,也希望增加波尖知识星球,球友们可以享受免费直播。

[En]

Of course, if you like the wave tip, but also hope to increase the wave tip knowledge planet, the ball players can enjoy free live broadcast.

浪院长 | spark streaming的使用心得

Original: https://www.cnblogs.com/zhchoutai/p/9893593.html
Author: zhchoutai
Title: 浪院长 | spark streaming的使用心得

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/7164/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

最近整理资源【免费获取】:   👉 程序员最新必读书单  | 👏 互联网各方向面试题下载 | ✌️计算机核心资源汇总