大叔问题定位分享(51)hbase有一个region一直处于rit状态(超时)

【自取】最近整理的,有需要可以领取学习:

hbase有一个region一直处于rit状态,对该region进行move/assign/unassign都没有反应,使用hbck2进行assigns/unassigns也没有反应

检查和修改HBase的当前锁定状态发现

[En]

Check and modify the current lock status discovery of hbase

hbase(main):003:0> list_locks
NAMESPACE(default)
Lock type: SHARED, count: 1

TABLE(apache_atlas_janus)
Lock type: SHARED, count: 1

REGION(05021681c404140ffcee58ea06f6c7d1)
Lock type: EXCLUSIVE, procedure: {"className"=>"org.apache.hadoop.hbase.master.assignment.UnassignProcedure", "procId"=>"2", "submittedTime"=>"1655288642895", "owner"=>"root", "state"=>"WAITING_TIMEOUT", "stackId"=>[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111], "lastUpdate"=>"1655350334877", "timeout"=>600000, "stateMessage"=>[{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}, "attempt"=>112}], "locked"=>true}

Took 0.0737 seconds
=> [{"resourceType"=>"NAMESPACE", "resourceName"=>"default", "lockType"=>"SHARED", "sharedLockCount"=>1}, {"resourceType"=>"TABLE", "resourceName"=>"apache_atlas_janus", "lockType"=>"SHARED", "sharedLockCount"=>1}, {"resourceType"=>"REGION", "resourceName"=>"05021681c404140ffcee58ea06f6c7d1", "lockType"=>"EXCLUSIVE", "exclusiveLockOwnerProcedure"=>{"className"=>"org.apache.hadoop.hbase.master.assignment.UnassignProcedure", "procId"=>"2", "submittedTime"=>"1655288642895", "owner"=>"root", "state"=>"WAITING_TIMEOUT", "stackId"=>[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111], "lastUpdate"=>"1655350334877", "timeout"=>600000, "stateMessage"=>[{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}, "attempt"=>112}], "locked"=>true}, "sharedLockCount"=>0}]

改region上有一把lock,是procId=2的procedure加上的,查看所有的procedure

hbase(main):001:0> list_procedures
 PID Name State Submitted Last_Update Parameters
 2 org.apache.hadoop.hbase.master.assignment.UnassignProcedure WAITING_TIMEOUT 2022-06-15 18:24:02 +0800 2022-06-16 11:32:14 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}, "attempt"=>112}]
 3 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-15 18:25:02 +0800 2022-06-15 18:25:02 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 4 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-15 18:26:03 +0800 2022-06-15 18:26:03 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 37 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:43:25 +0800 2022-06-16 10:43:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 38 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:44:25 +0800 2022-06-16 10:44:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 39 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:45:25 +0800 2022-06-16 10:45:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 40 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 10:45:48 +0800 2022-06-16 10:45:48 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}}]
 41 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:46:25 +0800 2022-06-16 10:46:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 42 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:54:54 +0800 2022-06-16 10:54:54 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 43 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 11:11:45 +0800 2022-06-16 11:11:45 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}}]
 44 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 11:13:14 +0800 2022-06-16 11:13:14 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "override"=>true}]
 45 org.apache.hadoop.hbase.master.procedure.DisableTableProcedure RUNNABLE 2022-06-16 11:17:20 +0800 2022-06-16 11:17:20 +0800 [{}, {"userInfo"=>{"effectiveUser"=>"root"}, "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "skipTableStateCheck"=>false}]
 1556 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 11:30:11 +0800 2022-06-16 11:30:11 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}}]
13 row(s)
Took 0.6656 seconds

发现该命令只是触发了多个程序试图操作该区域,然后在第一个程序上卡住了,因为第一个程序持有锁

[En]

It is found that the command just triggered that many procedure are trying to operate the region, and then get stuck on the first procedure, because the first procedure holds the lock

hbase hbck -j hbase-operator-tools-1.1.0/hbase-hbck2/hbase-hbck2-1.1.0.jar bypass -o -r $PROCEDURE_PID

通过hbck 2绕过这些过程,问题就解决了。

[En]

Bypass these procedure through hbck2, and the problem is solved.

Original: https://www.cnblogs.com/barneywill/p/16381778.html
Author: 匠人先生
Title: 大叔问题定位分享(51)hbase有一个region一直处于rit状态(超时)

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/6933/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

发表回复

登录后才能评论
免费咨询
免费咨询
扫码关注
扫码关注
联系站长

站长Johngo!

大数据和算法重度研究者!

持续产出大数据、算法、LeetCode干货,以及业界好资源!

2022012703491714

微信来撩,免费咨询:xiaozhu_tec

分享本页
返回顶部