大叔问题定位分享(51)hbase有一个region一直处于rit状态(超时)

hbase有一个region一直处于rit状态,对该region进行move/assign/unassign都没有反应,使用hbck2进行assigns/unassigns也没有反应

检查和修改HBase的当前锁定状态发现

[En]

Check and modify the current lock status discovery of hbase

hbase(main):003:0> list_locks
NAMESPACE(default)
Lock type: SHARED, count: 1

TABLE(apache_atlas_janus)
Lock type: SHARED, count: 1

REGION(05021681c404140ffcee58ea06f6c7d1)
Lock type: EXCLUSIVE, procedure: {"className"=>"org.apache.hadoop.hbase.master.assignment.UnassignProcedure", "procId"=>"2", "submittedTime"=>"1655288642895", "owner"=>"root", "state"=>"WAITING_TIMEOUT", "stackId"=>[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111], "lastUpdate"=>"1655350334877", "timeout"=>600000, "stateMessage"=>[{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}, "attempt"=>112}], "locked"=>true}

Took 0.0737 seconds
=> [{"resourceType"=>"NAMESPACE", "resourceName"=>"default", "lockType"=>"SHARED", "sharedLockCount"=>1}, {"resourceType"=>"TABLE", "resourceName"=>"apache_atlas_janus", "lockType"=>"SHARED", "sharedLockCount"=>1}, {"resourceType"=>"REGION", "resourceName"=>"05021681c404140ffcee58ea06f6c7d1", "lockType"=>"EXCLUSIVE", "exclusiveLockOwnerProcedure"=>{"className"=>"org.apache.hadoop.hbase.master.assignment.UnassignProcedure", "procId"=>"2", "submittedTime"=>"1655288642895", "owner"=>"root", "state"=>"WAITING_TIMEOUT", "stackId"=>[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111], "lastUpdate"=>"1655350334877", "timeout"=>600000, "stateMessage"=>[{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}, "attempt"=>112}], "locked"=>true}, "sharedLockCount"=>0}]

改region上有一把lock,是procId=2的procedure加上的,查看所有的procedure

hbase(main):001:0> list_procedures
 PID Name State Submitted Last_Update Parameters
 2 org.apache.hadoop.hbase.master.assignment.UnassignProcedure WAITING_TIMEOUT 2022-06-15 18:24:02 +0800 2022-06-16 11:32:14 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}, "attempt"=>112}]
 3 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-15 18:25:02 +0800 2022-06-15 18:25:02 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 4 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-15 18:26:03 +0800 2022-06-15 18:26:03 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 37 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:43:25 +0800 2022-06-16 10:43:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 38 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:44:25 +0800 2022-06-16 10:44:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 39 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:45:25 +0800 2022-06-16 10:45:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 40 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 10:45:48 +0800 2022-06-16 10:45:48 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}}]
 41 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:46:25 +0800 2022-06-16 10:46:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 42 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:54:54 +0800 2022-06-16 10:54:54 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
 43 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 11:11:45 +0800 2022-06-16 11:11:45 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}}]
 44 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 11:13:14 +0800 2022-06-16 11:13:14 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "override"=>true}]
 45 org.apache.hadoop.hbase.master.procedure.DisableTableProcedure RUNNABLE 2022-06-16 11:17:20 +0800 2022-06-16 11:17:20 +0800 [{}, {"userInfo"=>{"effectiveUser"=>"root"}, "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "skipTableStateCheck"=>false}]
 1556 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 11:30:11 +0800 2022-06-16 11:30:11 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}}]
13 row(s)
Took 0.6656 seconds

发现该命令只是触发了多个程序试图操作该区域,然后在第一个程序上卡住了,因为第一个程序持有锁

[En]

It is found that the command just triggered that many procedure are trying to operate the region, and then get stuck on the first procedure, because the first procedure holds the lock

hbase hbck -j hbase-operator-tools-1.1.0/hbase-hbck2/hbase-hbck2-1.1.0.jar bypass -o -r $PROCEDURE_PID

通过hbck 2绕过这些过程,问题就解决了。

[En]

Bypass these procedure through hbck2, and the problem is solved.

Original: https://www.cnblogs.com/barneywill/p/16381778.html
Author: 匠人先生
Title: 大叔问题定位分享(51)hbase有一个region一直处于rit状态(超时)

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/6933/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

最近整理资源【免费获取】:   👉 程序员最新必读书单  | 👏 互联网各方向面试题下载 | ✌️计算机核心资源汇总