使用英特尔 VTune Profiler 进行挖矿CPU指令数据分析

门罗币挖矿指令:

Collection and Platform Info
Application Command Line: D:\share\xmrig-6.18.0-msvc-win64\xmrig-6.18.0\xmrig.exe -o fr.minexmr.com:443 -u 4971qQbWrJRUGDvEUUvqsw29MNz68Cus7d6DAsmTmGoZd4o9AL9FAJiFSvo5uZK1ezguR46n689Rk3zApMZTcB3gQfDMULX -p x –tls
Operating System: Microsoft Windows 10
Computer Name: DESKTOP-ALRVTLS
Result Size: 1.7 GB 采集的全量数据规模
Collection start time: 15:29:48 02/08/2022 UTC
Collection stop time: 15:32:55 02/08/2022 UTC
Collector Type: Event-based sampling driver
Finalization mode: Fast. If the number of collected samples exceeds the threshold, this mode limits the number of processed samples to speed up post-processing.

CPU
Name: Intel(R) microarchitecture code named Rocketlake
Frequency: 2.6 GHz
Logical CPU Count: 12
Cache Allocation Technology
Level 2 capability: not detected
Level 3 capability: not detected

分析类型:

使用英特尔 VTune  Profiler 进行挖矿CPU指令数据分析

运行截图:

使用英特尔 VTune  Profiler 进行挖矿CPU指令数据分析

运行近2分钟,我们看下数据结果:

使用英特尔 VTune  Profiler 进行挖矿CPU指令数据分析

全量数据采集有1.7GB!还是比较恐怖的。。。

看下整体结果:

使用英特尔 VTune  Profiler 进行挖矿CPU指令数据分析

但从性能上看的话,瓶颈在backend。

看看单点的retiring,主要的CPU指令都在做啥:

使用英特尔 VTune  Profiler 进行挖矿CPU指令数据分析

FP的浮点运算比较多,13%

front-end的,cache miss、分支预测失误这些,占比很少:

使用英特尔 VTune  Profiler 进行挖矿CPU指令数据分析

backend的,

使用英特尔 VTune  Profiler 进行挖矿CPU指令数据分析

使用英特尔 VTune  Profiler 进行挖矿CPU指令数据分析

Long-latency operations like divides and memory operations can cause this, as can too many operations being directed to a single execution port (for example, more multiply operations arriving in the back-end per cycle than the execution unit can support).

从描述看,是L2 cache拖后腿了,L1的100%,L2的太低,貌似是这个意思。

看下call stack,耗时最多的就1个module。

使用英特尔 VTune  Profiler 进行挖矿CPU指令数据分析

我们看下event count:

使用英特尔 VTune  Profiler 进行挖矿CPU指令数据分析

将hardware event type导出来:

bash;gutter:true; Hardware Events Hardware Event Type Hardware Event Count ARITH.DIVIDER_ACTIVE 571,366,714,095 ==>arith.divider_active [Cycles when divide unit is busy executing divide or square root operations. Accounts for integer and floating-point operations] baclears.any [Counts the total number when the front end is resteered, mainly when the BPU cannot provide a correct prediction [当除法单元忙于执行除法或平方根运算时循环。 整数和浮点运算的帐户] baclears.any [计算前端重新转向时的总数,主要是当BPU无法提供正确的预测时<strong>*</strong>*除法、平方根运算,符合挖矿的特质!!! BACLEARS.ANY 24,000,720 ===》The BACLEARS event counts the number of times the front end is resteered, mainly when the Branch Prediction Unit cannot provide a correct prediction and this is corrected by the Branch Address Calculator at the front end. The BACLEARS.ANY event counts the number of baclears for any type of branch. 翻译过来是:BACLEARS 事件计算前端被重新引导的次数,主要是在分支预测单元无法提供正确预测并且由前端的分支地址计算器纠正时。 BACLEARS.ANY 事件计算任何类型分支的 baclears 数量。==》看来是分支预测miss哪里的! BR_INST_RETIRED.ALL_BRANCHES 179,656,042,170 ==>ALL_BRANCHES 计算退出的任何分支指令的数量。 分支预测预测分支目标并使处理器能够在知道分支真实执行路径之前很久就开始执行指令。 所有分支都使用分支预测单元 (BPU) 进行预测。 该单元不仅根据分支的 EIP,还根据执行到达该 EIP 的执行路径来预测目标地址。 BPU 可以有效地预测以下分支类型:条件分支、直接调用和跳转、间接调用和跳转、返回。 BR_MISP_RETIRED.ALL_BRANCHES 695,542,005 CPU_CLK_UNHALTED.DISTRIBUTED 2,762,526,000,000 ==》此事件在活动超线程(即 C0 中的超线程)之间分配循环计数。 超线程在执行 HLT 或 MWAIT 指令时变为非活动状态。 如果所有其他超线程都处于非活动状态(或禁用或不存在),则所有计数都归因于该超线程。 要在核心处于活动状态时获得完整计数,请将每个超线程的计数相加。 CPU_CLK_UNHALTED.REF_TSC 2,522,358,800,000 CPU_CLK_UNHALTED.THREAD 3,122,854,800,000 CPU_CLK_UNHALTED.THREAD_P 3,103,054,654,575 CYCLE_ACTIVITY.CYCLES_L1D_MISS 2,207,076,621,210 ==》Cycles while L1 cache miss demand load is outstanding.</p> <pre><code>CYCLE_ACTIVITY.CYCLES_MEM_ANY 2,970,053,910,135 CYCLE_ACTIVITY.STALLS_L1D_MISS 1,527,559,582,665 CYCLE_ACTIVITY.STALLS_L2_MISS 226,650,679,950 CYCLE_ACTIVITY.STALLS_L3_MISS 162,225,486,675 CYCLE_ACTIVITY.STALLS_MEM_ANY 1,551,274,653,810 CYCLE_ACTIVITY.STALLS_TOTAL 1,592,284,776,840 DSB2MITE_SWITCHES.PENALTY_CYCLES 1,669,550,085 DTLB_LOAD_MISSES.STLB_HIT:cmask=1 5,694,170,820 DTLB_LOAD_MISSES.WALK_ACTIVE 84,254,527,560 DTLB_STORE_MISSES.STLB_HIT:cmask=1 292,508,775 DTLB_STORE_MISSES.WALK_ACTIVE 370,511,115 EXE_ACTIVITY.1_PORTS_UTIL 273,300,409,950 EXE_ACTIVITY.2_PORTS_UTIL 390,990,586,485 EXE_ACTIVITY.BOUND_ON_STORES 195,000,585 FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE 563,478,403,845 FRONTEND_RETIRED.ANY_DSB_MISS 24,163,691,340 FRONTEND_RETIRED.DSB_MISS 660,046,200 FRONTEND_RETIRED.L2_MISS 24,001,680 FRONTEND_RETIRED.LATENCY_GE_16 45,003,150 FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1 25,053,253,605 FRONTEND_RETIRED.LATENCY_GE_4 232,516,275 ICACHE_16B.IFDATA_STALL 2,205,039,690 ICACHE_64B.IFTAG_STALL 1,176,017,640 IDQ.DSB_CYCLES_ANY 710,761,066,140 IDQ.DSB_CYCLES_OK 619,500,929,250 IDQ.DSB_UOPS 3,580,955,371,425 IDQ.MITE_CYCLES_ANY 92,280,138,420 IDQ.MITE_CYCLES_OK 67,200,100,800 IDQ.MITE_UOPS 335,040,502,560 IDQ.MS_SWITCHES 657,019,710 IDQ.MS_UOPS 4,468,634,055 IDQ_UOPS_NOT_DELIVERED.CORE 351,316,053,945 IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE 38,835,116,505 ILD_STALL.LCP 7,500,135 INST_RETIRED.ANY 3,769,987,000,000 INST_RETIRED.NOP 90,000,135 INT_MISC.CLEAR_RESTEER_CYCLES 7,215,129,870 INT_MISC.RECOVERY_CYCLES:cmask=1:e=yes 975,017,550 INT_MISC.UOP_DROPPING 16,350,049,050 L1D_PEND_MISS.FB_FULL 3,135,009,405 L1D_PEND_MISS.FB_FULL_PERIODS 180,000,540 L1D_PEND_MISS.L2_STALL 2,910,008,730 L1D_PEND_MISS.PENDING 2,753,288,259,840 L2_RQSTS.ALL_RFO 37,389,560,835 L2_RQSTS.RFO_HIT 24,540,368,100 LD_BLOCKS.STORE_FORWARD 3,000,090 LD_BLOCKS_PARTIAL.ADDRESS_ALIAS 7,704,231,120 MACHINE_CLEARS.COUNT 85,502,565 MEM_INST_RETIRED.ALL_STORES 200,160,600,480 MEM_INST_RETIRED.ANY 732,047,196,135 MEM_INST_RETIRED.LOCK_LOADS 15,001,050 MEM_INST_RETIRED.SPLIT_LOADS 9,000,270 MEM_INST_RETIRED.SPLIT_STORES 12,000,360 MEM_INST_RETIRED.STLB_MISS_LOADS 1,413,042,390 MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT 600,330 MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM 2,401,320 MEM_LOAD_RETIRED.FB_HIT 136,277,038,725 MEM_LOAD_RETIRED.L1_HIT 336,031,008,090 MEM_LOAD_RETIRED.L1_MISS 60,759,911,385 MEM_LOAD_RETIRED.L2_HIT 54,858,822,870 MEM_LOAD_RETIRED.L3_HIT 4,997,549,265 MEM_LOAD_RETIRED.L3_MISS 456,191,520 OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD:cmask=4 9,735,029,205 OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD 2,673,818,021,430 OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO 1,002,168,006,495 RESOURCE_STALLS.SCOREBOARD 5,067,152,010 TOPDOWN.BACKEND_BOUND_SLOTS 9,234,752,770,425 TOPDOWN.SLOTS 13,658,454,097,535 UOPS_DECODED.DEC0 33,000,099,000 UOPS_DECODED.DEC0:cmask=1 17,385,052,155 UOPS_DISPATCHED.PORT_0 910,771,366,155 UOPS_DISPATCHED.PORT_1 994,651,491,975 UOPS_DISPATCHED.PORT_2_3 534,780,802,170 UOPS_DISPATCHED.PORT_4_9 223,530,335,295 UOPS_DISPATCHED.PORT_5 850,201,275,300 UOPS_DISPATCHED.PORT_6 899,491,349,235 UOPS_DISPATCHED.PORT_7_8 207,810,311,715 UOPS_EXECUTED.CYCLES_GE_3 855,031,282,545 UOPS_EXECUTED.THREAD 4,300,326,450,480 UOPS_ISSUED.ANY 4,063,476,095,205 UOPS_RETIRED.SLOTS 3,905,945,858,910 </code></pre> <pre><code> 我++,太多了,写个程序排序下再分析。 https://perfmon-events.intel.com/icelake.html 很多事件的定义在这个链接里可以找到。 ;gutter:true;
TOPDOWN.SLOTS 13658454097535 ==》pass,分析用的吧
TOPDOWN.BACKEND_BOUND_SLOTS 9234752770425 ==》同上
UOPS_EXECUTED.THREAD 4300326450480 ==》Number of uops to be executed per-thread each cycle. 对挖矿检测应该没啥用
UOPS_ISSUED.ANY 4063476095205 ==>Uops that Resource Allocation Table (RAT) issues to Reservation Station (RS). 对挖矿检测应该没啥用
UOPS_RETIRED.SLOTS 3905945858910 ==》Counts number of retirement slots used.

INST_RETIRED.ANY 3769987000000 ==>This event counts the number of instructions that retire execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. The counter continues counting during hardware interrupts, traps, and inside interrupt handlers. 此事件计算退出执行的指令数。 对于由多个微操作组成的指令,此事件计算指令的最后一个微操作的退出。 计数器在硬件中断、陷阱和内部中断处理程序期间继续计数。********
IDQ.DSB_UOPS 3580955371425 ==》μops coming from the Decoded ICache.

CPU_CLK_UNHALTED.THREAD 3122854800000 ==>Counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling. 计算线程未处于暂停状态时的线程周期数。 线程在运行 HLT 指令时进入暂停状态。 由于功率或热节流,核心频率可能会不时改变。
CPU_CLK_UNHALTED.THREAD_P 3103054654575 ==》同上
CYCLE_ACTIVITY.CYCLES_MEM_ANY 2970053910135 ==》Cycles while memory subsystem has an outstanding load.在内存子系统具有未完成负载时的循环。
CPU_CLK_UNHALTED.DISTRIBUTED 2762526000000 ==》This event distributes cycle counts between active hyperthreads, i.e., those in C0. A hyperthread becomes inactive when it executes the HLT or MWAIT instructions. If all other hyperthreads are inactive (or disabled or do not exist), all counts are attributed to this hyperthread. To obtain the full count when the Core is active, sum the counts from each hyperthread. 此事件在活动超线程(即 C0 中的超线程)之间分配循环计数。 超线程在执行 HLT 或 MWAIT 指令时变为非活动状态。 如果所有其他超线程都处于非活动状态(或禁用或不存在),则所有计数都归因于该超线程。 要在核心处于活动状态时获得完整计数,请将每个超线程的计数相加。
L1D_PEND_MISS.PENDING 2753288259840 ==》Counts duration of L1D miss outstanding, that is each cycle number of Fill Buffers (FB) outstanding required by Demand Reads. FB either is held by demand loads, or it is held by non-demand loads and gets hit at least once by demand. The valid outstanding interval is defined until the FB deallocation by one of the following ways: from FB allocation, if FB is allocated by demand from the demand Hit FB, if it is allocated by hardware or software prefetch.Note: In the L1D, a Demand Read contains cacheable or noncacheable demand loads, including ones causing cache-line splits and reads due to page walks resulted from any request type 计算未完成的 L1D 未命中的持续时间,即需求读取所需的未完成填充缓冲区 (FB) 的每个周期数。FB 要么由需求负载持有,要么由非需求负载持有并在 至少一次按需求。有效的未完成间隔通过以下方式之一定义直到 FB 释放:从 FB 分配,如果 FB 是按需求分配的 从需求 Hit FB,如果它是通过硬件或软件预取分配的。注意: 在 L1D 中,Demand Read 包含可缓存或不可缓存的需求负载,包括由于任何请求类型导致的页面遍历而导致缓存行拆分和读取的负载。",
OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD 2673818021430
CPU_CLK_UNHALTED.REF_TSC 2522358800000
CYCLE_ACTIVITY.CYCLES_L1D_MISS 2207076621210 ==>L1 缓存未命中需求负载未完成时的周期。 Cycles while L1 cache miss demand load is outstanding.

CYCLE_ACTIVITY.STALLS_TOTAL 1592284776840
CYCLE_ACTIVITY.STALLS_MEM_ANY 1551274653810
CYCLE_ACTIVITY.STALLS_L1D_MISS 1527559582665 ==>Execution stalls while L1 cache miss demand load is outstanding. 当 L1 高速缓存未命中需求负载未完成时,执行会停止。
OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO 1002168006495 ==>Counts the number of offcore outstanding demand rfo Reads transactions in the super queue every cycle. The ‘Offcore outstanding’ state of the transaction lasts from the L2 miss until the sending transaction completion to requestor (SQ deallocation). See the corresponding Umask under OFFCORE_REQUESTS. 计算每个周期的超队列中的核心未完成需求 rfo 读取事务的数量。
UOPS_DISPATCHED.PORT_1 994651491975
UOPS_DISPATCHED.PORT_0 910771366155
UOPS_DISPATCHED.PORT_6 899491349235
UOPS_EXECUTED.CYCLES_GE_3 855031282545
UOPS_DISPATCHED.PORT_5 850201275300
MEM_INST_RETIRED.ANY 732047196135
IDQ.DSB_CYCLES_ANY 710761066140
IDQ.DSB_CYCLES_OK 619500929250
ARITH.DIVIDER_ACTIVE 571366714095
FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE 563478403845
UOPS_DISPATCHED.PORT_2_3 534780802170
EXE_ACTIVITY.2_PORTS_UTIL 390990586485
IDQ_UOPS_NOT_DELIVERED.CORE 351316053945
MEM_LOAD_RETIRED.L1_HIT 336031008090
IDQ.MITE_UOPS 335040502560
EXE_ACTIVITY.1_PORTS_UTIL 273300409950
CYCLE_ACTIVITY.STALLS_L2_MISS 226650679950
UOPS_DISPATCHED.PORT_4_9 223530335295
UOPS_DISPATCHED.PORT_7_8 207810311715
MEM_INST_RETIRED.ALL_STORES 200160600480
BR_INST_RETIRED.ALL_BRANCHES 179656042170
CYCLE_ACTIVITY.STALLS_L3_MISS 162225486675
MEM_LOAD_RETIRED.FB_HIT 136277038725
IDQ.MITE_CYCLES_ANY 92280138420
DTLB_LOAD_MISSES.WALK_ACTIVE 84254527560
IDQ.MITE_CYCLES_OK 67200100800
MEM_LOAD_RETIRED.L1_MISS 60759911385
MEM_LOAD_RETIRED.L2_HIT 54858822870
IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE 38835116505
L2_RQSTS.ALL_RFO 37389560835
UOPS_DECODED.DEC0 33000099000
FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1 25053253605
L2_RQSTS.RFO_HIT 24540368100
FRONTEND_RETIRED.ANY_DSB_MISS 24163691340
UOPS_DECODED.DEC0:cmask=1 17385052155
INT_MISC.UOP_DROPPING 16350049050
OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD:cmask=4 9735029205
LD_BLOCKS_PARTIAL.ADDRESS_ALIAS 7704231120
INT_MISC.CLEAR_RESTEER_CYCLES 7215129870
DTLB_LOAD_MISSES.STLB_HIT:cmask=1 5694170820
RESOURCE_STALLS.SCOREBOARD 5067152010
MEM_LOAD_RETIRED.L3_HIT 4997549265
IDQ.MS_UOPS 4468634055
L1D_PEND_MISS.FB_FULL 3135009405
L1D_PEND_MISS.L2_STALL 2910008730
ICACHE_16B.IFDATA_STALL 2205039690
DSB2MITE_SWITCHES.PENALTY_CYCLES 1669550085
MEM_INST_RETIRED.STLB_MISS_LOADS 1413042390
ICACHE_64B.IFTAG_STALL 1176017640
INT_MISC.RECOVERY_CYCLES:cmask=1:e=yes 975017550
BR_MISP_RETIRED.ALL_BRANCHES 695542005
FRONTEND_RETIRED.DSB_MISS 660046200
IDQ.MS_SWITCHES 657019710
MEM_LOAD_RETIRED.L3_MISS 456191520
DTLB_STORE_MISSES.WALK_ACTIVE 370511115
DTLB_STORE_MISSES.STLB_HIT:cmask=1 292508775
FRONTEND_RETIRED.LATENCY_GE_4 232516275
EXE_ACTIVITY.BOUND_ON_STORES 195000585
L1D_PEND_MISS.FB_FULL_PERIODS 180000540
INST_RETIRED.NOP 90000135
MACHINE_CLEARS.COUNT 85502565
FRONTEND_RETIRED.LATENCY_GE_16 45003150
FRONTEND_RETIRED.L2_MISS 24001680
BACLEARS.ANY 24000720
MEM_INST_RETIRED.LOCK_LOADS 15001050
MEM_INST_RETIRED.SPLIT_STORES 12000360
MEM_INST_RETIRED.SPLIT_LOADS 9000270
ILD_STALL.LCP 7500135
LD_BLOCKS.STORE_FORWARD 3000090
MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM 2401320
MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT 600330

还是多了,继续压缩:

TOPDOWN 22893206867960
CPU_CLK_UNHALTED 11510794254575
CYCLE_ACTIVITY 10237125711285
IDQ 5410863762360 ==》Instruction Decode Queue (IDQ)
UOPS_EXECUTED 5155357733025 ==》Counts the number of uops from any logical processor.

UOPS_DISPATCHED 4621236931845
UOPS_ISSUED 4063476095205
UOPS_RETIRED 3905945858910 ==》Counts the number of micro-ops retired, (macro-fused=1, mico-fused=2, others=1 – maximum count of 8). The processor decodes complex macro instructions into a sequence of simpler micro-ops. Most instructions are composed of one or two micro-ops. Some instructions are decoded into longer sequences such as repeat instructions, floating point transcendental instructions, and assists. 计算退休的微操作数,(macro-fused=1,mico-fused=2,others=1 – 最大计数为 8)。 处理器将复杂的宏指令解码为一系列更简单的微操作。 大多数指令由一个或两个微操作组成。 一些指令被解码为更长的序列,例如重复指令、浮点超越指令和辅助指令。可能挖矿相关!
INST_RETIRED 3770077000135 ==》Counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, Counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions. 计算退出执行的指令数。 对于由多个微操作组成的指令,计算指令的最后一个微操作的退出。 在硬件中断、陷阱和内部中断处理程序期间继续计数。
OFFCORE_REQUESTS_OUTSTANDING 3685721057130 ==>Counts the number of offcore outstanding cacheable Core Data Read transactions in the super queue every cycle. A transaction is considered to be in the Offcore outstanding state between L2 miss and transaction completion sent to requestor (SQ de-allocation). See corresponding Umask under OFFCORE_REQUESTS. 计算每个周期在超级队列中的核心未完成的可缓存核心数据读取事务的数量。 在 L2 未命中和发送到请求者的事务完成之间(SQ 解除分配),事务被认为处于 Offcore 未完成状态。
L1D_PEND_MISS 2759513278515 ==>Number of times a request needed a FB (Fill Buffer) entry but there was no entry available for it. A request includes cacheable/uncacheable demands that are load, store or SW prefetch instructions. 请求需要 FB(填充缓冲区)条目但没有可用条目的次数。 请求包括加载、存储或软件预取指令的可缓存/不可缓存需求。
MEM_INST_RETIRED 933656840685
EXE_ACTIVITY 664485997020
MEM_LOAD_RETIRED 593380521855
ARITH 571366714095 ==>Cycles when divide unit is busy executing divide or square root operations. Accounts for integer and floating-point operations. 挖矿相关!!!
FP_ARITH_INST_RETIRED 563478403845 ==>Counts once for most SIMD 128-bit packed computational double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 2 computation operations, one for each element. Applies to packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events. 对大多数 SIMD 128 位压缩计算双精度浮点指令计数一次;如下所述,某些指令将计算两次。每个计数代表 2 个计算操作,每个元素一个。适用于压缩双精度浮点指令:ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT DPP FM(N)ADD/SUB。 DPP 和 FM(N)ADD/SUB 指令计数两次,因为它们对每个元素执行 2 次计算。 使用这些事件时需要设置 MXCSR 寄存器中的 DAZ 和 FTZ 标志。挖矿相关!!!
IDQ_UOPS_NOT_DELIVERED 390151170450
BR_INST_RETIRED 179656042170
DTLB_LOAD_MISSES 89948698380
L2_RQSTS 61929928935 ==》Counts the total number of L2 code requests
UOPS_DECODED 50385151155
FRONTEND_RETIRED 50178512250
INT_MISC 24540196470
LD_BLOCKS_PARTIAL 7704231120
RESOURCE_STALLS 5067152010 ==》Counts resource-related stall cycles.

ICACHE_16B 2205039690
DSB2MITE_SWITCHES 1669550085
ICACHE_64B 1176017640
BR_MISP_RETIRED 695542005
DTLB_STORE_MISSES 663019890
MACHINE_CLEARS 85502565
BACLEARS 24000720
ILD_STALL 7500135
MEM_LOAD_L3_HIT_RETIRED 3001650
LD_BLOCKS 3000090==》The number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use.

如果是通过CPU指令检测挖矿的话,CPU高+这些指令特征是否也可以说明本质是在做挖矿???还需要更多的数据分析。。。

挖矿的话,ssl socket 在send的时候hook下发送的数据,检测挖矿协议是不是也是可行。。。值得再深入思考。。。

Original: https://www.cnblogs.com/bonelee/p/16545623.html
Author: bonelee
Title: 使用英特尔 VTune Profiler 进行挖矿CPU指令数据分析

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/8369/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

最近整理资源【免费获取】:   👉 程序员最新必读书单  | 👏 互联网各方向面试题下载 | ✌️计算机核心资源汇总