o &�j#*�@s<gd�ZdZdZdZdZdZdZdZdZd Z d Z dZdZd S)) �TOOLTIP_DEVICE_INFO_CN�TOOLTIP_MODEL_PERSPECTIVE_CN�$TOOLTIP_MODEL_PERSPECTIVE_PERSTEP_CN�!TOOLTIP_EVENT_TYPE_PERSPECTIVE_CN�'TOOLTIP_EVENT_TYPE_MODEL_PERSPECTIVE_CN�TOOLTIP_DEVICE_INFO_EN�TOOLTIP_MODEL_PERSPECTIVE_EN�$TOOLTIP_MODEL_PERSPECTIVE_PERSTEP_EN�!TOOLTIP_EVENT_TYPE_PERSPECTIVE_EN�'TOOLTIP_EVENT_TYPE_MODEL_PERSPECTIVE_ENuOCPU进程利用率:
进程所利用到的CPU的时间 / ProfileStep的时间(即性能分析的时间跨度）
CPU系统利用率:
整个系统所有进程利用到的CPU时间 / CPU总时间（ProfileStep的时间*CPU核心数）
GPU利用率:
进程利用GPU计算的时间 / ProfileStep的时间，进程利用GPU计算的时间即是GPU Kernel计算的时间，越高越好
流处理器效率:
对于流处理器处理某个GPU Kernel, 其效率为SM_Eff_i = min(Kernel所用的Blocks数量 / GPU的流处理器数量, 100%)。流处理器效率为SM_Eff_i关于每个Kernel的执行时间加权和 / ProfileStep的时间
流处理器占用率:
对于流处理器处理某个GPU Kernel, 其占用率Occu_i = 为活跃的warp数 / 能支持的最大warp数。流处理器占用率为Occu_i关于每个Kernel执行时间的加权平均
Tensor cores使用时间占比:
使用Tensor Cores的GPU Kernel的计算时间 / 所有Kernel的计算时间
uI展示模型各阶段DataLoader, Forward, Backward, Optimization以及Other的总CPU和GPU时间。
CPU时间即是各阶段代码执行的时间，GPU时间是各阶段所调用的GPU Kernel在GPU上的计算时间。
DataLoader: 表示使用paddle.io.DataLoader从数据集中取数据的阶段
Forward: 表示模型前向计算的阶段
Backward: 表示模型反向梯度计算的阶段
Optimization: 表示模型优化更新参数的阶段
Other: 其它时间
u]展示每一个ProfileStep内模型各阶段DataLoader, Forward, Backward, Optimization以及Other的CPU和GPU时间。
CPU时间即是各阶段代码执行的时间，GPU时间是各阶段所调用的GPU Kernel在GPU上的计算时间。
DataLoader: 表示使用paddle.io.DataLoader从数据集中取数据的阶段
Forward: 表示模型前向计算的阶段
Backward: 表示模型反向梯度计算的阶段
Optimization: 表示模型优化更新参数的阶段
Other: 其它时间
u�展示不同类型的事件在模型各阶段DataLoader, Forward, Backward, Optimization以及Other的分布。
Operator: 表示框架内的算子执行
CudaRuntime: 表示cuda runtime的函数执行
Kernel: 表示GPU上计算的Kernel函数执行
Memcpy: 表示CPU和GPU之间的数据传输
Memset: 表示GPU的显存值设置
UserDefined: 表示用户在python脚本中自定义的事件
OperatorInner: 表示框架内算子的执行子过程
Communication: 表示分布式通信有关的事件
u�展示在模型各阶段DataLoader, Forward, Backward, Optimization以及Other所包含的各种事件的时间。
Operator: 表示框架内的算子执行
CudaRuntime: 表示cuda runtime的函数执行
Kernel: 表示GPU上计算的Kernel函数执行
Memcpy: 表示CPU和GPU之间的数据传输
Memset: 表示GPU的显存值设置
UserDefined: 表示用户在python脚本中自定义的事件
OperatorInner: 表示框架内算子的执行子过程
Communication: 表示分布式通信有关的数据通信和计算事件
um展示模型在每个迭代过程中通信、计算以及两者重叠部分的时间。
ProfileStep: 表示某一步迭代的总时间
Communication: 表示和通信相关的时间，包括框架内打的Communication事件、和通信有关的算子和Kernel（nccl)执行的时间
Computation: 表示GPU Kernel计算的时间，但是去除了和通信有关的Kernel(nccl)
Overlap: 表示通信和计算过程并行执行时候时间相互重叠的部分
Others: 表示通信和计算之外的时间
u!CPU Process Utilization:
Process CPU time / ProfileStep time(total time of profiling）
CPU System Utilization:
Sum of system's all processes CPU time/ CPU total time（ProfileStep time* #CPU Core)
GPU Utilization:
GPU busy time / ProfileStep time，GPU busy time is the time during in which at least one GPU kernel is running on it.
Est. SM Efficiency:
The SM efficiency for one kernel can be denoted as SM_Eff_i = min(blocks of this kernel / SM number of this GPU, 100%).Est. SM efficiency of GPU is the weighted sum of SM_Eff_i across all kernels / ProfileStep time
Est. Achieved Occupancy:
The SM occupancy for one kernel can be denoted as Occu_i = active warps on an SM / maximum number of active warps supported by the SM. Est. SM occupancy of GPU is the weighted average of Occu_i across all kernels
Tensor cores ratio:
Sum of kernel time using Tensor Cores / Sum of total kernel time
u!Present CPU and GPU time for each stage of a model, i.e. DataLoader, Forward, Backward, Optimization and Other.
CPU time is the execution time for code，GPU time is the calculation time of kernels launched in the stage.
DataLoader: denote data fetching using paddle.io.DataLoader
Forward: denote model forward
Backward: denote gradient back-propagate
Optimization: denote parameters update
Other: other time out of above rangeu7Present CPU and GPU time in each ProfileStep for each stage of a model, i.e. DataLoader, Forward, Backward, Optimization and Other.
CPU time is the execution time for code，GPU time is the calculation time of kernels launched in the stage.
DataLoader: denote data fetching using paddle.io.DataLoader
Forward: denote model forward
Backward: denote gradient back-propagate
Optimization: denote parameters update
Other: other time out of above rangea�Present the distribution of each kind of events across DataLoader, Forward, Backward, Optimization and Other stage.
Operator: denote operator execution
CudaRuntime: denote cuda runtime function execution
Kernel: denote kernel execution on GPU
Memcpy: denote data transfer between CPU and GPU
Memset: denote memory data set on GPU
UserDefined: denote events defined by users in python script
OperatorInner: denote operator's subprocess execution
Communication: denote events associated with distributed data transfer and computation.
a�Present the time of each kind of events included in DataLoader, Forward, Backward, Optimization and Other stage.
Operator: denote operator execution
CudaRuntime: denote cuda runtime function execution
Kernel: denote kernel execution on GPU
Memcpy: denote data transfer between CPU and GPU
Memset: denote memory data set on GPU
UserDefined: denote events defined by users in python script
OperatorInner: denote operator's subprocess execution
Communication: denote events associated with distributed data transfer and computation.
u�Present the time of communication, computation and their overlap in program.
ProfileStep: denote an iteration step of training process
Communication: denote the time related to communication, including events of communication type in paddle framework、communication-related operators and GPU Kernels(nccl)
Computation: denote the computation time of GPU Kernels，except communication-related Kernels(nccl)
Overlap: denote the overlap time between Communication and Computation when they are executed parallelly.
Others: denote the time out of Communication and Computation
N) Z__ALL__rrrrrZ&TOOLTIP_EVENT_DISTRIBUTED_HISTOGRAM_CNrrrr r Z&TOOLTIP_EVENT_DISTRIBUTED_HISTOGRAM_EN�rr�u/var/www/html/Deteccion_Ine/venv/lib/python3.10/site-packages/visualdl/component/profiler/parser/const_description.py�s2��