NSight
Welcome to acronym hell
Welcome to acronym hell
NVIDIA NSight is a tool to measure performance on NVIDIA GPU's for graphics and compute. It available as a Visual studio plugin or stand alone program.
Speed of Light
NSight focus on hardware metrics, how well the hardware units and sub-units are utilized and how close they are running to their respective maximum throughput. This is shown as a % value of the theoretical throughput of the unit, Speed of Light (SOL). The SOL can be shown for a whole frame, a range of graphic API calls or a specific API call. For details about the GPU architecture of NVIDIA look on the NVIDIA GPU page.
GPU Units
Streaming Multiprocessor)
Stall reason smsp__warp_stall_*_pct
long_scoreboard - Warps that were stalled waiting for a scoreboard dependency on L1TEX.
short_scoreboard - Warps that were stalled waiting for a scoreboard dependency on a MIO (memory input/output) operation. Ex special math instructions or dynamic branching.
drain: Warps stalled waiting after EXIT for all memory operations to complete so warp can be freed.
imc_miss: Warps stalled waiting for an immediate constant cache miss.
no_instructions: Warps waiting to be selected to fetch an instruction or waiting on an instruction cache miss.
Reference
The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload - 2018
Using ‘Nsight Graphics: GPU Trace’ and the Peak-Performance-Percentage Method
GPU-Driven Rendering - 2016