NVIDIA NSight is a tool to measure performance on NVIDIA GPU's for graphics and compute. It available as a Visual studio plugin or stand alone program.

Speed of Light

NSight focus on hardware metrics, how well the hardware units and sub-units are utilized and how close they are running to their respective maximum throughput. This is shown as a % value of the theoretical throughput of the unit, Speed of Light (SOL). The SOL can be shown for a whole frame, a range of graphic API calls or a specific API call. For details about the GPU architecture of NVIDIA look on the NVIDIA GPU page.

GPU Units

Streaming Multiprocessor)

Stall reason smsp__warp_stall_*_pct

    • long_scoreboard - Warps that were stalled waiting for a scoreboard dependency on L1TEX.
    • short_scoreboard - Warps that were stalled waiting for a scoreboard dependency on a MIO (memory input/output) operation. Ex special math instructions or dynamic branching.
    • drain: Warps stalled waiting after EXIT for all memory operations to complete so warp can be freed.
    • imc_miss: Warps stalled waiting for an immediate constant cache miss.
    • no_instructions: Warps waiting to be selected to fetch an instruction or waiting on an instruction cache miss.


The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload - 2018

Using ‘Nsight Graphics: GPU Trace’ and the Peak-Performance-Percentage Method

GPU-Driven Rendering - 2016