Graphics‎ > ‎Tools‎ > ‎


NVIDIA NSight is a tool to measure performance on NVIDIA GPU's for graphics and compute. It available as a Visual studio plugin or stand alone program.

Speed of Light
NSight focus on hardware metrics, how well the hardware units and sub-units are utilized and how close they are running to their respective maximum throughput. This is shown as a % value of the theoretical throughput of the unit, Speed of Light (SOL). The SOL can be shown for a whole frame, a range of graphic API calls or a specific API call. For details about the GPU architecture of NVIDIA look on the NVIDIA GPU page.

GPU Units

 Tag Name SOL Description
 PD Primitive Distributor pd__sol_pct Load's index-buffer and distributes primitives across the chip.
 VAF Vertex Attribute Fetch  Load's vertex-buffer to prepare to vertex shader launch.
 SM Streaming Multiprocessor sm__sol_pct Runs shaders.
 - FMA
  pipe_fma_realtime Fp32 math, simple int32 math (add,min,etc)
 - ALU
 - SFU
 VPC  pes__vpc_pct Viewport transfom, frustum colling and perspective correction of attributes
 TEX  tex__sol_pct Perform SRV fetches and UAV acess.
 L2  ltc__sol_pct Level-2 cache attatched to VRAM.
 CROP  crop__sol_pct Color writes & blending to render targets.
 ZROP  zrop__sol_pct Depth-stencil testing.
 VRAM   The GPU video memory. vram__throughput.avg.pct_of_peak_sustained_elapsed

Streaming Multiprocessor)
Stall reason smsp__warp_stall_*_pct
  • long_scoreboard - Warps that were stalled waiting for a scoreboard dependency on L1TEX.
  • short_scoreboard - Warps that were stalled waiting for a scoreboard dependency on a MIO (memory input/output) operation. Ex special math instructions or dynamic branching.
  • drain: Warps stalled waiting after EXIT for all memory operations to complete so warp can be freed.
  • imc_miss: Warps stalled waiting for an immediate constant cache miss.
  • no_instructions: Warps waiting to be selected to fetch an instruction or waiting on an instruction cache miss.