NVidia

GeForce 256

This page is about GPU's from NVIDIA.

Maxwell - switch

2014

Architecture Overview

The Maxwell GPU is partitioned into multiple GPCs (Graphics Processing Cluster). Each GPC contains multiple SMs (Streaming Multiprocessor) and one raster engine. The SM runs the shader programs and the raster engine turn triangles into pixels.

The part in the SM that runs shaders are called cores and the SM have many of them. They are run in groups of 32 so a vertex shader for example work on 32 vertices at the same time. Each group is called a warp and more then one warp can be active on the same cores at once. Each cycle the Warp Scheduler checks all the active warps to find one that is not stalled. A warp is stalled when it is waiting for something, for example a pixel shaders that wait for a texture read to complete. The warp selected will get to perform some instructions and then the Warp Scheduler might let another warp run. The switching between many warps if the way the GPU can work around the latency of certain operations, i do some useful work even if parts if it's work is waiting. The number of warps active at the same time is called the SM occupancy. High occupancy gives the Warp Scheduler more possible warps to switch between and let it work as much as possible.

The register file is 64k*32 bit in size and provide the registers for each thead. The more registers a thread need the less warps can run at the same time.

SASS

The assembly code used by the SM is called SASS. It change with each architecture but it can be useful to read some of it as it.

SASS Instruction Set

Links

New GPU Features of NVIDIA's Maxwell Architecture - 2015

Life of a triangle - NVIDIA's logical pipeline - 2015

Don't be conservative with Conservative Rasterization - 2014

Maxwell Whitepaper

Performance Guidelines

GPU Architecture

Ampere (GeForce 3000) - 2020

NVIDIA Ampere Architecture In-Depth - 2020

Nvidia Ampere GA102 GPU Architecture whitepaper - 2020

NVIDIA A100 Tensor Core GPU Architecture whitepaper - 2020

Turing (GeForce 2000) - 2018

NVIDIA Turing Architecture In-Depth - 2018

Introduction to Turing Mesh Shaders - 2018

Tech Focus: Wolfenstein 2's Variable Rate Shading: Nvidia Turing Analysis! - 2018

Pascal (GeForce 1000) - 2016

Maxwell (GeForce 900) - 2014

Listed above

Kepler (GeForce 600-700) - 2012

Performance-Optimization-Guidelines-GPU-Architecture - 2013

Performance Optimization:Programming Guidelines and GPU Architecture Reasons Behind Them - 2013

GPU Performance Analysis and Optimization - 2012

Kepler Whitepaper - 2012

Fermi (GeForce 400-500) - 2010

GPU Computing: Past, Present & Future - 2011

Fast Tessellated Rendering on Fermi - 2010

Fermi Whitepaper

Tesla (GeForce 8-9, 100-300) - 2006 : unified shaders

nVIDIA GPU architecture

An Introduction to Modern GPU Architecture - 2008

GPU Programming Guide GeForce 8 and 9 Series - 2008

GeForce7 - 2005

GPU Programming Guide GeForce 7 - 2005

GeForce6 - 2004

Links

Thinking Parallel, Part I: Collision Detection on the GPU - 2018

Thinking Parallel, Part II: Tree Traversal on the GPU - 2018

Thinking Parallel, Part III: Tree Construction on the GPU - 2018

Nsight Systems Exposes New GPU Optimization Opportunities - 2018

The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload - 2018

Performance-Optimization-Guidelines-GPU-Architecture - 2013

Scalar-Vector GPU Architectures

Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor - 2012


A history of NVidia Stream Multiprocessor - 2020

NVIDIA Ampere Architecture In-Depth - 2020

Nsight: The Most Important Ampere Tools In Your Utility Belt - 2020

Life of a triangle - NVIDIA's logical pipeline - 2015

NVIDIA Hopper Architecture In-Depth - 2022

Identifying Shader Limiters with the Shader Profiler in NVIDIA Nsight Graphics - 2022