NVidia
GeForce 256
This page is about GPU's from NVIDIA.
Maxwell - switch
2014
Architecture Overview
The Maxwell GPU is partitioned into multiple GPCs (Graphics Processing Cluster). Each GPC contains multiple SMs (Streaming Multiprocessor) and one raster engine. The SM runs the shader programs and the raster engine turn triangles into pixels.
The part in the SM that runs shaders are called cores and the SM have many of them. They are run in groups of 32 so a vertex shader for example work on 32 vertices at the same time. Each group is called a warp and more then one warp can be active on the same cores at once. Each cycle the Warp Scheduler checks all the active warps to find one that is not stalled. A warp is stalled when it is waiting for something, for example a pixel shaders that wait for a texture read to complete. The warp selected will get to perform some instructions and then the Warp Scheduler might let another warp run. The switching between many warps if the way the GPU can work around the latency of certain operations, i do some useful work even if parts if it's work is waiting. The number of warps active at the same time is called the SM occupancy. High occupancy gives the Warp Scheduler more possible warps to switch between and let it work as much as possible.
The register file is 64k*32 bit in size and provide the registers for each thead. The more registers a thread need the less warps can run at the same time.
SASS
The assembly code used by the SM is called SASS. It change with each architecture but it can be useful to read some of it as it.
Links
New GPU Features of NVIDIA's Maxwell Architecture - 2015
Life of a triangle - NVIDIA's logical pipeline - 2015
Don't be conservative with Conservative Rasterization - 2014
Performance Guidelines
GPU Architecture
Ampere (GeForce 3000) - 2020
NVIDIA Ampere Architecture In-Depth - 2020
Nvidia Ampere GA102 GPU Architecture whitepaper - 2020
NVIDIA A100 Tensor Core GPU Architecture whitepaper - 2020
Turing (GeForce 2000) - 2018
NVIDIA Turing Architecture In-Depth - 2018
Introduction to Turing Mesh Shaders - 2018
Tech Focus: Wolfenstein 2's Variable Rate Shading: Nvidia Turing Analysis! - 2018
Pascal (GeForce 1000) - 2016
Maxwell (GeForce 900) - 2014
Listed above
Kepler (GeForce 600-700) - 2012
Performance-Optimization-Guidelines-GPU-Architecture - 2013
Performance Optimization:Programming Guidelines and GPU Architecture Reasons Behind Them - 2013
GPU Performance Analysis and Optimization - 2012
Kepler Whitepaper - 2012
Fermi (GeForce 400-500) - 2010
GPU Computing: Past, Present & Future - 2011
Fast Tessellated Rendering on Fermi - 2010
Tesla (GeForce 8-9, 100-300) - 2006 : unified shaders
An Introduction to Modern GPU Architecture - 2008
GPU Programming Guide GeForce 8 and 9 Series - 2008
GeForce7 - 2005
GPU Programming Guide GeForce 7 - 2005
GeForce6 - 2004
Links
Thinking Parallel, Part I: Collision Detection on the GPU - 2018
Thinking Parallel, Part II: Tree Traversal on the GPU - 2018
Thinking Parallel, Part III: Tree Construction on the GPU - 2018
Nsight Systems Exposes New GPU Optimization Opportunities - 2018
The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload - 2018
Performance-Optimization-Guidelines-GPU-Architecture - 2013
Scalar-Vector GPU Architectures
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor - 2012
A history of NVidia Stream Multiprocessor - 2020
NVIDIA Ampere Architecture In-Depth - 2020
Nsight: The Most Important Ampere Tools In Your Utility Belt - 2020
Life of a triangle - NVIDIA's logical pipeline - 2015
NVIDIA Hopper Architecture In-Depth - 2022
Identifying Shader Limiters with the Shader Profiler in NVIDIA Nsight Graphics - 2022