GPU
Fun with triangles
The part of the hardware that created graphics in some form or another. To render things with them one often use some form of Graphical API. This pages focus on the way today's GPU's currently works, for past generations look at the GPU History page. Here is a list of the major manufacturers of GPU's.
AMD - Desktop GPU's, current gen consoles (XBO, PS4). AMD GPU list.
ARM - Mobile phones.
NVidia - Desktop GPU's, current gen console (Switch). NVidia GPU list.
Intel - Built in GPU's. Intel GPU list.
PowerVR - Mobile. PowerVR GPU list.
Overview
Note: Working on writing this part so most of it incorrect :)
To confuse everyone including each other all the manufacturers have different names for the (almost) same things in a GPU. So i use my own name here and you can use the table below to find out what each vendor calls the each thing.
A GPU is made up of a custom processors (GPUCore) that execute programs in form of shaders. It can execute all types of shaders such as vertex shaders, pixel shaders or compute shaders.
Much of the work in a GPU is the same. The same shaders will need to run on all the vertices in a mesh and all the pixels in a triangle need to run the same pixel shader. The input for each one is different but the code to run is the same. To make use of that the GPUCore's are grouped with each other and each group is controlled by a GPUCoreMaster. All the cores in a group will run the same code and execute it in lock-step with each other. So if there is 32 cores in a group it will run 32 pixel shaders at the same time or vertex shaders at the same time. This is known as a SIMT architecture. If you only draw a cube with 8 vertices only 8 cores will do meaningful work in the group.
While executing there might be branch like if statements that send the cores down on different paths in the code. That is known as Divergence and the cores will keep running in lock-step and run the code inside the if statement. The cores that failed the statement will be masked out and throw away the result. When they exit the if statement the cores will converge and the masked out cores be activated again. Divergence lowers performance as some of the cores do wasteful work that are thrown away in the end.
A stall is when a core have to wait to run the next instruction. A common example is sampling a texture and waiting for it to return from memory. As the cores in a group run in lock-step they all have to wait for everyone to get the result back. This is solved with a form of threading and switching to another thread while waiting for the result of an operation. As the cores run in lock-step all threads in the group needs to be switched out at the same time. So each core runs a thread and all the threads running in a group in lock-step with each other is called a Wave. When a wave stalls the GPUCoreMaster can switch to one of the other wave's that are running on the group.
GPU Dictionary
Thread - thread (NVIDIA) / work-item (AMD)
A thread is a single invocation of a program on the GPU. It can be a pixel shader or a vertex shader for example.
GPUCore - CUDA Core (NVIDIA) / Processing Element (AMD)
Wave - warp (NVIDIA) / wavefront (AMD)
Threads are executed in a group called a wave and all the threads in the wave execute the same instruction in lock-step.
GPUCoreMaster - Streaming Multiprocessor (NVIDIA) / Compute Unit (AMD)
Controls a group of GPUCores and run wave's on them.
Reference
GPU BCn decoding - 2022
Improving GPU Memory Oversubscription Performance - 2021
Understanding Graphs in GPUView and RGP - 2021
Gentle introduction to GPUs inner workings - 2021
Shaded vertex reuse on modern gpus - 2021
Branching on a GPU - 2021
GPU Optimization for GameDev - 2021
GPU architecture types explained - 2021
Understanding GPU caches - 2021
DirectX Raytracing (DXR) Functional Spec - 2021
Gentle introduction to GPUs inner workings - 2021
Shaded vertex reuse on modern gpus - 2021
GPU architecture types explained - 2021
Swapchains and frame pacing - 2021
Hash Functions for GPU Rendering - 2021
GPU Optimization for GameDev - 2021
Gentle introduction to GPUs inner workings - 2020
Memory types of discrete GPUs - 2020
GPU resources - 2020
Does subgroup/wave size matter? - 2020
Loads, Stores, Passes, and Advanced GPU Pipelines - 2020
Five years of GPU DB - 2020
Memory types of discrete GPUs - 2020
Cyberpunk 2077 PC: What Does Ray Tracing Deliver... And Is It Worth It? - 2020
GPU resources - 2020
The compositor is evil - 2020
GPU architecture resources - 2020
Capturing GPU Work - 2020
GPU Captures: How we support placed and reserved resources - 2020
Gentle introduction to GPUs inner workings - 2020
GPU Architectures - 2019
Triangles are precious - 2019
How does a GPU shader core work? - 2019
Triangles are precious - 2019
THE STORY OF THE 3DFX VOODOO1 - 2019
Breaking Down Barriers - 2018
Compute Shaders: Optimize your engine using compute - 2018
Anteru: Compute-shaders - 2018: Intro, More, Even more
Intro to GPU Scalarization - 2018
Breaking Down Barriers - 2018
Revisiting The Vertex Cache: Understanding and Optimizing Vertex Processing on the modern GPU - 2018
How does a GPU shader core work? - 2018
Intro to GPU Scalarization - 2018
Intro to GPU Scalarization - 2018 - Part 1 & Part 2
How does a GPU shader core work? - 2018
Compute Shaders: Optimize your engine using compute - 2018
Where do I start graphics programming? - 2017
Wave Programming in D3D12 and Vulkan - 2017
Tiled hardware (speculations) - 2017
Wave Programming in D3D12 and Vulkan - 2017
Understanding Latency Hiding on GPUs - 2016
GPU Programming - 2016
Uniform buffers vs texture buffers: The 2015 edition - 2015
Visual Computing Systems - 2014
A trip through the Graphics Pipeline Index - 2011
How the rasterization process works, the RasterizerState object - 2011
A trip through the Graphics Pipeline - 2011
From Shader Code to a Terafop : How GPU Shader Cores Work - 2010
How the GPU works - appendix A - 2009
The Latest Graphics Processing Units - 2009
Scalable Multi Agent Simulation on the GPU - 2009
Bullet: A Case Study in Optimizing Physics Middleware for the GPU - 2009
Next-Generation Graphics DRAM: Challenges and Opportunities - 2009
GPU Pipeline for Everyone - 2008
GPU versus CPU - 2008
A Closer look at GPUs - 2008
How the GPU works - 2008. Part I, Part II and Part III.
[Mobile] Graphics Hardware - 2007
3D Pipeline Of SM3/DX9 GPUs - 2006
Unified Radeon™ GPU Profiler and Radeon™ Memory Visualizer usage with Radeon™ Developer Panel 2.1
UE5 Lumen Implementation Analysis