This page is about GPU's from NVIDIA.
Maxwell - switch
The Maxwell GPU is partitioned into multiple GPCs (Graphics Processing Cluster). Each GPC contains multiple SMs (Streaming Multiprocessor) and one raster engine. The SM runs the shader programs and the raster engine turn triangles into pixels.
The part in the SM that runs shaders are called cores and the SM have many of them. They are run in groups of 32 so a vertex shader for example work on 32 vertices at the same time. Each group is called a warp and more then one warp can be active on the same cores at once. Each cycle the Warp Scheduler checks all the active warps to find one that is not stalled. A warp is stalled when it is waiting for something, for example a pixel shaders that wait for a texture read to complete. The warp selected will get to perform some instructions and then the Warp Scheduler might let another warp run. The switching between many warps if the way the GPU can work around the latency of certain operations, i do some useful work even if parts if it's work is waiting. The number of warps active at the same time is called the SM occupancy. High occupancy gives the Warp Scheduler more possible warps to switch between and let it work as much as possible.
The register file is 64k*32 bit in size and provide the registers for each thead. The more registers a thread need the less warps can run at the same time.
The assembly code used by the SM is called SASS. It change with each architecture but it can be useful to read some of it as it.
Ampere (GeForce 3000) - 2020
Turing (GeForce 2000) - 2018
Pascal (GeForce 1000) - 2016
Maxwell (GeForce 900) - 2014
Kepler (GeForce 600-700) - 2012
Kepler Whitepaper - 2012
Fermi (GeForce 400-500) - 2010
Tesla (GeForce 8-9, 100-300) - 2006 : unified shaders
GeForce7 - 2005
GeForce6 - 2004