Floating-point
Will it float?
A way for computer to represent real numbers. The standard format is the IEEE 754 and it's the one this page care about. The format data in a number is split into three parts.
sign: Always a single bit.
exponent
significand
There are three common sizes of numbers.
Single (float): 32 bit.
Double (double): 64 bit.
Special values:
- Zero
- Infinity
- NaN
Reference
Myths About Floating-Point Numbers - 2021
Making floating point numbers smaller - 2018
Reverse Z Cheat Sheet - 2018
Fixing Camera Shake on Single Precision GPUs - 2018
Small float formats – R11G11B10F precision - 2017
Floating Point Visually Explained - 2017
Demystifying Floating Point Precision - 2017
Learning To Wrangle Half-Floats - 2016
Five Tips for Floating Point Programming - 2014
Floating Point - Theory and Practice - 2013
Floating-Point Determinism - 2013
Floating-Point Formats Cheatsheet - 2013
Exceptional Floating Point - 2012
You're Going To Have To Think! - 2010
Floating Point Determinism - 2010
Quantizing floats - 2010
Visualizing Floats - 2009
Floating Point Fun and Frolics - 2009
IEEE floating-point exceptions in C++ - 2009
Anatomy of a floating point number - 2009
Floating point numbers are a leaky abstraction - 2009
Consistency: how to defeat the purpose of IEEE floating point - 2008
Lossless Compression of Floating-Point Geometry - 2004
What Every Computer Scientist Should Know About Floating-Point Arithmetic - 1991