Floating-point
Will it float?
A way for computer to represent real numbers. The standard format is the IEEE 754 and it's the one this page care about. The format data in a number is split into three parts.
sign: Always a single bit.
exponent
significand
There are three common sizes of numbers.
Single (float): 32 bit.
Double (double): 64 bit.
Special values:
- Zero
- Infinity
- NaN
Reference
- Small float formats – R11G11B10F precision - 2017
- Floating Point Visually Explained - 2017
- Learning To Wrangle Half-Floats - 2016
- Five Tips for Floating Point Programming - 2014
- Floating Point - Theory and Practice - 2013
- Floating-Point Determinism - 2013
- Exceptional Floating Point - 2012
- You're Going To Have To Think! - 2010
- Floating Point Determinism - 2010
- Visualizing Floats - 2009
- Floating Point Fun and Frolics - 2009
- IEEE floating-point exceptions in C++ - 2009
- Anatomy of a floating point number - 2009
- Floating point numbers are a leaky abstraction - 2009
- Consistency: how to defeat the purpose of IEEE floating point - 2008
- Lossless Compression of Floating-Point Geometry - 2004
- What Every Computer Scientist Should Know About Floating-Point Arithmetic - 1991
- FP Exceptions
Floating-Point Formats Cheatsheet
http://asawicki.info/news_1541_floating-point_formats_cheatsheet.html
Demystifying Floating Point Precision - 2017
Making floating point numbers smaller
Fixing Camera Shake on Single Precision GPUs
Demystifying Floating Point Precision - 2017
https://blog.demofox.org/2017/11/21/floating-point-precision/
Quantizing floats
https://zeux.io/2010/12/14/quantizing-floats/