Floating-point

Will it float?

A way for computer to represent real numbers. The standard format is the IEEE 754 and it's the one this page care about. The format data in a number is split into three parts.

sign: Always a single bit.

exponent

significand

There are three common sizes of numbers.

Single (float): 32 bit.

Double (double): 64 bit.

Special values:

- Zero

- Infinity

- NaN

Reference

Myths About Floating-Point Numbers - 2021

Making floating point numbers smaller - 2018

Reverse Z Cheat Sheet - 2018

Fixing Camera Shake on Single Precision GPUs - 2018

Small float formats – R11G11B10F precision - 2017

Floating Point Visually Explained - 2017

Demystifying Floating Point Precision - 2017

Learning To Wrangle Half-Floats - 2016

Five Tips for Floating Point Programming - 2014

Floating Point - Theory and Practice - 2013

Floating-Point Determinism - 2013

Floating-Point Formats Cheatsheet - 2013

Exceptional Floating Point - 2012

You're Going To Have To Think! - 2010

Floating Point Determinism - 2010

Quantizing floats - 2010

Visualizing Floats - 2009

Floating Point Fun and Frolics - 2009

IEEE floating-point exceptions in C++ - 2009

Anatomy of a floating point number - 2009

Floating point numbers are a leaky abstraction - 2009

Consistency: how to defeat the purpose of IEEE floating point - 2008

Lossless Compression of Floating-Point Geometry - 2004

What Every Computer Scientist Should Know About Floating-Point Arithmetic - 1991

FP Exceptions

Page updated

Google Sites

Report abuse