半精度浮点数

半精度浮点数
术语名称	半精度浮点数
英语名称	half-precision floating-point type
别名	half-precision floating-point format, 半精度浮点型, 半精浮点, half-precision float, float16, FP16, binary16, 16位浮点数, 16-bit floating-point format

半精度浮点数/半精度浮点型(half-precision floating-point number)是浮点型中长 16 二进制位的数据类型，一般符合 IEEE 754 中 binary16 类型的格式。根据长度，称为 float16 或 binary16 ，有时也称 half 。

半精度浮点数由于尾数精度过短，在正常计算中使用并不广泛，主要用于计算机图形学领域的像素颜色，在机器学习（深度模型训练）、传感器数据存储和嵌入式系统中也有使用。这些场景中精度需求较低且变化范围大，或者对存储空间要求较高，因此较为适用。

定义

半精度浮点数(half-precision floating-point type)或 float16 、 FP16 、 binary16 ，指长度为 16 位、符合 IEEE 754 中 binary16 类型浮点格式的浮点类型。

其中浮点数的 16 位包括符号位 1 位、阶码 5 位、尾数 10 位。

范围

阶码 5 位，因此阶码偏移量为 [math]\displaystyle{ b=2^{5-1}-1=15 }[/math] 。指数范围 [math]\displaystyle{ e_\min = 1-b=-14, e_\max=b=15 }[/math] 。尾数 10 位，精度 [math]\displaystyle{ p=11 }[/math] ，因此尾数精度为 [math]\displaystyle{ 2^{-10} }[/math] ，尾数取值在 0 ～ [math]\displaystyle{ 1-2^{-10} }[/math] 之间。

其有效数精度为 11 位二进制数（含隐藏位），相当于只有 3～4 位十进制数的有效数字（[math]\displaystyle{ \lg 2^{11} \approx 3.3 }[/math]）。

如图，对半精度浮点数，除了 0 和无穷、 NaN 外：

规格化数 [math]\displaystyle{ (-1)^s 2^{E-15} (1+M) }[/math] 正数/绝对值范围： [math]\displaystyle{ 2^{-14} = 0.00006103515625 \approx 6.104\times10^{-5} }[/math] ～ [math]\displaystyle{ 2^{15} (2 - 2^{-10}) = 65504 \approx 6.550 \times 10^4 }[/math]
非规格化数 [math]\displaystyle{ (-1)^s 2^{e_\min} M }[/math] 正数/绝对值范围： [math]\displaystyle{ 2^{-14} 2^{-10}=0.000000059604644775390625 \approx 5.960\times 10^{-8} }[/math] ～ [math]\displaystyle{ 2^{-14}(1-2^{-10})=0.000060975551605224609375 \approx 6.098 \times 10^{-5} }[/math] 。