Back to Tools
FP32 to INT4 Quantization
Visualize how floating point ranges are compressed into 4-bit integers.
Configuration
Unsigned Int4 [0, 15]
Dynamic Range
Model Parameters
Scale (s)1.3333e+0
Zero Point (z)8
q = clamp(round(x/s) + z, 0, 15)
Derivation (asymmetric)
s= (r_max - r_min) / (15 - 0)
z= round(0 - r_min / s)
Unsigned Int4
10
x
x̂ ≈ 2.67
1. Scale & Shift
2.50 / 1.33 + 8
= 9.88
2. Round & Clamp
round(...)
10 (int4)
3. Dequantize
(10 - 8) * 1.33
≈ 2.667
Quantization Grid
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Error
-1.67e-1