Fixed point quantization ===================================================== We start with a single or double precision (32 / 64 bit floating point) input X, The fixed point quantization demonstrates its superiority in U-Net image segmentation [1]. Following that, a basic bitwise shift quantization function is given by: .. math:: q(x) = \lfloor \texttt{clip}(x, 0, 2^b - 1) \ll b \rceil \gg b, where << and >> are left and right shift for bitwise operator, respectively. Then the given number $x$ to its fixed point value proceed by splitting its value into its fractional and integer parts: .. math:: x_f = \text{abs}(x) - \lfloor\text{abs}(x)\rfloor \quad \text{and} \quad x_i = \lfloor\text{abs}(x)\rfloor. The fixed point representation for $x$ is given by .. math:: Q_f{x} = \text{sign}(x) q(x_i) + \text{sign}(x) q(x_f) .. _fixed_point_simulator: The `Chopf` class enables the quantization of floating-point numbers into a fixed-point Qm.n format, where `m` is the number of integer bits (including the sign bit) and `n` is the number of fractional bits. This document describes the usage and provides examples for implementations in PyTorch, NumPy, JAX, and TensorFlow, each supporting six rounding modes: `nearest`, `up`, `down`, `towards_zero`, `stochastic_equal`, and `stochastic_proportional`. Overview -------- The simulator converts floating-point values into a fixed-point representation with a resolution of \(2^{-n}\) and a range of \([-2^{m-1}, 2^{m-1} - 2^{-n}]\). For the Q4.4 format used in the examples: - **Resolution**: \(2^{-4} = 0.0625\) - **Range**: \([-8.0, 7.9375]\) The quantization process scales the input by the resolution, applies the chosen rounding mode, reconstructs the fixed-point value, and clamps it to the valid range. Usage ----- Common Parameters ~~~~~~~~~~~~~~~~~ - **ibits**: Specifies the number of bits for the integer part, including the sign bit. - **fbits**: Defines the number of bits for the fractional part. - **rmode**: Selects the rounding method, defaulting to "nearest". PyTorch Version ~~~~~~~~~~~~~~~ The PyTorch implementation integrates with PyTorch tensors, making it suitable for machine learning workflows. **Initialization** Create an instance by setting the integer and fractional bit counts to define the Qm.n format. **Quantization** Quantize a tensor of floating-point values by invoking the quantization method, optionally specifying a rounding mode. The result is a tensor with quantized values. **Code Example**: .. code-block:: python # Initialize with Q4.4 format sim = Chopf(ibits=4, fbits=4) # Input tensor values = torch.tensor([1.7641, 0.3097, -0.2021, 2.4700, 0.3300]) # Quantize with nearest rounding result = sim.quantize(values, rounding_mode="nearest") print(result) NumPy Version ~~~~~~~~~~~~~ The NumPy version operates on NumPy arrays, offering a general-purpose quantization tool. **Initialization** Instantiate the simulator with the desired integer and fractional bit counts. **Quantization** Apply the quantization method to a NumPy array, with an optional rounding mode parameter, to obtain a quantized array. **Code Example**: .. code-block:: python # Initialize with Q4.4 format sim = Chopf(ibits=4, fbits=4) # Input array values = np.array([1.7641, 0.3097, -0.2021, 2.4700, 0.3300]) # Quantize with nearest rounding result = sim.quantize(values, rounding_mode="nearest") print(result) JAX Version ~~~~~~~~~~~ The JAX implementation uses JAX arrays and includes JIT compilation for performance, requiring a PRNG key for stochastic modes. **Initialization** Set up the simulator by defining the integer and fractional bits for the Qm.n format. **Quantization** Quantize a JAX array using the quantization method, specifying a rounding mode and, for stochastic modes, a PRNG key. The output is a quantized JAX array. **Code Example**: .. code-block:: python # Initialize with Q4.4 format sim = FPRound(ibits=4, fbits=4) # Input array values = jnp.array([1.7641, 0.3097, -0.2021, 2.4700, 0.3300]) # PRNG key for stochastic modes key = random.PRNGKey(42) # Quantize with nearest rounding (no key needed) result = sim.quantize(values, rounding_mode="nearest") print(result) TensorFlow Version ~~~~~~~~~~~~~~~~~~ The TensorFlow implementation operates on TensorFlow tensors, wrapping NumPy implementations with custom gradients for automatic differentiation support (STE). **Initialization** Create an instance by setting the integer and fractional bit counts to define the Qm.n format. **Quantization** Quantize a TensorFlow tensor of floating-point values by invoking the quantization method, optionally specifying a rounding mode. The result is a tensor with quantized values and gradient support via STE. **Code Example**: .. code-block:: python import tensorflow as tf # Initialize with Q4.4 format sim = Chopf(ibits=4, fbits=4) # Input tensor values = tf.constant([1.7641, 0.3097, -0.2021, 2.4700, 0.3300]) # Quantize with nearest rounding result = sim.quantize(values, rounding_mode="nearest") print(result) Examples -------- The following examples show the quantization of the input values `[1.7641, 0.3097, -0.2021, 2.47, 0.33]` in Q4.4 format across all rounding modes, consistent across PyTorch, NumPy, and JAX (with JAX using PRNG key 42 for stochastic modes). **Input Values** .. code-block:: text [1.7641, 0.3097, -0.2021, 2.47, 0.33] **Outputs by Rounding Mode** - **Nearest**: .. code-block:: text [1.75, 0.3125, -0.1875, 2.5, 0.3125] Rounds to the nearest representable value. - **Up**: .. code-block:: text [1.8125, 0.3125, -0.1875, 2.5, 0.375] Positive values round toward positive infinity, negative values toward negative infinity. - **Down**: .. code-block:: text [1.75, 0.25, -0.25, 2.4375, 0.3125] Positive values round toward negative infinity, negative values toward positive infinity. - **Towards Zero**: .. code-block:: text [1.75, 0.25, -0.1875, 2.4375, 0.3125] Truncates toward zero, reducing the magnitude. - **Stochastic Equal**: .. code-block:: text [1.8125, 0.3125, -0.25, 2.5, 0.3125] Randomly selects between floor and ceiling with equal probability (example with JAX PRNG key 42; varies otherwise). - **Stochastic Proportional**: .. code-block:: text [1.8125, 0.3125, -0.1875, 2.4375, 0.3125] Randomly selects between floor and ceiling, with probability proportional to the fractional part (example with JAX PRNG key 42; varies otherwise). This guide provides a clear introduction to using the `FPRound` classes, with practical examples formatted as code blocks for clarity.