Block Floating Point Formats¶

Block Floating Point (BFP) is a quantization format where a group of numbers shares a common exponent (scale factor), but each number has its own mantissa. This provides a good balance between compression efficiency and hardware simplicity.

Overview¶

What is Block Floating Point?¶

Block Floating Point (BFP) divides data into blocks and applies a shared exponent to all elements within each block. This is simpler than full floating-point but provides better dynamic range than fixed-point quantization.

Key Characteristics:

Shared Exponent: One exponent per block (typically 8 bits)
Individual Mantissas: Each element has its own mantissa (4-16 bits)
Hardware-Efficient: Simpler than full floating-point arithmetic
Good Dynamic Range: Adapts to local data statistics

BFP vs Other Formats:

Format	Memory	Dynamic Range	Hardware Cost	Best Use Case
BFP	Low	Good	Low	Edge devices, Inference
FP32	High	Excellent	High	Research, Training
FP16	Medium	Good	Medium	Training, Inference
INT8	Low	Poor	Low	Inference only
MX Formats	Low	Excellent	Medium	Advanced training

Architecture¶

BFP Structure¶

A BFP block consists of:

┌─────────────────────────────────────────────────┐
│         Block Floating Point Structure          │
├─────────────────────────────────────────────────┤
│  Shared Exponent (8 bits)                       │
├─────────────────────────────────────────────────┤
│  Element 1: Sign (1) + Mantissa (n bits)       │
│  Element 2: Sign (1) + Mantissa (n bits)       │
│  ...                                            │
│  Element N: Sign (1) + Mantissa (n bits)       │
└─────────────────────────────────────────────────┘

Example: BFP8 with block_size=32

1 shared exponent (8 bits)
32 elements × 8 bits each = 256 bits
Total: 264 bits for 32 elements
Compression vs FP16: 512/264 = 1.94x

Predefined Formats¶

Pychop provides several predefined BFP formats optimized for different use cases:

Standard Formats¶

Format Name	Mantissa Bits	Block Size	Exponent Bits	Compression vs FP16	Use Case
`bfp16`	16	16	8	1.07x	High precision
`bfp12`	12	16	8	1.39x	Balanced
`bfp8`	8	32	8	1.94x	Recommended default
`bfp6`	6	32	8	2.56x	Aggressive compression
`bfp4`	4	32	8	3.76x	Ultra-low precision

Ultra-Low Precision Formats¶

Format Name	Mantissa Bits	Block Size	Exponent Bits	Compression vs FP16	Use Case
`bfp3`	3	64	8	5.82x	Extreme compression
`bfp2`	2	128	8	10.67x	Research only

Intel Flexpoint Compatible¶

Format Name	Mantissa Bits	Block Size	Exponent Bits	Compression vs FP16	Notes
`flexpoint16`	16	16	5	1.10x	Intel compatible
`flexpoint8`	8	32	5	1.97x	Intel compatible

Quick Start¶

Basic Usage¶

import pychop
import numpy as np

# Set backend (auto-detect by default)
pychop.backend('auto')

# Create test data
X = np.random.randn(1024, 768).astype(np.float32)

# Quantize with BFP8
from pychop import bfp_quantize
X_quantized = bfp_quantize(X, format='bfp8')

# Check compression
print(f"Original: {X.nbytes / 1024:.2f} KB")
print(f"Quantized maintains same shape: {X_quantized.shape}")

Using BFPTensor¶

from pychop import BFPTensor

# Create BFP tensor
bfp = BFPTensor(X, format='bfp8')

# Dequantize
X_reconstructed = bfp.dequantize()

# Get statistics
stats = bfp.statistics()
print(f"Compression: {stats['compression_ratio_fp16']:.2f}x vs FP16")
print(f"Memory saved: {stats['memory_saved_vs_fp16']:.1f}%")

# Compute error
mse = np.mean((X - X_reconstructed) ** 2)
print(f"MSE: {mse:.2e}")

Custom Formats¶

from pychop import create_bfp_spec, bfp_quantize

# Create custom 5-bit BFP format
custom_spec = create_bfp_spec(
    mantissa_bits=5,
    block_size=64,
    exponent_bits=8,
    name="my_bfp5"
)

# Use custom format
X_q = bfp_quantize(X, format=custom_spec)

# Or use tuple shorthand
X_q = bfp_quantize(X, format=(5, 64))  # (mantissa_bits, block_size)

Backend-Specific Usage¶

NumPy Backend¶

Pure NumPy implementation for inference and analysis:

import numpy as np
import pychop

pychop.backend('numpy')

X = np.random.randn(512, 512).astype(np.float32)
X_q = pychop.bfp_quantize(X, format='bfp8')

# Compute reconstruction error
error = np.mean((X - X_q) ** 2)
print(f"MSE: {error:.2e}")

PyTorch Backend (with STE)¶

PyTorch backend with Straight-Through Estimator for Quantization-Aware Training:

import torch
import pychop

pychop.backend('torch')

# Enable gradient tracking
X = torch.randn(128, 768, requires_grad=True)

# Quantize (automatic STE!)
X_q = pychop.bfp_quantize(X, format='bfp8')

# Backward pass - gradients flow through!
loss = X_q.sum()
loss.backward()

print(f"Gradient shape: {X.grad.shape}")
print(f"Gradient norm: {X.grad.norm():.2e}")

Using BFP Quantizers in Models:

from pychop.tch.bfp_formats import BFPQuantizerSTE

class QuantizedModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.quantizer = BFPQuantizerSTE(format='bfp8')
        self.linear = torch.nn.Linear(768, 3072)

    def forward(self, x):
        x = self.quantizer(x)  # Quantize activations
        return self.linear(x)

model = QuantizedModel()
optimizer = torch.optim.Adam(model.parameters())

# Training loop
for batch in dataloader:
    output = model(batch)
    loss = loss_fn(output, target)
    loss.backward()  # STE handles gradients automatically!
    optimizer.step()

Quantized Layers:

from pychop.tch.bfp_formats import BFPLinear

# Replace standard Linear with BFP quantized version
layer = BFPLinear(
    in_features=768,
    out_features=3072,
    weight_format='bfp8',      # Quantize weights
    quantize_input=True,        # Quantize input activations
    quantize_output=False       # Keep output in FP32
)

x = torch.randn(32, 768)
y = layer(x)  # Automatic quantization with STE

Model Conversion:

from pychop.tch.bfp_formats import convert_linear_to_bfp

# Load pretrained model
model = YourModel()

# Convert all Linear layers to BFP
model = convert_linear_to_bfp(
    model,
    format='bfp8',
    quantize_input=True,
    quantize_output=False,
    inplace=True
)

# Fine-tune with quantization
for epoch in range(num_epochs):
    train(model)  # Gradients flow through STE automatically

JAX Backend (with Custom VJP)¶

JAX backend with custom Vector-Jacobian Product for differentiation:

import jax
import jax.numpy as jnp
import pychop

pychop.backend('jax')

# Create data
key = jax.random.PRNGKey(0)
X = jax.random.normal(key, (256, 512))

# Quantize
X_q = pychop.bfp_quantize(X, format='bfp8')

# Test gradient flow
from pychop.jx.bfp_formats import BFPQuantizerSTE

quantizer = BFPQuantizerSTE(format='bfp8')

def loss_fn(x):
    x_q = quantizer(x)
    return jnp.sum(x_q ** 2)

# Compute gradients (custom VJP handles this)
grad_fn = jax.grad(loss_fn)
grads = grad_fn(X)

print(f"Gradient shape: {grads.shape}")
print(f"Gradient norm: {jnp.linalg.norm(grads):.2e}")

Flax Integration:

from flax import linen as nn
from pychop.jx.bfp_formats import BFPDense

class QuantizedMLP(nn.Module):
    features: list

    @nn.compact
    def __call__(self, x):
        for feat in self.features[:-1]:
            x = BFPDense(
                features=feat,
                weight_format='bfp8',
                quantize_input=True
            )(x)
            x = nn.relu(x)

        x = BFPDense(features=self.features[-1])(x)
        return x

model = QuantizedMLP(features=[512, 256, 128, 10])

# Initialize
key = jax.random.PRNGKey(0)
x = jax.random.normal(key, (32, 784))
variables = model.init(key, x)

# Forward pass with quantization
output = model.apply(variables, x)

TensorFlow Backend (with STE)¶

TensorFlow backend with Straight-Through Estimator for Quantization-Aware Training via tf.numpy_function() with custom gradients:

import tensorflow as tf
import pychop

pychop.backend('tensorflow')

# Enable gradient tracking
X = tf.Variable(tf.random.normal([128, 768]))

# Quantize (automatic STE!)
X_q = pychop.bfp_quantize(X, format='bfp8')

# Backward pass - gradients flow through!
with tf.GradientTape() as tape:
    loss = tf.reduce_sum(X_q)
grads = tape.gradient(loss, X)

print(f"Gradient shape: {grads.shape}")

Using BFP Quantizers in Keras Models:

from pychop.tf.bfp_formats import BFPQuantizerSTE

class QuantizedModel(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.quantizer = BFPQuantizerSTE(format='bfp8')
        self.dense = tf.keras.layers.Dense(3072)

    def call(self, x):
        x = self.quantizer(x)  # Quantize activations
        return self.dense(x)

model = QuantizedModel()
optimizer = tf.keras.optimizers.Adam()

# Training loop
for batch in dataset:
    with tf.GradientTape() as tape:
        output = model(batch)
        loss = loss_fn(output, target)
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

API Reference¶

Core Functions¶

bfp_quantize¶

bfp_quantize(data, format='bfp8', backend=None)¶

Quantize array to BFP format with automatic backend detection.

Parameters:

data (array-like) – Input data (numpy.ndarray, torch.Tensor, jax.Array, or tf.Tensor)
format (str, BFPSpec, or tuple(int, int)) – BFP format specification
backend (str, optional) – Force specific backend (‘numpy’, ‘jax’, ‘torch’, or ‘tensorflow’)

Returns:

Quantized data (same type as input)

Return type:

array-like

Format Options:

String: 'bfp8', 'bfp6', etc. (predefined formats)
Tuple: (mantissa_bits, block_size) for custom format
BFPSpec: Full specification object

Example:

import numpy as np
from pychop import bfp_quantize

X = np.random.randn(1024, 768)

# Predefined format
X_q = bfp_quantize(X, format='bfp8')

# Custom format
X_q = bfp_quantize(X, format=(6, 32))  # 6-bit mantissa, 32 elem/block

# Force backend
X_q = bfp_quantize(X, format='bfp8', backend='numpy')

Classes¶

BFPTensor¶

class BFPTensor(data, format='bfp8', backend=None)¶

Backend-agnostic BFP tensor wrapper.

Parameters:

data (array-like) – Input tensor
format (str, BFPSpec, or tuple) – BFP format specification
backend (str, optional) – Force specific backend

Methods:

dequantize()¶

Dequantize to original data type.

Returns:: Reconstructed tensor
Return type:: array-like

statistics()¶

Get quantization statistics.

Returns:: Dictionary with statistics
Return type:: dict

Statistics Keys:

format: Format name
mantissa_bits: Mantissa bits per element
block_size: Elements per block
num_blocks: Total number of blocks
compression_ratio_fp32: Compression vs FP32
compression_ratio_fp16: Compression vs FP16
bfp_memory_mb: BFP memory usage (MB)
memory_saved_vs_fp16: Memory saved vs FP16 (%)
bits_per_element: Average bits per element

Example:

from pychop import BFPTensor

bfp = BFPTensor(X, format='bfp8')

# Reconstruct
X_reconstructed = bfp.dequantize()

# Get statistics
stats = bfp.statistics()
print(f"Compression: {stats['compression_ratio_fp16']:.2f}x")
print(f"Memory saved: {stats['memory_saved_vs_fp16']:.1f}%")
print(f"Blocks: {stats['num_blocks']}")

BFPSpec¶

class BFPSpec(name, mantissa_bits, block_size, exponent_bits=8, has_sign=True, use_subnormals=False)¶

BFP format specification.

Parameters:

name (str) – Format name
mantissa_bits (int) – Mantissa bits per element
block_size (int) – Elements per block
exponent_bits (int) – Shared exponent bits
has_sign (bool) – Whether elements have sign bits
use_subnormals (bool) – Whether to support subnormal numbers

Properties:

total_bits_per_block: Total bits for entire block
compression_vs_fp32: Compression ratio vs FP32
compression_vs_fp16: Compression ratio vs FP16

create_bfp_spec¶

create_bfp_spec(mantissa_bits, block_size, exponent_bits=8, name=None)¶

Create custom BFP format specification.

Parameters:

mantissa_bits (int) – Number of mantissa bits (1-32)
block_size (int) – Elements per block
exponent_bits (int) – Bits for shared exponent
name (str, optional) – Custom format name

Returns:

BFP format specification

Return type:

BFPSpec

Example:

from pychop import create_bfp_spec, bfp_quantize

# Create 5-bit BFP format
spec = create_bfp_spec(
    mantissa_bits=5,
    block_size=64,
    exponent_bits=8,
    name="my_bfp5"
)

# Use custom format
X_q = bfp_quantize(X, format=spec)

Utility Functions¶

print_bfp_format_table¶

print_bfp_format_table()¶

Print table of all predefined BFP formats.

Example:

from pychop import print_bfp_format_table

print_bfp_format_table()

Output:

==========================================================================================
Predefined BFP Formats
==========================================================================================
Name            Mantissa   Block Size   Exponent   Compress FP16   Total Bits
------------------------------------------------------------------------------------------
bfp16           16         16           8          1.07x            264
bfp12           12         16           8          1.39x            200
bfp8            8          32           8          1.94x            264
bfp6            6          32           8          2.56x            200
bfp4            4          32           8          3.76x            136
bfp3            3          64           8          5.82x            200
bfp2            2          128          8          10.67x           264
flexpoint16     16         16           5          1.10x            261
flexpoint8      8          32           5          1.97x            261
==========================================================================================

PyTorch-Specific API¶

BFPQuantizerSTE¶

class pychop.tch.bfp_formats.BFPQuantizerSTE(format='bfp8')¶

BFP quantizer with Straight-Through Estimator for QAT.

Automatically uses STE during training (requires_grad=True).

Parameters:: format (str, BFPSpec, or tuple) – BFP format specification

Example:

import torch
from pychop.tch.bfp_formats import BFPQuantizerSTE

quantizer = BFPQuantizerSTE(format='bfp8')

x = torch.randn(32, 768, requires_grad=True)
x_q = quantizer(x)

loss = x_q.sum()
loss.backward()  # Gradients flow through STE

BFPLinear¶

class pychop.tch.bfp_formats.BFPLinear(in_features, out_features, bias=True, weight_format='bfp8', act_format=None, quantize_input=True, quantize_output=False)¶

Linear layer with BFP quantization.

Parameters:

in_features (int) – Input dimension
out_features (int) – Output dimension
bias (bool) – Whether to use bias
weight_format (str, BFPSpec, or tuple) – BFP format for weights
act_format (str, BFPSpec, or tuple, optional) – BFP format for activations (if None, uses weight_format)
quantize_input (bool) – Whether to quantize input
quantize_output (bool) – Whether to quantize output

Example:

from pychop.tch.bfp_formats import BFPLinear

layer = BFPLinear(
    in_features=768,
    out_features=3072,
    weight_format='bfp8',
    quantize_input=True,
    quantize_output=False
)

x = torch.randn(32, 768)
y = layer(x)  # Automatic quantization with STE

BFPConv2d¶

class pychop.tch.bfp_formats.BFPConv2d(in_channels, out_channels, kernel_size, weight_format='bfp8', act_format=None, quantize_input=True, quantize_output=False, **kwargs)¶

2D Convolution with BFP quantization.

Parameters:

in_channels (int) – Input channels
out_channels (int) – Output channels
kernel_size (int or tuple) – Convolution kernel size
weight_format (str, BFPSpec, or tuple) – BFP format for weights
act_format (str, BFPSpec, or tuple, optional) – BFP format for activations
quantize_input (bool) – Whether to quantize input
quantize_output (bool) – Whether to quantize output
kwargs (dict) – Other Conv2d parameters

Example:

from pychop.tch.bfp_formats import BFPConv2d

conv = BFPConv2d(
    in_channels=3,
    out_channels=64,
    kernel_size=3,
    weight_format='bfp8',
    quantize_input=True,
    padding=1
)

x = torch.randn(16, 3, 224, 224)
y = conv(x)

convert_linear_to_bfp¶

pychop.tch.bfp_formats.convert_linear_to_bfp(module, format='bfp8', quantize_input=True, quantize_output=False, inplace=True)¶

Convert all Linear layers in a model to BFP quantized versions.

Parameters:

module (torch.nn.Module) – Model to convert
format (str, BFPSpec, or tuple) – BFP format
quantize_input (bool) – Whether to quantize inputs
quantize_output (bool) – Whether to quantize outputs
inplace (bool) – Whether to modify in place

Returns:

Converted model

Return type:

torch.nn.Module

Example:

from pychop.tch.bfp_formats import convert_linear_to_bfp
import transformers

# Load pretrained model
model = transformers.AutoModelForCausalLM.from_pretrained("gpt2")

# Convert to BFP8
model = convert_linear_to_bfp(
    model,
    format='bfp8',
    quantize_input=True,
    quantize_output=False,
    inplace=True
)

# Fine-tune with BFP quantization
for epoch in range(num_epochs):
    train(model)

JAX-Specific API¶

BFPQuantizerSTE (JAX)¶

class pychop.jx.bfp_formats.BFPQuantizerSTE(format='bfp8')¶

BFP quantizer with custom VJP for JAX.

Parameters:: format (str, BFPSpec, or tuple) – BFP format specification

Example:

import jax.numpy as jnp
from pychop.jx.bfp_formats import BFPQuantizerSTE

quantizer = BFPQuantizerSTE(format='bfp8')

x = jnp.array(np.random.randn(256, 512))
x_q = quantizer(x)

BFPDense¶

class pychop.jx.bfp_formats.BFPDense(features, use_bias=True, weight_format='bfp8', quantize_input=True)¶

Dense layer with BFP quantization for Flax.

Parameters:

features (int) – Number of output features
use_bias (bool) – Whether to use bias
weight_format (str, BFPSpec, or tuple) – BFP format for weights
quantize_input (bool) – Whether to quantize input

Example:

from flax import linen as nn
from pychop.jx.bfp_formats import BFPDense

class MyModel(nn.Module):
    @nn.compact
    def __call__(self, x):
        x = BFPDense(features=512, weight_format='bfp8')(x)
        x = nn.relu(x)
        x = BFPDense(features=10)(x)
        return x

Advanced Usage¶

Format Comparison¶

Compare different BFP formats on the same data:

import numpy as np
from pychop import BFPTensor

X = np.random.randn(1024, 768).astype(np.float32)

formats = ['bfp16', 'bfp8', 'bfp6', 'bfp4']

print("Format Comparison")
print("="*80)
print(f"{'Format':<10} {'Compression':<15} {'MSE':<12} {'MAE':<12}")
print("-"*80)

for fmt in formats:
    bfp = BFPTensor(X, format=fmt)
    X_reconstructed = bfp.dequantize()
    stats = bfp.statistics()

    mse = np.mean((X - X_reconstructed) ** 2)
    mae = np.mean(np.abs(X - X_reconstructed))

    print(f"{fmt:<10} {stats['compression_ratio_fp16']:.2f}x{'':>11} "
          f"{mse:.2e}{'':>6} {mae:.2e}")

Memory Analysis¶

Analyze memory usage for different formats:

from pychop import BFPTensor

X = np.random.randn(4096, 4096).astype(np.float32)

print("\nMemory Analysis")
print("="*80)
print(f"Original FP32: {X.nbytes / 1024**2:.2f} MB")
print(f"FP16 equivalent: {X.nbytes / 2 / 1024**2:.2f} MB")
print("-"*80)

for fmt in ['bfp8', 'bfp6', 'bfp4']:
    bfp = BFPTensor(X, format=fmt)
    stats = bfp.statistics()

    print(f"\n{fmt.upper()}:")
    print(f"  Memory: {stats['bfp_memory_mb']:.2f} MB")
    print(f"  Saved vs FP32: {stats['memory_saved_vs_fp32']:.1f}%")
    print(f"  Saved vs FP16: {stats['memory_saved_vs_fp16']:.1f}%")
    print(f"  Compression: {stats['compression_ratio_fp16']:.2f}x vs FP16")

LLM Fine-Tuning Example¶

Complete example for fine-tuning LLMs with BFP quantization:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from pychop.tch.bfp_formats import convert_linear_to_bfp

# Load model
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Convert to BFP8
model = convert_linear_to_bfp(
    model,
    format='bfp8',
    quantize_input=True,
    quantize_output=False,
    inplace=True
)

# Setup training
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)

# Training loop
model.train()
for epoch in range(num_epochs):
    for batch in dataloader:
        input_ids = batch['input_ids'].to(device)
        labels = input_ids.clone()

        # Forward pass (automatic BFP quantization with STE)
        outputs = model(input_ids=input_ids, labels=labels)
        loss = outputs.loss

        # Backward pass (gradients flow through STE)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        print(f"Loss: {loss.item():.4f}")

# Save quantized model
torch.save(model.state_dict(), 'model_bfp8.pt')

Performance Tips¶

Choosing Block Size¶

Block size affects compression and accuracy:

Small blocks (8-16): Better accuracy, less compression
Medium blocks (32): Recommended default, good balance
Large blocks (64-128): Higher compression, lower accuracy

# Test different block sizes
for block_size in [8, 16, 32, 64, 128]:
    X_q = bfp_quantize(X, format=(8, block_size))
    mse = np.mean((X - X_q) ** 2)
    print(f"Block size {block_size}: MSE = {mse:.2e}")

Choosing Mantissa Bits¶

Mantissa bits control precision:

16 bits: Near-lossless, minimal compression
8 bits: Recommended for most tasks
6 bits: Aggressive compression, acceptable for inference
4 bits or less: Research/experimental

Backend Selection¶

Choose backend based on your needs:

# For inference (fastest)
pychop.backend('numpy')

# For training (STE support)
pychop.backend('torch')

# For JAX/Flax (custom VJP)
pychop.backend('jax')

# Auto-detect (recommended)
pychop.backend('auto')

Troubleshooting¶

Common Issues¶

Import Error:

# Error: cannot import name 'bfp_quantize'
# Solution: Update pychop
pip install --upgrade pychop

Memory Issues:

# For very large tensors, use smaller block sizes
X_q = bfp_quantize(X, format=(8, 16))  # Smaller blocks

Gradient Issues:

# Ensure requires_grad=True for training
X = torch.randn(128, 768, requires_grad=True)
X_q = bfp_quantize(X, format='bfp8')

# Check gradient flow
loss = X_q.sum()
loss.backward()
assert X.grad is not None, "Gradients not flowing!"

Backend Issues:

# Check current backend
import pychop
print(pychop.get_backend())

# Reset backend
pychop.backend('auto')

FAQ¶

Q: What’s the difference between BFP and MX formats?

A: BFP uses one shared exponent per block, while MX formats use both a shared scale and individual exponents per element. BFP is simpler and more hardware-efficient, while MX provides better dynamic range.

Q: Can I use BFP for training?

A: Yes! The PyTorch backend includes Straight-Through Estimator (STE) support, enabling full quantization-aware training. JAX backend uses custom VJP.

Q: Which format should I use?

A: For most cases, BFP8 (8-bit mantissa, 32 elements/block) is recommended. It provides ~2x compression vs FP16 with minimal accuracy loss.

Q: How does BFP compare to INT8?

A: BFP provides better dynamic range than INT8 while maintaining similar compression. BFP adapts to local data statistics (per-block), while INT8 uses global scaling.

Q: Can I mix different formats in the same model?

A: Yes! You can use different formats for different layers:

from pychop.tch.bfp_formats import BFPLinear

class MixedPrecisionModel(nn.Module):
    def __init__(self):
        super().__init__()
        # Higher precision for first layer
        self.fc1 = BFPLinear(768, 3072, weight_format='bfp12')
        # Lower precision for middle layers
        self.fc2 = BFPLinear(3072, 3072, weight_format='bfp6')
        # Full precision for output
        self.fc3 = nn.Linear(3072, 768)

Q: Does BFP work with quantized models from PyTorch/TensorFlow?

A: BFP is independent of PyTorch/TensorFlow quantization. You can apply BFP quantization to any model, including already-quantized models.

References¶

Papers:

Intel Flexpoint: “Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks” (2017) https://arxiv.org/abs/1711.02213
Microsoft BFloat16: “BFloat16: The Secret to High Performance on Cloud TPUs” (2019) https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus
Block Floating Point for Neural Networks: “Training Deep Neural Networks with 8-bit Floating Point Numbers” (2018) https://arxiv.org/abs/1812.08011

Related Formats:

Microscaling formats - OCP Microscaling formats with better dynamic range
fixed_point - Fixed-point quantization (Chopf)
integer - Integer quantization (Chopi)

External Links:

Note

For the latest updates and examples, see the Pychop GitHub repository.