Microscaling Quantization

MBFQuant: A Multiplier-Bitwidth-Fixed, Mixed-Precision Quantization Method for Mobile CNN-Based Applications

Abstract: Deploying Convolutional Neural Network (CNN)-based applications to mobile platforms can be challenging due to the conflict between the restricted computing capacity of mobile devices and the ...

IEEE

Data Quality-Aware Mixed-Precision Quantization via Hybrid Reinforcement Learning

Abstract: Mixed-precision quantization mostly predetermines the model bit-width settings before actual training due to the non-differential bit-width sampling process, obtaining suboptimal performance ...

GitHub

Nonuniform-to-Uniform Quantization

This repository contains the training code of N2UQ introduced in our CVPR 2022 paper: "Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation" In ...

GitHub

SDNQ Quantization

SD.Next Quantization provides full cross-platform quantization to reduce memory usage and increase performance for any device. Triton enables the use of optimized kernels for much better performance.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results