Abstract: Deploying Convolutional Neural Network (CNN)-based applications to mobile platforms can be challenging due to the conflict between the restricted computing capacity of mobile devices and the ...
Abstract: Mixed-precision quantization mostly predetermines the model bit-width settings before actual training due to the non-differential bit-width sampling process, obtaining suboptimal performance ...
This repository contains the training code of N2UQ introduced in our CVPR 2022 paper: "Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation" In ...
SD.Next Quantization provides full cross-platform quantization to reduce memory usage and increase performance for any device. Triton enables the use of optimized kernels for much better performance.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results