Computer Science > Machine Learning

arXiv:2401.14110 (cs)

[Submitted on 25 Jan 2024]

Title:Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators

Authors:Yaniv Blumenfeld, Itay Hubara, Daniel Soudry

Abstract:The majority of the research on the quantization of Deep Neural Networks (DNNs) is focused on reducing the precision of tensors visible by high-level frameworks (e.g., weights, activations, and gradients). However, current hardware still relies on high-accuracy core operations. Most significant is the operation of accumulating products. This high-precision accumulation operation is gradually becoming the main computational bottleneck. This is because, so far, the usage of low-precision accumulators led to a significant degradation in performance. In this work, we present a simple method to train and fine-tune high-end DNNs, to allow, for the first time, utilization of cheaper, $12$-bits accumulators, with no significant degradation in accuracy. Lastly, we show that as we decrease the accumulation precision further, using fine-grained gradient approximations can improve the DNN accuracy.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
Cite as:	arXiv:2401.14110 [cs.LG]
	(or arXiv:2401.14110v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2401.14110

Submission history

From: Yaniv Blumenfeld [view email]
[v1] Thu, 25 Jan 2024 11:46:01 UTC (542 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2024-01

Change to browse by:

cs
cs.AI
cs.AR

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators