Combining Weight Approximation, Sharing and Retraining for Neural Network Model Compression
Abstract
1 Introduction
2 Related Work on Model Size Reduction
2.1 Pruning
2.2 Quantization
2.3 Weight Sharing
3 Proposed Methods for Model Size Reduction
3.1 Weight Approximation in Floating-Point Weights Using Exponents
A1. Pruning Non-Salient Weights
A2. Weight Adjustment without Changing the Salient Exponents
A3. Weight Replacement Using Exponent-Mantissa Adjustment
3.2 Mantissa Approximation and ESART
Real Float | Value Stored as Float | No. of Decimal Digits Matching between Real Float and Value Stored as Float |
---|---|---|
0.1987376154 | 0.1987376213 | 7 |
1.987376154 | 1.987376213 | 6 |
19.87376154 | 19.87376213 | 5 |
198.7376154 | 198.7376099 | 4 |
1987.376154 | 1987.376099 | 3 |
19873.76154 | 19873.76172 | 2 |
198737.6154 | 198737.6094 | 1 |
4 Experiments and Results
4.1 Approximation of Weights in Pre-Trained Models
4.1.1 Weight Approximation Using Exponent Magnitude.
4.1.2 Weight Approximation Using Exponent Frequencies.
4.1.3 Mantissa Approximation in Weights.
4.2 Results of ESART
4.2.1 Comparison of Models Generated after the Proposed Exponent- and Mantissa-Based Weight Approximations.
4.3 Comparison with Other Model Compression Methods
4.3.1 Accuracy.
4.3.2 Memory Savings.
Model | Model Size (MB) | |
---|---|---|
Original | ESART | |
ResNet18 | 21.30 | 14.22 |
VGG11-BN | 53.66 | 34.89 |
DenseNet121 | 13.19 | 9.07 |
MobileNetV2 | 4.23 | 2.60 |
4.4 Execution Time and Energy at Inference
4.4.1 Inference Time.
Name | Pre-Trained | ESART Processed Model | ||
---|---|---|---|---|
Original | Cache Flush | Original | Cache Flush | |
Self CPU time total | 8.224 ms | 11.648 ms | 12.706 ms | 13.008 ms |
Self CUDA time total | 4.157 ms | 4.157 ms | 7.450 ms | 7.459 ms |
4.4.2 Energy Consumption.
4.4.3 Execution Overhead of ESART.
Model | Time (Minutes) | Precision of Decimal Digits (k) | ||
---|---|---|---|---|
Training (100 Epochs) | ESART (10 Epochs) | Overhead (%) | ||
ResNet18 | 17.733 | 2.619 | 14.772 | 2 |
VGG11-BN | 14.717 | 2.057 | 13.978 | 3 |
DenseNet121 | 42.876 | 9.07 | 21.154 | 3 |
MobileNetV2 | 35.067 | 7.513 | 21.425 | 2 |
4.5 Results on FPGA
Name | Original | ESART with Pipeline | |
---|---|---|---|
Clock Cycles | 6,090,756 | 5,336,520 | |
Power (W) | 0.249 | 0.24 | |
Resource Utilization on Zynq | BRAM_18K | 29 | 28 |
DSP48E | 25 | 28 | |
FF | 6,447 | 6,277 | |
LUT | 5,693 | 5,866 |
5 Conclusion and Future Work
References
Index Terms
- Combining Weight Approximation, Sharing and Retraining for Neural Network Model Compression
Recommendations
ESCEPE: Early-Exit Network Section-Wise Model Compression Using Self-distillation and Weight Clustering
EdgeSys '23: Proceedings of the 6th International Workshop on Edge Systems, Analytics and NetworkingDeploying deep learning models on resource-constrained (edge) devices is challenging due to their high computational demands and large model sizes. Early-exit neural networks are one of the approaches to make deep learning models more efficient for ...
Differential Weight Quantization for Multi-Model Compression
Low bit-width quantization can effectively reduce the storage and computational costs of deep neural networks. Existing quantization methods are commonly designed for single model compression. For multi-model compression scenarios, multiple models for the ...
An Efficient Model Compression Method of Pruning for Object Detection
ICCAI '20: Proceedings of the 2020 6th International Conference on Computing and Artificial IntelligenceIn this paper, we propose an efficient model compression method for object detection network. The key to this method is that we combine pruning and training into a single process. This design benefits in two aspects. First, we have a full control on ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Publisher
Association for Computing Machinery
New York, NY, United States
Journal Family
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
Funding Sources
- DST-INRIA-CNRS
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 287Total Downloads
- Downloads (Last 12 months)287
- Downloads (Last 6 weeks)120
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in