Model Compression Methods and Their Applications
- Li, Zhijian
- Advisor(s): Xin, Jack
Abstract
Model compression is an essential technique to host deep neural networks in places with limited computing resources, such as mobile devices and autonomous driving cars. Despite the surge of studies on model compression, there are still plenty of open research issues in this area. This thesis adapts model compression technique into adversarial attacking and knowledge distillation circumstances, improves some existing methods, and develops a few novel algorithms.We apply quantization-aware training to adversarially trained model. We develop an algorithm that pursues both robustness and efficiency. Moreover, we discover the relationship between sparsity (from quantization) and adversarial training, and we develop a trade-off loss function to increase such sparsity as well as the natural accuracy.We improved quantized distillation by improving its loss function and introducing feature affinity loss \cite{FALoss}. This Johnson–Lindenstrauss lemma backed loss can significantly improve the performance knowledge distillation. We notice that feature affinity loss brings some additional cost of computing, which becomes not negligible as the resolution of image increases. We propose a fast feature affinity loss that can efficiently and accurately approximate feature affinity loss. Finally, we develop a novel algorithm that integrates channel pruning and quantization. This method pushes the limit of efficiency by searching quantized weight under sparsification constraint. We stabilize this algorithm by using a complementary transformed$-l_1$ loss to avoid any complete zero layer. Users can choose the level of sparsity constraint to trade off performance and efficiency.