IoTNet: An Efficient and Accurate Convolutional Neural Network for IoT Devices
<p>Our model dubbed IoTNet is distinctive from other related works as it uses pairs of 1 × 3 and 3 × 1 standard convolutions, rather than 3 × 3 standard convolutions typically found in large models, or depth-wise separable convolutions used in efficiency-focused models.</p> "> Figure 2
<p>MobileNet [<a href="#B3-sensors-19-05541" class="html-bibr">3</a>] uses depth-wise separable convolutions. DWise denotes depth-wise convolution. Skip connections are not used.</p> "> Figure 3
<p>ShuffleNet [<a href="#B4-sensors-19-05541" class="html-bibr">4</a>] uses a 3 × 3 convolution for the depth-wise phase of the convolution which is performed after a channel shuffle. DWConv denotes depth-wise convolution. This architecture uses skip connections.</p> "> Figure 4
<p>MobileNetV2 [<a href="#B6-sensors-19-05541" class="html-bibr">6</a>] uses a 3 × 3 convolution for the depth-wise phase of the convolution and makes use of skip connections. DWise indicates depth-wise convolution.</p> "> Figure 5
<p>LiteNet [<a href="#B20-sensors-19-05541" class="html-bibr">20</a>] takes an inception block and replaces one of the 1x2 convolutions and one of the 1 × 3 convolutions with their depth-wise counterparts, respectively.</p> "> Figure 6
<p>EffNet [<a href="#B5-sensors-19-05541" class="html-bibr">5</a>] uses 1 × 3 and 3 × 1 depth-wise separable convolutions to reduce model complexity. DWConv denotes depth-wise convolution.</p> "> Figure 7
<p>A standard convolution uses a kernel which extends the entire depth of an input.</p> "> Figure 8
<p>In the depth-wise phase, multiple kernels are used to exploit the entire depth of an input as each kernel only spans one channel.</p> "> Figure 9
<p>In the point-wise phase, a standard convolution is performed on the intermediate output from the depth-wise phase.</p> "> Figure 10
<p>Our network block contains a batch normalization, followed by a pair of 1 × 3 and 3 × 1 standard convolutions. Each convolution is preceded with a ReLU. Each block also contains a skip connection [<a href="#B13-sensors-19-05541" class="html-bibr">13</a>].</p> "> Figure 11
<p>Our network width is controlled by a widening factor <span class="html-italic">k</span>. Resolution is reduced within the first blocks of groups two and three if present.</p> "> Figure 12
<p>Example images extracted from the CIFAR-10 data set.</p> "> Figure 13
<p>Example images extracted from the SVHN data set.</p> "> Figure 14
<p>Example images extracted from the GTSRB data set.</p> "> Figure 15
<p>The imbalanced class distributions within the GTSRB data set.</p> "> Figure 16
<p>CIFAR-10: The proposed model based on 1 × 3 and 3 × 1 convolution pairs compared with a 3 × 3-based approach. Both variants are scaled to match in terms of FLOPs ranging from 1 to 10 million.</p> "> Figure 17
<p>SVHN: The proposed model based on 1 × 3 and 3 × 1 convolution pairs compared with a 3 × 3-based approach. Both variants are scaled to match in terms of FLOPs ranging from 1 to 10 million.</p> "> Figure 18
<p>GTSRB: The proposed model based on 1 × 3 and 3 × 1 convolution pairs compared with a 3 × 3-based approach. Both variants are scaled to match in terms of FLOPs ranging from 1 to 10 million.</p> ">
Abstract
:1. Introduction
- We propose a new architecture, namely IoTNet, which is designed specifically for performance constrained environments such as IoT devices, smartphones or embedded systems. It trades accuracy with a reduction in computational cost differently from existing methods by employing novel pairs of 1 × 3 and 3 × 1 normal convolutions, rather than using depth-wise separable convolutions.
- An in-depth comparison of the proposed architecture against efficiency-focused models including MobileNet [3], MobileNetV2 [6], ShuffleNet [4] and EffNet [5] has been conducted using CIFAR-10 [7], Street View House Numbers (SVHN) [8] and German Traffic Sign Recognition Benchmark (GTSRB) [9] data sets. The empirical results indicate that the proposed block architecture constructed exclusively from pairs of 1 × 3 and 3 × 1 normal convolutions, with average pooling for downsampling, outperforms the current state-of-the-art depth-wise separable convolution-based architectures in terms of accuracy and cost.
- A direct comparison of pairs of 1 × 3 and 3 × 1 normal convolutions against 3 × 3 standard convolutions has also been conducted. The empirical results indicate that our approach results in a more accurate and efficient architecture than a scaled-down large state-of-the-art network.
1.1. Related Work
1.2. Distinction Between Standard Convolutions and Depth-Wise Separable Convolutions
1.2.1. Standard Convolution
1.2.2. Depth-Wise Separable Convolution
2. Materials and Methods
2.1. Approach to Identify Candidate Models
2.2. Complexity Analysis
3. Results
3.1. Data Sets
3.1.1. CIFAR-10
3.1.2. SVHN
3.1.3. GTSRB
3.1.4. Training Scheme
3.2. Comparison Against Efficiency-Focused Benchmark Models
3.2.1. Evaluation Using CIFAR-10
3.2.2. Evaluation Using SVHN
3.2.3. Evaluation Using GTSRB
3.3. Evaluation Against 3 × 3 Standard Convolutions
3.3.1. Evaluation against 3 × 3 Standard Convolution-Based Models Using CIFAR-10
3.3.2. Evaluation against 3 × 3 Standard Convolution-Based Models Using SVHN
3.3.3. Evaluation against 3 × 3 Standard Convolution-Based Models Using GTSRB
3.3.4. Computational Comparison
3.4. Discussion
4. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Wei, G.; Li, G.; Zhao, J.; He, A. Development of a LeNet-5 Gas Identification CNN Structure for Electronic Noses. Sensors 2019, 19, 217. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Freeman, I.; Roese-Koerner, L.; Kummert, A. Effnet: An efficient structure for convolutional neural networks. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 6–10. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features From Tiny Images; Technical Report; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
- Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; Ng, A. Reading Digits in Natural Images with Unsupervised Feature Learning. 2011. Available online: http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf (accessed on 12 December 2019).
- Stallkamp, J.; Schlipsing, M.; Salmen, J.; Igel, C. The German Traffic Sign Recognition Benchmark: A multi-class classification competition. In Proceedings of the IEEE International Joint Conference on Neural Networks, San Jose, CA, USA, 31 July–5 August 2011; pp. 1453–1460. [Google Scholar]
- Karray, F.; Jmal, M.W.; Garcia-Ortiz, A.; Abid, M.; Obeid, A.M. A comprehensive survey on wireless sensor node hardware platforms. Comput. Netw. 2018, 144, 89–110. [Google Scholar] [CrossRef]
- Saia, R.; Carta, S.; Recupero, D.R.; Fenu, G. Internet of entities (IoE): A blockchain-based distributed paradigm for data exchange between wireless-based devices. In Proceedings of the 8th International Conference on Sensor Networks, SENSORNETS 2019, Prague, Czech Republic, 26–27 February 2019; pp. 77–84. [Google Scholar]
- Castaño, F.; Strzelczak, S.; Villalonga, A.; Haber, R.E.; Kossakowska, J. Sensor Reliability in Cyber-Physical Systems Using Internet-of-Things Data: A Review and Case Study. Remote Sens. 2019, 11, 2252. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Zagoruyko, S.; Komodakis, N. Wide residual networks. arXiv 2016, arXiv:1605.07146. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems; NIPS: Vancouver, BC, Canada, 2012; pp. 1097–1105. [Google Scholar]
- Han, D.; Kim, J.; Kim, J. Deep pyramidal residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5927–5935. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- He, Z.; Zhang, X.; Cao, Y.; Liu, Z.; Zhang, B.; Wang, X. LiteNet: Lightweight neural network for detecting arrhythmias at resource-constrained mobile devices. Sensors 2018, 18, 1229. [Google Scholar] [CrossRef] [Green Version]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Moody, G.B.; Mark, R.G. The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol. Mag. 2001, 20, 45–50. [Google Scholar] [CrossRef]
- Gaikwad, A.S.; El-Sharkawy, M. Pruning convolution neural network (squeezenet) using taylor expansion-based criterion. In Proceedings of the 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Louisville, KY, USA, 6–8 December 2018; pp. 1–5. [Google Scholar] [CrossRef]
- Singh, P.; Kadi, V.S.R.; Verma, N.; Namboodiri, V.P. Stability Based Filter Pruning for Accelerating Deep CNNs. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Village, HI, USA, 7–11 January 2019; pp. 1166–1174. [Google Scholar] [CrossRef] [Green Version]
- Galar, M.; Fernández, A.; Barrenechea, E.; Bustince, H.; Herrera, F. Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets. Inf. Sci. 2016, 354, 178–196. [Google Scholar] [CrossRef]
- Lin, S.; Ji, R.; Li, Y.; Deng, C.; Li, X. Toward Compact ConvNets via Structure-Sparsity Regularized Filter Pruning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 1–15. [Google Scholar] [CrossRef] [PubMed]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. LeNet-5, Convolutional Neural Networks. 2015, Volume 20, p. 5. Available online: Http://yann.Lecun.Com/exdb/lenet (accessed on 15 November 2019).
- LeCun, Y.; Cortes, C. MNIST Handwritten Digit Database. 2010. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 15 November 2019).
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. (IJCV) 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems; NIPS: Vancouver, BC, Canada, 2015; pp. 91–99. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Real, E.; Aggarwal, A.; Huang, Y.; Le, Q.V. Regularized evolution for image classifier architecture search. arXiv 2018, arXiv:1802.01548. [Google Scholar] [CrossRef] [Green Version]
- Zhong, Z.; Yan, J.; Wu, W.; Shao, J.; Liu, C. Practical Block-Wise Neural Network Architecture Generation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 19–21 June 2018; pp. 2423–2432. [Google Scholar] [CrossRef] [Green Version]
- Wang, B.; Sun, Y.; Xue, B.; Zhang, M. Evolving deep convolutional neural networks by variable-length particle swarm optimization for image classification. In Proceedings of the 2018 IEEE Congress on Evolutionary Computation (CEC). IEEE, Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
- Bochinski, E.; Senst, T.; Sikora, T. Hyper-parameter optimization for convolutional neural network committees based on evolutionary algorithms. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3924–3928. [Google Scholar] [CrossRef] [Green Version]
- Sun, Y.; Xue, B.; Zhang, M.; Yen, G.G. Automatically designing CNN architectures using genetic algorithm for image classification. arXiv 2018, arXiv:1808.03818. [Google Scholar]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Tokic, M. Adaptive ε-greedy exploration in reinforcement learning based on value differences. In Annual Conference on Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2010; pp. 203–210. [Google Scholar]
- Baker, B.; Gupta, O.; Naik, N.; Raskar, R. Designing neural network architectures using reinforcement learning. arXiv 2016, arXiv:1611.02167. [Google Scholar]
- Goldberg, D.E.; Deb, K. A comparative analysis of selection schemes used in genetic algorithms. In Foundations of Genetic Algorithms; Elsevier: Amsterdam, The Netherlands, 1991; Volume 1, pp. 69–93. [Google Scholar]
- Liu, Z.; Wu, B.; Luo, W.; Yang, X.; Liu, W.; Cheng, K.T. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 722–737. [Google Scholar]
- Umuroglu, Y.; Fraser, N.J.; Gambardella, G.; Blott, M.; Leong, P.; Jahre, M.; Vissers, K. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2017; pp. 65–74. [Google Scholar]
- Wang, N.; Choi, J.; Brand, D.; Chen, C.Y.; Gopalakrishnan, K. Training deep neural networks with 8-bit floating point numbers. In Advances in Neural Information Processing Systems; NIPS: Vancouver, BC, Canada, 2018; pp. 7675–7684. [Google Scholar]
- Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 525–542. [Google Scholar]
- Cubuk, E.D.; Zoph, B.; Mane, D.; Vasudevan, V.; Le, Q.V. AutoAugment: Learning Augmentation Strategies From Data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 113–123. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic Differentiation in PyTorch. In NIPS Autodiff Workshop; NIPS: Vancouver, BC, Canada, 2017. [Google Scholar]
- Tan, T.Y.; Zhang, L.; Lim, C.P.; Fielding, B.; Yu, Y.; Anderson, E. Evolving Ensemble Models for Image Segmentation Using Enhanced Particle Swarm Optimization. IEEE Access 2019, 7, 34004–34019. [Google Scholar] [CrossRef]
- Tan, T.Y.; Zhang, L.; Lim, C.P. Adaptive melanoma diagnosis using evolving clustering, ensemble and deep neural networks. Knowl.-Based Syst. 2019, 187, 104807. [Google Scholar] [CrossRef]
- Tan, T.Y.; Zhang, L.; Lim, C.P. Intelligent skin cancer diagnosis using improved particle swarm optimization and deep learning models. Appl. Soft Comput. 2019, 84, 105725. [Google Scholar] [CrossRef]
Model | Kernel | Convolution Type | Emphasis | Methodologies and Strengths | Drawbacks |
---|---|---|---|---|---|
AlexNet [15] | mixed | standard | accuracy | Demonstrated how the model depth was essential for performance | Contained large kernels which are less efficient. Outperformed by subsequent studies |
ResNet [13] | 3 × 3 | standard | accuracy | Used skip connections to enable training deeper networks | A slim but deep state-of-the-art model, not designed for constrained environments |
Inception [17,18] | mixed | standard | accuracy | Trained deeper networks using sparsely connected network architectures, i.e., by using a variety of kernel sizes side by side | The employed side-by-side model increased model complexity |
WideResnet [14] | 3 × 3 | standard | accuracy | Demonstrated that widening a residual network can decrease its depth and improve its performance | A state-of-the-art model, not designed for constrained environments. Less efficient at smaller scales than our approach |
PyramidNet [16] | 3 × 3 | standard | accuracy | Gradually increasing the feature map size of deep networks led to performance improvements on ResNet | A deep state-of-the-art model, not designed for constrained environments. Gradual depth increase led to a larger model size |
MobileNet [3,6] | 3 × 3 | depth-wise | efficiency | Traded accuracy with efficiency by using depth-wise separable convolutions | Contained bottlenecks during downsampling which impeded data flow |
ShuffleNet [4] | 3 × 3 | depth-wise | efficiency | Shuffling channels helped information flowing when performing depth-wise separable convolutions | Shuffle resulted in additional operations and contained bottlenecks which impeded data flow |
EffNet [5] | 1 × 3 and 3 × 1 | depth-wise | efficiency | Factorized 3 × 3 depth-wise convolutions into 1 × 3 and 3 × 1 depth-wise convolutions to reduce complexity. Addressed bottlenecks of prior efficiency-focused models | Based on depth-wise separable convolutions which traded accuracy with efficiency less optimally than our approach |
LiteNet [20] | 1 × 2 and 1 × 3 | depth-wise and standard | efficiency | Combined ideas from Inception and MobileNet | A combination of drawbacks of Inception and MobileNet (see above) |
Ours | 1 × 3 and 3 × 1 | standard | efficiency | Factorized 3 × 3 into 1 × 3 and 3 × 1 standard convolutions to retain the strength of standard convolutions, i.e., superior performance while reducing model complexity | Designed for constrained environments and not to outperform state-of-the-art accuracy-focused models in extremely large configurations on GPU machines. |
Model | Widening Factor k | Mean Acc | Mil. FLOPs |
---|---|---|---|
EffNet V1 large | 0.99 | 85.02% | 79.8 |
MobileNet large | 0.14 | 78.18% | 11.6 |
ShuffleNet large | 0.14 | 77.90% | 11.1 |
EffNet V1 | 0.14 | 80.20% | 11.4 |
EffNet V2 | 0.22 | 81.67% | 18.1 |
MobileNetV2 | 0.20 | 76.47% | 16.4 |
IoTNet-3-4 | 0.7 | 89.9% | 9.9 |
MobileNet | 0.07 | 77.48% | 5.8 |
ShuffleNet | 0.06 | 77.3% | 4.7 |
IoTNet-3-2 | 0.68 | 87.19% | 4.2 |
Data Set | Total Sample Size | Training Samples | Test Samples | Image Resolution |
---|---|---|---|---|
CIFAR-10 | 60,000 | 50,000 | 10,000 | 32 × 32 |
SVHN | 99,289 | 73,257 | 26,032 | 32 × 32 |
GTSRB | 51,839 | 39,209 | 12,630 | 32 × 32 |
Model Name | Brief Description |
---|---|
IoTNet-g-n | The proposed model with g as the number of groups, and n as the number of blocks per group |
EffNet V1 | An implementation of EffNet [5]. Model architecture contains 1 × 3 and 3 × 1 depth-wise separable convolution and pooling-based blocks |
EffNet V1 large | As per EffNet V1 with two additional layers and more channels |
EffNet V2 | As per EffNet V1, introduced also in [5] in response to MobileNetV2, model contains minor changes relating to network expansion, extension rates (depth and width) and the replacement of ReLU on the point-wise layers with leaky ReLU |
MobileNet | An implementation of MobileNet [3] of varying widths. Model architecture contains 3 × 3 depth-wise separable convolutions |
MobileNet large | As per MobileNet implementation with two extra layers |
MobileNetV2 | An implementation of MobileNetV2 [6] of varying widths. Model contains 3 × 3 depth-wise convolutions and inverted residual structures where shortcut connections are between bottleneck layers |
ShuffleNet | An implementation of ShuffleNet [4] of varying widths. Model contains 3 × 3 depth-wise convolutions in addition to point-wise group convolution and channel shuffle |
ShuffleNet large | As per ShuffleNet implementation with two extra layers |
Model | Widening Factor k | Mean Acc | Mil. FLOPs |
---|---|---|---|
IoTNet-3-2 | 1.08 | 89.79% | 11 |
IoTNet-3-4 | 0.7 | 89.9% | 9.9 |
IoTNet-3-3 | 0.66 | 88.98% | 6.2 |
IoTNet-3-2 | 0.68 | 87.19% | 4.2 |
IoTNet-3-2 | 0.5 | 81.47% | 2.6 |
IoTNet-3-3 | 0.41 | 83.49% | 2.5 |
Model | Acc Improvement | FLOPs Saving |
---|---|---|
EffNet V1 large | 4.88% | 87.59% |
MobileNet large | 11.72% | 14.66% |
ShuffleNet large | 12.0% | 10.81% |
EffNet V1 | 9.7% | 13.16% |
EffNet V2 | 8.23% | 45.3% |
MobileNetV2 | 13.43% | 39.63% |
MobileNet | 9.71% | 27.59% |
ShuffleNet | 9.89% | 10.64% |
Model | Widening Factor k | Mean Acc | kFLOPs |
---|---|---|---|
IoTNet-3-5 | 0.14 | 89.22% | 499.7 |
IoTNet-3-2 | 0.21 | 88.4% | 474.3 |
Model | Widening Factor k | Mean Acc | kFLOPs |
---|---|---|---|
EffNet V2 | 0.34 | 87.3% | 1204.2 |
MobileNetV2 | 0.33 | 86.71% | 1162.8 |
EffNet V1 | 0.14 | 88.51% | 517.6 |
MobileNet | 0.22 | 85.64% | 773.4 |
ShuffleNet | 0.21 | 82.73% | 733.1 |
IoTNet-3-5 | 0.14 | 89.22% | 499.7 |
Model | Acc Improvement | FLOPs Saving |
---|---|---|
EffNet V2 | 1.92% | 58.5% |
MobileNetV2 | 2.51% | 57.03% |
EffNet V1 | 0.71% | 3.46% |
MobileNet | 3.58% | 35.39% |
ShuffleNet | 6.49% | 31.84% |
Model | Widening Factor k | Mean Acc | kFLOPs |
---|---|---|---|
IoTNet-3-2 | 0.22 | 93.17% | 531.0 |
IoTNet-3-5 | 0.15 | 90.57% | 531.5 |
IoTNet-3-3 | 0.18 | 91.84% | 427.1 |
IoTNet-3-3 | 0.15 | 88.25% | 342.1 |
IoTNet-3-3 | 0.13 | 88.72% | 323.9 |
IoTNet-3-1 | 0.24 | 73.33% | 310.3 |
IoTNet-3-2 | 0.18 | 88.82% | 301.6 |
Model | Widening Factor k | Mean Acc | kFLOPs |
---|---|---|---|
EffNet V2 | 0.3 | 90.4% | 704.5 |
MobileNetV2 | 0.31 | 90.74% | 710.7 |
MobileNet | 0.23 | 88.15% | 533.0 |
ShuffleNet | 0.23 | 88.99% | 540.7 |
IoTNet-3-2 | 0.22 | 93.17% | 531.0 |
EffNet V1 | 0.15 | 91.79% | 344.1 |
IoTNet-3-2 | 0.18 | 88.82% | 301.6 |
Model | Acc Improvement | FLOPs Saving |
---|---|---|
EffNet V2 | 2.77% | 24.63% |
MobileNetV2 | 2.43% | 25.28% |
MobileNet | 5.02% | 0.38% |
ShuffleNet | 4.18% | 1.79% |
EffNet | −2.97% | 12.35% |
Device | CPU | Memory | Operating System | Library |
---|---|---|---|---|
PC | I7-2600k@4GHz | 16 GB | Ubuntu 18.04 LTS | PyTorch 1 |
Raspberry Pi 3 Model B+ | ARM [email protected] | 1 GB | Raspbian Buster 4.19 | PyTorch 1 |
Model | Widening Factor | Data Set | kFLOPs | Memory (MB) | Pi - Time (ms) | PC - Time (ms) |
---|---|---|---|---|---|---|
IoTNet-3-4 | 0.7 | CIFAR-10 | 9900 | 392 | 87.50 | 0.78 |
IoTNet-3-2 | 0.68 | CIFAR-10 | 4200 | 192 | 46.09 | 0.39 |
IoTNet-3-5 | 0.14 | SVHN | 499.7 | 15 | 5.94 | 0.20 |
IoTNet-3-2 | 0.22 | GTSRB | 531.0 | 27 | 4.61 | 0.13 |
IoTNet-3-2 | 0.18 | GTSRB | 301.6 | 14 | 4.06 | 0.16 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lawrence, T.; Zhang, L. IoTNet: An Efficient and Accurate Convolutional Neural Network for IoT Devices. Sensors 2019, 19, 5541. https://doi.org/10.3390/s19245541
Lawrence T, Zhang L. IoTNet: An Efficient and Accurate Convolutional Neural Network for IoT Devices. Sensors. 2019; 19(24):5541. https://doi.org/10.3390/s19245541
Chicago/Turabian StyleLawrence, Tom, and Li Zhang. 2019. "IoTNet: An Efficient and Accurate Convolutional Neural Network for IoT Devices" Sensors 19, no. 24: 5541. https://doi.org/10.3390/s19245541
APA StyleLawrence, T., & Zhang, L. (2019). IoTNet: An Efficient and Accurate Convolutional Neural Network for IoT Devices. Sensors, 19(24), 5541. https://doi.org/10.3390/s19245541