Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

SensiX++: Bringing MLOps and Multi-tenant Model Serving to Sensory Edge Devices

Published: 09 November 2023 Publication History

Abstract

We present SensiX++, a multi-tenant runtime for adaptive model execution with integrated MLOps on edge devices, e.g., a camera, a microphone, or IoT sensors. SensiX++ operates on two fundamental principles: highly modular componentisation to externalise data operations with clear abstractions and document-centric manifestation for system-wide orchestration. First, a data coordinator manages the lifecycle of sensors and serves models with correct data through automated transformations. Next, a resource-aware model server executes multiple models in isolation through model abstraction, pipeline automation, and feature sharing. An adaptive scheduler then orchestrates the best-effort executions of multiple models across heterogeneous accelerators, balancing latency and throughput. Finally, microservices with REST APIs serve synthesised model predictions, system statistics, and continuous deployment. Collectively, these components enable SensiX++ to serve multiple models efficiently with fine-grained control on edge devices while minimising data operation redundancy, managing data and device heterogeneity, and reducing resource contention. We benchmark SensiX++ with 10 different vision and acoustics models across various multi-tenant configurations on different edge accelerators (Jetson AGX and Coral TPU) designed for sensory devices. We report on the overall throughput and quantified benefits of various automation components of SensiX++ and demonstrate its efficacy in significantly reducing operational complexity and lowering the effort to deploy, upgrade, reconfigure, and serve embedded models on edge devices.

References

[1]
2021. BentoML. (2021). Retrieved August 8, 2023 from https://www.bentoml.ai
[2]
2021. Coral Keyphrase Detector. (2021). Retrieved August 8, 2023 from https://github.com/google-coral/project-keyword-spotter
[3]
2021. ElectrifAI. (2021). Retrieved August 8, 2023 from https://electrifai.net
[4]
2021. Emotion Classification. (2021). Retrieved August 8, 2023 from https://github.com/Data-Science-kosta/Speech-Emotion-Classification-with-PyTorch/
[5]
2021. KubeFlow. (2021). Retrieved August 8, 2023 from https://www.kubeflow.org
[6]
2021. LevelDB. (2021). Retrieved August 8, 2023 from https://github.com/google/leveldb
[7]
2021. Michelangelo. (2021). Retrieved August 8, 2023 from https://eng.uber.com/michelangelo-machine-learning-platform/
[8]
2021. SageMaker. (2021). Retrieved August 8, 2023 from https://aws.amazon.com/sagemaker/
[10]
Utku Günay Acer, Marc van den Broeck, Chulhong Min, Mallesham Dasari, and Fahim Kawsar. 2022. The city as a personal assistant: Turning urban landmarks into conversational agents for serving hyper local information. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 2 (July2022), Article 40, 31 pages. DOI:
[11]
Mattia Antonini, Miguel Pincheira, Massimo Vecchio, and Fabio Antonelli. 2022. Tiny-MLOps: A framework for orchestrating ML applications at the far edge of IoT systems. In Proceedings of the 2022 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS’22). 1–8. DOI:
[12]
Mattia Antonini, Tran Huy Vu, Chulhong Min, Alessandro Montanari, Akhil Mathur, and Fahim Kawsar. 2019. Resource characterisation of personal-scale sensing models on edge accelerators. In Proceedings of the 1st International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (AIChallengeIoT’19). ACM, New York, NY, 49–55. DOI:
[13]
Tayebeh Bahreini and Daniel Grosu. 2017. Efficient placement of multi-component applications in edge computing systems. In Proceedings of the 2nd ACM/IEEE Symposium on Edge Computing (SEC’17). Association for Computing Machinery, New York, NY, Article 5, 11 pages. DOI:
[14]
Abu Bakar, Tousif Rahman, Alessandro Montanari, Jie Lei, Rishad Shafik, and Fahim Kawsar. 2022. Logic-based intelligence for batteryless sensors. In Proceedings of the 23rd Annual International Workshop on Mobile Computing Systems and Applications (HotMobile’22). Association for Computing Machinery, New York, NY, 22–28. DOI:
[15]
Abu Bakar, Tousif Rahman, Rishad Shafik, Fahim Kawsar, and Alessandro Montanari. 2022. Adaptive intelligence for batteryless sensors using software-accelerated tsetlin machines. In Proceedings of the 20th Conference on Embedded Networked Sensor Systems (SenSys’22). Association for Computing Machinery, New York, NY. DOI:
[16]
Sourav Bhattacharya and Nicholas D. Lane. 2016. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM (SenSys’16). Association for Computing Machinery, New York, NY, 176–189. DOI:
[17]
Henrik Blunck, Niels Olof Bouvin, Tobias Franke, Kaj Grønbæk, Mikkel B. Kjaergaard, Paul Lukowicz, and Markus Wüstenberg. 2013. On heterogeneity in mobile sensing applications aiming at representative data collection. In Proceedings of the 2013 ACM Conference on Pervasive and Ubiquitous Computing Adjunct Publication (UbiComp’13 Adjunct). Association for Computing Machinery, New York, NY, 1087–1098. DOI:
[18]
Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, and Andrew Zisserman. 2018. VGGFace2: A dataset for recognising faces across pose and age. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG’18). 67–74. DOI:
[19]
Daniel Crankshaw, Peter Bailis, Joseph E. Gonzalez, Haoyuan Li, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan. 2015. The missing piece in complex analytics: Low latency, scalable model management and serving with velox. In Proceedings of the 7th Biennial Conference on Innovative Data Systems Research (CIDR’15). www.cidrdb.org. http://cidrdb.org/cidr2015/Papers/CIDR15_Paper19u.pdf
[20]
Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A low-latency online prediction serving system. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI’17). USENIX Association, Boston, MA, 613–627. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/crankshaw
[21]
Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. NestDNN: Resource-aware multi-tenant on-device deep learning for continuous mobile vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom’18). Association for Computing Machinery, New York, NY, 115–127. DOI:
[22]
Petko Georgiev, Nicholas D. Lane, Kiran K. Rachuri, and Cecilia Mascolo. 2016. LEO: Scheduling sensor inference algorithms across heterogeneous mobile processors and network resources. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking (MobiCom’16). Association for Computing Machinery, New York, NY, 320–333. DOI:
[23]
Kiryong Ha, Zhuo Chen, Wenlu Hu, Wolfgang Richter, Padmanabhan Pillai, and Mahadev Satyanarayanan. 2014. Towards wearable cognitive assistance. In Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’14). ACM, New York, NY, 68–81. DOI:
[24]
Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. https://arxiv.org/abs/1510.00149
[25]
Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, and Arvind Krishnamurthy. 2016. MCDNN: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’16). Association for Computing Machinery, New York, NY, 123–136. DOI:
[26]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity Mappings in Deep Residual Networks. (2016). DOI:
[27]
Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2018. Densely connected convolutional networks. https://arxiv.org/abs/1608.06993
[28]
Shawn Hymel, Colby Banbury, Daniel Situnayake, Alex Elium, Carl Ward, Mat Kelcey, Mathijs Baaijens, Mateusz Majchrzycki, Jenny Plunkett, David Tischler, Alessandro Grande, Louis Moreau, Dmitry Maslov, Artie Beavis, Jan Jongboom, and Vijay Janapa Reddi. 2022. Edge Impulse: An MLOps Platform for Tiny Machine Learning. Retrieved from
[29]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).
[30]
Joo Seong Jeong, Jingyu Lee, Donghyun Kim, Changmin Jeon, Changjin Jeong, Youngki Lee, and Byung-Gon Chun. 2022. Band: Coordinated multi-DNN inference on heterogeneous mobile processors. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services (MobiSys’22). Association for Computing Machinery, New York, NY, 235–247. DOI:
[31]
Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. SIGARCH Compututer Architecture. News 45, 1 (April2017), 615–629. DOI:
[32]
Fahim Kawsar, Chulhong Min, Akhil Mathur, and Alesandro Montanari. 2018. Earables for personal-scale behavior analytics. IEEE Pervasive Computing 17, 3 (July2018), 83–89. DOI:
[33]
Jong Hwan Ko, Taesik Na, Mohammad Faisal Amir, and Saibal Mukhopadhyay. 2018. Edge-host partitioning of deep neural networks with feature space encoding for resource-constrained internet-of-things platforms. In Proceedings of the 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’18). 1–6. DOI:
[34]
Nicholas D. Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. 2016. DeepX: A software accelerator for low-power deep learning inference on mobile devices. In Proceedings of the 15th International Conference on Information Processing in Sensor Networks (IPSN’16). IEEE Press, Article 23, 12 pages.
[35]
Nicholas D. Lane, Emiliano Miluzzo, Hong Lu, Daniel Peebles, Tanzeem Choudhury, and Andrew T. Campbell. 2010. A survey of mobile phone sensing. IEEE Communications Magazine 48, 9 (September2010), 140–150. DOI:
[36]
Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convNets. https://arxiv.org/abs/1608.08710
[37]
Dawei Liang and Edison Thomaz. 2019. Audio-based activities of daily living (ADL) recognition with large-scale acoustic embeddings from online videos. Proceedings of the ACM Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 1 (March2019), Article 17, 18 pages. DOI:
[38]
Sicong Liu, Yingyan Lin, Zimu Zhou, Kaiming Nan, Hui Liu, and Junzhao Du. 2018. On-demand deep model compression for mobile devices: A usage-driven model selection framework. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’18). Association for Computing Machinery, New York, NY, 389–400. DOI:
[39]
Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2018. Rethinking the value of network pruning. https://arxiv.org/abs/1810.05270
[40]
Akhil Mathur, Anton Isopoussu, Fahim Kawsar, Nadia Berthouze, and Nicholas D. Lane. 2019. Mic2Mic: Using cycle-consistent generative adversarial networks to overcome microphone variability in speech systems. In Proceedings of the 18th International Conference on Information Processing in Sensor Networks (IPSN’19). Association for Computing Machinery, New York, NY, 169–180. DOI:
[41]
Akhil Mathur, Nicholas D. Lane, Sourav Bhattacharya, Aidan Boran, Claudio Forlivesi, and Fahim Kawsar. 2017. Deepeye: Resource efficient local execution of multiple deep vision models using wearable commodity hardware. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. 68–81.
[42]
Akhil Mathur, Tianlin Zhang, Sourav Bhattacharya, Petar Veličković, Leonid Joffe, Nicholas D. Lane, Fahim Kawsar, and Pietro Lió. 2018. Using deep data augmentation training to address software and hardware heterogeneities in wearable and smartphone sensing devices. In Proceedings of the 17th ACM/IEEE International Conference on Information Processing in Sensor Networks. IEEE Press, 200–211.
[43]
Chulhong Min, Akhil Mathur, Alessandro Montanari, and Fahim Kawsar. 2019. An early characterisation of wearing variability on motion signals for wearables. In Proceedings of the 23rd International Symposium on Wearable Computers (ISWC’19). ACM, New York, NY, 166–168. DOI:
[44]
Chulhong Min, Akhil Mathur, Alessandro Montanari, and Fahim Kawsar. 2022. SensiX: A system for best-effort inference of machine learning models in multi-device environments. IEEE Transactions on Mobile Computing 22, 9 (2022), 5525–5538. DOI:
[45]
Chulhong Min, Alessandro Montanari, Akhil Mathur, and Fahim Kawsar. 2019. A closer look at quality-aware runtime assessment of sensing models in multi-device environments. In Proceedings of the 17th Conference on Embedded Networked Sensor Systems (SenSys’19). ACM, New York, NY, 271–284. DOI:
[46]
Akshay Naresh Modi, Chiu Yuen Koo, Chuan Yu Foo, Clemens Mewald, Denis M. Baylor, Eric Breck, Heng-Tze Cheng, Jarek Wilkiewicz, Levent Koc, Lukasz Lew, Martin A. Zinkevich, Martin Wicke, Mustafa Ispir, Neoklis Polyzotis, Noah Fiedel, Salem Elie Haykal, Steven Whang, Sudip Roy, Sukriti Ramesh, Vihan Jain, Xin Zhang, and Zakaria Haque. 2017. TFX: A TensorFlow-based production-scale machine learning platform. In Proceedings of KDD 2017.
[47]
Alessandro Montanari, Mohammed Alloulah, and Fahim Kawsar. 2019. Degradable inference for energy autonomous vision applications. In Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers. 592–597.
[48]
Alessandro Montanari, Fredrika Kringberg, Alice Valentini, Cecilia Mascolo, and Amanda Prorok. 2018. Surveying areas in developing regions through context aware drone mobility. In Proceedings of the 4th ACM Workshop on Micro Aerial Vehicle Networks, Systems, and Applications. 27–32.
[49]
Alessandro Montanari, Afra Mashhadi, Akhil Mathur, and Fahim Kawsar. 2016. Understanding the privacy design space for personal connected objects. In Proceedings of the 30th International BCS Human Computer Interaction Conference 30. 1–13.
[50]
Alessandro Montanari, Manuja Sharma, Dainius Jenkus, Mohammed Alloulah, Lorena Qendro, and Fahim Kawsar. 2020. ePerceptive: Energy reactive embedded intelligence for batteryless sensors. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems (SenSys’20). Association for Computing Machinery, New York, NY, 382–394. DOI:
[51]
Arthur Moss, Hyunjong Lee, Lei Xun, Chulhong Min, Fahim Kawsar, and Alessandro Montanari. 2022. Ultra-low power DNN accelerators for IoT: Resource characterization of the MAX78000. In Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems (SenSys’22), 934–940.
[52]
Deepak Narayanan, Keshav Santhanam, Amar Phanishayee, and Matei Zaharia. 2018. Accelerating deep learning workloads through efficient multi-model execution. In NeurIPS Workshop on Systems for Machine Learning. 20.
[53]
Francisco Javier Ordóñez and Daniel Roggen. 2016. Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 16, 1 (2016). DOI:
[54]
Andrei Paleyes, Raoul-Gabriel Urma, and Neil D. Lawrence. 2020. Challenges in deploying machine learning: A survey of case studies. CoRR abs/2011.09926 (2020). arxiv:2011.09926.https://arxiv.org/abs/2011.09926
[55]
Philipp Raith and Schahram Dustdar. 2021. Edge intelligence as a service. In Proceedings of the 2021 IEEE International Conference on Services Computing (SCC’21). 252–262. DOI:
[56]
Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu-Yeon Wei, and David Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA’16). IEEE Press, 267–278. DOI:
[57]
Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An incremental improvement. https://arxiv.org/abs/1804.02767
[58]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2019. MobileNetV2: Inverted residuals and linear bottlenecks. https://arxiv.org/abs/1801.04381
[59]
Mahadev Satyanarayanan. 2017. The emergence of edge computing. Computer 50, 1 (2017), 30–39. DOI:
[60]
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (NIPS’15). MIT Press, Cambridge, MA, 2503–2511.
[61]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the Inception Architecture for Computer Vision. (2015). DOI:
[62]
Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS’17). 328–339. DOI:
[63]
Juheon Yi, Chulhong Min, and Fahim Kawsar. 2021. Vision paper: Towards software-defined video analytics with cross-camera collaboration. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems (SenSys’21). Association for Computing Machinery, New York, NY, 474–477. DOI:
[64]
Michael Zhu and Suyog Gupta. 2017. To prune, or not to prune: Exploring the efficacy of pruning for model compression. https://arxiv.org/abs/1710.01878

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 22, Issue 6
November 2023
428 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3632298
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 09 November 2023
Online AM: 07 September 2023
Accepted: 29 July 2023
Revised: 28 July 2023
Received: 01 April 2022
Published in TECS Volume 22, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MLOps
  2. multi-tenancy
  3. model serving
  4. edge

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 387
    Total Downloads
  • Downloads (Last 12 months)334
  • Downloads (Last 6 weeks)43
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media