research-article

SensiX++: Bringing MLOps and Multi-tenant Model Serving to Sensory Edge Devices

Authors:

Utku Günay Acer,

Alessandro Montanari,

Fahim KawsarAuthors Info & Claims

ACM Transactions on Embedded Computing Systems, Volume 22, Issue 6

Article No.: 98, Pages 1 - 27

https://doi.org/10.1145/3617507

Published: 09 November 2023 Publication History

Abstract

We present SensiX++, a multi-tenant runtime for adaptive model execution with integrated MLOps on edge devices, e.g., a camera, a microphone, or IoT sensors. SensiX++ operates on two fundamental principles: highly modular componentisation to externalise data operations with clear abstractions and document-centric manifestation for system-wide orchestration. First, a data coordinator manages the lifecycle of sensors and serves models with correct data through automated transformations. Next, a resource-aware model server executes multiple models in isolation through model abstraction, pipeline automation, and feature sharing. An adaptive scheduler then orchestrates the best-effort executions of multiple models across heterogeneous accelerators, balancing latency and throughput. Finally, microservices with REST APIs serve synthesised model predictions, system statistics, and continuous deployment. Collectively, these components enable SensiX++ to serve multiple models efficiently with fine-grained control on edge devices while minimising data operation redundancy, managing data and device heterogeneity, and reducing resource contention. We benchmark SensiX++ with 10 different vision and acoustics models across various multi-tenant configurations on different edge accelerators (Jetson AGX and Coral TPU) designed for sensory devices. We report on the overall throughput and quantified benefits of various automation components of SensiX++ and demonstrate its efficacy in significantly reducing operational complexity and lowering the effort to deploy, upgrade, reconfigure, and serve embedded models on edge devices.

References

[1]

2021. BentoML. (2021). Retrieved August 8, 2023 from https://www.bentoml.ai

[2]

2021. Coral Keyphrase Detector. (2021). Retrieved August 8, 2023 from https://github.com/google-coral/project-keyword-spotter

[3]

2021. ElectrifAI. (2021). Retrieved August 8, 2023 from https://electrifai.net

[4]

2021. Emotion Classification. (2021). Retrieved August 8, 2023 from https://github.com/Data-Science-kosta/Speech-Emotion-Classification-with-PyTorch/

[5]

2021. KubeFlow. (2021). Retrieved August 8, 2023 from https://www.kubeflow.org

[6]

2021. LevelDB. (2021). Retrieved August 8, 2023 from https://github.com/google/leveldb

[7]

2021. Michelangelo. (2021). Retrieved August 8, 2023 from https://eng.uber.com/michelangelo-machine-learning-platform/

[8]

2021. SageMaker. (2021). Retrieved August 8, 2023 from https://aws.amazon.com/sagemaker/

[9]

2021. YAMNet. (2021). Retrieved August 8, 2023 from https://github.com/tensorflow/models/tree/master/research/audioset/yamnet

[10]

Utku Günay Acer, Marc van den Broeck, Chulhong Min, Mallesham Dasari, and Fahim Kawsar. 2022. The city as a personal assistant: Turning urban landmarks into conversational agents for serving hyper local information. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 2 (July2022), Article 40, 31 pages. DOI:

Digital Library

[11]

Mattia Antonini, Miguel Pincheira, Massimo Vecchio, and Fabio Antonelli. 2022. Tiny-MLOps: A framework for orchestrating ML applications at the far edge of IoT systems. In Proceedings of the 2022 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS’22). 1–8. DOI:

[12]

Mattia Antonini, Tran Huy Vu, Chulhong Min, Alessandro Montanari, Akhil Mathur, and Fahim Kawsar. 2019. Resource characterisation of personal-scale sensing models on edge accelerators. In Proceedings of the 1st International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (AIChallengeIoT’19). ACM, New York, NY, 49–55. DOI:

Digital Library

[13]

Tayebeh Bahreini and Daniel Grosu. 2017. Efficient placement of multi-component applications in edge computing systems. In Proceedings of the 2nd ACM/IEEE Symposium on Edge Computing (SEC’17). Association for Computing Machinery, New York, NY, Article 5, 11 pages. DOI:

Digital Library

[14]

Abu Bakar, Tousif Rahman, Alessandro Montanari, Jie Lei, Rishad Shafik, and Fahim Kawsar. 2022. Logic-based intelligence for batteryless sensors. In Proceedings of the 23rd Annual International Workshop on Mobile Computing Systems and Applications (HotMobile’22). Association for Computing Machinery, New York, NY, 22–28. DOI:

Digital Library

[15]

Abu Bakar, Tousif Rahman, Rishad Shafik, Fahim Kawsar, and Alessandro Montanari. 2022. Adaptive intelligence for batteryless sensors using software-accelerated tsetlin machines. In Proceedings of the 20th Conference on Embedded Networked Sensor Systems (SenSys’22). Association for Computing Machinery, New York, NY. DOI:

Digital Library

[16]

Sourav Bhattacharya and Nicholas D. Lane. 2016. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM (SenSys’16). Association for Computing Machinery, New York, NY, 176–189. DOI:

Digital Library

[17]

Henrik Blunck, Niels Olof Bouvin, Tobias Franke, Kaj Grønbæk, Mikkel B. Kjaergaard, Paul Lukowicz, and Markus Wüstenberg. 2013. On heterogeneity in mobile sensing applications aiming at representative data collection. In Proceedings of the 2013 ACM Conference on Pervasive and Ubiquitous Computing Adjunct Publication (UbiComp’13 Adjunct). Association for Computing Machinery, New York, NY, 1087–1098. DOI:

Digital Library

[18]

Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, and Andrew Zisserman. 2018. VGGFace2: A dataset for recognising faces across pose and age. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG’18). 67–74. DOI:

Digital Library

[19]

Daniel Crankshaw, Peter Bailis, Joseph E. Gonzalez, Haoyuan Li, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan. 2015. The missing piece in complex analytics: Low latency, scalable model management and serving with velox. In Proceedings of the 7th Biennial Conference on Innovative Data Systems Research (CIDR’15). www.cidrdb.org. http://cidrdb.org/cidr2015/Papers/CIDR15_Paper19u.pdf

[20]

Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A low-latency online prediction serving system. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI’17). USENIX Association, Boston, MA, 613–627. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/crankshaw

[21]

Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. NestDNN: Resource-aware multi-tenant on-device deep learning for continuous mobile vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom’18). Association for Computing Machinery, New York, NY, 115–127. DOI:

Digital Library

[22]

Petko Georgiev, Nicholas D. Lane, Kiran K. Rachuri, and Cecilia Mascolo. 2016. LEO: Scheduling sensor inference algorithms across heterogeneous mobile processors and network resources. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking (MobiCom’16). Association for Computing Machinery, New York, NY, 320–333. DOI:

Digital Library

[23]

Kiryong Ha, Zhuo Chen, Wenlu Hu, Wolfgang Richter, Padmanabhan Pillai, and Mahadev Satyanarayanan. 2014. Towards wearable cognitive assistance. In Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’14). ACM, New York, NY, 68–81. DOI:

Digital Library

[24]

Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. https://arxiv.org/abs/1510.00149

[25]

Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, and Arvind Krishnamurthy. 2016. MCDNN: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’16). Association for Computing Machinery, New York, NY, 123–136. DOI:

Digital Library

[26]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity Mappings in Deep Residual Networks. (2016). DOI:

[27]

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2018. Densely connected convolutional networks. https://arxiv.org/abs/1608.06993

[28]

Shawn Hymel, Colby Banbury, Daniel Situnayake, Alex Elium, Carl Ward, Mat Kelcey, Mathijs Baaijens, Mateusz Majchrzycki, Jenny Plunkett, David Tischler, Alessandro Grande, Louis Moreau, Dmitry Maslov, Artie Beavis, Jan Jongboom, and Vijay Janapa Reddi. 2022. Edge Impulse: An MLOps Platform for Tiny Machine Learning. Retrieved from

[29]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).

[30]

Joo Seong Jeong, Jingyu Lee, Donghyun Kim, Changmin Jeon, Changjin Jeong, Youngki Lee, and Byung-Gon Chun. 2022. Band: Coordinated multi-DNN inference on heterogeneous mobile processors. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services (MobiSys’22). Association for Computing Machinery, New York, NY, 235–247. DOI:

Digital Library

[31]

Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. SIGARCH Compututer Architecture. News 45, 1 (April2017), 615–629. DOI:

Digital Library

[32]

Fahim Kawsar, Chulhong Min, Akhil Mathur, and Alesandro Montanari. 2018. Earables for personal-scale behavior analytics. IEEE Pervasive Computing 17, 3 (July2018), 83–89. DOI:

Digital Library

[33]

Jong Hwan Ko, Taesik Na, Mohammad Faisal Amir, and Saibal Mukhopadhyay. 2018. Edge-host partitioning of deep neural networks with feature space encoding for resource-constrained internet-of-things platforms. In Proceedings of the 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’18). 1–6. DOI:

[34]

Nicholas D. Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. 2016. DeepX: A software accelerator for low-power deep learning inference on mobile devices. In Proceedings of the 15th International Conference on Information Processing in Sensor Networks (IPSN’16). IEEE Press, Article 23, 12 pages.

[35]

Nicholas D. Lane, Emiliano Miluzzo, Hong Lu, Daniel Peebles, Tanzeem Choudhury, and Andrew T. Campbell. 2010. A survey of mobile phone sensing. IEEE Communications Magazine 48, 9 (September2010), 140–150. DOI:

Digital Library

[36]

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convNets. https://arxiv.org/abs/1608.08710

[37]

Dawei Liang and Edison Thomaz. 2019. Audio-based activities of daily living (ADL) recognition with large-scale acoustic embeddings from online videos. Proceedings of the ACM Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 1 (March2019), Article 17, 18 pages. DOI:

Digital Library

[38]

Sicong Liu, Yingyan Lin, Zimu Zhou, Kaiming Nan, Hui Liu, and Junzhao Du. 2018. On-demand deep model compression for mobile devices: A usage-driven model selection framework. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’18). Association for Computing Machinery, New York, NY, 389–400. DOI:

Digital Library

[39]

Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2018. Rethinking the value of network pruning. https://arxiv.org/abs/1810.05270

[40]

Akhil Mathur, Anton Isopoussu, Fahim Kawsar, Nadia Berthouze, and Nicholas D. Lane. 2019. Mic2Mic: Using cycle-consistent generative adversarial networks to overcome microphone variability in speech systems. In Proceedings of the 18th International Conference on Information Processing in Sensor Networks (IPSN’19). Association for Computing Machinery, New York, NY, 169–180. DOI:

Digital Library

[41]

Akhil Mathur, Nicholas D. Lane, Sourav Bhattacharya, Aidan Boran, Claudio Forlivesi, and Fahim Kawsar. 2017. Deepeye: Resource efficient local execution of multiple deep vision models using wearable commodity hardware. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. 68–81.

Digital Library

[42]

Akhil Mathur, Tianlin Zhang, Sourav Bhattacharya, Petar Veličković, Leonid Joffe, Nicholas D. Lane, Fahim Kawsar, and Pietro Lió. 2018. Using deep data augmentation training to address software and hardware heterogeneities in wearable and smartphone sensing devices. In Proceedings of the 17th ACM/IEEE International Conference on Information Processing in Sensor Networks. IEEE Press, 200–211.

Digital Library

[43]

Chulhong Min, Akhil Mathur, Alessandro Montanari, and Fahim Kawsar. 2019. An early characterisation of wearing variability on motion signals for wearables. In Proceedings of the 23rd International Symposium on Wearable Computers (ISWC’19). ACM, New York, NY, 166–168. DOI:

Digital Library

[44]

Chulhong Min, Akhil Mathur, Alessandro Montanari, and Fahim Kawsar. 2022. SensiX: A system for best-effort inference of machine learning models in multi-device environments. IEEE Transactions on Mobile Computing 22, 9 (2022), 5525–5538. DOI:

Digital Library

[45]

Chulhong Min, Alessandro Montanari, Akhil Mathur, and Fahim Kawsar. 2019. A closer look at quality-aware runtime assessment of sensing models in multi-device environments. In Proceedings of the 17th Conference on Embedded Networked Sensor Systems (SenSys’19). ACM, New York, NY, 271–284. DOI:

Digital Library

[46]

Akshay Naresh Modi, Chiu Yuen Koo, Chuan Yu Foo, Clemens Mewald, Denis M. Baylor, Eric Breck, Heng-Tze Cheng, Jarek Wilkiewicz, Levent Koc, Lukasz Lew, Martin A. Zinkevich, Martin Wicke, Mustafa Ispir, Neoklis Polyzotis, Noah Fiedel, Salem Elie Haykal, Steven Whang, Sudip Roy, Sukriti Ramesh, Vihan Jain, Xin Zhang, and Zakaria Haque. 2017. TFX: A TensorFlow-based production-scale machine learning platform. In Proceedings of KDD 2017.

[47]

Alessandro Montanari, Mohammed Alloulah, and Fahim Kawsar. 2019. Degradable inference for energy autonomous vision applications. In Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers. 592–597.

Digital Library

[48]

Alessandro Montanari, Fredrika Kringberg, Alice Valentini, Cecilia Mascolo, and Amanda Prorok. 2018. Surveying areas in developing regions through context aware drone mobility. In Proceedings of the 4th ACM Workshop on Micro Aerial Vehicle Networks, Systems, and Applications. 27–32.

Digital Library

[49]

Alessandro Montanari, Afra Mashhadi, Akhil Mathur, and Fahim Kawsar. 2016. Understanding the privacy design space for personal connected objects. In Proceedings of the 30th International BCS Human Computer Interaction Conference 30. 1–13.

Digital Library

[50]

Alessandro Montanari, Manuja Sharma, Dainius Jenkus, Mohammed Alloulah, Lorena Qendro, and Fahim Kawsar. 2020. ePerceptive: Energy reactive embedded intelligence for batteryless sensors. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems (SenSys’20). Association for Computing Machinery, New York, NY, 382–394. DOI:

Digital Library

[51]

Arthur Moss, Hyunjong Lee, Lei Xun, Chulhong Min, Fahim Kawsar, and Alessandro Montanari. 2022. Ultra-low power DNN accelerators for IoT: Resource characterization of the MAX78000. In Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems (SenSys’22), 934–940.

[52]

Deepak Narayanan, Keshav Santhanam, Amar Phanishayee, and Matei Zaharia. 2018. Accelerating deep learning workloads through efficient multi-model execution. In NeurIPS Workshop on Systems for Machine Learning. 20.

[53]

Francisco Javier Ordóñez and Daniel Roggen. 2016. Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 16, 1 (2016). DOI:

[54]

Andrei Paleyes, Raoul-Gabriel Urma, and Neil D. Lawrence. 2020. Challenges in deploying machine learning: A survey of case studies. CoRR abs/2011.09926 (2020). arxiv:2011.09926.https://arxiv.org/abs/2011.09926

[55]

Philipp Raith and Schahram Dustdar. 2021. Edge intelligence as a service. In Proceedings of the 2021 IEEE International Conference on Services Computing (SCC’21). 252–262. DOI:

[56]

Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu-Yeon Wei, and David Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA’16). IEEE Press, 267–278. DOI:

Digital Library

[57]

Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An incremental improvement. https://arxiv.org/abs/1804.02767

[58]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2019. MobileNetV2: Inverted residuals and linear bottlenecks. https://arxiv.org/abs/1801.04381

[59]

Mahadev Satyanarayanan. 2017. The emergence of edge computing. Computer 50, 1 (2017), 30–39. DOI:

Digital Library

[60]

D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (NIPS’15). MIT Press, Cambridge, MA, 2503–2511.

Digital Library

[61]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015. Rethinking the Inception Architecture for Computer Vision. (2015). DOI:

[62]

Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS’17). 328–339. DOI:

[63]

Juheon Yi, Chulhong Min, and Fahim Kawsar. 2021. Vision paper: Towards software-defined video analytics with cross-camera collaboration. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems (SenSys’21). Association for Computing Machinery, New York, NY, 474–477. DOI:

Digital Library

[64]

Michael Zhu and Suyog Gupta. 2017. To prune, or not to prune: Exploring the efficacy of pruning for model compression. https://arxiv.org/abs/1710.01878

Index Terms

SensiX++: Bringing MLOps and Multi-tenant Model Serving to Sensory Edge Devices
1. Computer systems organization

Recommendations

Scheduler activations for interference-resilient SMP virtual machine scheduling
Middleware '17: Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference

The wide adoption of SMP virtual machines (VMs) and resource consolidation present challenges to efficiently executing multi-threaded programs in the cloud. An important problem is the semantic gaps between the guest OS and the hypervisor. The well-...
PvCC: A vCPU Scheduling Policy for DPDK-applied Systems at Multi-Tenant Edge Data Centers
MIDDLEWARE '24: Proceedings of the 25th International Middleware Conference

This paper explores a practical means to employ Data Plane Development Kit (DPDK), a kernel-bypassing framework for packet processing, in resource-limited multi-tenant edge data centers. The problem is that the traditional virtual CPU (vCPU) schedulers ...
Mitigating Multi-Tenancy Risks in IaaS Cloud Through Constraints-Driven Virtual Resource Scheduling
SACMAT '15: Proceedings of the 20th ACM Symposium on Access Control Models and Technologies

A major concern in the adoption of cloud infrastructure-as-a-service (IaaS) arises from multi-tenancy, where multiple tenants share the underlying physical infrastructure operated by a cloud service provider. A tenant could be an enterprise in the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 22, Issue 6

November 2023

428 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3632298

Editor:
Tulika Mitra
National University of Singapore, Singapore

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 09 November 2023

Online AM: 07 September 2023

Accepted: 29 July 2023

Revised: 28 July 2023

Received: 01 April 2022

Published in TECS Volume 22, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
387
Total Downloads

Downloads (Last 12 months)334
Downloads (Last 6 weeks)43

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents