Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
Harmonizing ML and Databases: A Symphony of Data (VLDB 2024 Keynote)
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Page 4556https://doi.org/10.14778/3685800.3685918Large language models (LLMs) are rapidly transforming the landscape of computing and daily life, demonstrating immense potential across diverse applications like natural language processing, machine translation, and code generation. This talk delves into ...
- research-articleNovember 2024
Reimagining Deep Learning Systems through the Lens of Data Systems
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4531–4535https://doi.org/10.14778/3685800.3685914The high-profile success of Deep Learning (DL) at Big Tech companies, including recent Large Language Models (LLMs) such as the GPT and Llama families, has led to high demand among Web companies, consumer app companies, enterprises, healthcare, domain ...
- research-articleNovember 2024
OFL-W3: A One-Shot Federated Learning System on Web 3.0
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4461–4464https://doi.org/10.14778/3685800.3685900Federated Learning (FL) addresses the challenges posed by data silos, which arise from privacy, security regulations, and ownership concerns. Despite these barriers, FL enables these isolated data repositories to participate in collaborative learning ...
A Demonstration of TENDS: Time Series Management System Based on Model Selection
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4357–4360https://doi.org/10.14778/3685800.3685874The growth in sensor technologies, IoT devices, and information systems has opened up new opportunities for managing time series data across various domains. Despite significant progress, existing time series management systems face two crucial ...
MLOS in Action: Bridging the Gap Between Experimentation and Auto-Tuning in the Cloud
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4269–4272https://doi.org/10.14778/3685800.3685852This paper presents MLOS (ML Optimized Systems), a flexible framework that bridges the gap between benchmarking, experimentation, and optimization of software systems. It allows users to create one-click benchmarking and experimentation scenarios for ...
-
BFTGym: An Interactive Playground for BFT Protocols
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4261–4264https://doi.org/10.14778/3685800.3685850Byzantine Fault Tolerant (BFT) protocols serve as a fundamental yet intricate component of distributed data management systems in untrustworthy environments. BFT protocols exhibit different design principles and performance characteristics under varying ...
- research-articleNovember 2024
Workload Placement on Heterogeneous CPU-GPU Systems
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4241–4244https://doi.org/10.14778/3685800.3685845The popularity of heterogeneous CPU-GPU processing has increased considerably in recent years. To efficiently utilize heterogeneous resources, data processing systems depend on an appropriate workload placement strategy to assign the right amount of ...
- research-articleNovember 2024
Efficient Training of Graph Neural Networks on Large Graphs
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4237–4240https://doi.org/10.14778/3685800.3685844Graph Neural Networks (GNNs) have gained significant popularity for learning representations of graph-structured data. Mainstream GNNs employ the message passing scheme that iteratively propagates information between connected nodes through edges. ...
Consensus in Data Management: With Use Cases in Edge-Cloud and Blockchain Systems
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4233–4236https://doi.org/10.14778/3685800.3685843Consensus is a fundamental problem in distributed systems, involving the challenge of achieving agreement among distributed nodes. It plays a critical role in various distributed data management problems. This tutorial aims to provide a comprehensive ...
- research-articleNovember 2024
Native Distributed Databases: Problems, Challenges and Opportunities
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4217–4220https://doi.org/10.14778/3685800.3685839Native distributed databases, crucial for scalable applications, offer transactional and analytical prowess but face data intricacies and network challenges. Under the CAP theorem's constraints, latency and replication issues necessitate creative ...
DLRover-RM: Resource Optimization for Deep Recommendation Models Training in the Cloud
- Qinlong Wang,
- Tingfeng Lan,
- Yinghao Tang,
- Bo Sang,
- Ziling Huang,
- Yiheng Du,
- Haitao Zhang,
- Jian Sha,
- Hui Lu,
- Yuanchun Zhou,
- Ke Zhang,
- Mingjie Tang
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4130–4144https://doi.org/10.14778/3685800.3685832Deep learning recommendation models (DLRM) rely on large embedding tables to manage categorical sparse features. Expanding such embedding tables can significantly enhance model performance, but at the cost of increased GPU/CPU/memory usage. Meanwhile, ...
- research-articleNovember 2024
OptScaler: A Collaborative Framework for Robust Autoscaling in the Cloud
- Ding Zou,
- Wei Lu,
- Zhibo Zhu,
- Xingyu Lu,
- Jun Zhou,
- Xiaojin Wang,
- Kangyu Liu,
- Kefan Wang,
- Renen Sun,
- Haiqing Wang
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4090–4103https://doi.org/10.14778/3685800.3685829Autoscaling is a critical mechanism in cloud computing, enabling the autonomous adjustment of computing resources in response to dynamic workloads. This is particularly valuable for co-located, long-running applications with diverse workload patterns. ...
- research-articleNovember 2024
Resource Management in Aurora Serverless
- Bradley Barnhart,
- Marc Brooker,
- Daniil Chinenkov,
- Tony Hooper,
- Jihoun Im,
- Prakash Chandra Jha,
- Tim Kraska,
- Ashok Kurakula,
- Alexey Kuznetsov,
- Grant McAlister,
- Arjun Muthukrishnan,
- Aravinthan Narayanan,
- Douglas Terry,
- Bhuvan Urgaonkar,
- Jiaming Yan
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4038–4050https://doi.org/10.14778/3685800.3685825Amazon Aurora Serverless is an on-demand, autoscaling configuration for Amazon Aurora with full MySQL and PostgreSQL compatibility. It automatically offers capacity scale-up/down (i.e., vertical scaling) based on a customer database application's needs. ...
- research-articleNovember 2024
X-Stor: A Cloud-Native NoSQL Database Service with Multi-Model Support
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4025–4037https://doi.org/10.14778/3685800.3685824In recent years at Tencent, we have observed that the use of multiple NoSQL databases for storing business data with diverse models has led to increased programming and deployment costs, as well as inefficient maintenance and underutilized resources. In ...
- research-articleNovember 2024
Towards Millions of Database Transmission Services in the Cloud
- Hua Fan,
- Dachao Fu,
- Xu Wang,
- Jiachi Zhang,
- Chaoji Zuo,
- Zhengyi Wu,
- Miao Zhang,
- Kang Yuan,
- Xizi Ni,
- Guocheng Huo,
- Wenchao Zhou,
- Feifei Li,
- Jingren Zhou
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4001–4013https://doi.org/10.14778/3685800.3685822Alibaba relies on its robust database infrastructure to facilitate realtime data access and ensure business continuity despite regional disruptions. To address these operational imperatives, Alibaba developed the Data Transmission Service (DTS), which ...
- research-articleNovember 2024
Transparent Migration from Datastore to Firestore
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 3960–3972https://doi.org/10.14778/3685800.3685819Datastore was one of Google's first cloud databases, launched initially as part of App Engine, and built over Google's internal Megastore database system. Firestore was launched in 2019, both a re-implementation of Datastore over Google's Spanner ...
- research-articleNovember 2024
ResLake: Towards Minimum Job Latency and Balanced Resource Utilization in Geo-Distributed Job Scheduling
- Xinchun Zhang,
- Aqsa Kashaf,
- Yihan Zou,
- Wei Zhang,
- Weibo Liao,
- Haoxiang Song,
- Jintao Ye,
- Yakun Li,
- Rui Shi,
- Yong Tian,
- Wei Feng,
- Binbin Chen,
- Zuzhi Chen,
- Tieying Zhang,
- Yongping Tang
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 3934–3946https://doi.org/10.14778/3685800.3685817At internet scale companies like ByteDance, data is generated and consumed at enormously high speed by many different applications. Achieving low latency on such big data jobs is an important problem. However, the naive approach of aggregating all the ...
- research-articleNovember 2024
A Flexible Forecasting Stack
- Tim Januschowski,
- Yuyang Wang,
- Jan Gasthaus,
- Syama Rangapuram,
- Caner Türkmen,
- Jasper Zschiegner,
- Lorenzo Stella,
- Michael Bohlke-Schneider,
- Danielle Maddix,
- Konstantinos Benidis,
- Alexander Alexandrov,
- Christos Faloutsos,
- Sebastian Schelter
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 3883–3892https://doi.org/10.14778/3685800.3685813Forecasting extrapolates the values of a time series into the future, and is crucial to optimize core operations for many businesses and organizations. Building machine learning (ML)-based forecasting applications presents a challenge though, due to non-...
- research-articleNovember 2024
TDSQL: Tencent Distributed Database System
- Yuxing Chen,
- Anqun Pan,
- Hailin Lei,
- Anda Ye,
- Shuo Han,
- Yan Tang,
- Wei Lu,
- Yunpeng Chai,
- Feng Zhang,
- Xiaoyong Du
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 3869–3882https://doi.org/10.14778/3685800.3685812Distributed databases have become indispensable in contemporary computing and data processing, owing to their pivotal role in ensuring high availability and scalability. They effectively cater to the requirements of data management and high-concurrency ...
Towards Resource Efficiency: Practical Insights into Large-Scale Spark Workloads at ByteDance
- Yixin Wu,
- Xiuqi Huang,
- Zhongjia Wei,
- Hang Cheng,
- Chaohui Xin,
- Zuzhi Chen,
- Binbin Chen,
- Yufei Wu,
- Hao Wang,
- Tieying Zhang,
- Rui Shi,
- Xiaofeng Gao,
- Yuming Liang,
- Pengwei Zhao,
- Guihai Chen
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 3759–3771https://doi.org/10.14778/3685800.3685804At ByteDance, where we execute over a million Spark jobs and handle 500PB of shuffled data daily, ensuring resource efficiency is paramount for cost savings. However, achieving optimization of resource efficiency in large-scale production environments ...