Computer Science > Programming Languages

arXiv:2311.02781 (cs)

[Submitted on 5 Nov 2023]

Title:Architecting Intermediate Layers for Efficient Composition of Data Management and Machine Learning Systems

Authors:Supun Abeysinghe, Fei Wang, Gregory Essertel, Tiark Rompf

View PDF

Abstract:Modern data analytics workloads combine relational data processing with machine learning (ML). Most DBMS handle these workloads by offloading these ML operations to external specialized ML systems. While both DBMS and ML systems go to great lengths to optimize performance for their specific workloads, significant performance is lost when used in combination, due to data movement across system boundaries, conversions between incompatible internal data formats, and the lack of cross system optimizations.
A key idea to remove these bottlenecks is to integrate existing data manipulation systems with ML systems by building a common intermediate layer (IR). Although this idea has been explored before (Weld, Delite), previous such attempts require significant re-engineering of prior systems and still fall short in achieving best-of-breed performance for individual tasks (e.g., SQL, Deep Learning). Specifically, they rely on re-implementing existing systems using a generic set of operators and fail to match best-of-breed individual performance due to the inability to recover high-level optimizations from this generic IR through compiler analysis.
We present Flern, the first intermediate-layer integration between DB and ML systems that are best-of-breed individually, competitive with the best compiled query engines such as HyPer on comprehensive relational benchmarks (TPC-H) and competitive with TensorFlow and PyTorch in state-of-the-art ML models (e.g., DeepSpeech, SqueezeNet, Transformers) and also represents a new state-of-the-art for integration. A key realization is to architect intermediate layers based on generative programming capabilities, which preserves high-level contextual information for cross optimizations and enables the construction of a variety of complex structures and cross system optimizations with minimal effort.

Subjects:	Programming Languages (cs.PL)
Cite as:	arXiv:2311.02781 [cs.PL]
	(or arXiv:2311.02781v1 [cs.PL] for this version)
	https://doi.org/10.48550/arXiv.2311.02781

Submission history

From: Supun Abeysinghe [view email]
[v1] Sun, 5 Nov 2023 22:04:23 UTC (283 KB)

Computer Science > Programming Languages

Title:Architecting Intermediate Layers for Efficient Composition of Data Management and Machine Learning Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Programming Languages

Title:Architecting Intermediate Layers for Efficient Composition of Data Management and Machine Learning Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators