research-article

Analyzing workload trends for boosting triple stores performance

Authors:

Ahmed Al-Ghezi,

Lena WieseAuthors Info & Claims

Volume 125, Issue C

https://doi.org/10.1016/j.is.2024.102420

Published: 18 October 2024 Publication History

Abstract

The Resource Description Framework (RDF) is widely used to model web data. The scale and complexity of the modeled data emphasized performance challenges on the RDF-triple stores. Workload adaption is one important strategy to deal with those challenges on the storage level. Current workload-adaption approaches lack the necessary generalization of the problem and only optimize part of the storage layer with the workload (mostly the replication). This creates a big performance gap within other data structures (e.g. indexes and cache) that could heavily benefit from the same workload adaption strategy. Moreover, the workload statistics are built collectively in most of the current approaches. Thus, the analysis process is unaware of whether workloads’ items are old or recent. However, that does not simulate the temporal trends that exist naturally in user queries which causes the analysis process to lag behind the rapid workload development. We present a novel universal adaption approach to the storage management of a distributed RDF store. The system aims to find optimal data assignments to the different indexes, replications, and join cache within the limited storage space. We present a cost model based on the workload that often contains frequent patterns. The workload is dynamically and continuously analyzed to evaluate predefined rules considering the benefits and costs of all options of assigning data to the storage structures. The objective is to reduce query execution time by letting different data containers compete on the limited storage space. By modeling the workload statistics as time series, we can apply well-known smoothing techniques allowing the importance of the workload to decay over time. That allows the universal adaption to stay tuned with potential changes in the workload trends.

Highlights

•

A new approach to achieve unified workload awareness in distributed RDF triple store.

•

Our triple store dynamically sets its needs of indexes, replications, and cache

•

The algorithm puts indexes, replication, and cache in one optimization problem.

•

The importance of data triples is set by the workload and the used data structures.

•

Smoothing techniques allowed the accumulated workload to follow recent changes.

•

Multiple types of workload access rules overcame the effect of workload fluctuation.

References

[1]

Aluc G., Özsu M.T., Daudjee K., Workload matters: Why RDF databases need a new design, Proc. VLDB Endow. 7 (10) (2014) 837–840,. URL http://www.vldb.org/pvldb/vol7/p837-aluc.pdf.

Abstract

Highlights

References

Index Terms

Recommendations

Analysing Workload Trends for Boosting Triple Stores Performance

Universal Storage Adaption for Distributed RDF-Triple Stores

Advocating for Key-Value Stores with Workload Pattern Aware Dynamic Compaction

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations