Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A collaborative framework for tweaking properties in a synthetic dataset

Published: 01 August 2018 Publication History

Abstract

Researchers and developers use benchmarks to compare their algorithms and products. For database systems, a benchmark must have a dataset D. To be application-specific, this dataset D should be empirical. However, a real D may be too small, or too large, for the benchmarking experiments. Therefore, D must first be scaled to the desired size.
Previous related work typically extracts a set of properties Π = {π1, . . . , πn} from D, then use Π to generate the synthetic D~. Π may thus ensure D~ is similar to D. This approach of having some monolithic software enforce properties π1, . . . , πn becomes increasingly intractable as n increases. Our demonstration will present ASPECT, a framework that takes a different approach.
With ASPECT, there is a tool So to first scale the dataset size. The resulting D~ can then be tweaked by tools T1, . . . , Tn, where Tk enforces πk in D~.
At the demonstration, a visitor has a choice of (i) D, (ii) size scaler S0, (iii) the subset of properties to enforce, and (iv) the order of applying the tools for the chosen properties. The visitor can then see the enforcement error for each πk and the running time for each Tk.
A video of the demonstration is presented here: http://scaler.d2.comp.nus.edu.sg/

References

[1]
T. Buda, T. Cerqueus, et al. ReX: Extrapolating relational data in a representative way. In Data Science, LNCS 9147, pages 95--107. Springer, 2015.
[2]
T. S. Buda, T. Cerqueus, et al. VFDS: An application to generate fast sample databases. In CIKM, pages 2048--2050, 2014.
[3]
L. Gu, M. Zhou, Z. Zhang, et al. Chronos: An elastic parallel framework for stream benchmark generation and simulation. In ICDE, pages 101--112, 2015.
[4]
N. Patki, R. Wedge, and K. Veeramachaneni. The synthetic data vault. In DSAA, pages 399--410, Oct 2016.
[5]
M. Stonebraker. A new direction for TPC? In TPCTC, pages 11--17, 2009.
[6]
Y. C. Tay. Data generation for application-specific benchmarking. PVLDB, 4(12):1470--1473, 2011.
[7]
Y. C. Tay, B. T. Dai, et al. UpSizeR: Synthetically scaling an empirical relational database. Inf. Syst., 38(8):1168--1183, 2013.
[8]
J. W. Zhang and Y. C. Tay. Dscaler: Synthetically scaling a given relational database. PVLDB, 9(14):1671--1682, 2016.
[9]
J. W. Zhang and Y. C. Tay. A tool framework for tweaking features in synthetic datasets. https://arxiv.org/abs/1801.03645, 2018.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 11, Issue 12
August 2018
426 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2018
Published in PVLDB Volume 11, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 30
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media