Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Data Management Systems Research at TU Berlin

Published: 17 May 2019 Publication History

Abstract

Data management systems research at TU Berlin is spearheaded by the Database Systems and Information Management (DIMA) Group, the Big Data Management (Big- DaMa) Group, as well as the affiliated Intelligent Analytics for Massive Data (IAM) Research Group at the German Research Center for Artificial Intelligence (DFKI). Jointly, our research activities encompass a wide variety of database topics, including benchmarking, data integration, modern hardware, and scalable data processing.
As of Fall 2018, the current team is comprised of three university professors, thirteen senior and postdoc researchers, twenty PhD students, and several research assistants. Among our notable accomplishments is the DFG-funded Stratosphere Research Unit, which laid the groundwork for what would later become Apache Flink. DIMA has also been leading the Berlin Big Data Center, one of only two BMBF-funded Big Data Competence Centers in Germany since 2014. In addition, DIMA is co-directing the Berlin Center for Machine Learning, one of four BMBF-funded Machine Learning Competence Centers in Germany.

References

[1]
Official apache flink website. https://flink.apache.org/.
[2]
Official website of the emma language. https://github.com/emmalanguage/emma.
[3]
D. Abadi, R. Agrawal, et al. The beckman report on database research. Communications of the ACM, 59(2):92--99, Jan. 2016.
[4]
Z. Abedjan, X. Chu, D. Deng, R. C. Fernandez, I. F. Ilyas, M. Ouzzani, P. Papotti, M. Stonebraker, and N. Tang. Detecting data errors: Where are we and what needs to be done? PVLDB, 9(12):993--1004, Aug. 2016.
[5]
Z. Abedjan, L. Golab, and F. Naumann. Profiling relational data: a survey. VLDBJ, 24(4):557--581, 2015.
[6]
A. Alexandrov et al. The stratosphere platform for big data analytics. VLDBJ, 23(6):939--964, 2014.
[7]
A. Alexandrov, A. Kunft, A. Katsifodimos, F. Schüler, L. Thamsen, O. Kao, T. Herb, and V. Markl. Implicit parallelism through deep language embedding. In SIGMOD, pages 47--61, 2015.
[8]
D. Battré, S. Ewen, F. Hueske, O. Kao, V. Markl, and D. Warneke. Nephele/pacts: a programming model and execution framework for web-scale analytical processing. In Proceedings of the ACM Symposium on Cloud Computing (SoCC), pages 119--130, 2010.
[9]
T. Behrens, V. Rosenfeld, J. Traub, S. Breß, and V. Markl. Efficient SIMD vectorization for hashing in opencl. In EDBT, pages 489--492, 2018.
[10]
C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. The end of slow networks: It's time for a redesign. PVLDB, 9(7):528--539, 2016.
[11]
C. Boden, A. Alexandrov, A. Kunft, T. Rabl, and V. Markl. PEEL: A framework for benchmarking distributed systems and algorithms. In TPCTC, pages 9--24, 2017.
[12]
C. Boden, T. Rabl, and V. Markl. Distributed machine learning - but at what cost? In Workshop on ML Systems at NIPS, 2017.
[13]
C. Boden, A. Spina, T. Rabl, and V. Markl. Benchmarking data flow systems for scalable machine learning. In Proceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR@SIGMOD 2017, Chicago, IL, USA, May 19, 2017, pages 5:1--5:10, 2017.
[14]
S. Borkar and A. Chien. The future of microprocessors. Communications of the ACM, 54(5):67--77, 2011.
[15]
S. Breß et al. Generating custom code for efficient query execution on heterogeneous processors. VLDBJ, 2018.
[16]
S. Breß, H. Funke, and J. Teubner. Robust query processing in co-processor-accelerated databases. In SIGMOD, pages 1891--1906, 2016.
[17]
P. Cao, B. Gowda, S. Lakshmi, C. Narasimhadevara, P. Nguyen, J. Poelman, M. Poess, and T. Rabl. From bigbench to tpcx-bb: Standardization of a big data benchmark. In Performance Evaluation and Benchmarking. Traditional - Big Data - Interest of Things - 8th TPC Technology Conference, TPCTC 2016, New Delhi, India, September 5--9, 2016, Revised Selected Papers, pages 24--44, 2016.
[18]
P. Carbone et al. Apache flinkTM: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38(4):28--38, 2015.
[19]
P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, and V. Markl. Cutty: Aggregate sharing for user-defined windows. In CIKM, pages 1201--1210, 2016.
[20]
D. Deng, R. C. Fernandez, Z. Abedjan, S. Wang, M. Stonebraker, A. Elmagarmid, I. Ilyas, S. Madden, M. Ouzzani, and N. Tang. The data civilizer system. In Proceedings of the Conference on Innovative Data Systems Research (CIDR), 2017.
[21]
D. Deng, W. Tao, Z. Abedjan, A. K. Elmagarmid, I. F. Ilyas, S. Madden, M. Ouzzani, M. Stonebraker, and N. Tang. Unsupervised string transformation learning for entity consolidation. In ICDE, 2019.
[22]
Esmaeilzadeh et al. Dark silicon and the end of multicore scaling. In ISCA, pages 365--376, 2011.
[23]
S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl. Spinning fast iterative data flows. PVLDB, 5(11):1268--1279, 2012.
[24]
R. C. Fernandez, Z. Abedjan, F. Koko, G. Yuan, S. Madden, and M. Stonebraker. Aurum: A data discovery system. In ICDE, 2018.
[25]
R. C. Fernandez, Z. Abedjan, S. Madden, and M. Stonebraker. Towards large-scale data discovery: position paper. In International Workshop on Exploratory Search in Databases and the Web (ExploreDB), pages 3--5, 2016.
[26]
R. C. Fernandez, D. Deng, E. Mansour, A. A. Qahtan, W. Tao, Z. Abedjan, A. K. Elmagarmid, I. F. Ilyas, S. Madden, M. Ouzzani, M. Stonebraker, and N. Tang. A demo of the data civilizer system. In SIGMOD, pages 1639--1642, 2017.
[27]
H. Funke, S. Breß, S. Noll, V. Markl, and J. Teubner. Pipelined query processing in coprocessor environments. In SIGMOD, 2018.
[28]
P. M. Grulich, R. Saitenmacher, J. Traub, S. Breß, T. Rabl, and V. Markl. Scalable detection of concept drifts on data streams with parallel adaptive windowing. In EDBT, pages 477--480, 2018.
[29]
M. Heimel, M. Kiefer, and V. Markl. Self-tuning, gpu-accelerated kernel density models for multidimensional selectivity estimation. In SIGMOD, pages 1477--1492, 2015.
[30]
M. Heimel and V. Markl. A first step towards gpu-assisted query optimization. In ADMS@VLDB, pages 33--44, 2012.
[31]
M. Heimel, M. Saecker, H. Pirk, S. Manegold, and V. Markl. Hardware-oblivious parallelism for in-memory column-stores. PVLDB, 6(9):709--720, 2013.
[32]
F. Hueske, M. Peters, M. Sax, A. Rheinländer, R. Bergmann, A. Krettek, and K. Tzoumas. Opening the black boxes in data flow optimization. PVLDB, 5(11):1256--1267, 2012.
[33]
X. Jiang. Unified engine for data processing and ai. https://berlin-2018.flink-forward.org/ conference-program/ #unified-engine-for-data-processing-and-ai. Alibaba, Flink Forward, Berlin, 2018.
[34]
U. Jugel, Z. Jerzak, G. Hackenbroich, and V. Markl. M4: A visualization-oriented time series data aggregation. PVLDB, 7(10):797--808, 2014.
[35]
U. Jugel, Z. Jerzak, G. Hackenbroich, and V. Markl. Vdda: Automatic visualization-driven data aggregation in relational databases. VLDBJ, 25(1):53--77, 2016.
[36]
J. Karimov, T. Rabl, A. Katsifodimos, R. Samarev, H. Heiskanen, and V. Markl. Benchmarking distributed stream processing engines. In ICDE, 2018.
[37]
M. Kiefer, M. Heimel, S. Breß, and V. Markl. Estimating join selectivities using bandwidth-optimized kernel density models. PVLDB, 10(13):2085--2096, 2017.
[38]
A. Kunft, A. Alexandrov, A. Katsifodimos, and V. Markl. Bridging the gap: towards optimization across linear and relational algebra. In BeyondMR@SIGMOD 2016, page 1, 2016.
[39]
A. Kunft, A. Katsifodimos, S. Schelter, T. Rabl, and V. Markl. Blockjoin: Efficient matrix partitioning through joins. PVLDB, 10(13):2061--2072, 2017.
[40]
A. Kunft, L. Stadler, D. Bonetta, C. Basca, J. Meiners, S. Breß, T. Rabl, J. J. Fumero, and V. Markl. Scootr: Scaling R dataframes on dataflow systems. In SOCC, pages 288--300, 2018.
[41]
C. Lutz, S. Breß, T. Rabl, S. Zeuch, and V. Markl. Efficient and scalable k-means on GPUs. Datenbank-Spektrum, 2018.
[42]
V. Markl. Breaking the chains: On declarative data analysis and data independence in the big data era. PVLDB, 7(13):1730--1733, Aug. 2014.
[43]
Öykü Özlem Çakal, M. Mahdavi, and Z. Abedjan. Clrl: Feature engineering forcross-language record linkage. In EDBT, 2019.
[44]
M. Poess, R. Nambiar, C. Narasimhadevara, K. Kulkarni, T. Rabl, and H.-A. Jacobsen. Analysis of tpcx-iot: The first industry standard benchmark for iot gateway systems. In ICDE, 2018.
[45]
M. Poess, T. Rabl, and H. Jacobsen. Analysis of TPC-DS: the first standard benchmark for sql-based big data systems. In SOCC, pages 573--585, 2017.
[46]
V. Rosenfeld, M. Heimel, C. Viebig, and V. Markl. The operator variant selection problem on heterogeneous hardware. In ADMS@VLDB. VLDB Endowment, 2015.
[47]
B. Raducanu et al. Micro adaptivity in Vectorwise. In SIGMOD, pages 1231--1242, 2013.
[48]
S. Schelter, S. Ewen, K. Tzoumas, and V. Markl. "all roads lead to rome": optimistic recovery for distributed iterative data processing. In CIKM, pages 1919--1928, 2013.
[49]
Q.-C. To, J. Soto, and V. Markl. A survey of state management in big data processing systems. VLDBJ, Aug 2018.
[50]
J. Traub, S. Breß, T. Rabl, A. Katsifodimos, and V. Markl. Optimized on-demand data streaming from sensor nodes. In SoCC, pages 586--597, 2017.
[51]
J. Traub, N. Steenbergen, P. Grulich, T. Rabl, and V. Markl. I2: Interactive real-time visualization for streaming data. In EDBT, pages 526--529, 2017.
[52]
L. Visengeriyeva and Z. Abedjan. Metadata-driven error detection. In SSDBM, 2018.
[53]
S. Zeuch, B. Del Monte, J. Karimov, C. Lutz, M. Renz, J. Traub, S. Breß, T. Rabl, and V. Markl. Analyzing efficient stream processing on modern hardware. PVLDB, 12(5):516--530, 2018.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 47, Issue 4
December 2018
34 pages
ISSN:0163-5808
DOI:10.1145/3335409
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2019
Published in SIGMOD Volume 47, Issue 4

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 154
    Total Downloads
  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media