Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2723372.2735360acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Demonstrating "Data Near Here": Scientific Data Search

Published: 27 May 2015 Publication History

Abstract

Prior work proposed "Data Near Here" (DNH), a data search engine for scientific archives that is modeled on Internet search engines. DNH performs a periodic, asynchronous scan of each dataset in an archive, extracting lightweight features that are combined to form a dataset summary. During a search, DNH assesses the similarity of the search terms to the summary features and returns to the user, at interactive timescales, a ranked list of datasets for further exploration and analysis. We will demonstrate the search capabilities and ancillary metadata-browsing features for an archive of observational oceanographic data. While comparing search terms to complete datasets might seem ideal, interactive search speed would be impossible with archives of realistic size. We include an analysis showing that our summary-based approach gives a reasonable approximation of such a "complete dataset" similarity measure.

References

[1]
Agrawal, R. and Srikant, R. 2003. Searching with numbers. IEEE TKDE 15, 4 (Aug. 2003), 855--870.
[2]
Batcheller, J.K. 2008. Automating geospatial metadata generation -- An integrated data management and documentation approach. Computers & Geosciences. 34, 4 (2008), 387--398.
[3]
Cafarella, M.J. et al. 2008. Webtables: exploring the power of tables on the web. Proc. of VLDB. 1, 1 (2008), 538--549.
[4]
Cornillon, P. et al. 2003. OPeNDAP: Accessing data in a distributed, heterogeneous environment. Data Science Journal. 2, 0 (2003), 164--174.
[5]
D'Ulizia, A. et al. 2009. Approximating geographical queries. Journal of Computer Science and Technology. 24, 6 (2009), 1109--1124.
[6]
Geospatial One Stop (GOS): http://gos2.geodata.gov/wps/portal/gos. Accessed: 2011-01--19.
[7]
Goodchild, M.F. and Zhou, J. 2003. Finding geographic information: Collection-level metadata. GeoInformatica. 7, 2 (2003), 95--112.
[8]
Gray, J. et al. 2005. Scientific data management in the coming decade. ACM SIGMOD Recd. 34, 4 (2005), 34--41.
[9]
Grossner, K.E. et al. 2008. Defining a digital earth system. Transactions in GIS. 12, 1 (2008), 145--160.
[10]
Hey, T. and Trefethen, A. 2003. E-Science and its implications. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences. 361, 1809 (2003), 1809.
[11]
Lord, P. and Macdonald, A. 2003. E-Science curation report: Data curation for e-Science in the UK: an audit to establish requirements for future curation and provision.
[12]
Maier, D. et al. 2014. Challenges for Dataset Search (keynote). (2014), 1--15.
[13]
Maier, D. et al. 2012. Navigating oceans of data. Scientific and Statistical Database Management (2012), 1--19.
[14]
Megler, V.M. 2014. Ranked Similarity Search of Scientific Datasets: An Information Retrieval Approach (PhD Dissertation). Portland State University.
[15]
Megler, V.M. and Maier, D. 2015. Are Datasets Like Documents?: Evaluating Similarity-Based Ranked Search Over Scientific Data. IEEE Transactions on Knowledge and Data Engineering. 27, 1 (Jan. 2015), 32--45.
[16]
Megler, V.M. and Maier, D. 2013. Data Near Here: Bringing relevant data closer to scientists. Computing in Science and Engineering. 15, 3 (2013), 44--53.
[17]
Megler, V.M. and Maier, D. 2011. Finding haystacks with needles: Ranked search for data using geospatial and temporal characteristics. Scientific and Statistical Database Management (2011), 55--72.
[18]
Megler, V.M. and Maier, D. 2013. Taming the metadata mess. 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW) (2013), 286--289.
[19]
Pallickara, S.L. et al. 2010. Efficient metadata generation to enable interactive data discovery over large-scale scientific data collections. 2nd IEEE Intnl. Conf. on Cloud Computing, Technology and Science (2010), 573--580.
[20]
Perlman, E. et al. 2007. Data exploration of turbulence simulations using a database cluster. Proceedings of the ACM/IEEE Conference on Supercomputing (2007), 1--11.
[21]
Rajasekar, A. and Moore, R. 2010. Data and metadata collections for scientific applications. High-Performance Computing and Networking (2010), 72--80.
[22]
Stolte, E. and Alonso, G. 2002. Efficient exploration of large scientific databases. Proc. of VLDB (2002), 633.
[23]
Venetis, P. et al. 2011. Recovering semantics of tables on the web. Proceedings of VLDB. 4, 9 (2011), 528--538.
[24]
Weidman, S. and Arrison, T. 2009. Steps toward large-scale data integration in the sciences: Summary of a workshop. Natnl. Res. Council of the National Academies.
[25]
Zheng, C. Personal Communication, September 26, 2014.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
May 2015
2110 pages
ISBN:9781450327589
DOI:10.1145/2723372
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ranked data search
  2. scientific data

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'15
Sponsor:
SIGMOD/PODS'15: International Conference on Management of Data
May 31 - June 4, 2015
Victoria, Melbourne, Australia

Acceptance Rates

SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 264
    Total Downloads
  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media