Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3459637.3481898acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

CADRE: A Cloud-Based Data Service for Big Bibliographic Data

Published: 30 October 2021 Publication History

Abstract

Large bibliographic data sets hold the promise of revolutionizing the scientific enterprise when combined with state-of-the-science computational capabilities. Providing high-quality data services for large network datasets such as the Microsoft Academic Graph, which contains more than two billion citation links, poses significant difficulties for universities. Data systems based on the property graph model are capable of delivering efficient graph query services for large networks. However, real-life queries often combine multiple types of data models. To satisfy the needs of different user groups, we developed and deployed a cloud-based data system consisting of scalable graph and text-indexed query engines. For non-expert users, the property graph model also presents a technological barrier. To alleviate the steep learning curve, we designed an intuitive graphical user interface for query-building. For advanced users, a scalable notebook service in our platform provides a more flexible computing environments where the query results can be further analyzed. These systems form the data-backbone of the Collaborative Archive and Data Research Environment (CADRE), which provides efficient and high-quality bibliographic data services to eleven large public universities in North America.

References

[1]
Yadu N. Babuji, Kyle Chard, Aaron Gerow, and Eamon Duede. 2016. Cloud Kotta: Enabling secure and scalable data analytics in the cloud. 2016 IEEE International Conference on Big Data (Big Data) (2016), 302--310.
[2]
Caroline Birkle, David A. Pendlebury, Joshua Schnell, and Jonathan Adams. 2020. Web of Science as a data source for research on scientific and scholarly activity. Quantitative Science Studies 1, 1 (02 2020), 363-- 37https://doi.org/10.1162/qss_a_00018 arXiv:https://direct.mit.edu/qss/article-pdf/1/1/363/1760864/qss_a_00018.pdf
[3]
Bitnine. 2021. AgensGraph Home Page. Retrieved May 24, 2021 from https://bitnine.net/agensgraph/
[4]
National Science Board. 2005. Long-Lived Digital Data Collections: Enabling Re-search and Education in the 21st Century. Technical Report NSB-05--40. Arlington, VA, USA. https://www.nsf.gov/pubs/2005/nsb0540/nsb0540.pdf
[5]
Elasticsearch. 2021. Elasticsearch Homepage. Retrieved May 24, 2021 from https://www.elastic.co/elasticsearch/
[6]
Santo Fortunato, Carl T. Bergstrom, Katy Börner, James A. Evans, Dirk Helbing, Alexander M. Petersen, Filippo Radic-chi, Roberta Sinatra, Brian Uzzi, Alessandro Vespignani, Ludo Walt-man, Dashun Wang, and Albert-László Barabási. 2018. Science of sci-ence. Science 359, 6379 (2018). https://doi.org/10.1126/science.aao0185 arXiv:https://science.sciencemag.org/content/359/6379/eaao0185.full.pdf
[7]
Ian Foster, Yong Zhao, Ioan Raicu, and Shiyong Lu. 2008. Cloud Computing and Grid Computing 360-Degree Compared. In 2008 Grid Computing Environments Workshop. 1--10. https://doi.org/10.1109/GCE.2008.4738445
[8]
Apache Software Foundation. 2021. Apache Solr Homepage. Retrieved May 24, 2021 from https://solr.apache.org/
[9]
Apache Software Foundation. 2021. Apache TinkerPop. Retrieved May 13, 2021 from https://tinkerpop.apache.org
[10]
Lars George. 2011. HBase: the definitive guide: random access to your planet-size data. " O'Reilly Media, Inc.".
[11]
http://solid it.at/. 2021. DB-Engines Ranking - popularity ranking of database management systems. Retrieved May 25, 2021 from https://db-engines.com/en/ranking
[12]
JanusGraph. 2021. JanusGraph Homepage. Retrieved May 24, 2021 from https://janusgraph.org/
[13]
Avinash Lakshman and Prashant Malik. 2010. Cassandra: a decentralized struc-tured storage system. ACM SIGOPS Operating Systems Review 44, 2 (2010), 35--40.
[14]
Julia Lane and Stephanie Shipp. 2007. Using a Remote Access Data Enclave for Data Dissemination. The International Journal of Digital Curation 2, 1 (jul 2007), 128--134. https://doi.org/10.2218/ijdc.v2i1.20
[15]
Katherine A. Lawrence, Michael Zentner, Nancy Wilkins-Diehr, Julie A. Wernert, Marlon Pierce, Suresh Marru, and Scott Michael. 2015. Science gateways today and tomorrow: positive perspectives of nearly 5000 mem-bers of the research community. Concurrency and Computation: Practice and Experience 27, 16 (2015), 4252--4268. https://doi.org/10.1002/cpe.3526 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.3526
[16]
Memgraph. 2021. Memgraph Homepage. Retrieved May 24, 2021 from https://memgraph.com/
[17]
Neo4j. 2021. Neo4j Homepage. Retrieved May 24, 2021 from https://neo4j.com/
[18]
Neo4j. 2021. Neo4j Supproted Graph Algorithms. Retrieved May 24, 2021 from https://neo4j.com/docs/graph-algorithms/current/introduction/
[19]
The United States Patent and Trademark Office. 2013. The USPTO Patent and Trademark Bulk Data. Retrieved May 24, 2021 from https://www.uspto.gov/learning-and-resources/bulk-data-products
[20]
Beth Plale. 2013. Big Data Opportunities and Challenges for IR, Text Mining and NLP. In Proceedings of the 2013 International Workshop on Mining Unstructured Big Data Using Natural Language Processing (San Francisco, California, USA)(UnstructureNLP '13). Association for Computing Machinery, New York, NY, USA, 1--2. https://doi.org/10.1145/2513549.2514739
[21]
RedisGraph. 2021. RedigGraph Home Page. Retrieved May 24, 2021 from https://oss.redislabs.com/redisgraph/
[22]
Marko A. Rodriguez. 2015. The Gremlin Graph Traversal Machine and Language. CoRR abs/1508.03843 (2015). arXiv:1508.03843 http://arxiv.org/abs/1508.03843
[23]
Amazon Web Services. 2021. Amazon Elastic Beanstalk. Retrieved May 18, 2021 from https://aws.amazon.com/sqs/
[24]
Amazon Web Services. 2021. Amazon Simple Queue Service. Retrieved May 13, 2021 from https://aws.amazon.com/sqs/
[25]
TigerGraph. 2021. TigerGraph Homepage. Retrieved May 24, 2021 from https://www.tigergraph.com/
[26]
Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Darrin Eide, Yuxiao Dong, Junjie Qian, Anshul Kanakia, Alvin Chen, and Richard Rogahn. 2019. A Review of Microsoft Academic Services for Science of Science Studies. Frontiers in Big Data 2 (2019), 45. https://doi.org/10.3389/fdata.2019.00045
[27]
Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In Proceedings of the 5th European Conference on Computer Systems (Paris, France) (EuroSys '10). As-sociation for Computing Machinery, New York, NY, USA, 265--278. https://doi.org/10.1145/1755913.1755940
[28]
Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. 2014. Cloud Computing Data Capsules for Non-Consumptiveuse of Texts. In Proceedings of the 5th ACM Workshop on Scientific Cloud Computing (Vancouver, BC, Canada) (ScienceCloud '14). Association for Computing Machinery, New York, NY, USA, 9--16. https://doi.org/10.1145/2608029.2608031

Cited By

View all
  • (2024)A Study on the Application of Cloud-Based Database Construction in Language SubjectProceedings of the 3rd International Conference on Cognitive Based Information Processing and Applications–Volume 110.1007/978-981-97-1975-4_1(1-12)Online publication date: 2-Jun-2024
  • (2023)Scholarly Data Share 2.0: Granular Access to Research DataPractice and Experience in Advanced Research Computing 2023: Computing for the Common Good10.1145/3569951.3597585(177-180)Online publication date: 23-Jul-2023
  • (2022)Scholarly Data Share: A Model for Sharing Big Data in Academic ResearchPractice and Experience in Advanced Research Computing 2022: Revolutionary: Computing, Connections, You10.1145/3491418.3530297(1-8)Online publication date: 8-Jul-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud platform
  2. computational reproducibility
  3. data sharing
  4. graph database
  5. science gateway

Qualifiers

  • Research-article

Funding Sources

  • Institute for Museum and Library Services

Conference

CIKM '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Study on the Application of Cloud-Based Database Construction in Language SubjectProceedings of the 3rd International Conference on Cognitive Based Information Processing and Applications–Volume 110.1007/978-981-97-1975-4_1(1-12)Online publication date: 2-Jun-2024
  • (2023)Scholarly Data Share 2.0: Granular Access to Research DataPractice and Experience in Advanced Research Computing 2023: Computing for the Common Good10.1145/3569951.3597585(177-180)Online publication date: 23-Jul-2023
  • (2022)Scholarly Data Share: A Model for Sharing Big Data in Academic ResearchPractice and Experience in Advanced Research Computing 2022: Revolutionary: Computing, Connections, You10.1145/3491418.3530297(1-8)Online publication date: 8-Jul-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media