Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3041021.3053060acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

BD2K ERuDIte: the Educational Resource Discovery Index for Data Science

Published: 03 April 2017 Publication History

Abstract

The field of data science has developed over the years to enable the efficient integration and analysis of the increasingly large amounts of data being generated across many domains, ranging from social media, to sensor networks, to scientific experiments. Numerous subfields of biology and medicine, such as genetics, neuroimaging, and mobile health, are witnessing a data explosion that promises to revolutionize biomedical science by yielding novel insights and discoveries. To address the challenges posed by biomedical big data, the National Institutes of Health (NIH) launched the Big Data to Knowledge (BD2K) initiative (datascience.nih.gov). An important component of this effort is the training of biomedical researchers. To this end, the NIH has funded the BD2K Training Coordinating Center (TCC). A core activity of the BD2K TCC is to develop a web portal (bigdatau.org) to provide personalized training in data science to biomedical researchers.
In this paper, we describe our approach and initial efforts in constructing ERuDIte, the Educational Resource Discovery Index for Data Science, which powers the BD2K TCC web portal. ERuDIte harvests a wealth of resources available online for learning data science, both for beginners and experts, including massive open online courses (MOOCs), videos of tutorials and research talks presented at conferences, textbooks, blog posts, and standalone web pages. Though the potential volume of resources is exciting, these online learning materials are highly heterogeneous in quality, difficulty, format, and topic. As a result, this mix of content makes the field intimidating to enter and difficult to navigate. Moreover, data science is a rapidly evolving field, so there is a constant influx of new materials and concepts. ERuDIte leverages data science techniques to build the data science index. This paper describes how ERuDIte uses data extraction, data integration, machine learning, information retrieval, and natural language processing techniques to automatically collect, integrate, describe and organize existing online resources for learning data science.

References

[1]
D. Chen and C. D. Manning. A fast and accurate dependency parser using neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processingrm (EMNLP), 2014.
[2]
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science (1986--1998), 41(6):391, 09 1990.
[3]
J. Gordon, L. Zhu, A. Galstyan, P. Natarajan, and G. Burns. Modeling concept dependencies in a scientific corpus. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguisticsrm (ACL), pages 866--75. Association for Computational Linguistics, Aug. 2016.
[4]
T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web. Morgan & Claypool Publishers, 2011.
[5]
Y. Liu, Z. Huang, Y. Yan, and Y. Chen. Science navigation map: An interactive data mining tool for literature analysis. In Proceedings of the 24th International Conference on World Wide Web, WWW '15 Companion, pages 591--6, New York, NY, USA, 2015. ACM.
[6]
P. McQuilton, A. Gonzalez-Beltran, P. Rocca-Serra, M. Thurston, A. Lister, E. Maguire, and S.-A. Sansone. BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences. Database: the journal of biological databases and curation, 2016.
[7]
L. Ohno-Machado. NIH's big data to knowledge initiative and the advancement of biomedical informatics. Journal of the American Medical Informatics Associationrm (JAMIA), 193, 2014.
[8]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12:2825--2830, 2011.
[9]
A. Plangprasopchok, K. Lerman, and L. Getoor. A probabilistic approach for learning folksonomies from structured data. In Proceedings of the 4th ACM Web Search and Data Mining Conferencerm (WSDM\rm ), Feb. 2011.
[10]
F. Shahnaz, M. W. Berry, V. Pauca, and R. J. Plemmons. Document clustering using nonnegative matrix factorization. Information Processing & Management, 42(2):373--386, 2006.
[11]
M. Taheriyan, C. A. Knoblock, P. Szekely, and J. L. Ambite. Semi-automatically modeling web APIs to create linked APIs. In Proceedings of the ESWC 2012 Workshop on Linked APIs, 2012.
[12]
E. M. Talley, D. Newman, D. Mimno, B. W. Herr, H. M. Wallach, G. A. P. C. Burns, A. G. M. Leenders, and A. McCallum. Database of NIH grants using machine-learned categories and graphical clustering. Nat. Meth., 8(6):443--4, June 2011.
[13]
L. van der Maaten and G. Hinton. Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Research, 9:2579--605, Nov. 2008.
[14]
C. Van Rijsbergen. Foundation of evaluation. Journal of Documentation, 30(4):365--373, 1974.
[15]
M. Wattenberg, F. Viégas, and I. Johnson. How to use t-SNE effectively. Distill, 2016. http://distill.pub/2016/misread-tsne.

Cited By

View all
  • (2021)BD2K Training Coordinating Center's ERuDIte: The Educational Resource Discovery Index for Data ScienceIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2019.29034669:1(316-328)Online publication date: 1-Jan-2021
  • (2020)IndexingInformation Retrieval: A Biomedical and Health Perspective10.1007/978-3-030-47686-1_4(181-223)Online publication date: 23-Jul-2020
  • (2019)Linking educational resources on data scienceProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.33019404(9404-9409)Online publication date: 27-Jan-2019
  • Show More Cited By

Index Terms

  1. BD2K ERuDIte: the Educational Resource Discovery Index for Data Science

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web Companion
      April 2017
      1738 pages
      ISBN:9781450349147

      Sponsors

      • IW3C2: International World Wide Web Conference Committee

      In-Cooperation

      Publisher

      International World Wide Web Conferences Steering Committee

      Republic and Canton of Geneva, Switzerland

      Publication History

      Published: 03 April 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. information integration
      2. machine learning
      3. online educational resources

      Qualifiers

      • Research-article

      Funding Sources

      • NIH

      Conference

      WWW '17
      Sponsor:
      • IW3C2

      Acceptance Rates

      WWW '17 Companion Paper Acceptance Rate 164 of 966 submissions, 17%;
      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 16 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)BD2K Training Coordinating Center's ERuDIte: The Educational Resource Discovery Index for Data ScienceIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2019.29034669:1(316-328)Online publication date: 1-Jan-2021
      • (2020)IndexingInformation Retrieval: A Biomedical and Health Perspective10.1007/978-3-030-47686-1_4(181-223)Online publication date: 23-Jul-2020
      • (2019)Linking educational resources on data scienceProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.33019404(9404-9409)Online publication date: 27-Jan-2019
      • (2019)Advancing the international data science workforce through shared training and educationF1000Research10.12688/f1000research.18357.18(251)Online publication date: 4-Mar-2019
      • (2017)VIM: A Big Data Analytics Tool for Data Visualization and Knowledge Mining2017 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE)10.1109/WIECON-ECE.2017.8468939(224-227)Online publication date: Dec-2017

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media