Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3184558.3186574acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Free access

A User Centred Perspective on Structured Data Discovery

Published: 23 April 2018 Publication History

Abstract

Structured data is becoming critical in every domain and its availability on the web is increasing rapidly. Despite its abundance and variety of applications, we know very little about how people find data, understand it, and put it to use. This work aims to inform the design of data discovery tools and technologies from a user centred perspective. We aim to better understand what type of information supports people in finding and selecting data relevant for their respective tasks. We conducted a mixed-methods study looking at the workflow of data practitioners when searching for data. From that we identified textual summaries as a key element that supports the decision making process in information seeking activities for data. Based on these results we performed a mixed-methods study to identify attributes people consider important when summarising a dataset. We found text summaries are laid out according to common structures, contain four main information types, and cover a set of dataset features. We describe follow-up studies that are planned to validate these findings and to evaluate their applicability in a dataset search scenario.

References

[1]
Lorena Leal Bando, Falk Scholer, and Andrew Turpin. 2010. Constructing Querybiased Summaries: A Comparison of Human and System Generated Snippets. In Proceedings of the Third Symposium on Information Interaction in Context (IIiX '10). ACM, New York, NY, USA, 195--204.
[2]
Bruce E Bargmeyer and Daniel W Gillman. 2000. Metadata standards and metadata registries: An overview. In International Conference on Establishment Surveys II, Buffalo, New York.
[3]
Ann Blandford and Simon Attfield. 2010. Interacting with information. Synthesis Lectures on Human-Centered Informatics 3, 1 (2010), 1--99.
[4]
Ria Mae Borromeo, Maha Alsaysneh, Sihem Amer-Yahia, and Vincent Leroy. 2017. Crowdsourcing Strategies for Text Creation Tasks. In Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21--24, 2017. 450--453.
[5]
Alan Bryman. 2006. Integrating quantitative and qualitative research: how is it done Qualitative research 6, 1 (2006), 97--113.
[6]
Tiziana Catarci. 2000. What happened when database researchers met usability. Information Systems 25, 3 (2000), 177--212.
[7]
Gabriella Cattaneo, Mike Glennon, Rosanna Lifonti, Giorgio Micheletti, Alys Woodward, Marianne Kolding, Angela Vacca, Carla La Croce, and David Osimo. 2015. IDC, European Data Market SMART 2013/0063, D6 - First Interim Report. (16 October 2015). https://idc-emea.app.box.com/s/ k7xv0u3gl6xfvq1rl667xqmw69pzk790.
[8]
Brenda Dervin. 1997. Given a context by any other name: Methodological tools for taming the unruly beast. Information seeking in context 13 (1997), 38.
[9]
Rafael Ferreira, Luciano de Souza Cabral, Rafael Dueire Lins, Gabriel Pereira e Silva, Fred Freitas, George D.C. Cavalcanti, Rinaldo Lima, Steven J. Simske, and Luciano Favaro. 2013. Assessing sentence scoring techniques for extractive text summarization. Expert Systems with Applications 40, 14 (2013), 5755--5764.
[10]
Mahak Gambhir and Vishal Gupta. 2017. Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47, 1 (2017), 1--66.
[11]
Dimitra Gkatzia, Oliver Lemon, and Verena Rieser. 2016. Natural Language Generation enhances human decision-making with uncertain information. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7--12, 2016, Berlin, Germany, Volume 2: Short Papers. http://aclweb.org/anthology/P/P16/P16--2043.pdf
[12]
Suzanne Hidi and Valerie Anderson. 1986. Producing written summaries: Task demands, cognitive operations, and implications for instruction. Review of educational research 56, 4 (1986), 473--493.
[13]
Jozef Hvorecky, Martin Drlík, and Michal Munk. 2010. Enhancing database
[14]
querying skills by choosing a more appropriate interface. In IEEE EDUCON 2010 Conference. IEEE, 1897--1905.
[15]
Bernard J Jansen. 2006. Search log analysis: What it is, what's been done, how to do it. Library & information science research 28, 3 (2006), 407--432.
[16]
Diane Kelly. 2009. Methods for Evaluating Interactive Information Retrieval Systems with Users. Foundations and Trends in Information Retrieval 3, 1--2 (2009), 1--224.
[17]
Dagmar Kern and Brigitte Mathiak. 2015. Are There Any Differences in Data Set Retrieval Compared to Well-Known Literature Retrieval. In Research and Advanced Technology for Digital Libraries - 19th International Conference on Theory and Practice of Digital Libraries, TPDL 2015, Pozna, Poland, September 14--18, 2015. Proceedings. 197--208.
[18]
Dagmar Kern and Brigitte Mathiak. 2015. Are There Any Differences in Data Set Retrieval Compared to Well-Known Literature Retrieval. In Research and Advanced Technology for Digital Libraries - 19th International Conference on Theory and Practice of Digital Libraries, TPDL 2015, Pozna, Poland, September 14--18, 2015. Proceedings. 197--208.
[19]
Joy O. Kim and Andrés Monroy-Hernández. 2016. Storia: Summarizing Social Media Content based on Narrative Theory using Crowdsourcing. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, CSCW 2016, San Francisco, CA, USA, February 27 - March 2, 2016. 1016--1025.
[20]
Laura M. Koesten, Emilia Kacprzak, Jenifer F. A. Tennison, and Elena Simperl. 2017. The Trials and Tribulations of Working with Structured Data: -a Study on Information Seeking Behaviour. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 1277--1289.
[21]
Anna S. Law, Yvonne Freer, Jim Hunter, Robert H. Logie, Neil Mcintosh, and John Quinn. 2005. A Comparison of Graphical and Textual Presentations of Time Series Data to Support Medical Decision Making in the Neonatal Intensive Care Unit. Journal of Clinical Monitoring and Computing 19, 3 (01 Jun 2005), 183--194.
[22]
Gary Marchionini, Stephanie W. Haas, Junliang Zhang, and Jonathan L. Elsas. 2005. Accessing Government Statistical Information. IEEE Computer 38, 12 (2005), 52--61.
[23]
Michael Chui Peter Groves Diana Farrell Steve Van Kuiken Elizabeth Almasi Doshi McKinsey Global Institute, James Manyika. 2013. Open data: Unlocking innovation and performance with liquid information. (2013).
[24]
C Naumer, K Fisher, and Brenda Dervin. 2008. Sense-Making: a methodological perspective. In Sensemaking Workshop, CHI'08.
[25]
T. T. Nguyen, Q. V. H. Nguyen, M. Weidlich, and K. Aberer. 2015. Result selection and summarization for Web Table search. In 2015 IEEE 31st International Conference on Data Engineering. 231--242.
[26]
Hanspeter Pfister and Joe Blitzstein. 2015. cs109/2015, Lectures 01-Introduction. https://github.com/cs109/2015/tree/master/Lectures. (2015).
[27]
Minh Pham, Suresh Alse, Craig A. Knoblock, and Pedro Szekely. 2016. Semantic Labeling: A Domain-Independent Approach. Springer International Publishing, Cham, 446--462.
[28]
Soo Young Rieh, Kevyn Collins-Thompson, Preben Hansen, and Hye-Jung Lee. 2016. Towards searching as a learning process: A review of current perspectives and future directions. J. Information Science 42, 1 (2016), 19--34.
[29]
Stefano Spaccapietra and Ramesh Jain. 2013. Visual Database Systems 3: Visual Information Management. Springer.
[30]
M van der Meulen, R H. Logie, Y Freer, C Sykes, N McIntosh, and J Hunter. 2010. When a graph is poorer than 100 words: A comparison of computerised natural language generation, human generated descriptions and graphical displays in neonatal intensive care. Applied Cognitive Psychology 24, 1 (2010), 77--89.
[31]
Max L. Wilson, Bill Kules, m. c. schraefel, and Ben Shneiderman. 2010. From Keyword Search to Exploration: Designing Future Search Interfaces for the Web. Foundations and Trends in Web Science 2, 1 (2010), 1--97.
[32]
Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, and Sören Auer. 2016. Quality assessment for Linked Data: A Survey. Semantic Web 7, 1 (2016), 63--93.
[33]
Hai Zhuge. 2015. Dimensionality on Summarization. CoRR abs/1507.00209 (2015). http://arxiv.org/abs/1507.00209

Cited By

View all
  • (2023)A Taxonomy of Dataset SearchAdvances on Intelligent Computing and Data Science10.1007/978-3-031-36258-3_50(562-573)Online publication date: 17-Aug-2023
  • (2022)Modular framework for similarity-based dataset discovery using external knowledgeData Technologies and Applications10.1108/DTA-09-2021-026156:4(506-535)Online publication date: 15-Feb-2022
  • (2022)Open dataset discovery using context-enhanced similarity searchKnowledge and Information Systems10.1007/s10115-022-01751-z64:12(3265-3291)Online publication date: 1-Dec-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '18: Companion Proceedings of the The Web Conference 2018
April 2018
2023 pages
ISBN:9781450356404
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 23 April 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data discovery
  2. data portals
  3. data search
  4. human data interaction

Qualifiers

  • Research-article

Funding Sources

  • Marie Skodowska-Curie ITN

Conference

WWW '18
Sponsor:
  • IW3C2
WWW '18: The Web Conference 2018
April 23 - 27, 2018
Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)114
  • Downloads (Last 6 weeks)15
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Taxonomy of Dataset SearchAdvances on Intelligent Computing and Data Science10.1007/978-3-031-36258-3_50(562-573)Online publication date: 17-Aug-2023
  • (2022)Modular framework for similarity-based dataset discovery using external knowledgeData Technologies and Applications10.1108/DTA-09-2021-026156:4(506-535)Online publication date: 15-Feb-2022
  • (2022)Open dataset discovery using context-enhanced similarity searchKnowledge and Information Systems10.1007/s10115-022-01751-z64:12(3265-3291)Online publication date: 1-Dec-2022
  • (2021)Similarity vs. Relevance: From Simple Searches to Complex DiscoverySimilarity Search and Applications10.1007/978-3-030-89657-7_9(104-117)Online publication date: 29-Sep-2021
  • (2021)Toward Best Practices for Unstructured Descriptions of Research DataProceedings of the Association for Information Science and Technology10.1002/pra2.45858:1(303-314)Online publication date: 13-Oct-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media