Abstract
This paper explores the appropriateness of utilizing Linked Data as a knowledge source for content selection. Content Selection is a crucial subtask in Natural Language Generation which has the function of determining the relevancy of contents from a knowledge source based on a communicative goal. The recent online era has enabled us to accumulate extensive amounts of generic online knowledge some of which has been made available as structured knowledge sources for computational natural language processing purposes. This paper proposes a model for content selection by utilizing a generic structured knowledge source, DBpedia, which is a replica of the unstructured counterpart, Wikipedia. The proposed model uses log likelihood to rank the contents from DBpedia Linked Data for relevance to a communicative goal. We performed experiments using DBpedia as the Linked Data resource using two keyword datasets as communicative goals. To optimize parameters we used keywords extracted from QALD-2 training dataset and QALD-2 testing dataset is used for the testing. The results was evaluated against the verbatim based selection strategy. The results showed that our model can perform 18.03% better than verbatim selection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Reiter, E., Dale, R.: Building natural language generation systems. Cambridge University Press (January 2000)
Jentzsch, A., Cyganiak, R., Bizer, C.: State of the LOD Cloud. Technical report, Hasso-Plattner-Institute, Potsdam-Babelsberg (2011)
Rayson, P., Berridge, D., Francis, B.: Extending the Cochran rule for the comparison of word frequencies between corpora. In: 7th International Conference on Statistical Analysis of Textual Data (2004)
He, T., Zhang, X., Xinghuo, Y.: An Approach to Automatically Constructing Domain Ontology. In: Pacific Asia Computational Linguistics, Wuhan, pp. 150–157 (2006)
Gelbukh, A., Sidorov, G., Lavin-Villa, E., Chanona-Hernandez, L.: Automatic Term Extraction Using Log-Likelihood Based Comparison with General Reference Corpus. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds.) NLDB 2010. LNCS, vol. 6177, pp. 248–255. Springer, Heidelberg (2010)
Pedersen, P.: WordNet: Similarity - Measuring the Relatedness of Concepts. In: Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics, Boston, pp. 38–41 (2004)
Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38(11), 39–41 (1995)
Penas, A., Hovy, E.: Semantic enrichment of text with background knowledge. In: NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, Los Angeles, pp. 15–23. Association for Computational Linguistics (June 2010)
Voorhees, E., Tice, D.: Building a Question Answering Test Collection. In: ACM Special Interest Group on Information Retrieval Conference, Athens, Greece. ACM Press (2000)
Unger, C.: Question Answering Over Linked Data. Technical report, Bielefeld University, Heraklion, Greece (2012)
Smith, N., Heilman, M., Hwa, R., Cohen, S., Gimpel, K.: Question-Answer Dataset. Technical report, Carnegie Mellon University, Pennsylvania, USA (2013)
Bouayad-Agha, N., Casamayor, G., Wanner, L., Mellish, C.: Content selection from semantic web data. In: Seventh International Natural Language Generation Conference, Utica, IL, USA, pp. 146–149. Association for Computational Linguistics (May 2012)
Bouayad-Agha, N., Casamayor, G., Wanner, L., Mellish, C.: Overview of the First Content Selection Challenge from Open Semantic Web Data. In: Proceedings of the 14th European Workshop on Natural Language Generation, Sofia, Bulgaria, pp. 98–102. Association for Computational Linguistics (August 2013)
Kutlak, R., Mellish, C., van Deemter, K.: Content Selection Challenge - University of Aberdeen Entry. In: Fourteenth European Workshop on Natural Language Generation, Sofia, Bulgaria, pp. 208–209. Association for Computational Linguistics (August 2013)
Venigalla, H., Eugenio, B.D.: UIC-CSC: The Content Selection Challenge Entry from the University of Illinois at Chicago. In: Proceedings of the 14th European Workshop on Natural Language Generation, Sofia, Bulgaria, pp. 210–211. Association for Computational Linguistics (August 2013)
Duboue, P.A., McKeown, K.R.: Statistical acquisition of content selection rules for natural language generation. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, Morristown, NJ, USA, vol. 10, pp. 121–128. Association for Computational Linguistics (July 2003)
Bouayad-Agha, N., Casamayor, G., Wanner, L.: Content selection from an ontology-based knowledge base for the generation of football summaries. In: Thirtheenth European Workshop on Natural Language Generation, Nancy, France, pp. 72–81. Association for Computational Linguistics (September 2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Perera, R., Nand, P. (2014). The Role of Linked Data in Content Selection. In: Pham, DN., Park, SB. (eds) PRICAI 2014: Trends in Artificial Intelligence. PRICAI 2014. Lecture Notes in Computer Science(), vol 8862. Springer, Cham. https://doi.org/10.1007/978-3-319-13560-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-13560-1_46
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13559-5
Online ISBN: 978-3-319-13560-1
eBook Packages: Computer ScienceComputer Science (R0)