Abstract
Open and efficient sharing of information about food products and their ingredients is important for all parties of the chain ranging from the manufactures to consumers. There exist a public catalogue of some Russian food products (http://goodsmatrix.ru/) that is used by some manufactures and consumers. Although the information is open, there are many difficulties in using the site, e.g., interoperability, querying and linking that could be mitigated by Semantic Web technologies. This paper presents an approach and a project for extracting and publishing information about food products and also linking it to existing datasets in Linked Open Data Cloud.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The goal of this work is to create a 5-starFootnote 1 open data dataset about Russian food products and their ingredients. Such work involves (a) food ontology development, (b) crawling of the existing sources, (c) publishing of the information as Linked Data and (d) linking to existing LOD datasets, such as AGROVOC [1] and DBpedia [2].
Based on the dataset that is created using Semantic Web technologies, new applications and services can be built, e.g. manufacturers can uses it to standardise the names for the ingredients, retailers can reuse the information on their e-shops, developers can built applications for customers that help them decide which product to buy based on their health conditions or personal preferences.
2 Dataset Creation
The source of the information for FOODpedia is web site called GoodsMatrixFootnote 2 which is manually curated catalogue where information comes mainly from manufacturers.
Extraction of food product information from GoodsMatrix goes through a pipeline that includes (a) crawling the web site using ScrapyFootnote 3 framework and set of XPath expressions, (b) parsing the resulting data to extract information about energy values, ingredients and E-additives, (c) translation of the name and description to English and (d) linking ingredients information to resource in AGROVOC and DBpedia datasets.
The source code of the crawler and other artifacts are available in Github repositoryFootnote 4.
Extraction of Ingredients. Ingredients are crawled as a list of ingredients separated by some character such as comma or semicolon. But there is an unsolved issue, it’s rare when different manufacturers use the same names for the same ingredients, some ingredients can have more than dozen alternative names. Usually such names are different only because of word order, missing or extra words, therefore we apply the Ratcli-Obershelp algorithm [3] to measure string similarity and create single resource for similar names.
Extraction of E-additives. E-additives are food additives which have special identifiers called E numbers such as E-100, E-201, etc. and are used in Europe, Russia and other countries. Since the identifiers have well-defined structure, it’s quite easy to find them in the ingredient list using regular expressions. The only issue is additives which have E-number, but written on the package without its number, e.g. CurcuminFootnote 5.
Multilingual Support. The name and description of food product crawled earlier are translated to English with help of Yandex.Translate APIFootnote 6.
Linking. Extracted E-additives and ingredients are linked to similar resource in AGROVOC and DBpedia datasets.
AGROVOC is a multilingual agricultural thesaurus consisting of over 32 000 concepts available in 21 languages including Russian, therefore it’s a good candidate for linking. Ingredients are mapped to AGROVOC concepts automatically, but it doesn’t support E numbers because of that they are mapped manually.
DBpedia is a good source of human readable descriptions of concepts, therefore it’s interesting to link E-additives and ingredients to its resources, but it’s not so easy, because the ontology is generated semi-automatically. Therefore the mapping is performed manually.
3 Ontologies
To represent food products and their ingredients, Food Product OntologyFootnote 7 were developed which extends GoodRelationsFootnote 8 and Food OntologyFootnote 9. Below you find an example of food product in Turtle:
Also an example of ingredient with links to similar resource in AGROVOC and DBpedia datasets:
4 Publishing
The dataset is published using PubbyFootnote 10. The interface for human and machine consumption is available at http://foodpedia.tk. Using the SPARQL endpointFootnote 11 provided by the underlying Virtuoso Triple StoreFootnote 12, actors are able to satisfy complex information needs. In addition, actors are able to use another query interface through Linked Data Fragments [4] serverFootnote 13 for high-availability querying. And last, human can use a simple search interface (see Fig. 1) to find food products by its barcode or name.
Licensing. All published data is openly licensed under Creative Commons Attribution License in accordance with the open definitionFootnote 14.
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
@prefix fpr: \(<\)http://purl.org/foodontology#>.
- 8.
@prefix gr: \(<\)http://purl.org/goodrelations/v1#>.
- 9.
@prefix food: \(<\)http://data.lirmm.fr/ontologies/food#>.
- 10.
- 11.
- 12.
- 13.
- 14.
References
Caracciolo, C., Stellato, A., Morshed, A., Johannsen, G., Rajbhandari, S., Jaques, Y., Keizer, J.: The agrovoc linked dataset. Semant. Web 4(3), 341–348 (2013)
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: Dbpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 6(2), 167–195 (2015)
Ratcliff, J.W., Metzener, D.E.: Pattern-matching-the gestalt approach. Dr. Dobbs J. 13(7), 46 (1988)
Verborgh, R., et al.: Querying datasets on the web with high availability. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 180–196. Springer, Heidelberg (2014). http://dx.doi.org/10.1007/978-3-319-11964-9_12
Acknowledgements
This work has been partially financially supported by the Government of Russian Federation, Grant #074-U01.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kolchin, M., Chistyakov, A., Lapaev, M., Khaydarova, R. (2015). FOODpedia: Russian Food Products as a Linked Data Dataset. In: Gandon, F., Guéret, C., Villata, S., Breslin, J., Faron-Zucker, C., Zimmermann, A. (eds) The Semantic Web: ESWC 2015 Satellite Events. ESWC 2015. Lecture Notes in Computer Science(), vol 9341. Springer, Cham. https://doi.org/10.1007/978-3-319-25639-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-25639-9_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25638-2
Online ISBN: 978-3-319-25639-9
eBook Packages: Computer ScienceComputer Science (R0)