OPEN—Enabling Non-expert Users to Extract, Integrate, and Analyze Open Data

Katrin Braunschweig¹,
Julian Eberius¹,
Maik Thiele¹ &
…
Wolfgang Lehner¹

387 Accesses
2 Citations
Explore all metrics

Abstract

Government initiatives for more transparency and participation have lead to an increasing amount of structured data on the web in recent years. Many of these datasets have great potential. For example, a situational analysis and meaningful visualization of the data can assist in pointing out social or economic issues and raising people’s awareness. Unfortunately, the ad-hoc analysis of this so-called Open Data can prove very complex and time-consuming, partly due to a lack of efficient system support.

On the one hand, search functionality is required to identify relevant datasets. Common document retrieval techniques used in web search, however, are not optimized for Open Data and do not address the semantic ambiguity inherent in it. On the other hand, semantic integration is necessary to perform analysis tasks across multiple datasets. To do so in an ad-hoc fashion, however, requires more flexibility and easier integration than most data integration systems provide. It is apparent that an optimal management system for Open Data must combine aspects from both classic approaches.

In this article, we propose OPEN, a novel concept for the management and situational analysis of Open Data within a single system. In our approach, we extend a classic database management system, adding support for the identification and dynamic integration of public datasets. As most web users lack the experience and training required to formulate structured queries in a DBMS, we add support for non-expert users to our system, for example though keyword queries. Furthermore, we address the challenge of indexing Open Data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Piveau: A Large-Scale Open Data Management Platform Based on Semantic Web Technologies

LinDA: A Service Infrastructure for Linked Data Analysis and Provision of Data Statistics

Exploring Open Data Portals for Geospatial Data Discovery Purposes

Notes

www.transparency.org.

References

Bergamaschi S, Domnori E, Guerra F, Trillo Lado R, Velegrakis Y (2011) Keyword search over relational databases: a metadata approach. In: Proceedings of the 2011 international conference on management of data (SIGMOD ’11), pp 565–576
Google Scholar
Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Semantic Web Inf Syst 5(3)
Blunschi L, Dittrich PJ, Girard OR, Kirakos S, Marcos K, Salles AV (2007) A dataspace odyssey: the iMeMex personal dataspace management system. In: CIDR, pp 114–119
Google Scholar
Calvanese D, De Giacomo G, Lembo D, Lenzerini M, Poggi A, Rodriguez-Muro M, Rosati R, Ruzzi M, Savo DF (2011) The MASTRO system for ontology-based data access. J Web Semant 2:43–53
Google Scholar
Chiticariu L, Hernández MA, Kolaitis PG, Popa L (2007) Semi-automatic schema integration in Clio. In: Proceedings of the 33rd international conference on very large data bases (VLDB ’07), pp 1326–1329
Google Scholar
Cunningham H, Maynard D, Bontcheva K, Tablan V (2002) Gate: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th anniversary meeting of the association for computational linguistics (ACL’02)
Google Scholar
Demartini G, Difallah DE, Cudré-Mauroux P (2012) Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st international conference on world wide web (WWW ’12). ACM, New York, pp 469–478. http://doi.acm.org/10.1145/2187836.2187900. doi:10.1145/2187836.2187900
Chapter Google Scholar
Finin T, Murnane W, Karandikar A, Keller N, Martineau J, Dredze M (2010) Annotating named entities in twitter data with crowdsourcing. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with amazon’s mechanical turk (CSLDAMT ’10). Association for Computational Linguistics, Stroudsburg, pp 80–88. http://dl.acm.org/citation.cfm?id=1866696.1866709
Google Scholar
Franklin M, Halevy A, Maier D (2005) From databases to dataspaces: a new abstraction for information management. SIGMOD Rec 34:27–33
Article Google Scholar
Franklin MJ, Kossmann D, Kraska T, Ramesh S, Xin R (2011) Crowddb: answering queries with crowdsourcing. In: Proceedings of the 2011 international conference on management of data (SIGMOD ’11). ACM, New York, pp 61–72. http://doi.acm.org/10.1145/1989323.1989331. doi:10.1145/1989323.1989331
Google Scholar
Hristidis V, Papakonstantinou Y (2002) Discover: keyword search in relational databases. In: Proceedings of the 28th international conference on very large data bases (VLDB ’02), pp 670–681
Chapter Google Scholar
Lawson N, Eustice K, Perkowitz M, Yetisgen-Yildiz M (2010) Annotating large email datasets for named entity recognition with mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with amazon’s mechanical turk (CSLDAMT ’10). Association for Computational Linguistics, Stroudsburg, pp 71–79. http://dl.acm.org/citation.cfm?id=1866696.1866708
Google Scholar
Madhavan J, Cohen S, Dong XL, Halevy AY, Jeffery SR, Ko D, Yu C (2007) Web-scale data integration: you can afford to pay as you go. In: CIDR, pp 342–350
Google Scholar
Oleson D, Sorokin A, Laughlin GP, Hester V, Le J, Biewald L (2011) Programmatic gold: targeted and scalable quality assurance in crowdsourcing. In: Human computation
Google Scholar
Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10:334–350
Article MATH Google Scholar
Sarawagi S (2008) Information extraction. Found Trends Databases 1(3):261–377. doi:10.1561/1900000003
Article Google Scholar
Vaz Salles MA, Dittrich JP, Karakashian SK, Girard OR, Blunschi L (2007) iTrails: pay-as-you-go information integration in dataspaces. In: Proceedings of the 33rd international conference on very large data bases (VLDB ’07), pp 663–674
Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität Dresden, Nöthnitzer Str. 46, 01187, Dresden, Germany
Katrin Braunschweig, Julian Eberius, Maik Thiele & Wolfgang Lehner

Authors

Katrin Braunschweig
View author publications
You can also search for this author in PubMed Google Scholar
Julian Eberius
View author publications
You can also search for this author in PubMed Google Scholar
Maik Thiele
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Lehner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maik Thiele.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Braunschweig, K., Eberius, J., Thiele, M. et al. OPEN—Enabling Non-expert Users to Extract, Integrate, and Analyze Open Data. Datenbank Spektrum 12, 121–130 (2012). https://doi.org/10.1007/s13222-012-0091-9

Download citation

Received: 14 March 2012
Accepted: 12 May 2012
Published: 26 May 2012
Issue Date: July 2012
DOI: https://doi.org/10.1007/s13222-012-0091-9

OPEN—Enabling Non-expert Users to Extract, Integrate, and Analyze Open Data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Piveau: A Large-Scale Open Data Management Platform Based on Semantic Web Technologies

LinDA: A Service Infrastructure for Linked Data Analysis and Provision of Data Statistics

Exploring Open Data Portals for Geospatial Data Discovery Purposes

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

OPEN—Enabling Non-expert Users to Extract, Integrate, and Analyze Open Data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Piveau: A Large-Scale Open Data Management Platform Based on Semantic Web Technologies

LinDA: A Service Infrastructure for Linked Data Analysis and Provision of Data Statistics

Exploring Open Data Portals for Geospatial Data Discovery Purposes

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation