Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2427336.2427342guidebooksArticle/Chapter ViewAbstractPublication PagesBookacm-pubtype
chapter

Materialization of web data sources

Published: 01 January 2012 Publication History

Abstract

Recent years witnessed an exponential increase in the number of data services available on the Web. Many popular Web sites, including social networks, offer API for interacting with their information, and open data initiative such as the Linked Data project promise to achieve the vision of the Web of data. Unfortunately, access to Web data is typically limited by the constraints imposed by the query interface, and by technical limitations such as the network latency, or the number and frequency of allowed daily service invocations. Moreover, several sources may independently publish data about the same real-world objects; in such case, their combined use for assembling all available information about those objects requires duplicate removal, reconciliation and integration. This paper describes various data materialization problems, defining properties such as source coverage and data alignment of the materialized data, and then focuses on a specific problem, the reseeding of data access methods by using available information from previous calls in order to build a materialization of maximum size.

References

[1]
Bozzon, A., Brambilla, M., Ceri, S., Fraternali, P.: Liquid query: multidomain exploratory search on the web. In: WWW 2010: Proceedings of the 19th International Conference on World Wide Web (2010).
[2]
Cafarella, M.J., Madhavan, J., Halevy, A.: Web-scale extraction of structured data. SIGMOD Rec. 37, 55-61 (2009).
[3]
Cali, A., Calvanese, D., Martinenghi, D.: Dynamic Query Optimization under Access Limitations and Dependencies. J. UCS 15(1), 33-62 (2009).
[4]
Cambazoglu, B.B., Junqueira, F.P., Plachouras, V., Banachowski, S., Cui, B., Lim, S., Bridge, B.: A refreshing perspective of search engine caching. In: WWW 2010: Proceedings of the 19th International Conference on World Wide Web (2010).
[5]
Bozzon, A., Brambilla, M., Ceri, S., Quarteroni, S.: A Framework for Integrating, Exploring, and Searching Location-Based Web Data. IEEE Internet Computing 15(6), 24-31 (2011).
[6]
Dasgupta, A., Das, G., Mannila: A random walk approach to sampling hidden databases. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (2007).
[7]
Gupta, A., Mumick, I.S. (eds.): Materialized views: techniques, implementations, and applications. MIT Press, Cambridge (1999).
[8]
Halevy, A.Y.: Answering queries using views: A survey. The VLDB Journal 10, 270-294 (2001).
[9]
Halevy, A., Rajaraman, A., Ordille, J.: Data integration: The teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 9-16. VLDB Endowment (2006).
[10]
Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., Halevy, A.: Google's deep web crawl. Proc. VLDB Endowment 1(2), 1241-1252 (2008).
[11]
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: 16th International World Wide Web Conference (WWW 2007), New York, USA (2007).
[12]
Wu, P., Wen, J.-R., Liu, H., Ma, W.-Y.: Query selection techniques for efficient crawling of structured web sources. In: International Conference on Data Engineering (2006).
[13]
Zerfos, P., Cho, J., Ntoulas, A.: Downloading textual hidden web content through keyword queries. In: Joint Conference on Digital Libraries, pp. 100-109 (2005).
[14]
Bozzon, A., Braga, D., Brambilla, M., Ceri, S., Corcoglioniti, F., Fraternali, P., Vadacca, S.: Search computing: multi-domain search on ranked data. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD 2011), pp. 1267-1270. ACM, New York (2011).

Cited By

View all
  • (2015)A UI-Centric Approach for the End-User Development of Multidevice MashupsACM Transactions on the Web10.1145/27356329:3(1-40)Online publication date: 16-Jun-2015

Index Terms

  1. Materialization of web data sources

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide books
    Search Computing: broadening web search
    January 2012
    255 pages
    ISBN:9783642342127
    • Editors:
    • Stefano Ceri,
    • Marco Brambilla

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 01 January 2012

    Qualifiers

    • Chapter

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2015)A UI-Centric Approach for the End-User Development of Multidevice MashupsACM Transactions on the Web10.1145/27356329:3(1-40)Online publication date: 16-Jun-2015

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media