Abstract
Identifying location-based information from the WWW, such as street addresses of emergency service facilities, has become increasingly popular. However, current Web-mining tools such as Google’s crawler are designed to index webpages on the Internet instead of considering location information with a smaller granularity as an indexable object. This always leads to low recall of the search results. In order to retrieve the location-based information on the ever-expanding Internet with almost-unstructured Web data, there is a need of an effective Web-mining mechanism that is capable of extracting desired spatial data on the right webpages within the right scope. In this paper, we report our efforts towards automated location-information retrieval by developing a knowledge-based Web mining tool, CyberMiner, that adopts (1) a geospatial taxonomy to determine the starting URLs and domains for the spatial Web mining, (2) a rule-based forward and backward screening algorithm for efficient address extraction, and (3) inductive-learning-based semantic analysis to discover patterns of street addresses of interest. The retrieval of locations of all fire stations within Los Angeles County, California is used as a case study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Buyukokkten, O., Cho, J., Garcia-Molina, H., Gravano, L., Shivakumar, N.: Exploiting geographical location information of Web pages. In: Proceedings of Workshop on Web Databases (WebDB 1999) held in Conjunction with ACM SIGMOD 1999, Philadephia, Pennsylvania, USA (1999)
Cai, W., Wang, S., Jiang, Q.: Address Extraction: Extraction of Location-Based Information from the Web. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds.) APWeb 2005. LNCS, vol. 3399, pp. 925–937. Springer, Heidelberg (2005)
Chang, G., Healey, M.J., McHugh, J.A.M., Wang, J.T.L.: Mining the World Wide Web, vol. 10. Kluwer Academic Publishers, Norwell (2001)
Glendora: City of Glendora Government Website (2012), http://www.ci.glendora.ca.us/index.aspx?page=896 (last Access Date: July 27, 2012)
Goodchild, M.F.: Citizens as sensors: the world of volunteered geography. Geo Journal 69, 211–221 (2007)
Gulli, A., Signorini, A.: The indexable web is more than 11.5 billion pages. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 902–903. ACM, Chiba (2005)
Han, J., Kamber, M.: Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco (2001)
Kofahl, M., Wilde, E.: Location concepts for the web. In: King, I., Baeza-Yates, R. (eds.) Weaving Services and People on the World Wide Web, pp. 147–168. Springer, Heidelberg (2009)
Loos, B., Biemann, C.: Supporting web-based address extraction with unsupervised tagging. In: Data Analysis, Machine Learning and Applications 2008, pp. 577–584 (2008)
Li, W., Goodchild, M.F., Raskin, R.: Towards geospatial semantic search: exploiting latent semantic analysis among geospatial data. International Journal of Digital Earth (2012), doi:10.1080/17538947.2012.674561
Li, W., Yang, C.W., Sun, D.: Mining geophysical parameters through decision-tree analysis to determine correlation with tropical cyclone development. Computers & Geosciences 35, 309–316 (2009)
Li, W., Yang, C., Zhou, B.: Internet-Based Spatial Information Retrieval. In: Shekhar, S., Xiong, H. (eds.) Encyclopedia of GIS, pp. 596–599. Springer, NYC (2008)
Li, W., Yang, C.W., Yang, C.J.: An active crawler for discovering geospatial Web services and their distribution pattern - A case study of OGC Web Map Service. International Journal of Geographical Information Science 24, 1127–1147 (2010)
Ligiane, A.S., Clodoveu Jr., A.D., Karla, A.V.B., Tiago, M.D., Alberto, H.F.L.: The Role of Gazetteers in Geographic Knowledge Discovery on the Web. In: Proceedings of the Third Latin American Web Congress, p. 157. IEEE Computer Society (2005)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Rogers, J.D.: GVU 9th WWW User Survey, vol. 2012 (2012), http://www.cc.gatech.edu/gvu/user_surveys/survey-1998-1904/ (last Access Date: July 27, 2012)
Sanjay Kumar, M., Sourav, S.B., Wee Keong, N., Ee-Peng, L.: Research Issues in Web Data Mining. In: Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery, pp. 303–312. Springer (1999)
Szalay, A., Gray, J.: Science in an exponential world. Nature 440, 413–414 (2006)
Taghva, K., Coombs, J., Pereda, R., Nartker, T.: Address extraction using hidden markov models. In: Proceedings of IS&T/SPIE 2005 Int. Symposium on Electronic Imaging Science and Technology, San Jose, California, pp. 119–126 (2005)
USCB: GCT-PH1 - Population, Housing Units, Area, and Density: 2010 - State - Place and (in selected states) County Subdivision (2012), http://factfinder2.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=DEC_10_SF11_GCTPH11.ST10 (last Access Date: July 27,2012)
Wray, R.: Internet data heads for 500bn gigabytes. The guardian, Vol. 2012. Guardian News and Media, London (2009), http://www.guardian.co.uk/business/2009/may/2018/digital-content-expansion (last Access Date: July 27,2012)
Yasuhiko, M., Masaki, A., Michael, E.H., Kevin, S.M.: Extracting Spatial Knowledge from the Web. In: Proceedings of the 2003 Symposium on Applications, p. 326. IEEE Computer Society (2003)
Yu, Z.: High accuracy postal address extraction from web pages. Thesis for Master of Computer Science. 61p. Dalhousie University, Halifax, Nova Scotia (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, W., Goodchild, M.F., Church, R.L., Zhou, B. (2012). Geospatial Data Mining on the Web: Discovering Locations of Emergency Service Facilities. In: Zhou, S., Zhang, S., Karypis, G. (eds) Advanced Data Mining and Applications. ADMA 2012. Lecture Notes in Computer Science(), vol 7713. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35527-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-642-35527-1_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35526-4
Online ISBN: 978-3-642-35527-1
eBook Packages: Computer ScienceComputer Science (R0)