Abstract
This paper proposed a knowware based supervised machine learning technique for domain specific regression and classification of Web documents. It is simple because it is only based on word counting techniques without natural language understanding and complicated statistic techniques. Starting from constructing a domain sub-division tree and assigning a training set of documents to its nodes, the algorithm produces a labeled classification tree with a characteristic vector for each node. This tree is used to classify any number of documents in that particular domain. A tool for developing Web portal is also provided to build a Web station for displaying the final treelike library of documents.
Partially supported by 973 Program 2009CB320701,NSFC 61073023 and Tsinghua-Tengxun Lab Project.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zamir, O., Etzioni, O.: Grouper: A Dynamic Clustering Interface to Web Search Results. In: Proceedings of the Eighth International World Wide Web Conference (WWW8), Toronto, Canada (May 1999)
Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: Proceedings of the 19th International ACM SIGIR Conference on Research and Development of Information Retrieval (SIGIR 1998), pp. 46–54 (1998)
Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., Ma, J.: Learning to cluster web search results. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004), pp. 210–217. ACM, New York (2004)
Osiński, S., Stefanowski, J., Weiss, D.: Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition. In: Advances in Soft Computing, Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM 2004 Conference, Zakopane, Poland, pp. 359–368 (2004)
Osiński, S., Weiss, D.: Conceptual Clustering Using Lingo Algorithm: Evaluation on Open Directory Project Data. In: Advances in Soft Computing, Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM 2004 Conference, Zakopane, Poland, pp. 369–378 (2004)
Stefanowski, J., Weiss, D.: Carrot2 and Language Properties in Web Search Results Clustering. In: Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds.) AWIC 2003. LNCS (LNAI), vol. 2663, pp. 240–249. Springer, Heidelberg (2003)
Osiński, S., Weiss, D.: Carrot2: Design of a Flexible and Efficient Web Information Retrieval Framework. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 439–444. Springer, Heidelberg (2005)
Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of Web clustering engines. ACM Computing Surveys (CSUR) 41(3), Article No. 17 (July 2009) ISSN:0360-0300
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE, Trans. on Pattern Analysis and Machine Intelligence 22, 888–905 (2000)
Cheng, D., Kannan, R., Wang, G.: A Divide-and-Merge Methodology for Clustering. In: Proc. of the ACM Symposium on Principles of Database Systems (2005)
Kannan, R., Vetta, A.: On clusterings: good, bad and spectral. Journal of the ACM (JACM) 51(3), 497–515 (2004)
Frey, B.J., Dueck, D.: University of Toronto. Science 315, 972–976 (2007)
Givoni, I.E., Chung, C., Frey, B.J.: Uncertainty in Artificial Intelligence (UAI) (July 2011)
Ghosh, S., Ungureunu, A., Sudderth, E., Blei, D.: Spatial distance dependent Chinese restaurant processes for image segmentation. In: Neural Information Processing Systems (2011)
Lu, R.: From hardware to software to knowware: IT’s third liberation? IEEE Intelligent Systems, 82–85 (March/April 2005)
Lu, R.: Knowware, the Third Star after Hardware and Software. Polimetrica Publishing Co., Italy (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lu, R., Huang, Y., Sun, K., Chen, Z., Chen, Y., Zhang, S. (2012). KACTL: Knowware Based Automated Construction of a Treelike Library from Web Documents. In: Wang, F.L., Lei, J., Gong, Z., Luo, X. (eds) Web Information Systems and Mining. WISM 2012. Lecture Notes in Computer Science, vol 7529. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33469-6_80
Download citation
DOI: https://doi.org/10.1007/978-3-642-33469-6_80
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33468-9
Online ISBN: 978-3-642-33469-6
eBook Packages: Computer ScienceComputer Science (R0)