KACTL: Knowware Based Automated Construction of a Treelike Library from Web Documents

Ruqian Lu^20,21,
Yu Huang²⁰,
Kai Sun²¹,
Zhongxiang Chen²⁰,
Yiwen Chen²² &
…
Songmao Zhang^20,21

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7529))

Included in the following conference series:

International Conference on Web Information Systems and Mining

2735 Accesses

Abstract

This paper proposed a knowware based supervised machine learning technique for domain specific regression and classification of Web documents. It is simple because it is only based on word counting techniques without natural language understanding and complicated statistic techniques. Starting from constructing a domain sub-division tree and assigning a training set of documents to its nodes, the algorithm produces a labeled classification tree with a characteristic vector for each node. This tree is used to classify any number of documents in that particular domain. A tool for developing Web portal is also provided to build a Web station for displaying the final treelike library of documents.

Partially supported by 973 Program 2009CB320701,NSFC 61073023 and Tsinghua-Tengxun Lab Project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Automatic Document Classification Based on J.S. Mill’s Ideas

Intelligent Data Analysis Using a Classification Method for Data Mining Knowledge Discovery Process

Approximating Multi-class Text Classification Via Automatic Generation of Training Examples

References

Zamir, O., Etzioni, O.: Grouper: A Dynamic Clustering Interface to Web Search Results. In: Proceedings of the Eighth International World Wide Web Conference (WWW8), Toronto, Canada (May 1999)
Google Scholar
Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: Proceedings of the 19th International ACM SIGIR Conference on Research and Development of Information Retrieval (SIGIR 1998), pp. 46–54 (1998)
Google Scholar
Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., Ma, J.: Learning to cluster web search results. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004), pp. 210–217. ACM, New York (2004)
Google Scholar
Osiński, S., Stefanowski, J., Weiss, D.: Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition. In: Advances in Soft Computing, Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM 2004 Conference, Zakopane, Poland, pp. 359–368 (2004)
Google Scholar
Osiński, S., Weiss, D.: Conceptual Clustering Using Lingo Algorithm: Evaluation on Open Directory Project Data. In: Advances in Soft Computing, Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM 2004 Conference, Zakopane, Poland, pp. 369–378 (2004)
Google Scholar
Stefanowski, J., Weiss, D.: Carrot² and Language Properties in Web Search Results Clustering. In: Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds.) AWIC 2003. LNCS (LNAI), vol. 2663, pp. 240–249. Springer, Heidelberg (2003)
Google Scholar
Osiński, S., Weiss, D.: Carrot²: Design of a Flexible and Efficient Web Information Retrieval Framework. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 439–444. Springer, Heidelberg (2005)
Chapter Google Scholar
Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of Web clustering engines. ACM Computing Surveys (CSUR) 41(3), Article No. 17 (July 2009) ISSN:0360-0300
Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE, Trans. on Pattern Analysis and Machine Intelligence 22, 888–905 (2000)
Article Google Scholar
Cheng, D., Kannan, R., Wang, G.: A Divide-and-Merge Methodology for Clustering. In: Proc. of the ACM Symposium on Principles of Database Systems (2005)
Google Scholar
Kannan, R., Vetta, A.: On clusterings: good, bad and spectral. Journal of the ACM (JACM) 51(3), 497–515 (2004)
Article MathSciNet MATH Google Scholar
Frey, B.J., Dueck, D.: University of Toronto. Science 315, 972–976 (2007)
Article MathSciNet MATH Google Scholar
Givoni, I.E., Chung, C., Frey, B.J.: Uncertainty in Artificial Intelligence (UAI) (July 2011)
Google Scholar
Ghosh, S., Ungureunu, A., Sudderth, E., Blei, D.: Spatial distance dependent Chinese restaurant processes for image segmentation. In: Neural Information Processing Systems (2011)
Google Scholar
Lu, R.: From hardware to software to knowware: IT’s third liberation? IEEE Intelligent Systems, 82–85 (March/April 2005)
Google Scholar
Lu, R.: Knowware, the Third Star after Hardware and Software. Polimetrica Publishing Co., Italy (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing Technology, CAS Key Lab of IIP, China
Ruqian Lu, Yu Huang, Zhongxiang Chen & Songmao Zhang
Academy of Mathematics and Systems Science, CAS Key Lab of MADIS, China
Ruqian Lu, Kai Sun & Songmao Zhang
Tianjin University, China
Yiwen Chen

Authors

Ruqian Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zhongxiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yiwen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Songmao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Deaprtment of Business Administration, Caritas Institute of Higher Education, 18 Chui Ling Road, Tseung Kwan O, Hong Kong, China
Fu Lee Wang
School of Computer and Information Engineering, Shanghai University of Electric Power, 200090, Shanghai, China
Jingsheng Lei
Department of Computer and Inforamtion Science, University of Macau, Av. Padre Tomás Pereira, Taipa, Macau, China
Zhiguo Gong
School of Computer, Shanghai University, 200444, Shanghai, China
Xiangfeng Luo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, R., Huang, Y., Sun, K., Chen, Z., Chen, Y., Zhang, S. (2012). KACTL: Knowware Based Automated Construction of a Treelike Library from Web Documents. In: Wang, F.L., Lei, J., Gong, Z., Luo, X. (eds) Web Information Systems and Mining. WISM 2012. Lecture Notes in Computer Science, vol 7529. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33469-6_80

Download citation

DOI: https://doi.org/10.1007/978-3-642-33469-6_80
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33468-9
Online ISBN: 978-3-642-33469-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

KACTL: Knowware Based Automated Construction of a Treelike Library from Web Documents

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Automatic Document Classification Based on J.S. Mill’s Ideas

Intelligent Data Analysis Using a Classification Method for Data Mining Knowledge Discovery Process

Approximating Multi-class Text Classification Via Automatic Generation of Training Examples

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

KACTL: Knowware Based Automated Construction of a Treelike Library from Web Documents

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Automatic Document Classification Based on J.S. Mill’s Ideas

Intelligent Data Analysis Using a Classification Method for Data Mining Knowledge Discovery Process

Approximating Multi-class Text Classification Via Automatic Generation of Training Examples

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation