Nothing Special   »   [go: up one dir, main page]

skip to main content
article

An Efficient Multiclass Classifier Using On-Page Positive Personality Features for Web Page Classification for the Next Generation Wireless Communication Networks

Published: 01 March 2017 Publication History

Abstract

Over the years, wireless communication networks have been widely used by a large community of users in wide variety of applications such as intelligent transportation systems, energy management, safety, and security etc. But, during this era, due to large number of user's request, there may be a performance bottleneck in some part of the network with respect to various QoS parameters such as congestion and network delay. Hence, there is a requirement of an efficient classification technique to reduce congestion in the network so that throughput of various applications can be increased. Classification helps in searching, sorting, retrieval, and querying of a document for the wireless networks. World Wide Web (WWW) contains huge repository of information in the form of web pages. However, size of Internet is growing day-by-day. The huge repository of information poses challenge to collect and process the relevant related information of a particular domain. So, traditional text classification techniques are difficult to apply on the rapidly growing web-based contents. Hence, novel approaches and techniques need to be devised to reduce the manual efforts in web page classification. Keeping focus on these points, this paper proposes a novel approach for multiclass classifier based on unique personality features of the web page of particular domain category for the next generation wireless networks. Personality features are collected and assigned weights in the proposed scheme. Then, the proposed classifier is trained based on these special features. Results obtained depict that proposed classifier successfully classified news domain pages, education, resume, online shopping, and research web pages from large database repository. Accuracy of the proposed classifier is found to be satisfactory from a large data set of different categories. Also, there is a 10---15 % overall performance gain using the proposed scheme in comparison to the other existing schemes.

References

[1]
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273---297.
[2]
Stuckenschmidt, H., Hartmann, J., & Van Harmelen, F. (2002). Learning structural classification rules for web-page categorization. In FLAIRS conference (pp. 440---444).
[3]
Kwon, O. W., & Lee, J. H. (2003). Text categorization based on k-nearest neighbor approach for web site classification. Information Processing & Management, 39(1), 25---44.
[4]
Denoyer, L., Zaragoza, H., & Gallinari, P. (2001, March). HMM-based passage models for document classification and ranking. In Proceedings of ECIR-01, 23rd European colloquium on information retrieval research seattle (pp. 126---135). WA, USA.
[5]
Selamat, A., & Omatu, S. (2004). Web page feature selection and classification using neural networks. Information Sciences, 158, 69---88.
[6]
Tan, S. (2005). Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications, 28(4), 667---671.
[7]
Sun, A., Lim, E. P., & Ng, W. K. (2002, November). Web classification using support vector machine. In Proceedings of the 4th international workshop on Web information and data management (pp. 96---99).
[8]
Zhang, M. L., Pea, J. M., & Robles, V. (2009). Feature selection for multi-label naive Bayes classification. Information Sciences, 179(19), 3218---3229.
[9]
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In D. Haussler (Ed.), 5th annual ACM workshop on COLT (pp. 144---152). Pittsburgh, PA.
[10]
Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13(2), 415---425.
[11]
Liang, J. Z. (2004). SVM multi-classifier and web document classification. In Proceedings of international conference on machine learning and cybernetics, 2004 (Vol. 3, pp. 1347---1351). Shanghai.
[12]
Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. In Proceedings of 14th international conference on machine learning ICML-97 (pp. 170---178). San Francisco, Nashville, USA.
[13]
Dumais, S., & Chen, H. (2000, July). Hierarchical classification of web content. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 256---263). New York, USA.
[14]
Dumais, S., & Chen, H. (2000, July). Hierarchical classification of web content. In N. J. Belkin, P. Ingwersen, & M.-K. Leong (Eds.), Proceedings of SIGIR-00 (pp. 256---263). ACM.
[15]
Pietramala, A., Policicchio, V. L., Rullo, P., & Sidhu, I. (2008). A genetic algorithm for text classification rule induction. In Proceedings of European conference, ECML PKDD 2008, part II (pp. 188---203). Antwerp, Belgium.
[16]
Bai, R., Wang, X., & Liao, J. (2007). Combination of rough sets and genetic algorithms for text classification. In Proceedings of second international workshop, AIS-ADM 2007 (pp. 256---268). St. Petersburg, Russia.
[17]
Liang, J. Z. (2003). Chinese web page classification based on self-organizing mapping neural networks. In Proceedings fifth international conference on computational intelligence and multimedia applications, ICCIMA 2003 (pp. 96---101). Wan, China.
[18]
Holden, N., & Freitas, A. A. (2004, January). Web page classification with an ant colony algorithm. In Proceedings of 8th international conference (pp. 1092---1102). Birmingham, UK.
[19]
Benbrahim, H., & Bramer, M. (2004, October). An empirical study for hypertext categorization. In IEEE international conference on systems, man and cybernetics (pp. 5952---5957).
[20]
Sun, A., Lim, E. P., & Ng, W. K. (2002, November). Web classification using support vector machine. In Proceedings of the 4th international workshop on Web information and data management (pp. 96---99).
[21]
Lim, C. S., Lee, K. J., & Kim, G. C. (2005). Multiple sets of features for automatic genre classification of web documents. Information Processing & Management, 41(5), 1263---1276.
[22]
Attardi, G., Gulli, A., & Sebastiani, F. (1999). Automatic Web page categorization by link and context analysis. In Proceedings of THAI-99, European symposium on telematics, hypermedia and artificial intelligence (pp. 105---119).
[23]
Riboni, D. (2002). Feature selection for web page classification. In Proceedings workshop, pp. 473---478.
[24]
Quek, C. Y., & Mitchell, T. (1997). Classification of world wide web documents. Master's thesis, School of Computer Science Carnegie Mellon University.
[25]
Yang, Y., Slattery, S., & Ghani, R. (2002). A study of approaches to hypertext categorization. Journal of Intelligent Information Systems, 18(2---3), 219---241.
[26]
Hodgson, J. (2001). Do HTML tags flag semantic content. Internet Computing, 5(1), 20---25.
[27]
Calado, P., Cristo, M., Moura, E., Ziviani, N., Ribeiro-Neto, B., & Gonalves, M. A. (2003, November). Combining link-based and content-based methods for web document classification. In Proceedings of the twelfth international conference on Information and knowledge management (pp. 394---401).
[28]
Frnkranz, J. (1999). Exploiting structural information for text classification on the WWW. In IDA '99 proceedings of the third international symposium on advances in intelligent data analysis (pp. 487---497). London, UK: Springer.
[29]
Internet source. http://www.dmoz.org/ Open Directory Project (ODP).
[30]
Internet source yahoo! Directory.
[31]
Aliakbary, S., Abolhassani, H., Rahmani, H., & Nobakht, B. (2009). Web page classification using social tags. In Computational science and engineering, 2009. CSE '09. International conference on (Vol. 4, pp. 588---593).
[32]
Zou, J., Chen, G.-L., & Guo, W.-Z. (2005). Chinese web page classification using noise-tolerant support vector machines. In Natural language processing and knowledge engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE international conference (pp. 785---790).
[33]
Kwon, O. W., & Lee, J. H. (2000, November). Web page classification based on k-nearest neighbor approach. In Proceedings of the fifth international workshop on on Information retrieval with Asian languages (pp. 9---15). New York, USA.
[34]
Liu, Z. Q., & Zhang, Y. J. (2001). A competitive neural network approach to web-page categorization. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 9(06), 731---741.
[35]
Enhong, C., Shangfei, W., Zhenya, Z., & Xufa, W. (2001). Document classification with CC4 neural network. In Proceedings of ICONIP. Sanghai, China.
[36]
Ozel, S. A. (2011). A web page classification system based on a genetic algorithm using tagged-terms as features. Expert Systems with Applications, 38(4), 3407---3415.
[37]
Qi, X., & Davison, B. D. (2006, November). Knowing a web page by the company it keeps. In Proceedings of the 15th ACM international conference on Information and knowledge management (pp. 228---237).
[38]
Blum, A., & Mitchell, T. (1998, July). Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory (pp. 92---100).
[39]
Chen, R. C., & Hsieh, C. H. (2006). Web page classification based on a support vector machine using a weighted vote schema. Expert Systems with Applications, 31(2), 427---435.
[40]
Olson, D. L., & Delen, D. (2008). Advanced data mining techniques. Berlin: Springer Science & Business Media.
[41]
Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27. http://www.csie.ntu.edu.tw/cjlin/libsvm
[42]
Hsu, C. H., Chang, C. C. & Lin, C. J. (2003). A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University.
[43]
Chapelle, O. (2007). Training a support vector machine in the primal. Neural Computation, 19(5), 1155---1178.
[44]
Lee, Y.-B., & Myaeng, S. H. (2002). Text genre classification with genre-revealing and subject-revealing features. In Proceeding SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval (pp. 145---150).
[45]
Peng, X., & Choi, B. (2002). Automatic web page classification in a dynamic and hierarchical way. In Data mining, 2002. ICDM 2003. Proceedings. 2002 IEEE international conference (pp. 386---393).
[46]
Schenker, A., Last, M., Bunke, H., & Kandel, A. (2003). Classification of web documents using a graph model. International Journal of Pattern Recognition and Artificial Intelligence, 18(03), 475---496.
[47]
Shen, D., Chen, Z., & Yang, Q., (2004). Web-page classification through summarization. In Proceeding SIGIR '04. Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 242---249).
[48]
Kan, M.-Y., & Hoang Oanh Nguyen, T. (2005). Fast webpage classification using URL features. In Proceeding CIKM '05. Proceedings of the 14th ACM international conference on Information and knowledge management (pp. 325---326).
[49]
Devi, M. I., Rajaram, R., & Selvakuberan, K. (2007). Machine learning techniques for automated web page classification using URL features. In Conference on computational intelligence and multimedia applications, 2007. international conference (Vol. 2, pp. 116---120).
[50]
Yin, Z., Li, Z., Mei, Q., & Han, J. (2009). Exploring social tagging graph for web object classification. In Proceeding KDD '09. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 957---966).
[51]
Punera, K., Rajan, S., & Ghosh, J., (2005). Automatically learning document taxonomies for hierarchical classification. In Proceeding WWW '05. Special interest tracks and posters of the 14th international conference on world wide web (pp. 1010---1011).
[52]
Liang, J. (2004). SVM multi-classifier and web document classification. In Machine learning and cybernetics, 2004. Proceedings of 2004 international conference on (Vol. 3, pp. 1347---1351).
[53]
Sun, A., Liu, Y., & Lim, E.-P. (2011). Web classification of conceptual entities using co-training. Expert Systems with Applications, 38(12), 14367---14375.
[54]
Godoy, D. (2012). One-class support vector machines for personalized tag-based resource classification in social bookmarking systems. Concurrency and Computation: Practice and Experience, 24(17), 2193---2206.
[55]
Liu, R., Zhou, J., & Liu, M. (2006). A graph-based semi-supervised learning algorithm for web page classification. In Intelligent systems design and applications, 2006. ISDA '06. Sixth international conference (Vol. 2, pp. 856---860).
[56]
Sun, A., Lim, E.-P., & Ng, W.-K. (2002). Web classification using support vector machine. In Proceeding WIDM '02. Proceedings of the 4th international workshop on web information and data management (pp. 96---99).
[57]
Zou, J., Chen, G.-L. & Guo, W.-Z. (2005). Chinese web page classification using noise-tolerant support vector machines. In Natural language processing and knowledge engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE international conference (pp. 785---790).
[58]
Chen, R.-C., & Hsieh, C.-H. (2006). Web page classification based on a support vector machine using a weighted vote schema. Expert Systems with Applications, 31(2), 427---435.
[59]
Xue, W., Bao, H., Huang, W., & Lu, Y. (2006). Web page classification based on SVM. In Intelligent control and automation, 2006. WCICA 2006. The sixth world congress on (Vol. 2, pp. 6111---6114).
  1. An Efficient Multiclass Classifier Using On-Page Positive Personality Features for Web Page Classification for the Next Generation Wireless Communication Networks

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Wireless Personal Communications: An International Journal
        Wireless Personal Communications: An International Journal  Volume 93, Issue 2
        March 2017
        306 pages

        Publisher

        Kluwer Academic Publishers

        United States

        Publication History

        Published: 01 March 2017

        Author Tags

        1. Accuracy
        2. Classifier
        3. Internet of Things
        4. Multiclassifier
        5. Webpage classification

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 09 Nov 2024

        Other Metrics

        Citations

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media