Enhancing Decision Boundary Setting for Binary Text Classification

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11320))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

2486 Accesses
2 Citations

Abstract

Text classification is a task of assigning a set of text documents into predefined classes based on the classifier that learns from training samples; labelled or unlabeled. Binary text classifiers provide a way to separate related documents from a large dataset. However, the existing binary text classifiers are not grounded in reality due to the issue of overfitting. They try to find a clear boundary between relevant and irrelevant objects rather than understand the decision boundary. Normally, the decision boundary cannot be described as a clear boundary because of the numerous uncertainties in text documents. This paper attempts to address this issue by proposing an effective model based on sliding window technique (SW) and Support Vector Machine (SVM) to deal with the uncertain boundary and to improve the effectiveness of binary text classification. This model aims to set the decision boundary by dividing the training documents into three distinct regions (positive, boundary, and negative regions) to ensure the certainty of extracted knowledge to describe relevant information. The model then organizes training samples for the learning task to build a multiple SVMs based classifier. The experimental results using the standard dataset Reuters Corpus Volume 1 (RCV1) and TREC topics for text classification, show that the proposed model significantly outperforms six state-of-the-art baseline models in binary text classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multiple Support Vector Machines for Binary Text Classification Based on Sliding Window Technique

Accurate Text Classification via Maximum Entropy Model

Bayesian Multinomial Naïve Bayes Classifier to Text Classification

References

Jindal, R., Malhotra, R., Jain, A.: Techniques for text classification: literature review and current trends. Webology 12(2), 1–28 (2015)
Google Scholar
Joachims, T.: Transductive inference for text classification using support vector machines. In: ICML 1999, San Francisco, pp. 200–209. ACM (1999)
Google Scholar
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: UAI 1995, Canada, pp. 338–345. ACM (1995)
Google Scholar
Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 163–222. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_6
Chapter Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Article Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Article MathSciNet Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
MATH Google Scholar
Zhang, L., Li, Y., Bijaksana, M. A.: Decreasing uncertainty for improvement of relevancy prediction. In: Proceeding of the Twelfth Australasian Data Mining Conference, AusDM 2014, Brisbane, pp. 157–162 (2014)
Google Scholar
Li, Y., Zhang, L., Yue, X., Yiyu, Y., Raymond, L., Yutong, W.: Enhancing binary classification by modeling uncertain boundary in three-way decisions. IEEE Trans. Knowl. Data Eng. 29(7), 1438–1451 (2017)
Article Google Scholar
Wardaya, P.D.: Support vector machine as a binary classifier for automated object detection in remotely sensed data. In: IOP Conference Series: Earth and Environmental Science, vol. 18, no. 1. IOP Publishing (2014)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683
Chapter Google Scholar
Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)
Article Google Scholar
Shannon, M.: Forensic relative strength scoring: ASCII and entropy scoring. Int. J. Digit. Evid. 2(4), 1–19 (2004)
Google Scholar
Lau, R.Y., Bruza, P.D., Song, D.: Towards a belief-revision-based adaptive and context-sensitive information retrieval system. ACM Trans. Inf. Syst. (TOIS) 26(2), 1–38 (2008)
Article Google Scholar
Bekkerman, R., Gavish, M.: High-precision phrase-based document classification on a modern scale. In: KDD 2011, San Diego, pp. 231–239. ACM (2011)
Google Scholar
Li, Y., Algarni, A., Zhong, N.: Mining positive and negative patterns for relevance feature discovery. In: KDD 2010, pp. 753–762. ACM, New York (2010)
Google Scholar
Fu, Z., Robles-Kelly, A., Zhou, J.: Mixing linear SVMs for nonlinear classification. IEEE Trans. Neural Netw. 21(12), 1963–1975 (2010)
Article Google Scholar
Rodriguez-Lujan, I., Cruz, C.S., Huerta, R.: Hierarchical linear support vector machine. Pattern Recogn. 45(12), 4414–4427 (2012)
Article Google Scholar
Gao, Y., Sun, S.: An empirical evaluation of linear and nonlinear kernels for text classification using support vector machines. In: FSKD 2010, Yantai, pp. 1502–1505. IEEE (2010)
Google Scholar
Lan, M., Tan, C.L., Low, H.B.: Proposing a new term weighting scheme for text categorization. In: AAAI 2006, Boston, pp. 763–768. ACM (2006)
Google Scholar
Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University, Taipei (2003)
Google Scholar
Du, L., Song, Q., Jia, X.: Detecting concept drift: an information entropy based method using an adaptive sliding window. Intell. Data Anal. 18(3), 337–364 (2014)
Article Google Scholar
Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Now Publishers Inc., Breda (2009)
Google Scholar
Ko, Y.J., Seo, J.Y.: Issues and empirical results for improving text classification. J. Comput. Sci. Eng. 5(2), 150–160 (2011)
Article Google Scholar
Hall, G.A.: Sliding window measurement for file type identification. Technical report, ManTech Security and Mission Assurance (2006)
Google Scholar
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
Google Scholar
Joachims, T.: A support vector method for multivariate performance measures. In: ICML 2005, Germany, pp. 377–384. ACM (2005)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

School of EECS, Queensland University of Technology, Brisbane, QLD, Australia
Aisha Rashed Albqmi, Yuefeng Li & Yue Xu
Department of CS, Taif University, Taif, Saudi Arabia
Aisha Rashed Albqmi

Authors

Aisha Rashed Albqmi
View author publications
You can also search for this author in PubMed Google Scholar
Yuefeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yue Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aisha Rashed Albqmi .

Editor information

Editors and Affiliations

University of Canterbury, Christchurch, New Zealand
Tanja Mitrovic
School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
Bing Xue
RMIT University, Melbourne, VIC, Australia
Xiaodong Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Albqmi, A.R., Li, Y., Xu, Y. (2018). Enhancing Decision Boundary Setting for Binary Text Classification. In: Mitrovic, T., Xue, B., Li, X. (eds) AI 2018: Advances in Artificial Intelligence. AI 2018. Lecture Notes in Computer Science(), vol 11320. Springer, Cham. https://doi.org/10.1007/978-3-030-03991-2_72

Download citation

DOI: https://doi.org/10.1007/978-3-030-03991-2_72
Published: 10 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03990-5
Online ISBN: 978-3-030-03991-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Decision Boundary Setting for Binary Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multiple Support Vector Machines for Binary Text Classification Based on Sliding Window Technique

Accurate Text Classification via Maximum Entropy Model

Bayesian Multinomial Naïve Bayes Classifier to Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Enhancing Decision Boundary Setting for Binary Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multiple Support Vector Machines for Binary Text Classification Based on Sliding Window Technique

Accurate Text Classification via Maximum Entropy Model

Bayesian Multinomial Naïve Bayes Classifier to Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation