Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/355214.355225acmconferencesArticle/Chapter ViewAbstractPublication PagesiralConference Proceedingsconference-collections
Article
Free access

Character cluster based Thai information retrieval

Published: 01 November 2000 Publication History

Abstract

Some languages including Thai, Japanese and Chinese do not have explicit word boundary. This causes the problem of word boundary ambiguity that results in decreasing the accuracy of information retrieval. This paper proposes a new technique so-called character clustering to reduce the ambiguity of word boundary in Thai documents and hence improve searching efficiency. To investigate the efficiency, a set of experiments using Thai newspapers is conducted in both non-indexing and indexing searching approaches. The experimental results show our method outperform the traditional methods in both non-indexing and indexing approaches in all measures.

References

[1]
Frakes, W. B. and Baeza-Yates, R. Eds. Information Retrieval: Data Structures & Algorithms. Prentice Hall, 1992.
[2]
Boyer, R. and Moore, S. A fast string searching algorithm. CACM, 1977, 20, pp. 762-772.
[3]
Harman, D., Fox, E., Baeza-Yates, R. and Lee, W. Inverted Files. In Information Retrieval: Data Structures & Algorithms, Eds. Frakes W.B. and Baeza-Yates R. Prentice Hall, 1992, pp. 28-43.
[4]
Gonnet, G. Pat 3.1: An Efficient Text Searching System. User's Manual. UW Centre for the New OED, University of Waterloo, 1987.
[5]
Manber, U. and Myers, G. Suffix Arrays: A New Method for On-line String Searches. In Proceedings of the first ACM-SIAM Symposium on Discrete Algorithms. 1990, pp. 319-327.
[6]
Kanlayanawat, W. and Prasitjutrakul, S. Automatic Indexing for Thai Text with Unknown Words using Trie Structure. In Proceedings of the Natural Language Processing Pacific Rim Symposium (NLPRS'97), 1997, pp. 115-120.
[7]
Mitrapiyanuruk, P., Puvanich, C., Meknavin, S. and Boriboon, M. A. Development of Full-Text Search Engine for Large Scale Thai Text Database. In the 1999 National Science and Technology Development Agency (NSTDA) Annual Meeting. in Thai, 1999, pp. 247-257.
[8]
Jun'ichi, A. Quick Digital Search for Double Array Trie. Bit, 21/6, March 1989, pp. 776-784.
[9]
Knuth, D., Morris, J., and Pratt, V. Fast Pattern Matching in Strings. In Journal of SlAM on computing. 1977, 6, pp.323-350.
[10]
Kawtrakul, A., Thumkanon, C. Poovorawan, Y., Varasrai P. and Suktarachan, M. Automatic Thai Unknown Word Recognition. In Proceedings of the Natural Language Processing Pacific Rim Symposium (NLPRS'97). 1997, pp. 341-346.
[11]
Mekanavin, S., Charoenpornsawat, P. and Kijsirikul, B. Feature-based Thai Word Segmentation. In Proceedings of the Natural Language Processing Pacific Rim Symposium (NLPRS'97). 1997, pp. 41-46.

Cited By

View all
  • (2023)Character-Based Thai Word Segmentation with Multiple AttentionsJournal of Natural Language Processing10.5715/jnlp.30.37230:2(372-400)Online publication date: 2023
  • (2022)Type Linking for Query Understanding and Semantic SearchProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539067(3931-3940)Online publication date: 14-Aug-2022
  • (2022)Thai Named Entity Recognition Using BiLSTM-CNN-CRF Enhanced by TCCIEEE Access10.1109/ACCESS.2022.317520110(53043-53052)Online publication date: 2022
  • Show More Cited By
  1. Character cluster based Thai information retrieval

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages
    November 2000
    220 pages
    ISBN:1581133006
    DOI:10.1145/355214
    • Chairmen:
    • Kam-Fai Wong,
    • Dik L. Lee,
    • Jong-Hyeok Lee
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 November 2000

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Thai document
    2. character cluster
    3. indexing and non-indexing information retrieval

    Qualifiers

    • Article

    Conference

    IRAL00
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)107
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 24 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Character-Based Thai Word Segmentation with Multiple AttentionsJournal of Natural Language Processing10.5715/jnlp.30.37230:2(372-400)Online publication date: 2023
    • (2022)Type Linking for Query Understanding and Semantic SearchProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539067(3931-3940)Online publication date: 14-Aug-2022
    • (2022)Thai Named Entity Recognition Using BiLSTM-CNN-CRF Enhanced by TCCIEEE Access10.1109/ACCESS.2022.317520110(53043-53052)Online publication date: 2022
    • (2021)Text generation by probabilistic suffix tree language model2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)10.1109/iSAI-NLP54397.2021.9678167(1-4)Online publication date: 21-Dec-2021
    • (2021)Hybrid Deep Learning Models for Thai Sentiment AnalysisCognitive Computation10.1007/s12559-020-09770-014:1(167-193)Online publication date: 4-Mar-2021
    • (2019)Thai Keyword Extraction using TextRank Algorithm2019 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)10.1109/iSAI-NLP48611.2019.9045523(1-6)Online publication date: Oct-2019
    • (2018)Improving Thai Word and Sentence Segmentation Using Linguistic KnowledgeIEICE Transactions on Information and Systems10.1587/transinf.2018EDP7016E101.D:12(3218-3225)Online publication date: 1-Dec-2018
    • (2018)Multi-Candidate Word Segmentation using Bi-directional LSTM Neural Networks2018 International Conference on Embedded Systems and Intelligent Technology & International Conference on Information and Communication Technology for Embedded Systems (ICESIT-ICICTES)10.1109/ICESIT-ICICTES.2018.8442053(1-6)Online publication date: May-2018
    • (2017)Burmese word segmentation with Character Clustering and CRFs2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE)10.1109/JCSSE.2017.8025934(1-6)Online publication date: Jul-2017
    • (2017)TLex+: A Hybrid Method Using Conditional Random Fields and Dictionaries for Thai Word SegmentationRecent Advances and Future Prospects in Knowledge, Information and Creativity Support Systems10.1007/978-3-319-70019-9_10(112-125)Online publication date: 2-Dec-2017
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media