Nothing Special   »   [go: up one dir, main page]

Skip to main content

Systematic Literature Review: Corpus Linguistics in Indonesia

  • Conference paper
  • First Online:
HCI International 2022 Posters (HCII 2022)

Abstract

Native languages in many countries are currently in the risk of extinction, which happens in Indonesia. The lack of awareness of the younger generation and the globalization effects are mentioned as factors that play a role in the extinction. The fewer local language users, the more limited linguistic corpus will be. The languages in Indonesia itself also have the same problem mentioned in many literature, lacking of a linguistic corpus. This lack of a linguistic corpus causes difficulties in research, especially those which are based on artificial intelligence or natural language processing. This study conducted a systematic literature review on the corpus linguistics in Indonesia. The main purpose is to obtain information from various scientific articles ranging from the development to the use of the corpus linguistics in Indonesia. The results showed that the corpus mostly used primary sources such as social media, Wikipedia or previous similar research. There is no single article that discusses the development of the corpus of languages in Indonesia. Thus, research for the development of a corpus of languages in Indonesia is still wide open.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://garuda.kemdikbud.go.id/.

  2. 2.

    https://ieeexplore.ieee.org/Xplore/home.jsp.

  3. 3.

    https://dl.acm.org.

  4. 4.

    https://link.springer.com.

  5. 5.

    https://corpora.uni-leipzig.de/en?corpusId=ind_mixed_2013.

  6. 6.

    https://github.com/famrashel/idn-treebank.

References

  1. Alfina, I., Manurung, R., Fanany, M.I.: DBpedia entities expansion in automatically building dataset for Indonesian NER. In: 2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS), pp. 335–340. IEEE (2016)

    Google Scholar 

  2. Andriansyah, M., Purwanto, I., Subali, M., Sukowati, A.I., Samos, M., Akbar, A.: Developing Indonesian corpus of pornography using simple NLP-text mining (NTM) approach to support government anti-pornography program. In: 2017 Second International Conference on Informatics and Computing (ICIC), pp. 1–4. IEEE (2017)

    Google Scholar 

  3. Darmoko: Revitalisasi teks-teks kearifan lokal kemaritiman untuk membangun kehidupan bermasyarakat, berbangsa, dan bernegara. PUSTAKA: Jurnal Ilmu-Ilmu Budaya (2019)

    Google Scholar 

  4. Denistia, K., Baayen, R.H.: The Indonesian prefixes pe- and pen-: a study in productivity and allomorphy. Morphology 29(3), 385–407 (2019)

    Google Scholar 

  5. Dewi, N.P., Ubaidi, U.: Pos tagging bahasa madura dengan menggunakan algoritma brill tagger. Jurnal Teknologi Informasi dan Ilmu Komputer 7(6), 1121–1128 (2020)

    Article  Google Scholar 

  6. Dharma, A.: Pembinaan dan pengembangan bahasa daerah. In: Language Maintenance and Shift, pp. 8–11 (2011)

    Google Scholar 

  7. Dharmawan, E., Sujaini, H., Muhardi, H.: Perbandingan nilai akurasi terhadap penggunaan part of speech set pada mesin penerjemah statistik. JUSTIN (Jurnal Sistem dan Teknologi Informasi) 8(3), 250–256 (2020)

    Google Scholar 

  8. Embram, E.R.: Rancangan awal pengembangan bahan ajar bahasa sentani. Kibas Cenderawasih 15(1), 117–132 (2019)

    Google Scholar 

  9. Firmansyah, I., Adikara, P.P., Adinugroho, S.: Klasifikasi kelas kata (part-of-speech tagging) untuk bahasa madura menggunakan algoritme viterbi. Jurnal Teknologi Informasi dan Ilmu Komputer 8(5), 1039–1048 (2021)

    Article  Google Scholar 

  10. Gunawan, D., Amalia, A., Charisma, I.: Automatic extraction of multiword expression candidates for Indonesian language. In: 2016 6th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), pp. 304–309. IEEE (2016)

    Google Scholar 

  11. Ibrahim, G.A., Mayani, L.A.: Perencanaan bahasa di indonesia berbasis triglosia. Linguistik Indonesia 36(2), 107–116 (2018)

    Article  Google Scholar 

  12. Kemdikbud: Bahasa dan peta bahasa di indonesia (2019).https://petabahasa.kemdikbud.go.id/. Accessed 21 Dec 2021

  13. Kitchenham, B.: Procedure for undertaking systematic reviews. Computer Science Department, Keele University (TRISE-0401) and National ICT Australia Ltd (0400011T. 1). Joint Technical report (2004)

    Google Scholar 

  14. Liberati, A., et al.: The Prisma statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J. Clin. Epidemiol. 62(10), e1–e34 (2009)

    Article  Google Scholar 

  15. Lin, N., Chen, B., Lin, X., Wattanachote, K., Jiang, S.: A framework for Indonesian grammar error correction. Trans. Asian Low-Resour. Lang. Inf. Process. 20(4), 1–12 (2021)

    Article  Google Scholar 

  16. Liu, B., et al.: Sentiment analysis and subjectivity. Handbook Nat. Lang. Process. 2(2010), 627–666 (2010)

    Google Scholar 

  17. Mutiara, A.B., Wibowo, E.P., Santosa, P.I., et al.: Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation. J. Big Data 8(1), 1–16 (2021)

    Google Scholar 

  18. Nugraha, D.S.: Makna-makna gramatikal konstruksi verba denominatif dalam bahasa indonesia. Bahasa dan Seni: Jurnal Bahasa, Sastra, Seni, dan Pengajarannya 49(2), 224–239 (2021)

    Article  Google Scholar 

  19. Putra, K.A.: Youth, technology and indigenous language revitalization in Indonesia. Ph.D. thesis, The University of Arizona (2018)

    Google Scholar 

  20. Raharjo, S., Wardoyo, R., Putra, A.E.: Desain korpus indonesia (tinjauan informatika). In: Konferensi Lingustik Tahunan Atma Jaya 13 (KOLITA 2013), pp. 351–355 (2015)

    Google Scholar 

  21. Rosidy, A.S., Akhriza, T.M., Husni, M.: Kombinasi metode ner-ocr untuk meningkatkan efisiensi pengambilan informasi di poster berbahasa indonesia. Jurnal Teknologi dan Sistem Komputer 8(4), 263–269 (2020)

    Article  Google Scholar 

  22. Setiawan, R., Budiharto, W., Kartowisastro, I.H., Prabowo, H.: Finding model through latent semantic approach to reveal the topic of discussion in discussion forum. Educ. Inf. Technol. 25(1), 31–50 (2019). https://doi.org/10.1007/s10639-019-09901-7

    Article  Google Scholar 

  23. Sholikah, R.W., Arifin, A.Z., Purwitasari, D., Fatichah, C.: Co-occurrence technique and dictionary based method for Indonesian thesaurus construction. In: 2017 5th International Conference on Information and Communication Technology (ICoIC7), pp. 1–6. IEEE (2017)

    Google Scholar 

  24. Simons, G.F.: Two centuries of spreading language loss. Proc. Linguist. Soc. Am. 4(1), 1–27 (2019)

    MathSciNet  Google Scholar 

  25. Strochlic, N.: The race to save the world’s disappearing languages (2018). https://news.nationalgeographic.com/2018/04/saving-dying-disappearing-languages-wikitongues-culture. Accessed 10 Dec 2021

  26. Suhardijanto, T., Dinakaramani, A.: Korpus beranotasi: Ke arah pengembangan korpus bahasa-bahasa di Indonesia. Prosiding Kongres Bahasa Indonesia. Prosiding Kongres Bahasa Indonesia, pp. 339–355 (2019)

    Google Scholar 

  27. Sutami, D.P.H.: Pendefinisian objek dan pelengkap dalam kamus besar bahasa indonesia sebagai penunjang literasi. Linguistik Indonesia, p. 179 (2020)

    Google Scholar 

  28. Tondo, H.: Kepunahan bahasa-bahasa daerah: faktor penyebab dan implikasi etnolinguistis. Jurnal masyarakat dan budaya 11(2), 277–296 (2009)

    Google Scholar 

  29. Wicaksono, A.F., Vania, C., Distiawan, B., Adriani, M.: Automatically building a corpus for sentiment analysis on Indonesian tweets. In: Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing, pp. 185–194 (2014)

    Google Scholar 

  30. Wijana, D.P.: Pemertahanan dan pengembangan bahasa indonesia (Indonesian language maintenance and development). Widyaparwa 46(1), 91–98 (2018)

    Article  Google Scholar 

  31. Yuki, L.K.: Study of the meaning of Lelaki, Bujang and Bujangan words in the utilisation of Indonesian corpus. Jurnal Sasindo UNPAM 9(1), 81–93 (2021)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suwanto Raharjo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Raharjo, S., Utami, E., Yusa, M., Sutanta, E. (2022). Systematic Literature Review: Corpus Linguistics in Indonesia. In: Stephanidis, C., Antona, M., Ntoa, S. (eds) HCI International 2022 Posters. HCII 2022. Communications in Computer and Information Science, vol 1580. Springer, Cham. https://doi.org/10.1007/978-3-031-06417-3_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06417-3_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06416-6

  • Online ISBN: 978-3-031-06417-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics