Abstract
Native languages in many countries are currently in the risk of extinction, which happens in Indonesia. The lack of awareness of the younger generation and the globalization effects are mentioned as factors that play a role in the extinction. The fewer local language users, the more limited linguistic corpus will be. The languages in Indonesia itself also have the same problem mentioned in many literature, lacking of a linguistic corpus. This lack of a linguistic corpus causes difficulties in research, especially those which are based on artificial intelligence or natural language processing. This study conducted a systematic literature review on the corpus linguistics in Indonesia. The main purpose is to obtain information from various scientific articles ranging from the development to the use of the corpus linguistics in Indonesia. The results showed that the corpus mostly used primary sources such as social media, Wikipedia or previous similar research. There is no single article that discusses the development of the corpus of languages in Indonesia. Thus, research for the development of a corpus of languages in Indonesia is still wide open.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alfina, I., Manurung, R., Fanany, M.I.: DBpedia entities expansion in automatically building dataset for Indonesian NER. In: 2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS), pp. 335–340. IEEE (2016)
Andriansyah, M., Purwanto, I., Subali, M., Sukowati, A.I., Samos, M., Akbar, A.: Developing Indonesian corpus of pornography using simple NLP-text mining (NTM) approach to support government anti-pornography program. In: 2017 Second International Conference on Informatics and Computing (ICIC), pp. 1–4. IEEE (2017)
Darmoko: Revitalisasi teks-teks kearifan lokal kemaritiman untuk membangun kehidupan bermasyarakat, berbangsa, dan bernegara. PUSTAKA: Jurnal Ilmu-Ilmu Budaya (2019)
Denistia, K., Baayen, R.H.: The Indonesian prefixes pe- and pen-: a study in productivity and allomorphy. Morphology 29(3), 385–407 (2019)
Dewi, N.P., Ubaidi, U.: Pos tagging bahasa madura dengan menggunakan algoritma brill tagger. Jurnal Teknologi Informasi dan Ilmu Komputer 7(6), 1121–1128 (2020)
Dharma, A.: Pembinaan dan pengembangan bahasa daerah. In: Language Maintenance and Shift, pp. 8–11 (2011)
Dharmawan, E., Sujaini, H., Muhardi, H.: Perbandingan nilai akurasi terhadap penggunaan part of speech set pada mesin penerjemah statistik. JUSTIN (Jurnal Sistem dan Teknologi Informasi) 8(3), 250–256 (2020)
Embram, E.R.: Rancangan awal pengembangan bahan ajar bahasa sentani. Kibas Cenderawasih 15(1), 117–132 (2019)
Firmansyah, I., Adikara, P.P., Adinugroho, S.: Klasifikasi kelas kata (part-of-speech tagging) untuk bahasa madura menggunakan algoritme viterbi. Jurnal Teknologi Informasi dan Ilmu Komputer 8(5), 1039–1048 (2021)
Gunawan, D., Amalia, A., Charisma, I.: Automatic extraction of multiword expression candidates for Indonesian language. In: 2016 6th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), pp. 304–309. IEEE (2016)
Ibrahim, G.A., Mayani, L.A.: Perencanaan bahasa di indonesia berbasis triglosia. Linguistik Indonesia 36(2), 107–116 (2018)
Kemdikbud: Bahasa dan peta bahasa di indonesia (2019).https://petabahasa.kemdikbud.go.id/. Accessed 21 Dec 2021
Kitchenham, B.: Procedure for undertaking systematic reviews. Computer Science Department, Keele University (TRISE-0401) and National ICT Australia Ltd (0400011T. 1). Joint Technical report (2004)
Liberati, A., et al.: The Prisma statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J. Clin. Epidemiol. 62(10), e1–e34 (2009)
Lin, N., Chen, B., Lin, X., Wattanachote, K., Jiang, S.: A framework for Indonesian grammar error correction. Trans. Asian Low-Resour. Lang. Inf. Process. 20(4), 1–12 (2021)
Liu, B., et al.: Sentiment analysis and subjectivity. Handbook Nat. Lang. Process. 2(2010), 627–666 (2010)
Mutiara, A.B., Wibowo, E.P., Santosa, P.I., et al.: Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation. J. Big Data 8(1), 1–16 (2021)
Nugraha, D.S.: Makna-makna gramatikal konstruksi verba denominatif dalam bahasa indonesia. Bahasa dan Seni: Jurnal Bahasa, Sastra, Seni, dan Pengajarannya 49(2), 224–239 (2021)
Putra, K.A.: Youth, technology and indigenous language revitalization in Indonesia. Ph.D. thesis, The University of Arizona (2018)
Raharjo, S., Wardoyo, R., Putra, A.E.: Desain korpus indonesia (tinjauan informatika). In: Konferensi Lingustik Tahunan Atma Jaya 13 (KOLITA 2013), pp. 351–355 (2015)
Rosidy, A.S., Akhriza, T.M., Husni, M.: Kombinasi metode ner-ocr untuk meningkatkan efisiensi pengambilan informasi di poster berbahasa indonesia. Jurnal Teknologi dan Sistem Komputer 8(4), 263–269 (2020)
Setiawan, R., Budiharto, W., Kartowisastro, I.H., Prabowo, H.: Finding model through latent semantic approach to reveal the topic of discussion in discussion forum. Educ. Inf. Technol. 25(1), 31–50 (2019). https://doi.org/10.1007/s10639-019-09901-7
Sholikah, R.W., Arifin, A.Z., Purwitasari, D., Fatichah, C.: Co-occurrence technique and dictionary based method for Indonesian thesaurus construction. In: 2017 5th International Conference on Information and Communication Technology (ICoIC7), pp. 1–6. IEEE (2017)
Simons, G.F.: Two centuries of spreading language loss. Proc. Linguist. Soc. Am. 4(1), 1–27 (2019)
Strochlic, N.: The race to save the world’s disappearing languages (2018). https://news.nationalgeographic.com/2018/04/saving-dying-disappearing-languages-wikitongues-culture. Accessed 10 Dec 2021
Suhardijanto, T., Dinakaramani, A.: Korpus beranotasi: Ke arah pengembangan korpus bahasa-bahasa di Indonesia. Prosiding Kongres Bahasa Indonesia. Prosiding Kongres Bahasa Indonesia, pp. 339–355 (2019)
Sutami, D.P.H.: Pendefinisian objek dan pelengkap dalam kamus besar bahasa indonesia sebagai penunjang literasi. Linguistik Indonesia, p. 179 (2020)
Tondo, H.: Kepunahan bahasa-bahasa daerah: faktor penyebab dan implikasi etnolinguistis. Jurnal masyarakat dan budaya 11(2), 277–296 (2009)
Wicaksono, A.F., Vania, C., Distiawan, B., Adriani, M.: Automatically building a corpus for sentiment analysis on Indonesian tweets. In: Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing, pp. 185–194 (2014)
Wijana, D.P.: Pemertahanan dan pengembangan bahasa indonesia (Indonesian language maintenance and development). Widyaparwa 46(1), 91–98 (2018)
Yuki, L.K.: Study of the meaning of Lelaki, Bujang and Bujangan words in the utilisation of Indonesian corpus. Jurnal Sasindo UNPAM 9(1), 81–93 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Raharjo, S., Utami, E., Yusa, M., Sutanta, E. (2022). Systematic Literature Review: Corpus Linguistics in Indonesia. In: Stephanidis, C., Antona, M., Ntoa, S. (eds) HCI International 2022 Posters. HCII 2022. Communications in Computer and Information Science, vol 1580. Springer, Cham. https://doi.org/10.1007/978-3-031-06417-3_50
Download citation
DOI: https://doi.org/10.1007/978-3-031-06417-3_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06416-6
Online ISBN: 978-3-031-06417-3
eBook Packages: Computer ScienceComputer Science (R0)