Book Genre Classification Based on Reviews of Portuguese-Language Literature

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13208))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

818 Accesses

Abstract

Automatic book genre classification is a hard task as it requires the whole book’s content or a high-quality summary, which is challenging to write automatically. On the other hand, online reviews are an accessible resource for readers to evaluate a book or even get a general sense about it, including its genre. As the amount of book reviews is always increasing, using such information to genre classification needs a robust solution to deal with high volumes of data. In such a context, we introduce a model for automatically classifying book genres by analyzing online text reviews. We build a dataset of compiled texts from online book reviews. Then, we use multiple machine learning algorithms to categorize a book into a specific genre. Such a process enables to compare algorithms and detect the best classifiers. Hence, the most efficient machine learning algorithm completed the task with an accuracy of 96%; i.e., the proposed model is convenient for various information retrieval systems due to its high certainty and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Improved Dictionary Based Genre Classification Based on Title and Abstract of E-book Using Machine Learning Algorithms

Identifying Genre of a Book from Its Summary Using Machine Learning Approach

Book Genre Classification System Through Supervised Learning Technique

Notes

1.
Domínio Público: https://www.dominiopublico.gov.br.
2.
Projecto Adamastor: https://projectoadamastor.org.
3.
BLPL: https://www.literaturabrasileira.ufsc.br.
4.
On Goodreads, a bookshelf is a list where one can add or remove books to facilitate reading, similar to a real-life bookshelf where one keeps books.
5.
https://pypi.org/project/translate.
6.
TF-IDF is a numerical statistic that reflects how important a word is to a document within a collection or corpus. Its value increases proportionally to the number of times a word appears in the document and is offset by the number of documents that contain it (source: wikipedia).
7.
Most classifiers are described in Data Mining textbooks such as [6, 19].
8.
As implemented by sklearn.dummy.DummyClassifier using prior as strategy.
9.
Scikit Learn: https://scikit-learn.org/stable.
10.
Standard metrics for evaluating models, common in Information Retrieval. Precision = truePositive/predictedPositive. Recall = truePositive/totalActualPositive. Accuracy = $(truePositive+trueNegative)/(Positive+Negative)$. F1 = $2\times ((Precision\times Recall)/(Precision+Recall))$.

References

Akalp, H., Cigdem, E.F., Yilmaz, S., Bölücü, N., Can, B.: Language representation models for music genre classification using lyrics. In: ISEEIE - International Symposium on Electrical, Electronics and Information Engineering, pp. 408–414. ACM, Seoul, Republic of Korea (2021). https://doi.org/10.1145/3459104.3459171
Altszyler, E., Sigman, M., Fernández Slezak, D.: Comparative study of LSA vs Word2Vec embeddings in small corpora: a case study in dreams database, October 2016
Google Scholar
Catharin, L.G., Feltrim, V.D.: Finding opinion targets in news comments and book reviews. In: Villavicencio, A., et al. (eds.) International Conference on Computational Processing of the Portuguese Language (PROPOR). LNCS, vol. 11122, pp. 375–384. Springer, Canela, Brazil (2018). https://doi.org/10.1007/978-3-319-99722-3_38
Dumais, S.T., Furnas, G.W., Landauer, T.K., Deerwester, S.C., Harshman, R.A.: Using latent semantic analysis to improve access to textual information. In: SIGCHI Conference on Human Factors in Computing Systems, pp. 281–285. ACM, Washington, D.C. (1988). https://doi.org/10.1145/57167.57214
Freitas, C., Motta, E., Milidiú, R., César, J.: Sparkling vampire... lol! annotating opinions in a book review corpus. In: Aluisio, S.M., Tagnin, S.E. (eds.) New Language Technologies and Linguistic Research: A Two-Way Road, pp. 128–146. Cambridge Scholars Publishing, Newcastle upon Tyne (2014)
Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Techniques, 3rd edn. Morgan Kauffman Publishers, Waltham (2012)
Google Scholar
Hartmann, N., Cucatto, L., Brants, D., Aluísio, S.: Automatic Classification of the Complexity of Nonfiction Texts in Portuguese for Early School Years. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS (LNAI), vol. 9727, pp. 12–24. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41552-9_2
Chapter Google Scholar
Jelodar, H., et al.: A NLP framework based on meaningful latent-topic detection and sentiment analysis via fuzzy lattice reasoning on youtube comments. Multim. Tools Appl. 80(3), 4155–4181 (2020). https://doi.org/10.1007/s11042-020-09755-z
Lozano, L.C., Planells, S.C.: Best Books Ever Dataset. Zenodo, November 2020. https://doi.org/10.5281/zenodo.4265096
Omar, A.: Classificação de gêneros literários: uma sinergia metodológica de modelagem computacional e semântica lexical. Texto Livre: Linguagem e Tecnologia 13, 83–101 (2020). 10.35699/1983-3652.2020.24396
Google Scholar
Ozsarfati, E., Sahin, E., Saul, C.J., Yilmaz, A.: Book genre classification based on titles with comparative machine learning algorithms. In: IEEE International Conference on Computer and Communication Systems (ICCCS), pp. 14–20 (2019). https://doi.org/10.1109/CCOMS.2019.8821643
Rinaldi, A.M., Russo, C., Tommasino, C.: Web document categorization using knowledge graph and semantic textual topic detection. In: Computational Science and Its Applications (ICCSA). Springer, Cham (2021). https://doi.org/10.1007/978-3-030-24311-1
Silva, M., Scofield, C., Moro, M.: PPORTAL: public domain Portuguese-language literature Dataset. In: Anais do III Dataset Showcase Workshop, Brazilian Symposium on Databases, pp. 77–88. SBC, Rio de Janeiro, Brazil (2021). https://doi.org/10.5753/dsw.2021.17416
Silva, M.O., Scofield, C., Moro, M.M.: PPORTAL: Public domain Portuguese-language literature Dataset, August 2021. https://doi.org/10.5281/zenodo.5178063
Sobkowicz, A., Kozłowski, M., Buczkowski, P.: Reading book by the cover - book genre detection using short descriptions. In: Gruca, A., et al. (eds.) Man-Machine Interactions 5. ICMMI 2017. Advances in Intelligent Systems and Computing, vol. 659, pp. 439–448. Springer (2018)
Google Scholar
Veiga, A., Candeias, S., Celorico, D., Proença, J., Perdigão, F.: Towards automatic classification of speech styles. In: de Medeiros Caseli, H., et al. (eds.) International Conference on Computational Processing of the Portuguese Language (PROPOR). LNCS, vol. 7243, pp. 421–426. Springer, Coimbra, Portugal (2012). https://doi.org/10.1007/978-3-642-28885-2_47
Xu, Z., Liu, L., Song, W., Du, C.: Text genre classification research. In: International Conference on Computer, Information and Telecommunication Systems (CITS), pp. 175–178 (2017). https://doi.org/10.1109/CITS.2017.8035329
Ying, T.C., Doraisamy, S., Abdullah, L.N.: Genre and mood classification using lyric features. In: International Conference on Information Retrieval & Knowledge Management, pp. 260–263. IEEE, Kuala Lumpur, Malaysia (2012). https://doi.org/10.1109/InfRKM.2012.6204985
Zaki, M.J., Meira Jr, W.: Data Mining and Machine Learning: Fundamental Concepts and Algorithms. 2nd edn. Cambridge University Press, London (2020)
Google Scholar

Download references

Acknowledgments

This work was funded by CNPq and FAPEMIG, Brazil.

Author information

Authors and Affiliations

Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Clarisse Scofield, Mariana O. Silva, Luiza de Melo-Gomes & Mirella M. Moro

Authors

Clarisse Scofield
View author publications
You can also search for this author in PubMed Google Scholar
Mariana O. Silva
View author publications
You can also search for this author in PubMed Google Scholar
Luiza de Melo-Gomes
View author publications
You can also search for this author in PubMed Google Scholar
Mirella M. Moro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mirella M. Moro .

Editor information

Editors and Affiliations

Universidade de Fortaleza, Fortaleza, Brazil
Vládia Pinheiro
CiTIUS - Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Pablo Gamallo
Universidade Nova de Lisboa, Lisbon, Portugal
Raquel Amaro
University of Sheffield, Sheffield, UK
Carolina Scarton
INESC-ID, Lisbon, Portugal
Fernando Batista
Federal University of São Carlos, São Carlos, Brazil
Diego Silva
University of Lisbon, Lisbon, Portugal
Catarina Magro
Sentimonitor, Porto Alegre, Brazil
Hugo Pinto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Scofield, C., Silva, M.O., de Melo-Gomes, L., Moro, M.M. (2022). Book Genre Classification Based on Reviews of Portuguese-Language Literature. In: Pinheiro, V., et al. Computational Processing of the Portuguese Language. PROPOR 2022. Lecture Notes in Computer Science(), vol 13208. Springer, Cham. https://doi.org/10.1007/978-3-030-98305-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-98305-5_18
Published: 16 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98304-8
Online ISBN: 978-3-030-98305-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Book Genre Classification Based on Reviews of Portuguese-Language Literature

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Improved Dictionary Based Genre Classification Based on Title and Abstract of E-book Using Machine Learning Algorithms

Identifying Genre of a Book from Its Summary Using Machine Learning Approach

Book Genre Classification System Through Supervised Learning Technique

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Book Genre Classification Based on Reviews of Portuguese-Language Literature

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Improved Dictionary Based Genre Classification Based on Title and Abstract of E-book Using Machine Learning Algorithms

Identifying Genre of a Book from Its Summary Using Machine Learning Approach

Book Genre Classification System Through Supervised Learning Technique

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation