Classification of Short Scientific Texts

?

Classification of Short Scientific Texts

Scientific and Technical Information Processing. 2023. Vol. 50. No. 3. P. 176–183.

I. K. Kusakin, Fedorets O. V., A. Y. Romanov

This paper discusses modern approaches to natural language processing and the application of machine learning models to the task of classifying short scientific texts in Russian. This study is devoted to the analysis of methods for vectorization of textual information, selection of a model for scientific paper clas- sification, and training of linguistic model BERT on the domain of scientific texts. This paper presents the results of experiments to train scientific article classification models at the first and second levels of the Russian State Rubricator of Scientific and Technical Information (SRSTI).

Research target: Computer Science

Исследование методов машинного обучения для классификации научных текстов на русском языке

Кусакин И. К., Федорец О. В., Romanov A., Научно-техническая информация. Серия 2: Информационные процессы и системы 2022 Т. 12 С. 6–9

This paper discusses modern approaches to natural language processing and appliance of artificial intelligence technologies in the task of classifying scientific texts in Russian. The report contains an analysis of implementations of text vectorization methods, a description of experiments with training various classifier models: from classical machine learning algorithms to neural network transformer architectures. ...

Added: January 31, 2023

Использование BERT для классификации коротких научных текстов на русском языке

Кусакин И. К., Цурупа А. М., Алмакаев А. В. et al., В кн.: НТИ-2022. Научная информация в современном мире: глобальные вызовы и национальные приоритеты : материалы 10-ой научной конференции с международным участием, посвященной 70-летию ВИНИТИ РАН, Москва, 25–26 октября 2022 года. М.: ВИНИТИ РАН, 2022. С. 103–109.

This work is devoted to the study of approaches for training BERT-based classifiers of scientific articles to implement the application with the adoption of the best models for use in the infrastructure of the VINITI RAS. For this purpose, the BERT linguistic model was trained on a specialized corpus of scientific texts for subsequent use ...

Added: January 31, 2023

Artificial Intelligence. RCAI 2021. Lecture Notes in Computer Science

Springer, 2021.

This book constitutes the proceedings of the 19th Russian Conference on Artificial Intelligence, RCAI 2021, held in Moscow, Russia, in October 2021. The 19 full papers and 7 short papers presented in this volume were carefully reviewed and selected from 80 submissions. The conference deals with a wide range of topics, categorized into the following topical ...

Added: October 28, 2021

Comparative analysis of classification methods for text in UDC code generation problem for scientific articles

Lomotin K. E., Kozlova E. S., Romanov A., , in: Information Innovative Technologies: Materials of the International scientific–рractical conference. M.: Association of graduates and employees of AFEA named after prof. Zhukovsky, 2017. P. 359–363.

The research is devoted to studying of applicability of most relevant modern classification methods to the issue of automatic universal decimal classificator code generation for arbitrary scientific article. The next methods are considered as classifiers: artificial neural network, logistic regression, naive Bayesian classifier and metrical ...

Added: July 30, 2017

Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

Berlin: Association for Computational Linguistics, 2016.

The 2016 Conference on Computational Natural Language Learning is the twentieth in the series of annual meetings organized by SIGNLL, the ACL special interest group on natural language learning. CoNLL 2016 will be held on August 11-12, 2016, and is co-located with the 54th annual meeting of the Association for Computational Linguistics (ACL) in Berlin, ...

Added: November 12, 2016

Analysis of Images, Social Networks and Texts. 5th International Conference, AIST 2016, Yekaterinburg, Russia, April 7-9, 2016, Revised Selected Papers. Communications in Computer and Information Science

Switzerland: Springer, 2017.

This book constitutes the proceedings of the 5th International Conference on Analysis of Images, Social Networks and Texts, AIST 2016, held in Yekaterinburg, Russia, in April 2016. The 23 full papers, 7 short papers, and 3 industrial papers were carefully reviewed and selected from 142 submissions. The papers are organized in topical sections on machine ...

Added: October 19, 2016

Современные проблемы и тенденции компьютерной лингвистики

Toldova S., Lyashevskaya O., Вопросы языкознания 2014 № 1 С. 120–145

This paper is an overview of the current issues and tendencies in Computational linguistics. The overview is based on the materials of the conference on computational linguistics COLING’2012. The modern approaches to the traditional NLP domains such as pos-tagging, syntactic parsing, machine translation are discussed. The highlights of automated information extraction, such as fact extraction, ...

Added: October 15, 2013

9th Russian Summer School in Information Retrieval (RuSSIR 2015)

Braslavski P. undefined., Markov I., Pardalos P. M. et al., ACM SIGIR Forum 2016 Vol. 49 No. 2 P. 72–79

This paper provides the reader with a report on 9th Russian Summer School in Information Retrieval (RuSSIR 2015). ...

Added: February 27, 2017

Machine learning approach for scientific and technical expertise

A. V. Belov, E. A. Egorova, Bulletin D. Serikbayev East Kazakhstan Technical University 2023 No. 4 P. 92–102

When conducting scientific and technical expertise, it is necessary to analyze the texts of reports on scientific research work. The analysis is carried out in order to determine whether the research being conducted belongs to the class of scientific research and development work in the field of IT. This article discusses the tasks of binary ...

Added: March 9, 2024

Breaking Sticks and Ambiguities with Adaptive Skip-gram

Bartunov S., Кондрашкин Д. А., Osokin A. et al., / Series arXiv:1502.07257 "Computation and language". 2015.

Recently proposed Skip-gram model is a powerful method for learning high-dimensional word representations that capture rich semantic relationships between words. However, Skip-gram as well as most prior work on learning word representations does not take into account word ambiguity and maintain only single representation per word. Although a number of Skip-gram modifications were proposed to ...

Added: November 5, 2015

Применение методов машинного обучения для решения задачи автоматической рубрикации статей по УДК

Romanov A., Ломотин К. Е., Козлова Е. С., Информационные технологии 2017 Т. 23 № 6 С. 418–423

The paper deals with the applicability of modern machine learning methods to the problem of automatic generation of UDC for scientific articles. As the classifiers, such models as artificial neural networks, logistic regression and boosting are considered. Graph algorithms and a prototype software module to generate UDC are designed. ...

Added: July 30, 2017

Прогнозирование тепловых режимов оборудования космического аппарата

Istratov A., Хоменко И. И., Погодин А. В. et al., Вестник НПО им. С.А. Лавочкина 2017 № 4 С. 68–75

Рассматривается подход к прогнозированию температурных показателей оборудования космического аппарата (КА) в ходе выполнения научной программы для предотвращения перегрева. Предлагаются алгоритмы обработки данных, накопленных в ходе эксплуатации КА для определения температурных значений компонентов в указанные моменты времени. Представляется реализация программного комплекса. Проведенные эксперименты подтвердили возможность выявления аномальных тепловых режимов эксплуатации КА. ...

Added: January 12, 2018

Intelligent Data Processing 11th International Conference, IDP 2016, Barcelona, Spain, October 10–14, 2016, Revised Selected Papers

Switzerland: Springer, 2019.

This book constitutes the refereed proceedings of the 11th International Conference on Intelligent Data Processing, IDP 2016, held in Barcelona, Spain, in October 2016. The 11 revised full papers were carefully reviewed and selected from 52 submissions. The papers of this volume are organized in topical sections on machine learning theory with applications; intelligent data processing in life ...

Added: February 8, 2020

Proceedings of Analysis of Images, Social Networks and Texts – 7th International Conference, AIST 2018, Moscow, Russia, July 5-7, 2018, Revised Selected Papers. Lecture Notes in Computer Science

Berlin: Springer, 2018.

This book constitutes the proceedings of the 7th International Conference on Analysis of Images, Social Networks and Texts, AIST 2018, held in Moscow, Russia, in July 2018. The 29 full papers were carefully reviewed and selected from 107 submissions (of which 26 papers were rejected without being reviewed). The papers are organized in topical sections on ...

Added: September 5, 2018

Multiple features for clinical relation extraction: A machine learning approach

Alimova l., Tutubalina E., Journal of Biomedical Informatics 2020 Vol. 103 P. 1–9

Relation extraction aims to discover relational facts about entity mentions from plain texts. In this work, we focus on clinical relation extraction; namely, given a medical record with mentions of drugs and their attributes, we identify relations between these entities. We propose a machine learning model with a novel set of knowledge-based and BioSentVec embedding ...

Added: October 28, 2020

Analysis of Images, Social Networks and Texts Third International Conference, AIST 2014, Yekaterinburg, Russia, April 10-12, 2014, Revised Selected Papers

Berlin: Springer, 2014.

This book constitutes the proceedings of the Third International Conference on Analysis of Images, Social Networks and Texts, AIST 2014, held in Yekaterinburg, Russia, in April 2014. The 11 full and 10 short papers were carefully reviewed and selected from 74 submissions. They are presented together with 3 short industrial papers, 4 invited papers and ...

Added: November 13, 2014

Pulse of the Nation: Observable Subjective Well-Being in Russia Inferred from Social Network Odnoklassniki

Sergey Smetanin, Mathematics 2022 Vol. 10 No. 16 Article 2947

Policymakers and researchers worldwide are interested in measuring the subjective well-being (SWB) of populations. In recent years, new approaches to measuring SWB have begun to appear, using digital traces as the main source of information, and show potential to overcome the shortcomings of traditional survey-based methods. In this paper, we propose the formal model for ...

Added: August 15, 2022

Style transfer in NLP: a framework and multilingual analysis with Friends TV series

Tikhonova M., Elina Telesheva, Mirzoev S. et al., , in: 2021 International Conference Engineering and Telecommunication (En&T). IEEE, 2022. P. 1–6.

Style transfer is an important and a rapidly developing of Natural Language Processing. This days more and more methods and models are proposed which allow us to generate text in predefined style. In this paper we propose a framework for style transfer of “Friends” TV series. The trained models are able to mimic one of ...

Added: May 21, 2022

Recent Trends in Analysis of Images, Social Networks and Texts. 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020 Revised Supplementary Proceedings

Springer, 2021.

This book constitutes revised selected papers from the 9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020, held during October 15-16, 2020. The conference was planned to take place in Moscow, Russia, but changed to an online format due to the COVID-19 pandemic. The 27 full papers and 4 short papers presented ...

Added: October 7, 2020

Texterra: инфраструктура для анализа текстов

Денис Турдаков, Астраханцев Н. А., Недумов Я. Р. et al., Труды Института системного программирования РАН 2014 Т. 26 С. 421–438

he paper presents a framework for fast text analytics developed during the Texterra project. Texterra is a technology for multilingual text mining based on novel text processing methods that exploit knowledge extracted from user-generated content. It delivers a fast scalable solution for text mining without the expensive customization. Depending on use-cases Texterra could be utilized ...

Added: November 6, 2017

Texterra: A framework for text analysis.

S.D. Kuznetsov, D.Yu. Turdakov, Астраханцев Н. А. et al., Programming and Computer Software 2014 Vol. 40 No. 5 P. 288–295

A framework for fast text analysis, which is developed as a part of the Texterra project, is described. Texterra provides a scalable solution for the fast text processing on the basis of novel methods that exploit knowledge extracted from the Web and text documents. For the developed tools, details of the project, use cases, and ...

Added: November 26, 2017

The Applications of Sentiment Analysis for Russian Language Texts: Current Challenges and Future Perspectives

Smetanin S., IEEE Access 2020 Vol. 8 P. 110693–110719

Sentiment analysis has become a powerful tool in processing and analysing expressed opinions on a large scale. While the application of sentiment analysis on English-language content has been widely examined, the applications on the Russian language remains not as well-studied. In this survey, we comprehensively reviewed the applications of sentiment analysis of Russian-language content and ...

Added: June 24, 2020

Supplementary Proceedings of the 3rd International Conference on Analysis of Images, Social Networks and Texts (AIST 2014)

Ekaterinburg: CEUR Workshop Proceedings, 2014.

AIST'2014 is an international data science conference on Analysis of Images, Social Networks, and Texts. Traditionally, the conference is held annually in Yekaterinburg, Russia. The conference is intended for computer scientists and practitioners whose research interests involve Internet mathematics and other related fields of data science. LIST OF TOPICS (NON EXHAUSTIVE) Applications of Data Mining and Machine ...

Added: August 28, 2014

Artificial Intelligence and Natural Language. AINL 2020. Communications in Computer and Information Science

Springer, 2020.

Added: September 8, 2020