Аннотация:
Numerous public databases now collect and disseminate biological activity data from literature and patents, forming the basis for chemogenomics and novel scoring functions. However, data quality is often compromised due to multiple citations of values across different studies with varying protocols. To address this issue, we used the XGBoost model in combination with a BERT-based NLP approach and a distance-based out-of-distribution (OOD) data detection method to enhance classification accuracy and exclude review articles.
Образец цитирования:
Ya. V. Timofeev, A. M. Mrasov, M. V. Panova, F. N. Novikov, I. V. Svitanko, “How to stop worrying and love multiple citation experimental data”, Mendeleev Commun., 35:2 (2025), 224–227