Семинары 2024

Семинар НУЛ Искусственного интеллекта для вычислительной биологии

Дата: 24.05.2024

Тема: "Идентификация нанопластиков методом масс-спектрометрии "

Докладчик: Бурмак Карина Сергеевна, стажёр-исследователь НУЛ Искусственного интеллекта для вычислительной биологии.

Аннотация:
В настоящее время микро- и нанопластики вызывают беспокойство у научного сообщества. Помимо того, что они способны накапливаться в тканях и органах человека, они могут вызывать серьезные заболевания. Поэтому все большее внимание уделяется методам поиска нанопластиков в биологических жидкостях человека. Одним из таких методов является масс-спектрометрия, которая позволяет точно определять количественный и качественный состав пробы даже в низких концентрациях. Однако на настоящий момент не существует вычислительных алгоритмов и методов идентификации и аннотации нанопластиков. Данный семинар посвящен разработаннной и реализованной библиотеке на языке Python, предназначенной для идентификации нанопластиков полистиролсульфоната и перфторалкильных соединений методом масс-спектрометрии.

Тема: "Методы глубинного обучения для предсказания времени удержания пептидов"

Докладчик: Шинкарев Елисей Сергеевич, стажёр-исследователь НУЛ Искусственного интеллекта для вычислительной биологии.

Аннотация:

В данной исследовательской работе анализируется применение методов глубинного обучения для предсказания времени удержания пептидов (retention time), рассматриваются существующие модели: AutoRT и DeepLC. Эти модели играют ключевую роль в повышении точности и надежности идентификации пептидов. Они используют техники глу- бинного обучения и обучения на больших наборах данных с аннотированными спектрами пептидов, что позволяет им захватывать сложные шаблоны и особенности в спектрах, обес- печивая более точные предсказания рассматриваемого признака - времени удержания. В рамках работы будут поставлены эксперименты на нашем датасете, будут рассматриваться результаты работы вышеупомянутых моделей на пептидах, а также их модификациях с последующим сравнением распределений и анализом результатов. То есть, чем предсказания у пептида и его модификации отличаются в зависимости от используемой модели.

Запись семинара доступна по ссылке.

Семианар НУЛ ИИВБ

Дата: 05.06.2024

Спикер: Сарадва Хушбу Нароттамбхай, Стажер-исследователь в Научно-учебной лаборатории искусственного интеллекта для вычислительной биологии

Название: "Benchmarking the Thermo Fisher Orbitrap Astral Mass Spectrometer"

Аннотация: The complexity of human physiology arises from well-orchestrated interactions between trillions of single cells in the body. Mass spectrometry.(MS) based proteomics has emerged as a powerful tool for comprehensive protein analysis, including single-cell applications. The use of mass spectrometer in the analysis.of biological samples has become ubiquitous, marking it as an indispensable tool & device in the realm of proteomics analysis and research. My keen curiosity to unravel the intricacies of the proteome has been a driving force behind the development of innovative technologies that continually expand the horizons of mass spectrometry capabilities. As a result, mass spectrometry has been empowered to confront an ever-expanding array of biological inquiries, catalysing advancements and breakthroughs in our understanding of complex biological systems. However, challenges remain in terms of throughput and proteomic depth, in order to maximize the biological impact of single-cell proteomics by Mass Spectrometry (SCPMS) workflows. This study leverages a novel high resolution, accurate.mass.(HRAM) instrument platform, consisting of both an Orbitrap and an innovative HRAM Asymmetric Track Lossless (Astral) analyzer. The Astral analyzer offers high sensitivity and resolution through lossless ion transfer and a unique flight track design. Ultimately goal is to evaluate.the performance of the Thermo Scientific Orbitrap Astral MS using Data-Independent Acquisition (DIA) and assess proteome depth and quantitative precision for ultra-low input samples. The Orbitrap Astral MS has the potential to revolutionize protein discovery and precision medicine by enabling large-scale studies and faster insights. With its ability to handle high sample volumes and advanced solutions, researchers can uncover new insights and make breakthrough discoveries across a wide range of disciplines, ultimately advancing our understanding of complex diseases and paving the way for precision medicine solutions. As a result, novel proteomics techniques were created that take advantage of each HRAM analyzer advantages. For example, the Orbitrap analyzer can execute full scans with a high dynamic range and resolution, while the Astral analyzer can acquire quick and sensitive HRAM MS/MS scans in synchrony.

Спикер: Суфиян Мухаммад, Стажер-исследователь в Научно-учебной лаборатории искусственного интеллекта для вычислительной биологии

Название: Analysis of Semi Correct Annotations for False Discovery Rate Control in Tandem Mass Spectrometry Data

Аннотация: With an accentuation on peptide distinguishing proof and misleading disclosure rate (FDR) the executives, this postulation researches the turns of events and utilizations of mass spectrometry (MS) in proteomics. We highlight the significance of MS/MS techniques and MS technology for proteomics by examining their development over time. The capability of computational strategies, like computerized reasoning (simulated intelligence) and AI, to further develop range coordinating and MS/MS information examination is researched. Our methodology includes the utilization of simulated intelligence models, careful exploratory settings, and information handling strategies to improve the accuracy of peptide ID and FDR gauge. That's what the discoveries show, in contrast with the peptide-opposite and peptide-mix draws near, the de Bruijn distraction age technique reliably creates more peptide-range matches (PSMs) across various FDR levels. The de Bruijn method, according to ROC analysis, provides a more accurate depiction of inaccurate target PSM values, particularly at strict FDR levels. A dependable approach to real-time FDR calculation is made possible by incorporating AI models into data processing. This results in significant improvements in recognition accuracy and efficiency. The consequences of these disclosures for proteomics are talked about, with an accentuation on how they could improve the recognizable proof of biomarkers and the determination of sickness. In addition, the study's limitations and potential future research directions are discussed, providing recommendations for MS-based proteomics advancements. By offering an intensive assessment of MS/MS strategies and proposing inventive computational procedures to further develop peptide distinguishing proof and FDR control, this study progresses the discipline.

Запись семинара по ссылке.

Семинар НУЛ ИИВБ

Дата: 06.06.2024

Спикер: Подерни Афина Юрьевна

Название: "Deep Neural Networks for Peptide Identification in Data Independent Acquisition Mass Spectrometry"

Аннотация: Mass spectrometry is an analysis technique used for the quantification and structural determination of molecules, which is widely used in fields such as medicine, cosmetology, marine sciences and others. It was developed over 100 years ago, but continues to evolve with the development of computer technologies. Mass spectrometers produce a huge amount of information that needs to be analyzed and stored, and researchers proposed various approaches for data acquisition and preprocessing. A noticeable attainment in this domain is data-independent acquisition mass spectrometry, which minimizes data loss and looks promising in peptide identification quality increasing. However, it increases the size and complexity of the data, but as technological progress moves rapidly forward, this problem is becoming less substantial and the interest of researchers in data-independent acquisition mass spectrometry is escalating. The use of machine learning algorithms and deep neural networks in computational biology offers promising development of MS data analysis. Mass spectrometry data is prone to sudden appearances and disappearances of spectral ions, but at the same time, due to its chemical nature, it has many connected components that neural networks could generalize and find hidden patterns in the data. The goal of this work is to develop a convolutional neural network for preprocessing DIA spectra, which will help to improve the quality of peptide identification. For this purpose, basic methods and tools for MS analysis were studied and implemented in the pipeline along with the model training. This thesis presents the results of experiments on training the model with different parameters on DIA dataset consisting of mass spectrometry experiments with Saccharomyces cerevisiae (Baker's yeast), which helped to improve the quality of peptide identification for low-resolution data.

Спикер: Сулимов Даниил Андреевич

Название: PEFT (Parameter-Efficient Fine-Tuning) for GPT-like Deep Models to Reduce Hallucinations and to Improve Reproducibility in Scientific Text Generation Using Stochastic Optimization Techniques

Аннотация: Large Language Models (LLMs) have demonstrated impressive performance in a variety of language-related tasks, including text generation, machine translation, text summarising. Sometimes the result produced by a LLM turns out to be inaccurate. This thesis aims to fine-tune the existing LLM, GPT-2 by OpenAI, to reduce model's hallucinations and increase the answers' reproducibility in mass spectrometry. The research involved the application of the following scope of skills: data engineering, stochastic modelling, data science and statistics. I used two servers for all experiments: cHARISMa Higher School of Economics (HSE) server for fine-tuning and AI for Computational biology (AIC) server, where I run Docker images, necessary for the data preprocessing. Our fine-tuned model was named MassSpecGPT (MS-GPT). The thesis includes the novel approach of reproducibility score computations and calculation of Wilcoxon rank sum statistical test to compare the fine-tuned model MS-GPT against the base GPT-2 by OpenAI in reproducibility domain. The selection of optimal parameters (optimizer, learning rate) was based on several factors: validation error, run time, random-access memory (RAM) usage and Electricity usage. The fine-tuning of the model involved Low-Rank Adaptation of Large Language Models (LoRA) adapters, the state-of-the art (SOTA) method by now. I used common Natural Language Generation (NLG) evaluation metrics to compare the models' accuracies: Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE) and Perplexity. As the result of the research, the BLEU score increased from 0.33 to 0.34, ROUGE-1 - from 0.42 to 0.44, ROUGE-L - from 0.57 to 0.62, Perplexity reduced from 13586.37 to 10092.12 and reproducibility score went from 0.83 to 0.84. Statistically significant under 5\% significance level turned out to be Perplexity score and reproducubility.

Запись семинара доступна по ссылке.

Нашли опечатку?
Выделите её, нажмите Ctrl+Enter и отправьте нам уведомление. Спасибо за участие!
Сервис предназначен только для отправки сообщений об орфографических и пунктуационных ошибках.

Научно-учебная лаборатория искусственного интеллекта для вычислительной биологии

Семинары 2024

Семинар НУЛ Искусственного интеллекта для вычислительной биологии

Семианар НУЛ ИИВБ

Семинар НУЛ ИИВБ