Quantitative Biology > Genomics

arXiv:2311.03429 (q-bio)

[Submitted on 6 Nov 2023 (v1), last revised 8 Nov 2023 (this version, v2)]

Title:ProPath: Disease-Specific Protein Language Model for Variant Pathogenicity

View PDF

Abstract:Clinical variant classification of pathogenic versus benign genetic variants remains a pivotal challenge in clinical genetics. Recently, the proposition of protein language models has improved the generic variant effect prediction (VEP) accuracy via weakly-supervised or unsupervised training. However, these VEPs are not disease-specific, limiting their adaptation at point-of-care. To address this problem, we propose a disease-specific \textsc{pro}tein language model for variant \textsc{path}ogenicity, termed ProPath, to capture the pseudo-log-likelihood ratio in rare missense variants through a siamese network. We evaluate the performance of ProPath against pre-trained language models, using clinical variant sets in inherited cardiomyopathies and arrhythmias that were not seen during training. Our results demonstrate that ProPath surpasses the pre-trained ESM1b with an over $5\%$ improvement in AUC across both datasets. Furthermore, our model achieved the highest performances across all baselines for both datasets. Thus, our ProPath offers a potent disease-specific variant effect prediction, particularly valuable for disease associations and clinical applicability.

Comments:	Accepted by MLCB 2023
Subjects:	Genomics (q-bio.GN); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2311.03429 [q-bio.GN]
	(or arXiv:2311.03429v2 [q-bio.GN] for this version)
	https://doi.org/10.48550/arXiv.2311.03429

Submission history

From: Huixin Zhan [view email]
[v1] Mon, 6 Nov 2023 18:43:47 UTC (1,508 KB)
[v2] Wed, 8 Nov 2023 04:35:37 UTC (1,508 KB)

Quantitative Biology > Genomics

Title:ProPath: Disease-Specific Protein Language Model for Variant Pathogenicity

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Genomics

Title:ProPath: Disease-Specific Protein Language Model for Variant Pathogenicity

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators