Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3156346.3156354acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsbioConference Proceedingsconference-collections
research-article

Position-Residue Specific Dynamic Gap Penalty Scoring Strategy for Multiple Sequence Alignment

Published: 07 December 2017 Publication History

Abstract

Multiple Sequence Alignment (MSA) is a basic tool for biological sequence analysis and also a crucial step utilized by biologists to analyze phylogentic, gene regulations, homology marker, drug discovery, and predicting the protein structure and its functions. Effective Alignment of multiple sequences having biologic relevance is still an open problem. Accuracy of MSA is highly dependent on the scoring function, which aligns a given residue to its appropriate position during alignment. Scoring function has three possible cases to score a pair of residues: i) a residue with same residue, ii) a residue with different residue and iii) a residue with gap. A number of biological meaningful approaches are developed for the first two cases. However, for the third case, most of the approaches follow the default score for gap penalty, which is provided as an input by an expert. In this study, we propose a new, biologically relevant, and position-residue specific dynamic scoring approach for gap penalty. Position-Residue Specific Dynamic Gap Penalty (PRSDGP) scoring function is tested on the BAliBASE benchmark dataset. The proposed PRSDGP scoring approach is compared with the CLUSTAL O program and Quality metric improvement ranges from 46.2% to 81.5%.

References

[1]
{n. d.}. Benchmark Alignment dataBASE (BAliBASE) 4.0. ({n. d.}). http://www.lbgi.fr/balibase/
[2]
MO Dayhoff, RM Schwartz, and BC Orcutt. 1978. 22 A Model of Evolutionary Change in Proteins. In Atlas of protein sequence and structure. Vol. 5. National Biomedical Research Foundation Silver Spring, MD, 345--352.
[3]
Chuong B Do, Mahathi SP Mahabhashyam, Michael Brudno, and Serafim Batzoglou. 2005. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome research 15, 2 (2005), 330--340.
[4]
R.C Edgar. {n. d.}. http://www.drive5.com/bench/. ({n. d.}).
[5]
Robert C Edgar. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research 32, 5 (2004), 1792--1797.
[6]
Da-Fei Feng and Russell F Doolittle. 1987. Progressive sequence alignment as a prerequisitetto correct phylogenetic trees. Journal of molecular evolution 25, 4 (1987), 351--360.
[7]
Gaston H Gonnet, A Cohen, and Steven A Benner. 1992. Exhaustive matching of the entire protein sequence database. issues 3 (1992), 10.
[8]
Steven Henikoff and Jorja G Henikoff. 1992. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences 89, 22 (1992), 10915--10919.
[9]
Kazutaka Katoh, Kazuharu Misawa, Kei-ichi Kuma, and Takashi Miyata. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic acids research 30, 14 (2002), 3059--3066.
[10]
Eagu Kim and John Kececioglu. 2008. Learning scoring schemes for sequence alignment from partial examples. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 5, 4 (2008), 546--556.
[11]
Saul B Needleman and Christian D Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology 48, 3 (1970), 443--453.
[12]
Cédric Notredame. 2007. Recent evolutions of multiple sequence alignment algorithms. PLoS Comput Biol 3, 8 (2007), e123.
[13]
Cédric Notredame, Desmond G Higgins, and Jaap Heringa. 2000. T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of molecular biology 302, 1 (2000), 205--217.
[14]
Álvaro Rubio-Largo, Miguel A Vega-Rodríguez, and David L González-Álvarez. 2016. A Hybrid Multiobjective Memetic Metaheuristic for Multiple Sequence Alignment. IEEE Transactions on Evolutionary Computation 20, 4 (2016), 499--514.
[15]
Naruya Saitou and Masatoshi Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution 4, 4 (1987), 406--425.
[16]
Nanjiang Shu and Arne Elofsson. 2011. KalignP: Improved multiple sequence alignments using position specific gap penalties in Kalign2. Bioinformatics 27, 12 (2011), 1702--1703.
[17]
Fabian Sievers, Andreas Wilm, David Dineen, Toby J Gibson, Kevin Karplus, Weizhong Li, Rodrigo Lopez, Hamish McWilliam, Michael Remmert, Johannes Söding, et al. 2011. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular systems biology 7, 1 (2011), 539.
[18]
Temple F Smith and Michael S Waterman. 1981. Identification of common molecular subsequences. Journal of molecular biology 147, 1 (1981), 195--197.
[19]
PHA Sneath and RR Sokal. 1973. Numerical Taxonomy Freeman San Francisco. (1973).
[20]
Julie D Thompson, Desmond G Higgins, and Toby J Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic acids research 22, 22 (1994), 4673--4680.
[21]
Julie D Thompson, Frédéric Plewniak, and Olivier Poch. 1999. A comprehensive comparison of multiple sequence alignment programs. Nucleic acids research 27, 13 (1999), 2682--2690.
[22]
William Seth Jermy Valdar. 2001. Residue conservation in the prediction of protein-protein interfaces. Ph.D. Dissertation. University College London (University of London).
[23]
Lusheng Wang and Tao Jiang. 1994. On the complexity of multiple sequence alignment. Journal of computational biology 1, 4 (1994), 337--348.
[24]
Huazheng Zhu, Zhongshi He, and Yuanyuan Jia. 2016. A novel approach to multiple sequence alignment using multiobjective evolutionary algorithm based on decomposition. IEEE journal of biomedical and health informatics 20, 2 (2016), 717--727.

Cited By

View all
  • (2020)ProgSIO-MSA: Progressive-based single iterative optimization framework for multiple sequence alignment using an effective scoring systemJournal of Bioinformatics and Computational Biology10.1142/S021972002050005518:02(2050005)Online publication date: 6-May-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
CSBio '17: Proceedings of the 8th International Conference on Computational Systems-Biology and Bioinformatics
December 2017
83 pages
ISBN:9781450353502
DOI:10.1145/3156346
© 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

In-Cooperation

  • SOICT: School of Information and Communication Technology - HUST
  • NAFOSTED: The National Foundation for Science and Technology Development
  • KMUTT: King Mongkut's University of Technology Thonburi

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Multiple sequence alignment
  2. gap penalty
  3. scoring function

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CSBio '17

Acceptance Rates

Overall Acceptance Rate 23 of 37 submissions, 62%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)ProgSIO-MSA: Progressive-based single iterative optimization framework for multiple sequence alignment using an effective scoring systemJournal of Bioinformatics and Computational Biology10.1142/S021972002050005518:02(2050005)Online publication date: 6-May-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media