CN105219765A - Protein sequence is utilized to build genomic method and apparatus - Google Patents
Protein sequence is utilized to build genomic method and apparatus Download PDFInfo
- Publication number
- CN105219765A CN105219765A CN201510755855.XA CN201510755855A CN105219765A CN 105219765 A CN105219765 A CN 105219765A CN 201510755855 A CN201510755855 A CN 201510755855A CN 105219765 A CN105219765 A CN 105219765A
- Authority
- CN
- China
- Prior art keywords
- sequence
- genome
- protein sequence
- protein
- genomic fragment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides one utilizes protein sequence to build genomic method and apparatus.Particularly, the invention provides the method based on protein sequence spliced gene group, comprise screening fragmin sequence, comparison area sequence and screening on protein sequence, screen based on the genome sequence splicing connecting at most evidence, form the steps such as new genome sequence.The present invention contains the splicing connecting at most evidence and is the statistical thinking the most reliably spliced, and utilizes the method comprising the protein sequence spliced gene group of homologous protein.The protein data that utilizes according to the present invention assembles genomic method, can utilize disclosed protein sequence, to promote genomic integrity.Protein sequence data both can be the protein sequence of the corresponding species of genome sequence, also can be the protein sequence of nearly edge species.These sequences come from public database, also can be the data that user tests generation.Present invention also offers the device realizing aforesaid method.
Description
Technical field
The present invention relates to genetics and field of bioinformatics, especially relate to genomic construction process and device.
Background technology
The full-length genome assembling of species depends on shotgun strategy at present.Behind the library that the multiple Insert Fragment length of structure differs, the library assembling genome first utilizing Insert Fragment short, the library progressively utilizing Insert Fragment longer assembling genome, makes genome length progressively increase.But adopt the genome of shotgun construction of strategy cannot the whole gene of complete covering.
Protein biology synthesis comprises transcribes and translates two biological procedureses.First, transcription for template is transcribed, forms ripe messenger RNA(mRNA) after removing intron with continuity genome.Then, translation process, according to the central dogma of genetic code, by " putting in order of base " (nucleotide sequence) decoding in the messenger RNA(mRNA) of maturation, and generates corresponding specific protein sequence.If genome is imperfect, then a protein sequence will be divided into 2 or more comparison area, be dispersed on multiple genome sequence.Utilize these regions and the position at protein sequence thereof, can again the genome sequence of correspondence be together in series, form longer genome sequence.So originally, the protein sequence be dispersed in can intactly cover.Therefore, the method and the device that develop protein data assembling genome sequence have feasibility.And exploitation the method and device can improve genomic integrity.
Clearly illustrating for making the present invention, first defining as follows for the technical term used in this specification sheets.
Comparison area, refers to the region that protein sequence and genome sequence are similar or consistent in this manual.Because genome assembling is still imperfect, so a protein sequence may be divided into multiple comparison area, and these comparison area lay respectively on multiple genome sequence.
The length of protein sequence, refers to all amino acid no of protein.
The length of comparison area, refers to the histone amino acid number in comparison area.
The relative position of comparison area: refer to the position of comparison area relative to whole piece protein sequence.
The absolute location of comparison area: refer to the position of comparison area relative to genome sequence.
Interval between comparison area: refer to that (i and j) at the range difference of the genome sequence (A and B) of correspondence equals (length-comparison area i of genome sequence A at the position+comparison area j of A sequence in the position of B sequence) former and later two comparison area.
Sequence coverage: the ratio equaling comparison area length and protein sequence length.
Genome sequence splices: two or more genome sequences are according to the position of comparison area in protein sequence, and sequencing forms.
Summary of the invention
The present invention is intended at least to solve one of technical problem existed in prior art.For this reason, the present invention proposes a kind of method and apparatus based on protein sequence spliced gene group, what the present invention adopted is be containing the splicing connecting at most evidence the statistical thinking the most reliably spliced.
According to an aspect of the present invention, provide a kind of method based on protein sequence spliced gene group, comprise the steps:
(1) fragmin sequence is screened
Protein sequence and genome sequence are compared, obtains the relative position of comparison area on described protein sequence, and the absolute location on described genome sequence,
The protein sequence that removal sequence coverage is too high and only comparison are to the protein sequence of a genome sequence, make the protein sequence energy comparison after screening to many genome sequences and each comparison area can not cover whole piece albumen completely, obtain the protein sequence of fragmentation;
(2) sequence of comparison area on protein sequence and screening
According to the described relative position of comparison area on protein sequence that the protein sequence of described fragmentation is corresponding, from small to large genome sequence corresponding for its comparison area is arranged in order,
Interval between two comparison area be connected before and after calculating, the comparison area splicing that reservation interval is less than 200Kb is spliced with the genome sequence of its correspondence,
Using the connection evidence that the protein sequence of the described fragmentation corresponding to described genome sequence splicing splices as described genome sequence;
(3) based on the genome sequence splicing screening connecting at most evidence
With in the splicing of described genome sequence, the sequence of the new genomic fragment of follow-up connection for origin sequences, with in described genome sequence splicing, connect the sequence of new genomic fragment before for terminator sequence,
But genomic fragment new for follow-up connection is not connected before the genome sequence of new genomic fragment as starting point, but connect the follow-up genome sequence not connecting new genomic fragment of new genomic fragment as destination node using having before, using the genome sequence of genomic fragment new for existing connection before, the follow-up genomic fragment having again connection new as intermediate point
Remain with the described origin sequences of maximum described connection evidence and described terminator sequence;
(4) new genome sequence is formed
For the sequence retained final in step (3), eachly as the genomic fragment of starting point, starting point can only be it can be used as respectively, select follow-up intermediate point, for this intermediate point selects new intermediate point further, till finding destination node,
The tandem connected according to above-mentioned each genome sequence is by the genomic fragment of each genomic fragment assembled in series Cheng Gengchang.
Specifically, such as, by protein sequence (be assumed to a ... .z) comparison is on genome.Obtain the relative position of comparison area on protein sequence, and genome sequence (be assumed to A ... Z) on absolute location.According to the screening principle that step of the present invention (1) provides, the protein sequence after filtration is the protein sequence of fragmentation, is characterized as, and the comparison of albumen energy is to many genome sequences, and each comparison area can not complete covering whole piece albumen.The albumen remained after these screenings, may as the connection evidence of genome sequence splicing afterwards.
Next, for the albumen remained and comparison area thereof, according to the relative position of each comparison area on protein sequence from small to large, arrange the genome sequence of its correspondence.For example, assuming that a albumen is the fragmentation sequence after retaining, be 1,2,3 and 4 between comparison area, corresponding genome sequence A, B, C and D respectively.If the relative position of above-mentioned 4 comparison area on albumen is ascending is 4,2,1 and 3, so corresponding genome sequence order is D, B, A and C.
Because protein sequence possible errors comparison is on genome sequence, thus between the comparison area causing two front and back to be connected between interval excessive.If the interval of two comparison area that front and back are connected is less than 200kb, these two comparison area splicings think that reliably, corresponding genome sequence also retains.For above-mentioned 4 comparison area, so by generation 3 kinds connection: 4-> 2,2-> 1, and 1-> 3.Corresponding genome is spliced into D-> B, B-> A and A-> C.Assuming that in 4-> 2,2-> 1 and 1-> 3, the interval of 2-> 1 is more than 200kb, order so between 2-> 1 may be wrong, according to the present invention, only retain 4-> 2 and 1-> 3, and the genome sequence splicing D-> B and A-> C of correspondence also remains.So protein sequence a is using the connection evidence as genome sequence splicing D-> B and A-> C.
3rd step, according to the present invention, each genome sequence is listed in sequence assembly two attribute: origin sequences and terminator sequence.Such as, in the splicing D-> B of two genome sequences, D is origin sequences, and B is terminator sequence.For each genome sequence, as origin sequences, several genes group sequence assembly may be had.According to the present invention, only remain with the genome splicing connecting at most evidence.Such as, for genome sequence D, as origin sequences, multiple connecting method may be had, such as D-> B, D-> K, and D-> M.It is 5,3 and 2 that the albumen of often kind of connecting method connects evidence, will retain D-> B.In like manner, for each genome sequence, as terminator sequence, also take the same operation steps.Such as, for genome sequence D, as terminal sequence, multiple connecting method may be had, such as P-> D, T-> D, and S-> D.It is 5,3 and 2 that the albumen of often kind of connecting method connects evidence, will retain P-> D.
Finally, the genome splicing after retaining is connected, forms new genome sequence.As the genomic fragment of starting point, starting point can only be it can be used as respectively for each in previous step, from the genome splicing retained, select follow-up intermediate point; For this intermediate point selects new intermediate point, further till finding destination node.According to the tandem that above-mentioned each genome sequence connects, each genomic fragment Connecting groups is dressed up longer genomic fragment.Such as, the genome splicing D-> B and P-> D retained.The genome orders formed after series connection is P-> D-> B.
One of according to the embodiment of the present invention, protein sequence source comprises: (i) these species have been delivered or published protein sequence; (ii) protein sequence of allied species; (iii) for the species not having Protein Data Bank, disclosed transcript profile data prediction protein sequence is utilized.
According to another embodiment of the present invention, protein sequence derives from public database, such as NCBI, Uniprot or Ensembl database, or the protein sequence obtained after deriving from transcript profile order-checking translation.
According to the present invention, preferably, the comparison software adopted when protein sequence and genome are compared in step (1) is BLAT sequence alignment program, and parameter is-q=prot and-t=dnax.
Preferably, in step (1), sequence coverage is too high refers to that sequence coverage is higher than 90%.Remove sequence coverage higher than 90% albumen, reservation queue coverage, lower than the albumen of 90%, namely requires that the protein sequence coverage of each comparison area is lower than 90%.The albumen that this step retains is Data Source of the invention process.
The present invention, in the reliable genome splicing of screening, uses and connects at most evidence screening principle.The technique effect that the present invention is useful is:
(1) genome sequence length and integrity can effectively be improved;
(2) protein sequence of fragmentation can be linked together, improve the integrity of protein sequence on genome.
According to a further aspect in the invention, additionally provide the device realizing aforesaid method, comprise as lower part:
(1) fragmin sequence screening unit: for protein sequence and genome sequence being compared, obtains the relative position of comparison area on described protein sequence, and the absolute location on described genome sequence; The protein sequence that removal sequence coverage is too high and only comparison are to the protein sequence of a genome sequence, make the protein sequence energy comparison after screening to many genome sequences and each comparison area can not cover whole piece albumen completely, obtain the protein sequence of fragmentation.
(2) sequence of comparison area on protein sequence and screening unit: the sequence of described comparison area on protein sequence and screening unit are connected with described fragmin sequence screening unit; For according to the described relative position of comparison area on protein sequence corresponding to the protein sequence of described fragmentation, from small to large genome sequence corresponding for its comparison area is arranged in order; Interval between two comparison area be connected before and after calculating, the comparison area splicing that reservation interval is less than 200kb is spliced with the genome sequence of its correspondence; Using the connection evidence that the protein sequence of the described fragmentation corresponding to described genome sequence splicing splices as described genome sequence.
(3) genome sequence splicing screening unit: sequence on protein sequence of described genome sequence splicing screening unit and described comparison area and screen unit and be connected, what described genome sequence splicing screening unit adopted is connect at most evidence to screen principle; For with in the splicing of described genome sequence, the sequence of the new genomic fragment of follow-up connection for origin sequences, with in described genome sequence splicing, connect the sequence of new genomic fragment before for terminator sequence; But genomic fragment new for follow-up connection is not connected before the genome sequence of new genomic fragment as starting point, but connecting the follow-up genome sequence not connecting new genomic fragment of new genomic fragment as destination node using having before, using genomic fragment new for existing connection before, follow-uply having again the genome sequence connecting new genomic fragment as intermediate point; Remain with the described origin sequences of maximum described connection evidence and described terminator sequence.
(4) new genome sequence forming unit: described new genome sequence forming unit and described genome sequence splice and screen unit and be connected; For the sequence for reservation final in step (3), eachly as the genomic fragment of starting point, starting point can only be it can be used as respectively, select follow-up intermediate point, for this intermediate point selects new intermediate point, further till finding destination node; The tandem connected according to above-mentioned each genome sequence is by the genomic fragment of each genomic fragment assembled in series Cheng Gengchang.
The aspect that the present invention adds and advantage will part provide in the following description, and part will become obvious from the following description, or be recognized by practice of the present invention.
Embodiment
Be described below in detail embodiments of the invention, the embodiment of description is exemplary, only for explaining the present invention, and can not be interpreted as limitation of the present invention.
Embodiment 1
Utilize the genome sequence of the EnsemblGenomeBrower protein sequence assembling zebra fish of zebra fish.
Material: from U.S.'s Biotechnology Information center (NCBI, NationalCenterforBiotechonlogyInformation, http://www.ncbi.nlm.nih.gov/) website downloads the zebra fish genome sequence of 37298 FASTA forms, and the mean length of these genome sequences is 143274bp.The protein data of 43153 zebra fishs is downloaded from EnsemblGenomeBrower (www.ensembl.org) website.
(1) protein sequence of fragmentation is screened
Step 01: download BLAT (BlAST-likealignmenttool) program from branch school, California, USA university Santa Cruz (http://hgdownload.cse.ucsc.edu/admin/exe/), select standalone version pattern, using each protein data as search sequence, using genomic fragment as matching sequence, parameter is-q=prot and-t=dnax, other parameters are acquiescence, 43153 protein sections of reading and 37298 genomic fragments are compared.Result shows the comparison of 43118 albumen mass-energy on genomic fragment.
Step 02: the sequence coverage calculating comparison area, the albumen that reservation queue coverage is less than 90%.
Step 03: for the albumen remained in step 02, removes the albumen of only comparison to 1 genomic fragment, retains comparison to the albumen of more than 2 genome sequences and comparison area thereof.After this step, remain 13858 comparison area.
(2) sequence of comparison area on protein sequence and screening
Step 04: for the comparison area remained in step 03, on same albumen, is arranged in order from small to large according to the relative position of each comparison section.
Step 05: calculate each comparison area in same protein sequence and the distance of the follow-up comparison area be connected, if distance is less than or equal to 200kb, then retain this two comparison area.Otherwise remove this two comparison area.
Step 06: the connection evidence that each joint area remained through step 05 is spliced as the genome of correspondence.
(3) based on the genome splicing screening connecting at most evidence
Step 07: be each genome sequence remained in step 06, according to method steps provided by the present invention, is respectively it and selects to connect the maximum origin sequences of evidence and terminator sequence.5998 reliable genome sequence splicing relations are created after this step terminates.
These genome sequences are divided into (i) starting point, (ii) destination node and (iii) intermediate point three class according to method provided by the invention.
(4) the genome splicing after reservation is connected, form new genome sequence
Step 08: for each genome sequence belonging to (i) class in step 07, it can be used as starting point respectively, from (ii) class with the genome sequence of (iii) class, find sliceable genome sequence, form genome sequence and connect; The genome sequence this searched connects as new starting point, finds attachable genome sequence as described above further, till not having attachable genome sequence.Be assembled into longer genome sequence according to the tandem splicing that above-mentioned each genome sequence connects, thus complete genome assembling process.This step terminates the new genome sequence of rear generation 3428.
Result: the zebra fish genome sequence after assembling is classified as 31304, more originally decreases 16.07%; Mean length is 169286bp, increases 18.16%.
Embodiment 2
Utilize the protein sequence of nematode to assemble the genome sequence of nematode
Protein sequence and 3267 genome sequences of 30250 nematodes are downloaded from EnsemblGenomeBrower website.
The assembling of nematode gene group is carried out according to the step 01-step 08 in embodiment 1.
Result: the nematode gene group sequence average length before assembled in advance is 36490 bases.After using protein sequence assembling, the mean length of nematode gene group sequence is 43454bp, and increase by 19.1%, quantity is reduced to 2557.
Embodiment 3
Utilize the albumen of people to assemble human genome sequence
The human protein sequence of 141032 FASTA forms is downloaded from the SwissProt word bank of UniProtKB database.The human genome sequence of 27416 FASTA forms is downloaded from U.S.'s Biotechnology Information center (NCBI, http://www.ncbi.nlm.nih.gov/) website
Next, the genome sequence of people is spliced according to the step 01-step 08 in embodiment 1.
Result: the genome sequence mean length of the people before assembled in advance is 142356bp, totally 27416.After using present method assembling, genome sequence mean length is 173197bp, adds 21.7%, and sequence quantity is reduced to 20905.
Be described for zebra fish, nematode and human genome in above-described embodiment.Certainly, principle of the present invention and method can also be used for the genome sequence assembling of other biological.
Obviously, those skilled in the art can carry out various change and modification to the present invention and not depart from the spirit and scope of the present invention.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.
Claims (10)
1., based on a method for protein sequence spliced gene group, comprise the steps:
(1) fragmin sequence is screened
Protein sequence and genome sequence are compared, obtains the relative position of comparison area on described protein sequence, and the absolute location on described genome sequence,
The protein sequence that removal sequence coverage is too high and only comparison are to the protein sequence of a genome sequence, make the protein sequence energy comparison after screening to many genome sequences and each comparison area can not cover whole piece albumen completely, obtain the protein sequence of fragmentation;
(2) sequence of comparison area on protein sequence and screening
According to the described relative position of comparison area on protein sequence that the protein sequence of described fragmentation is corresponding, from small to large genome sequence corresponding for its comparison area is arranged in order,
Interval between two comparison area be connected before and after calculating, the comparison area splicing that reservation interval is less than 200Kb is spliced with the genome sequence of its correspondence,
Using the connection evidence that the protein sequence of the described fragmentation corresponding to described genome sequence splicing splices as described genome sequence;
(3) based on the genome sequence splicing screening connecting at most evidence
With in the splicing of described genome sequence, the sequence of the new genomic fragment of follow-up connection for origin sequences, with in described genome sequence splicing, connect the sequence of new genomic fragment before for terminator sequence,
But genomic fragment new for follow-up connection is not connected before the genome sequence of new genomic fragment as starting point, but connect the follow-up genome sequence not connecting new genomic fragment of new genomic fragment as destination node using having before, using the genome sequence of genomic fragment new for existing connection before, the follow-up genomic fragment having again connection new as intermediate point
Remain with the described origin sequences of maximum described connection evidence and described terminator sequence;
(4) new genome sequence is formed
For the sequence retained final in step (3), eachly as the genomic fragment of starting point, starting point can only be it can be used as respectively, select follow-up intermediate point, for this intermediate point selects new intermediate point further, till finding destination node,
The tandem connected according to above-mentioned each genome sequence is by the genomic fragment of each genomic fragment assembled in series Cheng Gengchang.
2. method as claimed in claim 1, wherein said protein sequence source comprises: (i) these species have been delivered or published protein sequence; (ii) protein sequence of allied species; (iii) for the species not having Protein Data Bank, disclosed transcript profile data prediction protein sequence is utilized.
3. method as claimed in claim 1, wherein said protein sequence derives from public database, or the protein sequence obtained after deriving from transcript profile order-checking translation.
4. method as claimed in claim 1, the comparison software adopted when wherein protein sequence and genome are compared in step (1) is BLAT sequence alignment program, and parameter is-q=prot and-t=dnax.
5. method as claimed in claim 1, wherein in step (1), sequence coverage is too high refers to that sequence coverage is higher than 90%.
6. based on a device for protein sequence spliced gene group, it is characterized in that, comprising:
(1) fragmin sequence screening unit
For protein sequence and genome sequence being compared, obtain the relative position of comparison area on described protein sequence, and the absolute location on described genome sequence,
The protein sequence that removal sequence coverage is too high and only comparison are to the protein sequence of a genome sequence, make the protein sequence energy comparison after screening to many genome sequences and each comparison area can not cover whole piece albumen completely, obtain the protein sequence of fragmentation;
(2) sequence of comparison area on protein sequence and screening unit
The sequence of described comparison area on protein sequence and screening unit are connected with described fragmin sequence screening unit,
For according to the described relative position of comparison area on protein sequence corresponding to the protein sequence of described fragmentation, from small to large genome sequence corresponding for its comparison area is arranged in order,
Interval between two comparison area be connected before and after calculating, the comparison area splicing that reservation interval is less than 200Kb is spliced with the genome sequence of its correspondence,
Using the connection evidence that the protein sequence of the described fragmentation corresponding to described genome sequence splicing splices as described genome sequence;
(3) genome sequence splicing screening unit
Sequence on protein sequence of described genome sequence splicing screening unit and described comparison area and screen unit and be connected, what described genome sequence splicing screening unit adopted is connect at most evidence to screen principle,
For with in the splicing of described genome sequence, the sequence of the new genomic fragment of follow-up connection for origin sequences, with in described genome sequence splicing, connect the sequence of new genomic fragment before for terminator sequence,
But genomic fragment new for follow-up connection is not connected before the genome sequence of new genomic fragment as starting point, but connect the follow-up genome sequence not connecting new genomic fragment of new genomic fragment as destination node using having before, using the genome sequence of genomic fragment new for existing connection before, the follow-up genomic fragment having again connection new as intermediate point
Remain with the described origin sequences of maximum described connection evidence and described terminator sequence;
(4) new genome sequence forming unit
Described new genome sequence forming unit and described genome sequence splice and screen unit and be connected,
For for the sequence retained final in step (3), eachly as the genomic fragment of starting point, starting point can only be it can be used as respectively, select follow-up intermediate point, for this intermediate point selects new intermediate point, further till finding destination node
The tandem connected according to above-mentioned each genome sequence is by the genomic fragment of each genomic fragment assembled in series Cheng Gengchang.
7. device as claimed in claim 6, wherein said protein sequence source comprises: (i) these species have been delivered or published protein sequence; (ii) protein sequence of allied species; (iii) for the species not having Protein Data Bank, disclosed transcript profile data prediction protein sequence is utilized.
8. device as claimed in claim 6, wherein said protein sequence derives from public database, or the protein sequence obtained after deriving from transcript profile order-checking translation.
9. device as claimed in claim 6, the comparison software adopted when protein sequence and genome are compared in wherein said fragmin sequence screening unit is BLAT sequence alignment program, and parameter is-q=prot and-t=dnax.
10. device as claimed in claim 6, in wherein said fragmin sequence screening unit, sequence coverage is too high refers to that sequence coverage is higher than 90%.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510755855.XA CN105219765A (en) | 2015-11-09 | 2015-11-09 | Protein sequence is utilized to build genomic method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510755855.XA CN105219765A (en) | 2015-11-09 | 2015-11-09 | Protein sequence is utilized to build genomic method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105219765A true CN105219765A (en) | 2016-01-06 |
Family
ID=54989035
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510755855.XA Pending CN105219765A (en) | 2015-11-09 | 2015-11-09 | Protein sequence is utilized to build genomic method and apparatus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105219765A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106055925A (en) * | 2016-05-24 | 2016-10-26 | 中国水产科学研究院 | Method and apparatus for assembling genome sequence based on transcriptome paired-end sequencing data |
CN107784200A (en) * | 2016-08-26 | 2018-03-09 | 深圳华大基因研究院 | A kind of method and apparatus for screening novel C RISPR Cas systems |
CN108897986A (en) * | 2018-05-29 | 2018-11-27 | 中南大学 | A kind of genome sequence joining method based on protein information |
CN112634988A (en) * | 2021-01-07 | 2021-04-09 | 内江师范学院 | Python language-based gene variation detection method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102206704A (en) * | 2011-03-02 | 2011-10-05 | 深圳华大基因科技有限公司 | Method and device for assembling genome sequence |
CN102789553A (en) * | 2012-07-23 | 2012-11-21 | 中国水产科学研究院 | Method and device for assembling genomes by utilizing long transcriptome sequencing result |
CN104657628A (en) * | 2015-01-08 | 2015-05-27 | 深圳华大基因科技服务有限公司 | Proton-based transcriptome sequencing data comparison and analysis method and system |
-
2015
- 2015-11-09 CN CN201510755855.XA patent/CN105219765A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102206704A (en) * | 2011-03-02 | 2011-10-05 | 深圳华大基因科技有限公司 | Method and device for assembling genome sequence |
CN102789553A (en) * | 2012-07-23 | 2012-11-21 | 中国水产科学研究院 | Method and device for assembling genomes by utilizing long transcriptome sequencing result |
CN104657628A (en) * | 2015-01-08 | 2015-05-27 | 深圳华大基因科技服务有限公司 | Proton-based transcriptome sequencing data comparison and analysis method and system |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106055925A (en) * | 2016-05-24 | 2016-10-26 | 中国水产科学研究院 | Method and apparatus for assembling genome sequence based on transcriptome paired-end sequencing data |
CN106055925B (en) * | 2016-05-24 | 2018-09-18 | 中国水产科学研究院 | The method and apparatus for assembling genome sequence based on transcript profile both-end sequencing data |
CN107784200A (en) * | 2016-08-26 | 2018-03-09 | 深圳华大基因研究院 | A kind of method and apparatus for screening novel C RISPR Cas systems |
CN107784200B (en) * | 2016-08-26 | 2020-11-06 | 深圳华大生命科学研究院 | Method and device for screening novel CRISPR-Cas system |
CN108897986A (en) * | 2018-05-29 | 2018-11-27 | 中南大学 | A kind of genome sequence joining method based on protein information |
CN108897986B (en) * | 2018-05-29 | 2020-11-27 | 中南大学 | Genome sequence splicing method based on protein information |
CN112634988A (en) * | 2021-01-07 | 2021-04-09 | 内江师范学院 | Python language-based gene variation detection method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tian et al. | A large-scale analysis of mRNA polyadenylation of human and mouse genes | |
Hsiao et al. | RNA editing in nascent RNA affects pre-mRNA splicing | |
Abdel-Ghany et al. | A survey of the sorghum transcriptome using single-molecule long reads | |
Liu et al. | Detecting alternatively spliced transcript isoforms from single‐molecule long‐read sequences without a reference genome | |
Tang et al. | Identification of protein coding regions in RNA transcripts | |
Marquez et al. | Unmasking alternative splicing inside protein-coding exons defines exitrons and their role in proteome plasticity | |
Hon et al. | Quantification of stochastic noise of splicing and polyadenylation in Entamoeba histolytica | |
Meyer et al. | Gene structure conservation aids similarity based gene prediction | |
Tang et al. | Suppression of artifacts and barcode bias in high-throughput transcriptome analyses utilizing template switching | |
CN105219765A (en) | Protein sequence is utilized to build genomic method and apparatus | |
CN109234267B (en) | Genome assembly method | |
CN113808668B (en) | Method and device for improving genome assembly integrity and application thereof | |
Parey et al. | Synteny-guided resolution of gene trees clarifies the functional impact of whole-genome duplications | |
KR20210116454A (en) | Genetic mutation recognition method and device and storage medium | |
Curado et al. | Promoter-like epigenetic signatures in exons displaying cell type-specific splicing | |
Zupanic et al. | Detecting translational regulation by change point analysis of ribosome profiling data sets | |
Lonardi et al. | When less is more:‘slicing’sequencing data improves read decoding accuracy and de novo assembly quality | |
Rose et al. | Computational discovery of human coding and non-coding transcripts with conserved splice sites | |
CN109411020A (en) | The method for carrying out whole genome sequence filling-up hole using long sequencing read | |
CN102789553B (en) | Method and device for assembling genomes by utilizing long transcriptome sequencing result | |
Pan et al. | Novo&Stitch: accurate reconciliation of genome assemblies via optical maps | |
Ling et al. | Evolution of alternative splicing in eudicots | |
CN106055925B (en) | The method and apparatus for assembling genome sequence based on transcript profile both-end sequencing data | |
Hocq et al. | Monitored eCLIP: high accuracy mapping of RNA-protein interactions | |
Gohr et al. | Insplico: effective computational tool for studying splicing order of adjacent introns genome-wide with short and long RNA-seq reads |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160106 |