fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences.

Kin T ¹,

Yamada K ,

Terai G ,

Okida H ,

Ono Y ,

Affiliations

1. Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST) Aomi 2-42, Koto-ku, Tokyo 135-0064, Japan.
Authors
Kin T¹
(1 author)

Nucleic Acids Research, 11 Nov 2006, 35(Database issue):D145-8
https://doi.org/10.1093/nar/gkl837 PMID: 17099231 PMCID: PMC1669753

This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.

Free full text in Europe PMC

Abstract

There are abundance of transcripts that code for no particular protein and that remain functionally uncharacterized. Some of these transcripts may have novel functions while others might be junk transcripts. Unfortunately, the experimental validation of such transcripts to find functional non-coding RNA candidates is very costly. Therefore, our primary interest is to computationally mine candidate functional transcripts from a pool of uncharacterized transcripts. We introduce fRNAdb: a novel database service that hosts a large collection of non-coding transcripts including annotated/non-annotated sequences from the H-inv database, NONCODE and RNAdb. A set of computational analyses have been performed on the included sequences. These analyses include RNA secondary structure motif discovery, EST support evaluation, cis-regulatory element search, protein homology search, etc. fRNAdb provides an efficient interface to help users filter out particular transcripts under their own criteria to sort out functional RNA candidates. fRNAdb is available at http://www.ncrna.org/

Free full text

Nucleic Acids Res. 2007 Jan; 35(Database issue): D145–D148.

Published online 2006 Nov 11. https://doi.org/10.1093/nar/gkl837

PMCID: PMC1669753

PMID: 17099231

fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences

Taishin Kin,^1,^* Kouichirou Yamada,³ Goro Terai,² Hiroaki Okida,² Yasuhiko Yoshinari,⁴ Yukiteru Ono,³ Aya Kojima,² Yuki Kimura,³ Takashi Komori,² and Kiyoshi Asai^1,⁵

Author information Article notes Copyright and License information Disclaimer

This article has been cited by other articles in PMC.

Go to:

Abstract

There are abundance of transcripts that code for no particular protein and that remain functionally uncharacterized. Some of these transcripts may have novel functions while others might be junk transcripts. Unfortunately, the experimental validation of such transcripts to find functional non-coding RNA candidates is very costly. Therefore, our primary interest is to computationally mine candidate functional transcripts from a pool of uncharacterized transcripts. We introduce fRNAdb: a novel database service that hosts a large collection of non-coding transcripts including annotated/non-annotated sequences from the H-inv database, NONCODE and RNAdb. A set of computational analyses have been performed on the included sequences. These analyses include RNA secondary structure motif discovery, EST support evaluation, cis-regulatory element search, protein homology search, etc. fRNAdb provides an efficient interface to help users filter out particular transcripts under their own criteria to sort out functional RNA candidates. fRNAdb is available at http://www.ncrna.org/

Go to:

INTRODUCTION

fRNAdb is a database that helps in annotating non-coding transcripts acquired from publicly available databases. H-inv: human full-length non-coding cDNAs (1); NONCODE: experimentally validated non-coding transcripts (2); and RNAdb: non-coding transcripts curated from the literature, human chromosome 7 project, and RIKEN antisense pipeline and other putative non-coding RNAs (3). Details are shown in Table 1. Each transcript is analyzed for various features such as maximum ORF length, the number of protein homologs, the average conservation score, transcription regulatory element motifs, existence of CpG islands and so on (listed in Table 2) that help in filtering out promising non-coding candidates. Transcripts can be filtered with fRNAdb's main listing interface in many different ways (see Figure 1). This main listing interface is linked to our custom UCSC Genome Browser (4) for functional RNAs equipped with our RNA-specific original custom tracks that are specific to screening of functional RNA. Users can inspect a transcript of interest from a genomic view with rich genomic information surrounding the mapped transcript. The information includes the UCSC original tracks such as known genes, genome conservation and Affymetrix transcriptome tracks (5), and our original tracks such as conserved potential secondary structure, existence of known RNA secondary structure motifs and significant RNA secondary structure Z-score regions (for details see Table 3).

An external file that holds a picture, illustration, etc.
Object name is gkl837f1.jpg

Figure 1

The first page shows a set of selection interfaces (A) and the listing table of 13 693 transcripts (B).

Table 1

Data sources of fRNAdb

Source	Num. seq. (mapped)
H-inv 2.0 (non-protein coding transcripts)	5489 (5217)
NONCODE	5339 (576)
RNAdb	2865 (1306)
RNAdb (literature curation)	1446 (524)
RNAdb (human chromosome 7 project)	306 (299)
RNAdb (RIKEN antisense pipeline)	1113 (486)
Total	13 693 (7102)

Table 2

List of attributes

S. no.	Description	Number of trascripts	Min/max
1	Length of the sequence (nt)	13 693	15/107 797
2	Number of exons	7166	0/60
3	Number of overlapping ESTs	4184	0/6490
4	Number of mapped positions	7158	0/892
5	GC-content (%)	13 693	4/87
6	Maximum length of potential ORF (amino acids)	12 655	0/1664
7	Percentage of bases that is covered with repeat elements	6460	0/100
8	Repeat elements reside proximal upstream/downstream	2219
9	Known gene that is a potential sense/antisense of this transcript (exon overlapping required)	936
10	Number of protein homologs (GenBank NR)	5811	0/250
11	Known gene that includes this transcript within its intron	951
12	Known gene region that overlaps with the mapping extent of this transcript (strand not considered)	4245
13	Known gene that overlaps with this transcript within its intron in different strand	965
14	Known gene where this transcript is possibly a part of its 3′-UTR	757
15	Known gene where this transcript is possibly a part of its 5′-UTR	77
16	Known gene within upstream 5 kb	1011
17	Known gene within downstream 5 kb	402
18	Average conservation score over the mapped exonic region	6184	0/93
19	Maximum conservation score over the mapped exonic region	5741	0/98
20	Maximum conservation score within 500 base upstream from the mapped 5′ terminal	6878	0/255
21	Overlapping UCSC ultra conserved region	24	0/4
22	Number of canonical splice signals in this transcript	751	0/30
23	Number of poly(A) signals in this transcript	8081	0/199
24	Number of CpG island	1353	0/4
25	Associated transposon free region	1137
26	Number of RFAM known RNA motifs in this transcript	5511	0/12
27	Number of RNAz predictive RNA motifs in this transcript	1185	0/24
28	Number of EvoFold predictive RNA motifs in this transcript	888	0/7
29	Maximum Z-score of RNA secondary structure over this transcript. Scores lower than −6 are significant. Higher scores are considered insignificant. Stored scores= raw score × −10	252	0.0/121.0
30	Number of cell lines responding to Affy probes in exon regions of this transcript (Affymetrix Transcriptome Phase 2 Tiling Array Analyses)	1593	0/11

The number of applicable transcripts and the range of the attributes are shown.

Table 3

Functional RNA-specific tracks

Track	Description
RNAz folds (15)	Secondary structure annotation of RNAz
ENOR (16)	ENOR (expressed non-coding region) [lifted from mm5]
Erdmann (6)	Erdmann non-coding RNAs
NONCODE (2)	Mapping information of NONCODE RNAs
RNAdb (3)	Mapping information of RNAdb RNAs
RNA Clusters	Small RNA genes often reside close to each other forming clusters. This track represents computationally identified RNA clusters in human genome
Rfam seed folds	Genomic search results with INFERNAL and covariance models generated from RFAM seeds
Rfam full	BLAT mapping results for RFAM full sequence dataset
antisense ChenJ NAR2004 (17)	Sense–antisense pairs among UCSC known genes
tRNAscan-SE (18)	tRNA genes predicted by tRNAscan-SE
Ultra conserved elements (19)	100% conserved elements (≥200 bp) in human, rat and mouse
Ultra conserved elements 17 way	100% conserved elements in 17 vertebrates (longer than 50 bp)
Transposon free region (20)	Regions longer than 5 kb or 10 kb containing no LINEs, SINEs and LTRs
Human accelerated region (14)	HAR non-coding gene candidates predicted by (14)
Z-score	Regions with Z-score lower (lower is better) than −6 (actual track score = Z-score × −10)

Go to:

fRNAdb

fRNAdb provides two types of interfaces. The first page presents a list of all transcripts rendered as a table with 35 columns including ones for the attributes described in Table 2 (Figure 1B). The tabular control panel is placed above the table, which presents five tabs labeled ‘Basic’, ‘DB/ID’, ‘Expert’, ‘Sort’ and ‘Column’ (Figure 1A). The Basic tab contains the basic filters: a collection of frequently used filters that provide simple and quick selection of transcripts that match common criteria of functional non-coding RNAs. For example, checking ‘Mapped’ to select only genome-mapped transcripts, ‘Well conserved at best (Max > 50%)’ for transcripts that have maximum conservation score >50% among 17 vertebrates (4) in their exonic regions, ‘EST-supported’ for reliable expression evidence, ‘Tiny ORF (<40 aa)’ enriching for non-coding transcripts, ‘Low Repeat Coverage (<30%)’ for no repeat element contamination, ‘No protein homolog’ for another condition which enriches non-coding transcripts, ‘No overlapping known gene’ is for removing the possibility of being part of a protein-coding gene transcript. After checking the boxes, the ‘refresh’ button runs filtering action and presents results. Our example conditions yield nine hits including one H-inv non-protein coding cDNA and eight RNAdb literature-curated miRNAs. In other words, these criteria match real functional RNAs and also indicate that one non-coding transcript shares the same properties. Clicking on the ID of this transcript produces a detailed view of this transcript shown in Figure 2. This feature visualizer shows graphical representation of a variety of sequence elements found in the transcript including cis-regulatory elements, repeat elements, EST mapping regions and six frame stop codon positions. There are many different ways to filter these non-coding transcripts and there are many more potential candidates hidden in this dataset. More details of the basic filters are provided on the website.

An external file that holds a picture, illustration, etc.
Object name is gkl837f2.jpg

Figure 2

mRNA view of a transcript. Regulatory elements, EST positions, splice positions, repeat elements, six frame stop codons are visualized along with the full span of a cDNA.

The rest of the tabs offer additional functionality to further improve usability. The DB/ID tab contains DB selection and ID selection boxes. The DB selection box allows you to limit the target databases from currently available databases: H-inv, NONCODE and RNAdb. The ID selection box lets you choose target transcripts that match given string patterns. For example, specifying ‘FR000001’ (fRNAdb ID) in this box limits the target transcript FR000001 alone. The wild-card ‘%’ is allowed for pattern matching. Specifying ‘LIT%’ lets you limit the search to targets whose original IDs start with ‘LIT’. The string pattern is matched against ID, Acc. and Original columns. The Expert tab provides an interface to specify multiple conditions that let you perform more complex filtering than the basic filters. Please refer to the website for more details about the expert filters. The Sort tab has a sorting interface that lets you sort the table with multiple sorting keys. The Column tab allows you to limit visible columns of the main listing table. Since the 35-column table is too wide for ordinary browsers to display on a single screen, you can narrow the width of the table with this interface for better visibility.

Go to:

UCSC GENOME BROWSER FOR FUNCTIONAL RNAs

We mirrored the UCSC Genome Browser and added our custom tracks specific to functional RNAs and miRNAs as shown in Tables 3 and and4.4. Most of the tracks have their own sources and reference papers. Our original tracks are RNA clusters, Rfam seed folds, tRNAscan-SE, Ultra Conserved Elements 17way and Z-score (details are shown in Table 3). Besides, we mapped RNA sequences from public functional RNA sequence databases including Erdmann (6), NONCODE, RNAdb and Rfam. The UCSC Genome Browser has several tracks for miRNA genes and targets but we added more tracks including miRBase (7) known miRNA genes, miRNAMap (8) and Berezikov's predicted miRNA genes (9), TarBase (10) known miRNA targets, and predicted miRNA targets from RNAhybrid (11), PicTar 4 species and 5 species (12), miRBase targets and T-ScanS miRNA targets (13). Our custom tracks can be downloaded by using Table browser which can be accessed via ‘Table’ menu of the UCSC Genome Browser.

Table 4

miRNA-specific tracks

Track	Description
Known miRNAs	miRBase known miRNAs
Predicted miRNAs	miRNAMap and Berezikov's predicted miRNAs
Known targets	TarBase experimentally verified miRNA target sites
Predicted targets	RNAhybrid, PicTar, miRBase and T-ScanS-predicted miRNA target sites

In the near future, fRNAdb will include more transcripts from other sequence databases or non-coding gene prediction results. For example, Human Accelerated Region (14) is currently included as our custom track of the Genome Browser. Sequences of these non-coding gene candidates will be included in fRNAdb. We will also add more attributes to fRNAdb. Especially attributes representing expression patterns of the transcripts or protein genes related to the transcripts.

Go to:

Acknowledgments

This research is partially supported by the Functional RNA project funded by Ministry of Economy, Trade and Industry (METI). We thank Dr. Paul Horton for his kind help. Funding to pay the Open Access publication charges for this article was provided by National Institute of Advanced Industrial Science and Technology (AIST).

Conflict of interest statement. None declared.

Go to:

REFERENCES

1. Imanishi T., Itho T., Suzuki Y., O'Donovan C., Fukuchi S., Koyanagi K.O., Barrero R.A., Tamura T., Yamaguchi-Kabata Y., Tanino M., et al. Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol. 2004;2:856–875. [Europe PMC free article] [Abstract] [Google Scholar]

2. Liu C., Bai B., Skogerbo G., Cai L., Deng W., Zhang Y., Bu D., Zhao Y., Chen R. NONCODE: an integrated knowledge database of non-coding RNAs. Nucleic Acids Res. 2005;33:D112–D115. [Europe PMC free article] [Abstract] [Google Scholar]

3. Pang K.C., Stephen S., Engstrom P.G., Tajul-Arifin K., Chen W., Wahlestedt C., Lenhard B., Hayashizaki Y., Mattick J.S. RNAdb—a comprehensive mammalian noncoding RNA database. Nucleic Acids Res. 2005;33:D125–D130. [Europe PMC free article] [Abstract] [Google Scholar]

4. Hinrichs A.S., Karolchik D., Baertsch R., Barber G.P., Bejerano G., Clawson H., Diekhans M., Furey T.S., Harte R.A., Hsu F., et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006;34:D590–D598. [Europe PMC free article] [Abstract] [Google Scholar]

5. Cheng J., Kapranov P., Drenkow J., Dike S., Brubaker S., Patel S., Long J., Stern D., Tammana H., Helt G., et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 2005;308:1149–1154. [Abstract] [Google Scholar]

6. Szymanski M., Erdmann V.A., Barciszewski J. Noncoding regulatory RNAs database. Nucleic Acids Res. 2003;31:429–431. [Europe PMC free article] [Abstract] [Google Scholar]

7. Griffiths-Jones S. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006;34:D140–D144. [Europe PMC free article] [Abstract] [Google Scholar]

8. Hsu P.W., Huang H.D., Hsu S.D., Lin L.Z., Tsou A.P., Tseng C.P., Stadler P.F., Washietl S., Hofacker I.L. miRNAMap: genomic maps of microRNA genes and their target genes in mammalian genome. Nucleic Acids Res. 2006;34:D135–D139. [Europe PMC free article] [Abstract] [Google Scholar]

9. Berezikov E., Guryev V., van de Belt J., Wienholds E., Plasterk R.H., Cuppen E. Phylogenetic shadowing and computational identification of human microRNA genes. Cell. 2005;120:21–24. [Abstract] [Google Scholar]

10. Sethupathy P., Corda B., Hatzigeorgiou A.G. TarBase: a comprehensive database of experimentally supported animal microRNA targets. RNA. 2006;12:192–197. [Europe PMC free article] [Abstract] [Google Scholar]

11. Kuger J., Rehmsmeier M. RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic Acids Res. 2006;34:W451–W454. [Europe PMC free article] [Abstract] [Google Scholar]

12. Krek A., Grun D., Poy M.N., Wolf R., Rosenberg L, Epstein E.J., MacMenamin P., da Piedade I., Gunsalus K.C., Stoffel M., et al. Combinatorial microRNA target predictions. Nature Genet. 2005;37:495–500. [Abstract] [Google Scholar]

13. Lewis B.P., Burge C.B., Bartel D.P. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005;120:15–20. [Abstract] [Google Scholar]

14. Pollard K.S., Salama S.R., Lambert N., Lambot M.A., Coppens S., Pedersen J.S., Katzman S., King B., Onodera C., Siepel A., et al. An RNA gene expressed during cortical development evolved rapidly in humans. Nature. 2006;443:167–172. [Abstract] [Google Scholar]

15. Washietl S., Hofacker I.L., Lukasser M., Huttenhofer A., Stadler P.F. Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat. Biotechnol. 2005;23:1383–1390. [Abstract] [Google Scholar]

16. Furuno M., Pang K.C., Ninomiya N., Fukuda S., Frith M.C., Bult C., Kai C., Kawai J., Carninci P., Hayashizaki Y., et al. Clusters of internally primed transcripts reveal novel long noncoding RNAs. PLoS Genet. 2006;2:e37. [Europe PMC free article] [Abstract] [Google Scholar]

17. Chen J., Sun M., Kent W.J., Huang X., Xie H., Wang W., Zhou G., Shi R.Z., Rowley J.D. Over 20% of human transcripts might form sense-antisense pairs. Nucleic Acids Res. 2004;32:4812–4820. [Europe PMC free article] [Abstract] [Google Scholar]

18. Lowe T.M., Eddy S.R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–964. [Europe PMC free article] [Abstract] [Google Scholar]

19. Bejerano G., Pheasant M., Makunin I., Stephen S., Kent W.J., Mattick J.S., Haussler D. Ultraconserved elements in the human genome. Science. 2004;304:1321–1325. [Abstract] [Google Scholar]

20. Simons C., Pheasant M., Makunin I.V., Mattick J.S. Transposon-free regions in mammalian genome. Genome Res. 2005;16:164–172. [Europe PMC free article] [Abstract] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Full text links

Read article at publisher's site: https://doi.org/10.1093/nar/gkl837

Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc1669753?pdf=render

Citations & impact

Impact metrics

Citations

Jump to Citations

Citations of article over time

Article citations

RNAapt3D: RNA aptamer 3D-structural modeling database.
Sato R, Suzuki K, Yasuda Y, Suenaga A, Fukui K
Biophys J, 121(24):4770-4776, 22 Sep 2022
Cited by: 2 articles | PMID: 36146935 | PMCID: PMC9808543
Free full text in Europe PMC
Fast free-energy-based neutral set size estimates for the RNA genotype-phenotype map.
Martin NS, Ahnert SE
J R Soc Interface, 19(191):20220072, 15 Jun 2022
Cited by: 2 articles | PMID: 35702868 | PMCID: PMC9198509
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution.
Johnston IG, Dingle K, Greenbury SF, Camargo CQ, Doye JPK, Ahnert SE, Louis AA
Proc Natl Acad Sci U S A, 119(11):e2113883119, 11 Mar 2022
Cited by: 24 articles | PMID: 35275794 | PMCID: PMC8931234
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Dicer-independent snRNA/snoRNA-derived nuclear RNA 3 regulates tumor-associated macrophage function by epigenetically repressing inducible nitric oxide synthase transcription.
Shi Y, Shi Q, Shen Q, Zhang Q, Cao X
Cancer Commun (Lond), 41(2):140-153, 17 Jan 2021
Cited by: 12 articles | PMID: 33455092 | PMCID: PMC7896748
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Neutral components show a hierarchical community structure in the genotype-phenotype map of RNA secondary structure.
Weiß M, Ahnert SE
J R Soc Interface, 17(171):20200608, 21 Oct 2020
Cited by: 4 articles | PMID: 33081646 | PMCID: PMC7653385
Free full text in Europe PMC

Go to all (97) article citations

Search life-sciences literature (45,094,167 articles, preprints and more)

fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences.

Affiliations

Abstract

Free full text

fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences

Taishin Kin

Kouichirou Yamada

Goro Terai

Hiroaki Okida

Yasuhiko Yoshinari

Yukiteru Ono

Aya Kojima

Yuki Kimura

Takashi Komori

Kiyoshi Asai

Table 1

Table 2

Table 3

Table 4

Full text links

Citations & impact

Impact metrics

Citations of article over time

Article citations

RNAapt3D: RNA aptamer 3D-structural modeling database.

Fast free-energy-based neutral set size estimates for the RNA genotype-phenotype map.

Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution.

Dicer-independent snRNA/snoRNA-derived nuclear RNA 3 regulates tumor-associated macrophage function by epigenetically repressing inducible nitric oxide synthase transcription.

Neutral components show a hierarchical community structure in the genotype-phenotype map of RNA secondary structure.

Similar Articles

The Functional RNA Database 3.0: databases to support mining and annotation of functional RNAs.

RNAdb 2.0--an expanded database of mammalian non-coding RNAs.

NONCODE: an integrated knowledge database of non-coding RNAs.

miRBase: the microRNA sequence database.

Partnerships & funding