-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some questions #2
Comments
Hi Benxia, I think there will be no problem to start a run directly after initializing the And the functions were wrapped from the original R code, and pretty much the same if you set the Best, Rui |
thanks a lot. |
Hello Rui, 2: mergeFeatures <<- input$merge_features # "protein_coding" or additional multi-exon transcript type, e.g. lincRNA. Based on your point, what option do I select? "protein_coding"? 3: mergeMethod <<- input$mergeMethod # by "exon" or "tu". Do you think "tu" is much better than "exon"? it seems that "tu" would generate much longer transcripts than "exon". 4: TSS_len <<- input$L_TSS # default 1000 Best, |
Hi Benxia,
Different library preparation methods (home-made or commercial kit, e.g. TruSeq) have inconsistent strandedness. Here for TT-seq with Oviation kit, the first reads are on the forward strand. But it might be on reverse strand in some experiments, and causes a trouble for TU annotation unless each bam file are carefully specified. I made this boolean, "strandFlipped", by comparing TU and gene strands, for handling bam files from different sources.
"protein_coding" or mRNA is multi-exon transcript, the same as "lincRNA". I would recommend to use both "protein_coding" and "lincRNA".
It depends on the purpose, whether the aim is about transcription or RNA itself. If it is to capture the entire transcription cycle, then "tu" method is better (as the first TT-seq paper). If the aim is to find non-coding RNA species around a coding region, then "exon" method will be more suitable. I made an extra step of clipping off the down-stream sense RNA (in the termination window) from coding region, since it has different synthesis rate and confounds in gene level synthesis estimation. And down-stream anti-sense ncRNA will be named thereafter, according to gene regions (by exon) rather by the entire TU.
It actually can take a gene annotation file that matches the bam files, but I haven't fully test all versions, Ensembl and GENCODE etc.. Please let me know if there is any issue. I would agree developing a R package is a good option, while a graphical interface in my opinion is nice for reproducible research as well. This local shiny app is computational intensive, but the main issue is downloader's limited permission cross platforms (sometimes not working on Ubuntu). Best, Rui |
Thanks. |
Hello Rui,
I still have several questions about the codes:
1: # strandFlipped from function "TU_prep"; if any experiment prepared first-in-pair read on reverse strand, then "strandFlipped" is TRUE (
strandFlipped <- FALSE
). Would you like to explain this?2: mergeFeatures <<- input$merge_features # "protein_coding" or additional multi-exon transcript type, e.g. lincRNA. Based on your point, what option do I select? "protein_coding"?
3: mergeMethod <<- input$mergeMethod # by "exon" or "tu". Do you think "tu" is much better than "exon"? it seems that "tu" would generate much longer transcripts than "exon".
Best,
The text was updated successfully, but these errors were encountered: