Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some questions #2

Open
BenxiaHu opened this issue Dec 6, 2022 · 5 comments
Open

some questions #2

BenxiaHu opened this issue Dec 6, 2022 · 5 comments

Comments

@BenxiaHu
Copy link
BenxiaHu commented Dec 6, 2022

Hello Rui,
I still have several questions about the codes:
1: # strandFlipped from function "TU_prep"; if any experiment prepared first-in-pair read on reverse strand, then "strandFlipped" is TRUE (strandFlipped <- FALSE). Would you like to explain this?
2: mergeFeatures <<- input$merge_features # "protein_coding" or additional multi-exon transcript type, e.g. lincRNA. Based on your point, what option do I select? "protein_coding"?
3: mergeMethod <<- input$mergeMethod # by "exon" or "tu". Do you think "tu" is much better than "exon"? it seems that "tu" would generate much longer transcripts than "exon".

Best,

@shaorray
Copy link
Owner
shaorray commented Dec 9, 2022

Hi Benxia,

I think there will be no problem to start a run directly after initializing the input parameters from line 574.

And the functions were wrapped from the original R code, and pretty much the same if you set the mergeMethod as "tu".

Best, Rui

@BenxiaHu BenxiaHu changed the title code na Dec 9, 2022
@BenxiaHu
Copy link
Author
BenxiaHu commented Apr 18, 2023

Hi Benxia,

I think there will be no problem to start a run directly after initializing the input parameters from line 574.

And the functions were wrapped from the original R code, and pretty much the same if you set the mergeMethod as "tu".

Best, Rui

thanks a lot.

@BenxiaHu BenxiaHu changed the title na some question Apr 18, 2023
@BenxiaHu BenxiaHu changed the title some question some questions Apr 18, 2023
@BenxiaHu
Copy link
Author
BenxiaHu commented Apr 18, 2023

Hello Rui,
I still have several questions about the codes:
1: # strandFlipped from function "TU_prep"; if any experiment prepared first-in-pair read on reverse strand, then "strandFlipped" is TRUE (strandFlipped <- FALSE). Would you like to explain this?

2: mergeFeatures <<- input$merge_features # "protein_coding" or additional multi-exon transcript type, e.g. lincRNA. Based on your point, what option do I select? "protein_coding"?

3: mergeMethod <<- input$mergeMethod # by "exon" or "tu". Do you think "tu" is much better than "exon"? it seems that "tu" would generate much longer transcripts than "exon".

4: TSS_len <<- input$L_TSS # default 1000
Promoter_len <<- input$L_Promoter # default 1000
TTS_len <<- input$L_TTS # default 1000
what is the difference TSS_len and Promoter_len?
5: it seems this code just can analyze nascent RNA-seq from human, not other species.

Best,

@shaorray
Copy link
Owner

Hi Benxia,

  1. "strandFlipped" is TRUE (strandFlipped <- FALSE). Would you like to explain this?

Different library preparation methods (home-made or commercial kit, e.g. TruSeq) have inconsistent strandedness. Here for TT-seq with Oviation kit, the first reads are on the forward strand. But it might be on reverse strand in some experiments, and causes a trouble for TU annotation unless each bam file are carefully specified. I made this boolean, "strandFlipped", by comparing TU and gene strands, for handling bam files from different sources.

  1. what option do I select? "protein_coding"?

"protein_coding" or mRNA is multi-exon transcript, the same as "lincRNA". I would recommend to use both "protein_coding" and "lincRNA".

  1. Do you think "tu" is much better than "exon"?

It depends on the purpose, whether the aim is about transcription or RNA itself. If it is to capture the entire transcription cycle, then "tu" method is better (as the first TT-seq paper). If the aim is to find non-coding RNA species around a coding region, then "exon" method will be more suitable. I made an extra step of clipping off the down-stream sense RNA (in the termination window) from coding region, since it has different synthesis rate and confounds in gene level synthesis estimation. And down-stream anti-sense ncRNA will be named thereafter, according to gene regions (by exon) rather by the entire TU.

  1. what is the difference TSS_len and Promoter_len?

TSS_len is downstream of TSS, in (0, TSS_len] region. Promoter_len is upstream, in (-Promoter_len, 0] region.

  1. not other species

It actually can take a gene annotation file that matches the bam files, but I haven't fully test all versions, Ensembl and GENCODE etc.. Please let me know if there is any issue.

I would agree developing a R package is a good option, while a graphical interface in my opinion is nice for reproducible research as well. This local shiny app is computational intensive, but the main issue is downloader's limited permission cross platforms (sometimes not working on Ubuntu).

Best,

Rui

@BenxiaHu
Copy link
Author
BenxiaHu commented Apr 20, 2023

Thanks.
The shiny app can not handle large bam file >40GB.
is it possible to take as input a bigwig file which is much smaller than bam file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants