Vote count:
0
I'm relatively new in bioinformatics and need to generate coverage plots from RNA-seq results.
From genome aligned RNA-seq results(tophat alignments), I was able to generate a Bed(or txt) file indicating which genomic location the sequence reads came from using coveragebed commands from bedtools. In this case, I exclusively selected exon region for my experiment purpose.
The result file (a ~4gb size gigantic table) was now imported in R, using fread function provided by data.table as a "data.frame"
To generate coverage plots of individual genes I search a gene called "Actb" as an example here from column 8 (V8) and this is how data is organized:
Actb.coverage <-["Actb"]
V8 V1 V2 V3 V4 V5 V6 V7 V9 V10 V111: Actb chr5 142903116 142903797 uc009ajk.1 utr3 0 - NM_007393 1 0
2: Actb chr5 142903116 142903797 uc009ajk.1 utr3 0 - NM_007393 2 0
3: Actb chr5 142903116 142903797 uc009ajk.1 utr3 0 - NM_007393 3 0
4: Actb chr5 142903116 142903797 uc009ajk.1 utr3 0 - NM_007393 4 0
--
1879: Actb chr5 142906652 142906724 uc009ajk.1 utr5 5 - NM_007393 70 0
1880: Actb chr5 142906652 142906724 uc009ajk.1 utr5 5 - NM_007393 71 0
1881: Actb chr5 142906652 142906724 uc009ajk.1 utr5 5 - NM_007393 72 0
Each row represents one nucleotides
So, in this simplified table, the 0th column (with no label) shows it is total 1881 rows (meaning that Actb gene consists of 1881 exon nucleotides)
and next V8 column is gene name, V1~V3 is chromosome ID and start and stop site of each given features in column V5 and V6 (i.e. utr3, 0 means first 3'UTR exon).
V7 is (-) and indicating direction of gene is 3' --> 5'end in the genome.
V11 column contains coveragebed generated read count information (which is what I want) in given nucleotides. they are 0 in this table because there was no coverage at the very first four nucleotides and last three nucleotides shown here.
Question1
Therefore, to generate simple coverage plot, I can plot x-axis numbers from 1 to 1881, and y axis is values corresponding to V11 something like this:
plot(Actb.coverage[,V0], Actb.coverage[,V11]) but as you can see there is no column name of very first column V0, so I need alternative solutions
Question2
When this method works, I would like to add more options
Is it possible to subdivide x-axes based on column 5 (V5) and 6 (V6)? For example, the 1881 nucleotides of length is sub-devided to
utr3(V5)-0(V6),
utr3-1
cds-0
cds-1
cds-2
.
.
.
utr5-0
utr5-1
utr5-2
utr5-3
utr5-4
utr5-5
Each features length is determined by simple subtraction from value of V3 to Value of V2 column.
Resulting plot should be identical to the plot in question 1, but I want to add those sub-devided features along with x-axis
I feel like it should be possible, but I don't know how to achieve this. I seek for your help
Thank you very much
gdy
plotting using row numbers vs. select column from data.frame in R (Coverage plot from RNA-seq data)
Aucun commentaire:
Enregistrer un commentaire