mercredi 27 août 2014

plotting using row numbers vs. select column from data.frame in R (Coverage plot from RNA-seq data)


Vote count:

0




I'm relatively new in bioinformatics and need to generate coverage plots from RNA-seq results.


From genome aligned RNA-seq results( alignments), I was able to generate a Bed(or txt) file indicating which genomic location the sequence reads came from using commands from . In this case, I exclusively selected exon region for my experiment purpose.


The result file (a ~4gb size gigantic table) was now imported in R, using function provided by as a "data.frame"


To generate coverage plots of individual genes I search a gene called "Actb" as an example here from column 8 (V8) and this is how data is organized:


Actb.coverage <-["Actb"]




V8 V1 V2 V3 V4 V5 V6 V7 V9 V10 V11

1: Actb chr5 142903116 142903797 uc009ajk.1 utr3 0 - NM_007393 1 0


2: Actb chr5 142903116 142903797 uc009ajk.1 utr3 0 - NM_007393 2 0


3: Actb chr5 142903116 142903797 uc009ajk.1 utr3 0 - NM_007393 3 0


4: Actb chr5 142903116 142903797 uc009ajk.1 utr3 0 - NM_007393 4 0


--


1879: Actb chr5 142906652 142906724 uc009ajk.1 utr5 5 - NM_007393 70 0


1880: Actb chr5 142906652 142906724 uc009ajk.1 utr5 5 - NM_007393 71 0


1881: Actb chr5 142906652 142906724 uc009ajk.1 utr5 5 - NM_007393 72 0



Each row represents one nucleotides


So, in this simplified table, the 0th column (with no label) shows it is total 1881 rows (meaning that Actb gene consists of 1881 exon nucleotides)


and next V8 column is gene name, V1~V3 is chromosome ID and start and stop site of each given features in column V5 and V6 (i.e. utr3, 0 means first 3'UTR exon).


V7 is (-) and indicating direction of gene is 3' --> 5'end in the genome.


V11 column contains generated read count information (which is what I want) in given nucleotides. they are 0 in this table because there was no coverage at the very first four nucleotides and last three nucleotides shown here.


Question1


Therefore, to generate simple coverage plot, I can plot x-axis numbers from 1 to 1881, and y axis is values corresponding to V11 something like this:


plot(Actb.coverage[,V0], Actb.coverage[,V11]) but as you can see there is no column name of very first column V0, so I need alternative solutions


Question2


When this method works, I would like to add more options


Is it possible to subdivide x-axes based on column 5 (V5) and 6 (V6)? For example, the 1881 nucleotides of length is sub-devided to

utr3(V5)-0(V6),

utr3-1

cds-0

cds-1

cds-2

.

.

.

utr5-0

utr5-1

utr5-2

utr5-3

utr5-4

utr5-5


Each features length is determined by simple subtraction from value of V3 to Value of V2 column.


Resulting plot should be identical to the plot in question 1, but I want to add those sub-devided features along with x-axis


I feel like it should be possible, but I don't know how to achieve this. I seek for your help


Thank you very much


gdy



asked 51 secs ago

gdy

3






plotting using row numbers vs. select column from data.frame in R (Coverage plot from RNA-seq data)

Aucun commentaire:

Enregistrer un commentaire