Example I: arrayCGH and exon array data
In this first example, we illustrate how different genomic datasets can be visualized together in an integrated GenomeGraphs graphic. We use arrayCGH and Affymetrix exon array data and plot these together with genomic annotation from Ensembl.
We first load the GenomeGraphs package and one of its example datasets. This dataset contains copy number data and segmented copy number data, as well as exon array data for a small genomic region. Once the data are loaded, a gdObject is created for each data type, namely a Segmentation object containing the copy number segments, a GenericArray object containing the raw copy number data, an Ideogram object representing the relevant chromosome we are plotting, a GenericArray object containing the exon array data, and a GenomeAxis object for the genomic coordinate axis.
> library(GenomeGraphs)
> data('exampleData', package='GenomeGraphs')
> seg = makeSegmentation(value = segments,
start = segStart, end = segEnd, dp = DisplayPars(color = 'dodgerblue2', lwd = 2, lty = 'dashed'))
> copyNumber = makeGenericArray(intensity = cn, probeStart = probestart,
segmentation = seg, dp = DisplayPars(size = 3, color = 'seagreen', type="dot"))
> ideogram = makeIdeogram(chromosome = 3)
> expression = makeGenericArray(intensity = intensity, probeStart = exonProbePos,
dp = DisplayPars(color='darkred', type='point'))
> genomeAxis = makeGenomeAxis(add53 = TRUE, add35 = TRUE)
In a next step, genomic annotation information is retrieved on-line from Ensembl using the biomaRt package. We first connect to the Ensembl BioMart database and select the human (hsapiens) dataset. Then, we retrieve gene structures on the forward and reverse strands of the region we want to visualize.
> minbase = 180292097
> maxbase = 180492096
> mart = useMart('ensembl', dataset='hsapiens_gene_ensembl')
> genesplus = makeGeneRegion(start = minbase, end = maxbase, strand = '+', chromosome = '3', biomart = mart)
> genesmin = makeGeneRegion(start = minbase, end = maxbase, strand = '-', chromosome = '3', biomart = mart)
In a last step, the gdPlot function is called to plot instances of gdObject that were created above. The objects are given to gdPlot as a list and the order in the list will determine the plotting order from top to bottom. A minimum and maximum base position are also given as arguments to restrict the visualization to this particular genomic region. The plot produced from this example is shown in Figure 1.
> gdPlot(list(ideogram, expression, copyNumber, genesplus, genomeAxis, genesmin), minBase = minbase, maxBase = maxbase)
Example II: Transcript isoforms and exon array data
In a second example, we show how probe-level exon array data from the Affymetrix GeneChip® Human Exon 1.0 ST platform (data available from http://www.affymetrix.com), can be plotted along with gene models from Affymetrix as well as gene and transcript annotation from Ensembl. The data of the exon array are not plotted at the exact chromosomal location of the probes in order to clearly visualize alternative splicing events. Most of the exons are represented on the Human Exon 1.0 ST platform by four probes. The location of these four probes are equally spaced in the data plots. Each exon is separated by a vertical line and the exons are linked to their genomic location by connecting lines. This visualization makes it easy to relate alternative exon usage, as observed in the exon array data, to known alternative transcript isoforms in Ensembl (Figure 2). The region highlighted in the plot shows the exon that is not expressed in the samples. To generate this plot, we first create the different subclasses of gdObject, namely: Title, ExonArray, Gene, Transcript, and Legend objects. In addition, we make a custom annotation track using the AnnotationTrack class.
> data('unrData', package='GenomeGraphs')
> title = makeTitle(text ='ENSG00000009307', color = 'darkred')
> col = colorRampPalette(c('firebrick2','dodgerblue2'))(length(unrData[1,]))
> exon = makeExonArray(intensity = unrData, probeStart = unrPositions[,3], probeEnd = unrPositions[,4],
probeId = as.character(unrPositions[,1), nProbes = unrNProbes,
dp = DisplayPars(color = col, mapColor = 'dodgerblue2'), displayProbesets = FALSE)
> affyModel <- makeAnnotationTrack(start = unrPositions[,3], end = unrPositions[,4],
feature = "gene_model", group = "ENSG00000009307",
dp = DisplayPars(gene_model = "darkblue"))
> gene = makeGene(id = 'ENSG00000009307', biomart = mart)
> transcript = makeTranscript(id ='ENSG00000009307', biomart = mart)
>legend = makeLegend(text = c('affyModel','Ensembl Gene', 'Ensembl Transcript'),
fill = c('darkgreen','orange','cornflowerblue'), cex = 0.5)
In a second step, we use the RectangleOverlay class to create a highlighted region followed by the gdPlot function to produce the integrated plot.
> rOverlay = makeRectangleOverlay(start = 115085100, end = 115086500, region = c(3,5),
dp = DisplayPars(alpha = .2, fill = "olivedrab1"))
> gdPlot(list(title, exon, affyModel, gene, transcript, legend), minBase = 115061061, maxBase = 115102147, overlay = rOverlay)
The plot generated in this second example is shown in Figure 2.
Example III: Short read sequencing and tiling array data
In the final example, we show how complex and diverse sets of data can be integrated to facilitate joint analysis and draw biological conclusions by presenting data from various published datasets on yeast. First, we construct a list where each gdObject represents either annotation or a publicly available dataset. We have plotted data from Ensembl, an Illumina sequencing dataset [9], Affymetrix tiling array data [10], nucleosome position data [11], and conservation data across 7 related species [12].
> data("seqDataEx", package = "GenomeGraphs")
> str = seqDataEx$david [,"strand"] == 1
> biomart = useMart("ensembl", "scerevisiae_gene_ensembl")
> pList = list("-" = makeGeneRegion(chromosome = "IV", start = 1300000, end = 1310000,
strand = "-", biomart = biomart,
dp = DisplayPars(plotId = TRUE, idRotation = 0, cex = .5)),
makeGenomeAxis(dp = DisplayPars(byValue = 1e3, size = 3)),
"+" = makeGeneRegion(chromosome = "IV", start = 1300000, end = 1310000,
strand = "+", biomart = biomart,
dp = DisplayPars(plotId = TRUE, idRotation = 0, cex = .5)),
"Nagalakshmi" = makeBaseTrack(base = seqDataEx$snyder [, "location"], value = seqDataEx$snyder [, "counts"],
dp = DisplayPars(lwd = .3, color = "darkblue", ylim = c(0,300))),
"David +" = makeGenericArray(probeStart = seqDataEx$david [str, "location"],
intensity = seqDataEx$david [str, "expr", drop = FALSE],
dp = DisplayPars(pointSize = .5)),
"David -" = makeGenericArray(probeStart = seqDataEx$david [!str, "location"],
intensity = seqDataEx$david [!str, "expr", drop = FALSE],
dp = DisplayPars(color = "darkgreen", pointSize = .5)),
"Lee" = makeBaseTrack(base = seqDataEx$nislow [, "location"],
value = seqDataEx$nislow [, "evalue"], dp = DisplayPars(color="grey", lwd = .25)),
"Conservation" = makeBaseTrack(base = seqDataEx$conservation [, "location"],
value = seqDataEx$conservation [, "score"],
dp = DisplayPars(color="gold4", lwd = .25)))
Having constructed the list of elements we wish to plot, we now set up an overlay, using the RectangleOverlay class, to highlight a region of interest. Finally, we plot the result using gdPlot. Although configuring and designing the initial plot may seem laborious, once we have this basic structure we can easily produce plots for all regions of interest.
> rOverlay = makeRectangleOverlay(start = 1302105, end = 1302190, region = c(4,8), dp = DisplayPars(alpha = .2))
> gdPlot(pList, minBase = 1301500, maxBase = 1302500, overlay = rOverlay)
The plot produced in this third example is shown in Figure 3.