GenomeGraphs: integrated genomic data visualization with R (2024)

Example I: arrayCGH and exon array data

In this first example, we illustrate how different genomic datasets can be visualized together in an integrated GenomeGraphs graphic. We use arrayCGH and Affymetrix exon array data and plot these together with genomic annotation from Ensembl.

We first load the GenomeGraphs package and one of its example datasets. This dataset contains copy number data and segmented copy number data, as well as exon array data for a small genomic region. Once the data are loaded, a gdObject is created for each data type, namely a Segmentation object containing the copy number segments, a GenericArray object containing the raw copy number data, an Ideogram object representing the relevant chromosome we are plotting, a GenericArray object containing the exon array data, and a GenomeAxis object for the genomic coordinate axis.

> library(GenomeGraphs)

> data('exampleData', package='GenomeGraphs')

> seg = makeSegmentation(value = segments,

start = segStart, end = segEnd, dp = DisplayPars(color = 'dodgerblue2', lwd = 2, lty = 'dashed'))

> copyNumber = makeGenericArray(intensity = cn, probeStart = probestart,

segmentation = seg, dp = DisplayPars(size = 3, color = 'seagreen', type="dot"))

> ideogram = makeIdeogram(chromosome = 3)

> expression = makeGenericArray(intensity = intensity, probeStart = exonProbePos,

dp = DisplayPars(color='darkred', type='point'))

> genomeAxis = makeGenomeAxis(add53 = TRUE, add35 = TRUE)

In a next step, genomic annotation information is retrieved on-line from Ensembl using the biomaRt package. We first connect to the Ensembl BioMart database and select the human (hsapiens) dataset. Then, we retrieve gene structures on the forward and reverse strands of the region we want to visualize.

> minbase = 180292097

> maxbase = 180492096

> mart = useMart('ensembl', dataset='hsapiens_gene_ensembl')

> genesplus = makeGeneRegion(start = minbase, end = maxbase, strand = '+', chromosome = '3', biomart = mart)

> genesmin = makeGeneRegion(start = minbase, end = maxbase, strand = '-', chromosome = '3', biomart = mart)

In a last step, the gdPlot function is called to plot instances of gdObject that were created above. The objects are given to gdPlot as a list and the order in the list will determine the plotting order from top to bottom. A minimum and maximum base position are also given as arguments to restrict the visualization to this particular genomic region. The plot produced from this example is shown in Figure 1.

ArrayCGH and exon array data. The first track in this figure shows an ideogram of the human chromosome 3. The red marker highlights the plotted genomic region. The second track shows exon array data, where each data point corresponds to a probe measuring the expression level of an exon. The third track displays copy number data in green and segmented copy number data with dashed blue lines. Note the amplification which can be seen in both the copy number and exon array tracks, suggesting that the amplification event results in higher expression levels of the gene in this region. The bottom track shows the gene annotation data from Ensembl.

Full size image

> gdPlot(list(ideogram, expression, copyNumber, genesplus, genomeAxis, genesmin), minBase = minbase, maxBase = maxbase)

Example II: Transcript isoforms and exon array data

In a second example, we show how probe-level exon array data from the Affymetrix GeneChip® Human Exon 1.0 ST platform (data available from http://www.affymetrix.com), can be plotted along with gene models from Affymetrix as well as gene and transcript annotation from Ensembl. The data of the exon array are not plotted at the exact chromosomal location of the probes in order to clearly visualize alternative splicing events. Most of the exons are represented on the Human Exon 1.0 ST platform by four probes. The location of these four probes are equally spaced in the data plots. Each exon is separated by a vertical line and the exons are linked to their genomic location by connecting lines. This visualization makes it easy to relate alternative exon usage, as observed in the exon array data, to known alternative transcript isoforms in Ensembl (Figure 2). The region highlighted in the plot shows the exon that is not expressed in the samples. To generate this plot, we first create the different subclasses of gdObject, namely: Title, ExonArray, Gene, Transcript, and Legend objects. In addition, we make a custom annotation track using the AnnotationTrack class.

Transcript isoforms and exon array data. Probe-level exon array data is plotted in the top graphic. The data of the exon array is intentionally not plotted on the exact chromosomal location of the probes in order to clearly visualize alternative splicing events. Each line in the top track represents a different sample. Usually, there are four probes per exon on the Affymetrix GeneChip® Human Exon 1.0 ST platform, vertical gray lines group these four probes belonging to the same exon together. The blue connecting lines map these exons to gene models as defined by Affymetrix (green) and Ensembl (orange). Transcript isoforms known for this gene are plotted in dark blue. The region highlighted in the plot by an RectangleOverlay object shows the exon that is not expressed in the samples. One can see that this is a known alternatively spliced exon as annotated by Ensembl.

Full size image

> data('unrData', package='GenomeGraphs')

> title = makeTitle(text ='ENSG00000009307', color = 'darkred')

> col = colorRampPalette(c('firebrick2','dodgerblue2'))(length(unrData[1,]))

> exon = makeExonArray(intensity = unrData, probeStart = unrPositions[,3], probeEnd = unrPositions[,4],

probeId = as.character(unrPositions[,1), nProbes = unrNProbes,

dp = DisplayPars(color = col, mapColor = 'dodgerblue2'), displayProbesets = FALSE)

> affyModel <- makeAnnotationTrack(start = unrPositions[,3], end = unrPositions[,4],

feature = "gene_model", group = "ENSG00000009307",

dp = DisplayPars(gene_model = "darkblue"))

> gene = makeGene(id = 'ENSG00000009307', biomart = mart)

> transcript = makeTranscript(id ='ENSG00000009307', biomart = mart)

>legend = makeLegend(text = c('affyModel','Ensembl Gene', 'Ensembl Transcript'),

fill = c('darkgreen','orange','cornflowerblue'), cex = 0.5)

In a second step, we use the RectangleOverlay class to create a highlighted region followed by the gdPlot function to produce the integrated plot.

> rOverlay = makeRectangleOverlay(start = 115085100, end = 115086500, region = c(3,5),

dp = DisplayPars(alpha = .2, fill = "olivedrab1"))

> gdPlot(list(title, exon, affyModel, gene, transcript, legend), minBase = 115061061, maxBase = 115102147, overlay = rOverlay)

The plot generated in this second example is shown in Figure 2.

Example III: Short read sequencing and tiling array data

In the final example, we show how complex and diverse sets of data can be integrated to facilitate joint analysis and draw biological conclusions by presenting data from various published datasets on yeast. First, we construct a list where each gdObject represents either annotation or a publicly available dataset. We have plotted data from Ensembl, an Illumina sequencing dataset [9], Affymetrix tiling array data [10], nucleosome position data [11], and conservation data across 7 related species [12].

> data("seqDataEx", package = "GenomeGraphs")

> str = seqDataEx$david [,"strand"] == 1

> biomart = useMart("ensembl", "scerevisiae_gene_ensembl")

> pList = list("-" = makeGeneRegion(chromosome = "IV", start = 1300000, end = 1310000,

strand = "-", biomart = biomart,

dp = DisplayPars(plotId = TRUE, idRotation = 0, cex = .5)),

makeGenomeAxis(dp = DisplayPars(byValue = 1e3, size = 3)),

"+" = makeGeneRegion(chromosome = "IV", start = 1300000, end = 1310000,

strand = "+", biomart = biomart,

dp = DisplayPars(plotId = TRUE, idRotation = 0, cex = .5)),

"Nagalakshmi" = makeBaseTrack(base = seqDataEx$snyder [, "location"], value = seqDataEx$snyder [, "counts"],

dp = DisplayPars(lwd = .3, color = "darkblue", ylim = c(0,300))),

"David +" = makeGenericArray(probeStart = seqDataEx$david [str, "location"],

intensity = seqDataEx$david [str, "expr", drop = FALSE],

dp = DisplayPars(pointSize = .5)),

"David -" = makeGenericArray(probeStart = seqDataEx$david [!str, "location"],

intensity = seqDataEx$david [!str, "expr", drop = FALSE],

dp = DisplayPars(color = "darkgreen", pointSize = .5)),

"Lee" = makeBaseTrack(base = seqDataEx$nislow [, "location"],

value = seqDataEx$nislow [, "evalue"], dp = DisplayPars(color="grey", lwd = .25)),

"Conservation" = makeBaseTrack(base = seqDataEx$conservation [, "location"],

value = seqDataEx$conservation [, "score"],

dp = DisplayPars(color="gold4", lwd = .25)))

Having constructed the list of elements we wish to plot, we now set up an overlay, using the RectangleOverlay class, to highlight a region of interest. Finally, we plot the result using gdPlot. Although configuring and designing the initial plot may seem laborious, once we have this basic structure we can easily produce plots for all regions of interest.

> rOverlay = makeRectangleOverlay(start = 1302105, end = 1302190, region = c(4,8), dp = DisplayPars(alpha = .2))

> gdPlot(pList, minBase = 1301500, maxBase = 1302500, overlay = rOverlay)

The plot produced in this third example is shown in Figure 3.

Short read sequencing and tiling array data. Data plotted are Illumina sequencing data from Nagalakshmi et al. [9], tiling array data from David et al. [10], nucleosome data from Lee et al. [11], and conservation track data from Siepel et al. [12]. The semi-transparent box highlights a possible annotation error in SGD, as suggested by the occurrence of the transcript in multiple separate datasets. In addition, the conservation track data demonstrate corroborating evidence for the possibility of a longer gene.

Full size image
GenomeGraphs: integrated genomic data visualization with R (2024)
Top Articles
Latest Posts
Article information

Author: Kareem Mueller DO

Last Updated:

Views: 6127

Rating: 4.6 / 5 (66 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Kareem Mueller DO

Birthday: 1997-01-04

Address: Apt. 156 12935 Runolfsdottir Mission, Greenfort, MN 74384-6749

Phone: +16704982844747

Job: Corporate Administration Planner

Hobby: Mountain biking, Jewelry making, Stone skipping, Lacemaking, Knife making, Scrapbooking, Letterboxing

Introduction: My name is Kareem Mueller DO, I am a vivacious, super, thoughtful, excited, handsome, beautiful, combative person who loves writing and wants to share my knowledge and understanding with you.