Array-based Genome Comparison of Arabidopsis Ecotypes Using Hidden Markov Models





Three-state HMM with Gaussian emission densities for the analysis of Array-CGH data. States of the HMM are represented by circles labeled with '-' (decreased), '=' (unchanged), and '+' (increased) modeling copy numbers of DNA segments in an ecotype compared to a reference genome. Transitions between states are represented by arrows modeling all possible transitions in an Array-CGH profile. Gaussian emission densities characterize the states. Thus, the emission density of the unchanged state (gray) has its mean about zero, whereas the emission densities of the decreased state (green) and the increased state (red) have means significantly different from zero.

Overview

Motivation

Back to Overview





Exemplary comparison of segmentation results for DNA regions on chromosome 4 for ecotype C24 compared to Columbia. From left to right separated by gray dashed lines: Region 1 [654,108 bp - 697,518 bp], Region 2 [1,305,320 bp - 1,324,132 bp], Region 3 [3,731,013 bp - 3,761,229 bp], and Region 4 [5,411,025 bp - 5,433,126 bp]. The two top plots represent segmentation results of the HMM approach for two interleaved arrays Array 1 and Array 2. Green dots label tiles predicted by the HMM to have a decreased copy number. Red dots label tiles predicted by the HMM to have an increased copy number. Blue dashed lines highlight DNA segments significantly different from permuted data at a Score-value threshold of 0.01. Black dots label tiles predicted by the HMM to have unchanged copy numbers. The two bottom plots represent segmentation results of the segMNTalgorithm for both arrays. Red dashed lines show that no segmentation was obtained. Both approaches provide a quite different segmentation of the DNA regions. Here, the segMNT algorithm failed to identify segments with decreased or increased copy numbers. The HMM approach clearly identifies segments with significantly decreased or increased copy numbers, and in addition, these biologically interesting results are reproducible for both arrays.

Downloads

Back to Overview

Case Study

Back to Overview

  1. Download the TilingArrayAnalyzer

  2. Unpack the file TilingArrayAnalyzer.tar.gz

  3. You obtain the folder TilingArrayAnalyzer containing

  4. Here we analyze the data set 6486702.txt of ecotype C24 compared to Columbia stored in the directory TilingArrayAnalyzer/RawData

Training

Scoring

General Usage

Back to Overview

Chr

Pos

Log-Ratio

chr1

108

1.87

chr1

216

-0.25

chr1

324

1.37

chr2

44

-1.66

chr2

88

0.66


© Michael Seifert April 2009