|
Utilizing promoter pair orientations for HMM-based analysis of ChIP-chip data |
Supplementary information to our paper: 'Utilizing promoter pair orientations for HMM-based analysis of ChIP-chip data' by Michael Seifert, Jens Keilwagen, Marc Strickert, and Ivo Grosse at the GCB2008 in Dresden.
|
|
|
|
|
Fig.1: Exemplary Hidden Markov Model with scaled transition matrices (SHMM) for the analysis of ChIP-chip profiles. The two states + and - model the potenial of a promoter to be a target (+) or not to be a target (-) of the studied transcription factor. Each state is characterized by its Gaussian emission distribution. The arrows represent transitions between the ChIP-chip measurements of adjacent promoters on the DNA. Thick arrows characterize adjacent promoters in head-head orientation, and thin arrows model adjacent promoters that are not in head-head orientation. |
||
|
Overview |
Seed development of Arabidopsis thaliana as model system for the comparison of different methods of ChIP-chip data analysis
ChIPchipAnalyzer: A JAR-file for analyzing ChIP-chip data
Log-Fold-Change analysis (LFC)
Hidden Markov Model (HMM)
Hidden Markov Model with scaled transition matrices (SHMM)
ABI3 ChIP-chip data set
Shows how to use the ChIPchipAnalyzer in combination with the ABI3 data set.
Shows how a data set must be structured to apply the ChIPchipAnalyzer.
|
Motivation |
ChIP-chip is still far from being routinely used for Arabidopsis thaliana
ChIP-chip based on promoter arrays was established for the transcription factor ABI3 in the trilateral project ArabidoSeed
ABI3 is a fundamental regulator of seed development
Knowledge of ABI3 target genes is important for a better understanding of seed development
One of the bioinformatics tasks is to provide tools for the detection of these target genes
Log-Fold Change analysis (LFC)
Two-State Hidden Markov Model (HMM) modeling chromosomal locations of promoters
Two-State Hidden Markov Model with scaled transition matrices (SHMM) modeling chromosomal locations of promoters and promoter pair orientations
Considering ChIP-chip data of ABI3 the SHMM can be seen as the more general model that should be preferred for the detection of ABI3 target genes (validation by independent Genevestigator expression data and transient assays)
We conjecture that SHMMs might possibly be useful for the analysis of other promoter array ChIP-chip data
|
Downloads |
ChIPchipAnalyzer (231 KB)
Implements LFC, HMMs, and SHMMs to analyze ChIP-chip data.
Requires Java 1.5 or higher.
Usage information
java -jar ChIPchipAnalyzer.jar
ABI3 data set (Currently not possible because the work on a biological paper is sill in progress.) (1,2 MB)
ABI3 experiments that we used in our case study.
|
Case Study |
Download the ChIPchipAnalyzer and the ABI3 data set to a folder CaseStudy on your system.
Unpack the ABI3 data set
Ensure to have enough heap space for Java -Xms256m -Xmx256m
Log-Fold-Change Analysis
Go to the directory CaseStudy
Start the ChIPchipAnalyzer
java -jar ChIPchipAnalyzer.jar -LFC -output true -dataSets 061220ABI3.txt 070222ABI3.txt 070223ABI3.txt 070420ABI3.txt 070427ABI3.txt
You obtain ranking lists in the directory CaseStudy based on decreasing log-ratios for each experiment.
061220ABI3.txt_ScoringLFC.txt
070222ABI3.txt_ScoringLFC.txt
070223ABI3.txt_ScoringLFC.txt
070420ABI3.txt_ScoringLFC.txt
070427ABI3.txt_ScoringLFC.txt
Standard HMMs
Go to the directory CaseStudy
Start the ChIPchipAnalyzer
java -jar ChIPchipAnalyzer.jar -startDistribution 0.9 0.1 -stateDurationScalingFactor 0.05 -means 0.0 2.0 -stds 1.0 0.5 -ess 4 -aprioriMeans 0.0 2.0 -scaleOfAprioriMeans 1000 1000 -shapeOfStandardDeviations 1 100 -scaleOfStandardDeviations 1E-4 1E-4 -output true -dataSets 061220ABI3.txt 070222ABI3.txt 070223ABI3.txt 070420ABI3.txt 070427ABI3.txt -outputFile HMM.txt
You obtain the trained standard HMM in the directory CaseStudy.
HMM.txt
You obtain ranking lists in the directory CaseStudy based on decreasing state posteriors of state '+' (target state) for each experiment.
061220ABI3.txt_ScoringHMM.txt
070222ABI3.txt_ScoringHMM.txt
070223ABI3.txt_ScoringHMM.txt
070420ABI3.txt_ScoringHMM.txt
070427ABI3.txt_ScoringHMM.txt
SHMMs
Go to directory CaseStudy
Start the ChIPchipAnalyzer
java -jar ChIPchipAnalyzer.jar -startDistribution 0.9 0.1 -stateDurationScalingFactor 0.05 -scalingFactor 4.0 -distanceThreshold 9000 -means 0.0 2.0 -stds 1.0 0.5 -ess 4 -aprioriMeans 0.0 2.0 -scaleOfAprioriMeans 1000 1000 -shapeOfStandardDeviations 1 100 -scaleOfStandardDeviations 1E-4 1E-4 -output true -dataSets 061220ABI3.txt 070222ABI3.txt 070223ABI3.txt 070420ABI3.txt 070427ABI3.txt -outputFile SHMM.txt
You obtain the trained SHMM.
SHMM.txt
You obtain ranking lists in the directory CaseStudy based on decreasing state posteriors of state '+' (target state) for each experiment.
061220ABI3.txt_ScoringSHMM4.0_9000.txt
070222ABI3.txt_ScoringSHMM4.0_9000.txt
070223ABI3.txt_ScoringSHMM4.0_9000.txt
070420ABI3.txt_ScoringSHMM4.0_9000.txt
070427ABI3.txt_ScoringSHMM4.0_9000.txt
|
General Usage |
The ChIPchipAnalyzer can be applied to data sets structured in the following manner:
Headline with columns ID, Chr, Start, End, Strand, NextIsDirectNeighbor, and Log-Ratio
ID: Gene Identifier (Gene is represented by its promoter on the array)
Chr: Chromosome where the gene is located
Start: Start position of the gene
End: End position of the gene
Strand: Strand of the gene (1: Forward Strand; -1: Reverse strand)
NextIsDirectNeighbor: Is the next gene directly adjacent on the chromosome (TRUE or FALSE)?
Log-Ratio: Log-Ratio of TF vs. Control
Two ID's are in head-head orientation when the NextIsDirectNeighbor attribute of the first ID is TRUE and the strand attributes are -1 (first ID) followed by 1 (second ID).
The column delimiter must be the tabulator.
No missing values.
All rows in the data set file must be sorted first by increasing values for Chr and second by increasing values for Start.
|
ID |
Chr |
Start |
End |
Strand |
NextIsDirectNeighbor |
Log-Ratio |
|---|---|---|---|---|---|---|
|
A |
1 |
10 |
12 |
1 |
FALSE |
1.87 |
|
B |
1 |
15 |
16 |
-1 |
TRUE |
-0.93 |
|
C |
1 |
18 |
20 |
1 |
FALSE |
-0.88 |
|
D |
2 |
12 |
20 |
-1 |
TRUE |
2.72 |
|
E |
2 |
26 |
54 |
-1 |
FALSE |
0.03 |
|
F |
5 |
13 |
27 |
1 |
FALSE |
1.41 |
|
© Michael Seifert July 2008 |