HOME SEARCH TUTORIAL METHODS DOWNLOADS
TF ChIP-Seq
Genome-wide TF binding profiles were obtained from the Gene Expression Omnibus.

In order to standardise the way TF binding sites are obtained from the dataset, peak calling was performed using an automated pipeline that we used previously to analyse the Human CD34+ HSPC dataset [1]. Briefly, sequencing reads were aligned to the hg19 reference genome using BWA [2]. Peak calling was performed using three peak calling algorithms, HOMER [3], MACS [4] and Partek Genomics Suite (version 6.6). Only peaks that were called by two or more of the algorithms are kept as part of the final peak list. Aligned reads were also extended to 200 bp to generated TF-binding genome coverage profiles for visualisation on the UCSC genome browser.

In order to obtain a uniform set of TF binding sites across all TFs across all cell lines, all peaks were merged. Each region was then marked as either bound or unbound for each TF dataset. Finally, each of the merged TF binding regions was associated with the two closest genes within a 100kB region using GREAT [65.

Histone ChIP-Seq
Genome-wide binding catalogues for H3K27ac, H3K4me1, H3K4me3 were obtained from the Epigenome Atlas Project (http://www.genboree.org/epigenomeatlas/index.rhtml) and ENCODE project. Sequencing reads were downloaded in a pre-filtered and pre-aligned format. Read coverage +/- 1kb around the centre of all HCBR was calculated using BEDtools [6].

Expression arrays
Genome-wide microarray expression data for were obtained from GEO and ArrayExpress. The non-normalised expression data from each dataset was log2 transformed and quantile normalised. Where multiple microarray probes are available for one gene, the probe with the highest average expression value across all samples was selected to represent the expression of the gene. Genome-wide expression data comparing the stem and progenitor subfractions of human CD34+ cells of normal donors and AML patients was obtained from GEO (GSE24006). Robust Multi-chip Average approach [7] was used for normalization and expression levels summarized using Matlab (version 2012b). Probes were mapped to respective genes and expression values for each gene were stored in the database.

References
[1] Beck D, Thoms JAI, Perera D, et al. Genome-wide Analysis of Transcriptional Regulators in Human HSPCs Reveals a Densely Interconnected Network of Coding and Non-coding Genes. Blood. 2013; In press.

[2] Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26(5):589-595.

[3] Heinz S, Benner C, Spann N, et al. Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities. Molecular Cell. 2010;38(4):576-589.

[4] Zhang Y, Liu T, Meyer CA, et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biology. 2008;9(9).

[5] McLean CY, Bristor D, Hiller M, et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010;28(5):495-501.

[6] Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841-842.

[7] Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19(2):185-193.

How to cite
Chacon D, Beck D, Perera D, Wong JWH and Pimanda JE. BloodChIP: An Atlas of Genome-wide Transcription Factor Binding Profiles in Human Haematopoietic Stem/Progenitor Cells. Nucleic Acids Res. 2014 Jan;42(Database issue):D172-7. doi: 10.1093/nar/gkt1036