1. We develop a variety of computational and statistical tools for both low-level processing and high-level integration of high-throughput biological data sets, especially for those generated by next-generation sequencing platforms.
For example, the MAnorm model developed by us is one of the earliest computation models for quantitative comparison of ChIP-Seq data sets. It can quantitatively characterize the differential binding between two ChIP-Seq data sets in a robust and unbiased way, and then uses these quantitative measures to identify candidate cell type-specific regulators associated with the differential binding of given factor by integrating with other types of data such as motif scanning. Now, we are transforming it to a comprehensive package by adding a series of integrative tools, which can be used for quantitatively comparing and integrating different types of -omic data sets, especially for those related with gene transcriptional regulation and epigenetic regulation. Meanwhile, we are still keeping improving this model, with the purpose to apply it to compare more sophisticated data sets, such as those generated in the same cell type but from different individuals. With these new models and tools, we hope we can better understand:
(1) The key gene regulation mechanisms in different tissue and diseases cell types, which can further help to identify potential targets for gene or chemical therapy;
(2) The epigenetic variations between different individuals and how to link them with genetic and phenotypic variations, which can serve as epigenetic basis of personalized disease prediction and medicine design.
2.We extensively collaborate with experimental biologists from wet labs to study the dynamics of gene regulation, which can be carried by specific transcriptional factors or chromatin regulators, during tissue developments and disease onsets, by using the latest high-throughput platforms such as Genome/ChIP/DNase1/Bisulfite/RNA-sequencing, gene expression and SNP microarrays.
3.System biology studies.By applying novel statistical, machine learning or data mining tools to public data sets, we hope to systematically discover unexplored functional elements such as regulatory elements or non-coding RNAs, as well as to infer unknown functions of genes or uncharacterized association between genes.