Our research is focused on the following two aspects:

1. The genomic and epigenomic changes and the functional consequences during crop domestication and polyploidization.

Both polyploidization and domestication are the major forces shaping current crop genomes. Many important crops are polyploid, typical examples including wheat and peanut. To characterize the major genomic and epigenomic changes during both processes, and further identify changes determining the desired traits are of great significance. This part of work is in close collaboration with wet lab colleagues.

2. The mechanism underlying specific regulation of epigenetic modifications.

Epigenetic modifications of the genome allow for a relatively stable and reversible control of gene expression state, which is essential for organisms to adapt the dynamic developmental and environmental cues. How do plants know when and where to change the epigenome is still a mystery. We are paritularly interested in exploring the mechanism controlling the specificity of different Polycomb Group (PcG) members, with integrative approach combining molecular, genetic and computational tools.

Recently Publications

note: *, Co-first author; #, Corresponding author ; Lab members’ name are in bold

Software and Databases

CGT-seq: Core Genome Targeted Sequencing

CGT-seq, which employed epigenomic information from both active and repressive epigenetic marks to guide the assembly of the core genome mainly composed of promoter and intragenic regions. This method was relatively easily implemented, and displayed high sensitivity and specificity for capturing the core genome of bread wheat.

95% intragenic and 89% promoter region from wheat were covered by CGT-seq read. We further demonstrated in rice that CGT-seq captured hundreds of novel genes and regulatory sequences from a previously unsequenced ecotype.

Together, with specific enrichment and sequencing of regions within and nearby genes, CGT-seq is a time- and resource-effective approach to profiling functionally relevant regions in sequenced and non-sequenced populations with large genomes.

Go to CGT-seq >>>

GSHR: Gene Set-level Analyses of Hormone Responses in Arabidopsis

GSHR is a web server provides analyses based on integrated hormone response gene sets in Arabidopsis thaliana. We developed this to facilitate cross-study and cross-platform comparisons of transcriptomic changes to hormones.

The GSHR is user-friendly and has several features when comparing with other similar tools:

1. The GSHR especially focuses on genes response to hormones in Arabidopsis thaliana. It supported hormone response gene sets for users to compare with their own gene lists based on Fisher's exact test.
2. Other analysis tools are provided including cluster analysis, co-expression network, enrichment analysis of KEGG, GO and InterPro to help users unearthing the underlying biological insights of their gene lists.

Go to GSHR >>>

CARMO: Comprehensive Annotation of Rice Multi-Omics

CARMO is a web-based platform providing comprehensive annotations for multi-omics data, including transcriptomic data sets, epi-genomic modification sites, SNPs from genome re-sequencing, and the large gene lists derived from these omics studies. Well-organized results, as well as multiple tools for interactive visualization, are available through a user-friendly web interface.

The power of CARMO lies in the comprehensive collection and integration of information from both multi-omics data and diverse functional evidence of rice, which was further curated into gene sets and higher level gene modules. In this way, the high-throughput data can easily be compared across studies and platforms, and notably, integration of multiple types of evidence provides biological interpretation from the level of modules with high confidence. Examples in the manuscripts demonstrated that CARMO not only reproduced reported evidence, but also proposed novel functional insights for further experimental exploration.

Go to CARMO >>>

MAnorm: ChIP-Seq data quantitative comparison

ChIP-Seq is widely used to characterize genome-wide binding patterns of transcription factors and other chromatin-associated proteins. Although comparison of ChIP-Seq data sets is critical for understanding cell type-dependent and cell state-specific binding, and thus the study of cell-specific gene regulation, few quantitative approaches have been developed. Here, we present a simple and effective method, MAnorm, for quantitative comparison of ChIP-Seq data sets describing transcription factor binding sites and epigenetic modifications. The quantitative binding differences inferred by MAnorm showed strong correlation with both the changes in expression of target genes and the binding of cell type-specific regulators.

Go to MAnorm >>>

Motif-Scan: scan genomic regions for target of given motifs and perform enrichment analysis

(Under Constuction) With the accumulation of ChIP-seq data across different cell types, an effective and accurate method are essential to unravel the relationship between regulator binding and epigenetic modifications in different cell types. We present an integrative computational toolkit, MAmotif, to infer cell type specific regulators.

Based on a hypotheses that the regions with higher epigenetic changes are more likely to be directly targeted by key cell type specific regulators, we combine MAnorm’s quantitative comparison information of 2 cell types and transcription factor binding sites information to infer cell type specific regulators. Here MAnorm is a model for quantitative comparison of ChIP-seq data between 2 cell types. While TFBS are detected from the epigenetic change regions by our newly developed motif scanning package.

Our motif scan algorithm is a probabilistic model based on position weight matrix (PWM): the score of motif A is calculated as the ratio of A’s probability of occurrence on the target sequence and its probability of occurrence on the genome background. The target sequence can finally be defined as the motif A target sequence when the score is beyond the score threshold, which is from the distribution of motif A scores calculated on the whole genome sequence. When the epigenetic modification changes and TFBS information are prepared, several statistical tests and clustering methods are applied to determine the linkage between epigenetic modification changes and the motif binding affinity in specific cell type.

Go to Motif-Scan >>>

CYPSI: P450 protein structure database

The CYP Structure Interface (CYPSI) is a platform for CYP studies. CYPSI integrated the 3D structures for 266 A. thaliana CYPs predicted by three TBM methods: BMCD, which we developed specifically for CYP TBM; and two well-known web-servers, MUSTER and I-TASSER. After careful template selection and optimization, the models built by BMCD were accurate enough for practical application, which we demonstrated using a docking example aimed at searching for the CYPs responsible for ABA 8′-hydroxylation. CYPSI also provides extensive resources for A. thaliana CYP structure and function studies, including 400 PDB entries for solved CYPs, 48 metabolic pathways associated with A. thaliana CYPs, 232 reported CYP ligands and 18 A. thaliana CYPs docked with ligands (61 complexes in total). In addition, CYPSI also includes the ability to search for similar sequences and chemicals.

CYPSI provides comprehensive structure and function information for A. thaliana CYPs, which should facilitate investigations into the interactions between CYPs and their substrates. CYPSI has a user-friendly interface, which is available at http://bioinfo.cau.edu.cn/CYPSI.

Go to CYPSI >>>

