Plant Bioinformatics and Functional Epigenomics

Software and Database

Wheat-RegNet: An encyclopedia of common wheat hierarchical regulatory networks
	Common wheat (Triticum aestivum, 2n = 6× = 42, AABBDD) is the staple crop worldwide. Elucidating the gene regulatory network provides essential information for mechanism studies and targeted manipulation of gene activity for breeding. However, it is a challenging task given the extremely large (16 Gb) and complicated allohexaploid genome of common wheat. Integrating multi-omics data is a compelling approach to construct the hierarchical regulatory network. Here, we collected 189 transcription factor (TF) binding profiles, 90 epigenomic datasets, 2356 transcriptomes, and genome-wide association study (GWAS) for 144 agronomic traits in common wheat, which were further integrated using machine learning approach to infer direct target genes and the hierarchical regulatory network. Wheat-RegNet, a web-based platform, is further developed providing four major functions: (i) to identify regulatory elements regulating input gene(s), and to infer the tissue and environmental response specificities; (ii) to identify the TFs responsible for regulating input gene(s) or locus/loci, as well as the associated GWAS traits; (iii) to construct the hierarchical regulatory network regulating input gene(s); and (iv) to browse hundreds of TF binding, epigenomic, and transcriptomic profiles of input region or gene. Well-organized results and multiple tools for interactive visualization are available through a user-friendly web interface. Wheat-RegNet is a highly useful resource for exploring gene regulatory information and for targeted manipulation, facilitating both hypothesis-driven research and breeding research in common wheat. Go to Wheat-RegNet >>>

Wheat-RegNet: An encyclopedia of common wheat hierarchical regulatory networks

Common wheat (Triticum aestivum, 2n = 6× = 42, AABBDD) is the staple crop worldwide. Elucidating the gene regulatory network provides essential information for mechanism studies and targeted manipulation of gene activity for breeding. However, it is a challenging task given the extremely large (16 Gb) and complicated allohexaploid genome of common wheat. Integrating multi-omics data is a compelling approach to construct the hierarchical regulatory network. Here, we collected 189 transcription factor (TF) binding profiles, 90 epigenomic datasets, 2356 transcriptomes, and genome-wide association study (GWAS) for 144 agronomic traits in common wheat, which were further integrated using machine learning approach to infer direct target genes and the hierarchical regulatory network.

Wheat-RegNet, a web-based platform, is further developed providing four major functions: (i) to identify regulatory elements regulating input gene(s), and to infer the tissue and environmental response specificities; (ii) to identify the TFs responsible for regulating input gene(s) or locus/loci, as well as the associated GWAS traits; (iii) to construct the hierarchical regulatory network regulating input gene(s); and (iv) to browse hundreds of TF binding, epigenomic, and transcriptomic profiles of input region or gene.

Well-organized results and multiple tools for interactive visualization are available through a user-friendly web interface. Wheat-RegNet is a highly useful resource for exploring gene regulatory information and for targeted manipulation, facilitating both hypothesis-driven research and breeding research in common wheat.

Go to Wheat-RegNet >>>

Triti-Map: A Snakemake-based pipeline for gene mapping in Triticeae.
	Triti-Map is a Snakemake-based pipeline for gene mapping in Triticeae, which contains a suite of user-friendly computational packages and web-interface integrating multi-omics data from Triticeae species including genomic, epigenomic, evolutionary and homologous information. Triti-Map could efficiently explore trait-related genes or functional elements not present in the reference genome and reduce the time and labor required for gene mapping in large genome species. Go to Triti-Map >>>

Triti-Map: A Snakemake-based pipeline for gene mapping in Triticeae.

Triti-Map is a Snakemake-based pipeline for gene mapping in Triticeae, which contains a suite of user-friendly computational packages and web-interface integrating multi-omics data from Triticeae species including genomic, epigenomic, evolutionary and homologous information.

Triti-Map could efficiently explore trait-related genes or functional elements not present in the reference genome and reduce the time and labor required for gene mapping in large genome species.

Go to Triti-Map >>>

CGT-seq: Core Genome Targeted Sequencing
	CGT-seq, which employed epigenomic information from both active and repressive epigenetic marks to guide the assembly of the core genome mainly composed of promoter and intragenic regions. This method was relatively easily implemented, and displayed high sensitivity and specificity for capturing the core genome of bread wheat. 95% intragenic and 89% promoter region from wheat were covered by CGT-seq read. We further demonstrated in rice that CGT-seq captured hundreds of novel genes and regulatory sequences from a previously unsequenced ecotype. Together, with specific enrichment and sequencing of regions within and nearby genes, CGT-seq is a time- and resource-effective approach to profiling functionally relevant regions in sequenced and non-sequenced populations with large genomes. Go to CGT-seq >>>

CGT-seq: Core Genome Targeted Sequencing

CGT-seq, which employed epigenomic information from both active and repressive epigenetic marks to guide the assembly of the core genome mainly composed of promoter and intragenic regions. This method was relatively easily implemented, and displayed high sensitivity and specificity for capturing the core genome of bread wheat.

95% intragenic and 89% promoter region from wheat were covered by CGT-seq read. We further demonstrated in rice that CGT-seq captured hundreds of novel genes and regulatory sequences from a previously unsequenced ecotype.

Together, with specific enrichment and sequencing of regions within and nearby genes, CGT-seq is a time- and resource-effective approach to profiling functionally relevant regions in sequenced and non-sequenced populations with large genomes.

Go to CGT-seq >>>

Plant Regulomics: Data-driven Interface for Retrieving Upstream Regulators
	Plant Regulomcs is a data-driven interface for retrieving upstream regulators from plant multi-omics data, which integrates 19,925 transcriptomic and epigenomic data sets and diverse sources of functional evidence (58,112 terms AND 695,414 protein-protein interactions) from six plant species, namely Arabidopsis thaliana, Oryza sativa, Zea mays, Glycine max, Solanum lycopersicum and Triticum aestivum, along with the orthologous genes from 56 whole-genome sequenced plant species. These data were well-organized to gene modules and further implemented into the same statistical framework. For any input gene list or genomic loci, Plant Regulome retrieves the factors, treatments, and experimental/environmental conditions regulating the input from the integrated omics data. Additionally, multiple tools and an interactive visualization are available through a user-friendly web interface. Go to Plant Regulomics >>>

Plant Regulomics: Data-driven Interface for Retrieving Upstream Regulators

Plant Regulomcs is a data-driven interface for retrieving upstream regulators from plant multi-omics data, which integrates 19,925 transcriptomic and epigenomic data sets and diverse sources of functional evidence (58,112 terms AND 695,414 protein-protein interactions) from six plant species, namely Arabidopsis thaliana, Oryza sativa, Zea mays, Glycine max, Solanum lycopersicum and Triticum aestivum, along with the orthologous genes from 56 whole-genome sequenced plant species. These data were well-organized to gene modules and further implemented into the same statistical framework.

For any input gene list or genomic loci, Plant Regulome retrieves the factors, treatments, and experimental/environmental conditions regulating the input from the integrated omics data. Additionally, multiple tools and an interactive visualization are available through a user-friendly web interface.

Go to Plant Regulomics >>>

GSHR: Gene Set-level Analyses of Hormone Responses in Arabidopsis
	GSHR is a web server provides analyses based on integrated hormone response gene sets in Arabidopsis thaliana. We developed this to facilitate cross-study and cross-platform comparisons of transcriptomic changes to hormones. The GSHR is user-friendly and has several features when comparing with other similar tools: 1. The GSHR especially focuses on genes response to hormones in Arabidopsis thaliana. It supported hormone response gene sets for users to compare with their own gene lists based on Fisher's exact test. 2. Other analysis tools are provided including cluster analysis, co-expression network, enrichment analysis of KEGG, GO and InterPro to help users unearthing the underlying biological insights of their gene lists. Go to GSHR >>>

GSHR: Gene Set-level Analyses of Hormone Responses in Arabidopsis

GSHR is a web server provides analyses based on integrated hormone response gene sets in Arabidopsis thaliana. We developed this to facilitate cross-study and cross-platform comparisons of transcriptomic changes to hormones.

The GSHR is user-friendly and has several features when comparing with other similar tools:

1. The GSHR especially focuses on genes response to hormones in Arabidopsis thaliana. It supported hormone response gene sets for users to compare with their own gene lists based on Fisher's exact test.

2. Other analysis tools are provided including cluster analysis, co-expression network, enrichment analysis of KEGG, GO and InterPro to help users unearthing the underlying biological insights of their gene lists.

Go to GSHR >>>

CARMO: Comprehensive Annotation of Rice Multi-Omics
	CARMO is a web-based platform providing comprehensive annotations for multi-omics data, including transcriptomic data sets, epi-genomic modification sites, SNPs from genome re-sequencing, and the large gene lists derived from these omics studies. Well-organized results, as well as multiple tools for interactive visualization, are available through a user-friendly web interface. The power of CARMO lies in the comprehensive collection and integration of information from both multi-omics data and diverse functional evidence of rice, which was further curated into gene sets and higher level gene modules. In this way, the high-throughput data can easily be compared across studies and platforms, and notably, integration of multiple types of evidence provides biological interpretation from the level of modules with high confidence. Examples in the manuscripts demonstrated that CARMO not only reproduced reported evidence, but also proposed novel functional insights for further experimental exploration. Go to CARMO >>>

CARMO: Comprehensive Annotation of Rice Multi-Omics

CARMO is a web-based platform providing comprehensive annotations for multi-omics data, including transcriptomic data sets, epi-genomic modification sites, SNPs from genome re-sequencing, and the large gene lists derived from these omics studies. Well-organized results, as well as multiple tools for interactive visualization, are available through a user-friendly web interface.

The power of CARMO lies in the comprehensive collection and integration of information from both multi-omics data and diverse functional evidence of rice, which was further curated into gene sets and higher level gene modules. In this way, the high-throughput data can easily be compared across studies and platforms, and notably, integration of multiple types of evidence provides biological interpretation from the level of modules with high confidence. Examples in the manuscripts demonstrated that CARMO not only reproduced reported evidence, but also proposed novel functional insights for further experimental exploration.

Go to CARMO >>>

MAnorm: ChIP-Seq data quantitative comparison
	ChIP-Seq is widely used to characterize genome-wide binding patterns of transcription factors and other chromatin-associated proteins. Although comparison of ChIP-Seq data sets is critical for understanding cell type-dependent and cell state-specific binding, and thus the study of cell-specific gene regulation, few quantitative approaches have been developed. Here, we present a simple and effective method, MAnorm, for quantitative comparison of ChIP-Seq data sets describing transcription factor binding sites and epigenetic modifications. The quantitative binding differences inferred by MAnorm showed strong correlation with both the changes in expression of target genes and the binding of cell type-specific regulators. Go to MAnorm >>>

MAnorm: ChIP-Seq data quantitative comparison

ChIP-Seq is widely used to characterize genome-wide binding patterns of transcription factors and other chromatin-associated proteins. Although comparison of ChIP-Seq data sets is critical for understanding cell type-dependent and cell state-specific binding, and thus the study of cell-specific gene regulation, few quantitative approaches have been developed. Here, we present a simple and effective method, MAnorm, for quantitative comparison of ChIP-Seq data sets describing transcription factor binding sites and epigenetic modifications. The quantitative binding differences inferred by MAnorm showed strong correlation with both the changes in expression of target genes and the binding of cell type-specific regulators.

Go to MAnorm >>>

Motif-Scan: scan genomic regions for target of given motifs and perform enrichment analysis
	With the accumulation of ChIP-seq data across different cell types, an effective and accurate method are essential to unravel the relationship between regulator binding and epigenetic modifications in different cell types. We present an integrative computational toolkit, MAmotif, to infer cell type specific regulators. Based on a hypotheses that the regions with higher epigenetic changes are more likely to be directly targeted by key cell type specific regulators, we combine MAnorm’s quantitative comparison information of 2 cell types and transcription factor binding sites information to infer cell type specific regulators. Here MAnorm is a model for quantitative comparison of ChIP-seq data between 2 cell types. While TFBS are detected from the epigenetic change regions by our newly developed motif scanning package. Our motif scan algorithm is a probabilistic model based on position weight matrix (PWM): the score of motif A is calculated as the ratio of A’s probability of occurrence on the target sequence and its probability of occurrence on the genome background. The target sequence can finally be defined as the motif A target sequence when the score is beyond the score threshold, which is from the distribution of motif A scores calculated on the whole genome sequence. When the epigenetic modification changes and TFBS information are prepared, several statistical tests and clustering methods are applied to determine the linkage between epigenetic modification changes and the motif binding affinity in specific cell type. Go to Motif-Scan >>>

Motif-Scan: scan genomic regions for target of given motifs and perform enrichment analysis

With the accumulation of ChIP-seq data across different cell types, an effective and accurate method are essential to unravel the relationship between regulator binding and epigenetic modifications in different cell types. We present an integrative computational toolkit, MAmotif, to infer cell type specific regulators. Based on a hypotheses that the regions with higher epigenetic changes are more likely to be directly targeted by key cell type specific regulators, we combine MAnorm’s quantitative comparison information of 2 cell types and transcription factor binding sites information to infer cell type specific regulators. Here MAnorm is a model for quantitative comparison of ChIP-seq data between 2 cell types. While TFBS are detected from the epigenetic change regions by our newly developed motif scanning package. Our motif scan algorithm is a probabilistic model based on position weight matrix (PWM): the score of motif A is calculated as the ratio of A’s probability of occurrence on the target sequence and its probability of occurrence on the genome background. The target sequence can finally be defined as the motif A target sequence when the score is beyond the score threshold, which is from the distribution of motif A scores calculated on the whole genome sequence. When the epigenetic modification changes and TFBS information are prepared, several statistical tests and clustering methods are applied to determine the linkage between epigenetic modification changes and the motif binding affinity in specific cell type.

Go to Motif-Scan >>>

Zhang Lab