Overview

ZEAMAP is a multi-omics database for maize research and breeding, which integrates omics data generated from
527 elite inbred lines (in an association mapping panel, AMP) and 183 teosinte accessions. ZEAMAP includes:

  • genome assemblies and annotations of four inbred lines, B73, Mo17, SK and HuangZaoSi (HZS) and a
    teosinte accession (Zea mays ssp. mexicana);
  • expression patterns of tissues from different development stage of same inbred line and same tissue of
    different samples within the AMP;
  • three dimensional chromatin interactions, open chromatins of B73 and populational DNA methylation;
  • genetic variations including single-nucleotide polymorphisms (SNPs), small insertions and deletions
    (InDels) and large structure variations (SVs) generated from the deep sequencing of the AMP and the
    comparison among reference genome assemblies;
  • the phenotypes and metabolome of the AMP and the related loci mapped by genome-wide association studies
    (GWAS), expression quantitative trait locus (eQTL) and linkage analysis;
  • the population structure and pedigrees of each germplasm and the selective signals between different
    teosinte subspecies and maize;

ZEAMAP generated comprehensive functional annotations for the annotated gene models in each assembly, and
provided useful tools for users to search, analyze and visualize all these different omics data.


1 General




The top navigation menu (above) gathers general functions of the database, including links to different
resources and tools, and entrances for users to register/login.


The site wide search tool can be accessed on the homepage:



and the title bar of each page in order to quickly search for features users just browsed.


Wildcard search with *. Examples:

  1. genom* sequence (matches: genome, genomic, genomics ...)
  2. Lir*dron tulipifera (matches: Liriodendron tulipifera)

Fuzzy search: When you don’t know how to exactly spell the keywords, you can use fuzzy
search. Fuzzy search allows you to search for similar words. You use the ~ character at the end
of your keyword for fuzzy search (keyword~). Examples:

  1. sequeeence~ (matches: sequence)
  2. Alnus rhmifolia~ (matches: Alnus rhombifolia)

Regular expression search: wrapping keywords with forward slash (/). Examples:

  1. /transcriptom[a-z]+/ (matches: transcriptome, transcriptomes, transcriptomics ...)

Boolean operators: + and -. + means must present; - means must not present. Examples:

  1. +"green ash" +transcriptome -genome (excludes the word genome)
  2. +"green ash" -transcriptome +genome (includes the word genome)

AND, OR, NOT operator and combination search. Examples:

  1. "heat stress" AND ("Castanea mollissima" OR "green ash") NOT "heat shock"

The search results were grouped into categories on the right for efficient data filtering:

The search results on the left provide basic information and links to the detailed feature pages.


2 Genomics

This resource gathers the collection of maize and teosinte genomic datasets including reference genome
assemblies, annotations and gene expressions. It also provide tools for browsing features, searching
sequences and visualizing datasets.


2-1 Species

This page contains the basic information about the current species/germplasm, and the related external links
such as genom assembly datasets, taxonomy information and so on.


2-2 Genome features

We have provide two tools for searching for genome annotated features: Search genes and
Search features.


This tool provides filtering for genes/mRNAs/proteins based on their IDs or functional annotations.


This tool provides filtering for all annotated features based on the feature IDs or their locations.


2-2-3 Feature details

TODO: [Screen shots for gene detail page]


2-3 Genome browser

We have provided two genome browsers for visualizing the genomic features: Jbrowse and
WashU browser.


2-3-1 Jbrowse

ZEAMAP provides a Jbrowse instance for visualizing genomic data, there are many online tutorials on how to
use Jbrowse, such as Jbrowse Documentation. And
there is also a JBrowse tutorial video for more
details about how to navigate and use JBrowse.


2-3-2 WashU browser

ZEAMAP also provides a WashU Epigenome Browser (version: v48-4-4) instance to better visualize the chromatin
interaction datasets. Too learn more about how to navigate and use WashU Epigenome Browser, please visit its
official documentation:

WashU Epigenome Browser
Documentation


2-4 BLAST

Users can compare their query sequences of proteins/nucleotides with the genome/annotated features in the
database using BLAST (basic local alignment search tool).

Learn more about BLAST

NCBI BLAST Home page


below is the interface of running BLAST, you only need four steps to perform a BLAST search:

Step 1. Upload the query sequences

To perform a sequence search, you can paste your sequences in the query region or drag a sequence file to the
query region. The sequence type (protein/nucleotides) can be detected automaticly.

Note: Both raw sequence or multi-fasta format are supported when paste from clipboard,
but sequences uploaded from a file should only be in fasta format. Learn more about fasta
format href="https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=BlastHelp">here

Step 2. Select databases

After uploading your query sequences, you can select one or more databases to search.

Note: Only one type of database (Nucleotide or Protein) can be selected.

Step 3. Select parameters

The Advanced parameters input box allows you to run BLAST with your custom parameters. You could
click on the “?” button to view the avaliable parameters. If you let the box blank, the default parameters
will be used.

Learn more about BLAST parameters href="https://www.ncbi.nlm.nih.gov/books/NBK279684/">here

Step 4. Perform BLAST

Once you have finished the former steps, the proper sub-program of BLAST (BLASTn, BLASTx, BLASTp
etc.) is selected automatically according to the type of your query sequences and the databses, and
the BLAST button changes accordingly. Click on the BLAST button to perform the
analysis, and you will see a status page like this:

After the BLAST is done, it will lead you to the result page automatically.


2-4-2 The BLAST result page

Below is an example of the BLAST result page, the result page can be generally divided into 7 parts:

Part 1: General information

This part contains the general information of the mission, including the version of the program, the
submitting time, the database information and the parameters used.

Part 2: Category of query sequences

This part indicates results for each sequence of your query sequences. Click on the query sequence ID
will lead you to the details of that query sequence.

Part 3: Circos plot



This part is a circos plot indicates the mapping information between the query sequences and the similar
sequences in the database. Mouse over the ribbon will show the identity and Evalue of that alignment.

Part 4: Download Category

This part allows you to download different format of all the results into your local machine.

Part 5: Graphical overview

This part shows the BLAST hits of each query sequence. Each bar indicates one hits in the database, and
the color of the bar deepens when the hit is stronger. Mouse over the bar shows the sequence ID and the
Evalue of that hit, mouse click on the bar leads you to the detailed alignment information of that hit.

Part 6: Length distribution of hits



This is a histogram of the length of the similar sequences in the database. Mouse over the histogram
shows the ID, Evalue and length of the sequence.

Part 7: List veiw

This is a listview of the BLAST hit results including sequence name, query coverage, total score, E value
and identity. Mouse click on the sequence name leads you to the detailed alignment of that hit.

Part 8: Alignment details

A: Check on the Select box so that you can download only the results of the
select records from Part 2. Mouse click on Sequence will show the detailed
sequences. Mouse click on FASTA and Alignment will download the fasta format
sequence and the alignment result, respectively.

B: the graphical overview and the alignment.


2-5 Synteny

The conserved syntenic blocks among the Zea genomes were analyzed using BLASTp and MCScanX (their
parameters
), and the visualization is performed using the href="https://github.com/tripal/tripal_synview">Tripal Synteny Viewer. The conserved syntenic blocks
between a selected chromosome of a genome and another genome can be displayed interactively in both a
circular and tabular layout, and the detailed gene information in each block is also displayed.


2-5-1 Select genomes

To get the synteny block information, you can either select a query chromosome ( in the figure
below) and a target genome ( in the figure below), or request for a certain block ID if you
already knew one ( in the figure below). And click on the Search button to get
the results.


2-5-2 Synteny block overview

The synteny viewer provides both circular ( in the figure below) and tabular ( in
the figure below) layouts for the resulted blocks.

In the circular view, the blue bar indicates the query chromosome and the red bars indicates the target
genome, the ribbons indicate the synteny blocks, mouse over each ribbon shows the block ID and the regions,
mouse click on the ribbon leads you to the detailed block information page. In the tabular layout, mouse
click on the block ID also leads you to the detailed block information.


2-5-3 Detailed block information

The detailed synteny block information page includes 3 parts: the block information part ( in
the figure below), the visualization part ( in the figure below) and the tabular view part
( in the figure below). Hold mouse over the visualization part and scroll the mouse steel can
zoom in/out the visualization. Mouse click on the gene IDs in both the visualization and the tabular layout
leads to the gene details page.


2-6 Gene expression pattern

You can search for the gene expression patterns given a set of gene IDs for different tissues in one sample
(Reference expression), or for different samples in the same tissue (Population
expression
). Both the gene set and the tissues/samples are clustered using the complete linkage
method, and outputs an interactive heatmap layout.


2-6-1 Select panel

The query gene IDs can be separated by comma “,” or by newline. After inputted the query genes,
check the target tissues/samples and click on Search to get the result.


2-6-2 Result panel

The resulted gene expression pattern is displayed as a heatmap.By default, both genes and tissues/samples
were clustered in the heatmap (figure below). You could download the image, sort tissues/samples
alphabetically and re-cluster tissues/samples using the control panels in in the figure
below. Mouse over the cells of the heatmap shows the detailed expression level of the gene in the tissue
( in the figure below). Mouse click on the gene ID leads to the gene detailed information
page.


2-7 Crispr

ZEAMAP provides a tabular layout to search for single-guide RNAs (sgRNAs) designed for CRISPR genome editing
experiments, including CRISPR/Cas9 (with NGG PAM) and CRISPR/cpf1 (with TTV(A/G/C) and TTTV PAM). The sgRNAs
were designed using CRISPR-Local, a local single-guide RNA
(sgRNA) design tool for non-reference plant genomes
. The sgRNAs can be filtered by the target gene
IDs or their genomic regions:

and resulted tabular layout looks like this:

TODO: [A screen short for Crispr table]

with the meanings of each columns are listed below:

Column 1: The name of gene where the sgRNA located.

Column 2: The chromosome and the coordinate of the start position of the sgRNA.

Column 3: The sequence of sgRNA.

Column 4: The on-target score of the sgRNA. (There is no available scoring method for
Cpf1 sgRNA, denoted by NA)

Column 5: The number of off-target sites.

Column 6: Type of match between sgRNA and off-target sites.(NM:no match found; U0:Best
match found was a unique exact match; U1:Best match found was a unique 1-error match; U2:Best match
found was a unique 2-error match… R0:Multiple exact matches found; R1:Multiple 1-error matches found, no
exact matches; R2:Multiple 2-error matches found, no exact or 1-error matches.)

Column 7: The number of exact, 1-error, 2-error, 3-error and 4-error matches found.


Column 8: The gene and position in which exact match was found. (If there is no exact
match, then denoted by NA)

Column 9: The name of exon where the sgRNA located(split by ;).

Column 10: The number that split by “:” means “TSS position”, “exon start position”,
“length of exon”, “relative positon of sgRNA against exon” and “relative positon of sgRNA against TSS”,
respectively.

Column 11: The highest off-target score between sgRNA and all off-target sites.(There
is no available off-target scoring method for Cpf1 sgRNA, denoted by NA)

The sgRNAs can also be browsed through Jbrowse:

And mouse double-click on the sgRNAs leads to their detailed information:


3 Variations

The Variations module collects the genotypes and annotations of polymorphic variations including SNPs, InDels
and SVs among the AMP in reference to the B73 reference genome, as well as a haplotype map generated from
the SNP genotype matrix.

Features of
ZEAMAP variations module
. In ZEAMAP, variants could be displayed tabularly and filtered by
IDs, locations and annotated effects (A), with each resulted record has links to its
location on Jbrowse (B). The popup for each variant in Jbrowse shows the detailed
information including annotations and genotype in each sample (inset in B). Jbrowse has also provided an
additional track shows an overview of genotype matrix (red: reference genotype; light blue: homozygous
genotype; light pink: heterozygous; grey: no call). (C) ZEAMAP has also provided a
“Genotype Search” function to get the genotype information by genomics regions and germplasms of
interest, with a resulted matrix shows the variant positions, their IDs, reference and alternative
alleles and genotypes in each samples ( “0” for reference allele and “1” for alternative allele; red:
homozygous reference genotype; blue: homozygous alternative genotype; green: heterozygous genotype ).


4 Genetics


4-1 Traits

ZEAMAP has collected phenotypic data from the AMP, including 21 agronomic traits, 31 kernel lipid
content-related traits, 19 kernel amino acid content-related traits and 184 known metabolites of maize
kernels. All these phenotypes can be searched and filtered by their threshold values using the “Search Trait
Evaluation” tool.

In ZEAMAP, both qualitative and quantitative trait could be searched by their trait values with multiple
filter conditions supported through “Search Trait Evaluation”. The search result shows all the
germplasms that passed the filter conditions and their triat values.


4-2 GWAS

We have identified loci significantly associated with these phenotypes using GWAS and provided a tabular data
search function to find specific loci by trait names, variant IDs, chromosome regions and significant P
values. ZEAMAP has provides several tools to navigator the GWAS signals:

They could be searched by traits, variant IDs and variant locations, and filtered by significant P values
through tabular browser. Each record in the search result has links to the GWAS visualization tools.



GWAS table browser in ZEAMAP. The GWAS signals could be searched by traits, variant IDs
and variant locations, and filtered by significant P values. Each record in the search result has links
to the GWAS visualization tools.

Three GWAS visualization tools (“GWAS-Single-Trait”, “GWAS-Multi-Trait” and “GWAS-Locus”) were developed to
better browse the GWAS results and compare the significant signals among different traits:



Schematic of the “GWAS-Single-Trait” tool. The trait and region of interest can be
queried through the top input boxes. Regions can be easily browsed through by clicking on the histogram
of the interactive “Navigational Manhattan Plot” track. The “Detailed Scatter Plot” track plots the
variants according to their chromosome locations and by the significance of their P values. The colors
of each dot indicate the LD r2 values between that variant and the reference variant (the purple diamond
dot, can be reset by selecting the “Make LD reference” link on the popup page for each variant). The
bottom track shows the gene annotations in the selected region, with a popup for each gene element which
links to a detailed information page, genome browsers and the eQTL visualizer for that gene.



GWAS-Multi-Trait visualization tool in ZEAMAP. This tools displays GWAS signals of
multiple traits, with logic similar to GWAS-Single-Trait tool . The only differences are that the traits
here are multi-selectable, and the colors in the detailed scatter plot indicate variants for different
traits rather than LDs. A “data layers” button has been added in the control panel of the detailed
scatter plot to fade, hide, order or remove certain trait layers.



GWAS-Locus visualization tool in ZEAMAP. This tool displays all significantly
associated signals between the query variant and all available traits, with a yellow highlight line and
a red dashed line respectively indicated the general and detailed position of the query variant.


4-3 eQTL

In ZEAMAP, we have collected cis-eQTL signals with a total of 18,039 gene expression patterns in maize
kernels, and provided a tabular tool to search and filter eQTL signals by gene IDs, gene locations,
distances from transcription start site (TSS), effect sizes, and significance values. A visualization tool
was also developed to browse all cis-eQTLs affecting the selected gene, with significance values, effect
size and pairwise LD information displayed interactively.

eQTL table
browser in ZEAMAP
. Using this tool, eQTL signals could be filtered by gene IDs and
locations, as well as the distance from transcription start site, the effect size (beta value) and the
significance (p value) of the most significant variant within each gene. The search result shows one
gene per record, with links to the visualization of each gene. Each record has a sub-table which lists
all the cis-eQTL signals significantly associated with this gene.



Schematic of the eQTL visualization tool. The significant cis-eQTL site for each gene
is sized by the significance of its P value and colored by the effect size (beta value). The heatmap
indicates pairwise LD r2 values of the variants.


4-4 Linkage

ZEAMAP has currently collected 12 published genetic maps constructed from different artificial maize
segregating populations using genotypes generated from the Illumina MaizeSNP50 BeadChip (Illumina Inc., San
Diego, CA, USA), as well as 813 quantitative trait loci (QTLs) identified from 15 plant architecture-related
traits.



Schematic of the TripalMap tool in ZEAMAP. This tool displays the detailed genetic
markers and mapped QTLs for each linkage group. Both the markers and the QTLs link to their own detailed
information page


5 Populations

It is often useful to dissect the genetic diversity, population structure and pedigrees of maize lines for
both evolutionary studies and molecular breeding. ZEAMAP provides interactive information about the
population structures assessed by principal component analysis (PCA) and ancestries inferred from an
unsupervised clustering analysis using ADMIXTURE for the whole Zea population and each sub-population in the
database. We have also added a table that lists the origins or pedigree information for each sub-population.



Features of the ZEAMAP Populations module. (A) Interactive PCA diagram
(top two dot plots) and structure diagram (stacked bar plot). Each diagram is zoomable and shows
detailed information, including germplasm names and PCA/structure values when an element is moused over.
(B) A table browser is provided to search for germplasm by pedigree, origin and
subpopulation information.


6 Evolutions

In order to provide a general guide for adding new alleles from teosinte into maize breeding programs, ZEAMAP
provides selection signals and genetic affinities between maize and its two main teosinte relatives, Zea
mays ssp. mexicana and Zea mays spp. parviglumis. The evolutionary selection signals can be browsed
graphically through an interactive “Selective signals browser”, which is similar to the aforementioned GWAS
viewer but with an additional Y-axis indicating the genetic variance by FST values. Signals can also be
analyzed and downloaded in a tabular format, or viewed in the WashU Epigenome Browser.



Evolution selective signals in ZEAMAP. (A) Selective signals browser.
The interactive histogram shows the distribution of XPCLR values within 500Kb windows along the
chromosomes, with Y-axis indicated the maximum XPCLR scores within each window. The detailed scatter
plot shows the detailed selective signals (dots) and the Fst values (blue line) within the selected
region, with two horizontal dash lines indicated the top 5% and top 10% XPCLR value cutoffs.
(B) A table browser was provided to search for selective signals by teosinte
sub-species and genomic regions, each resulted record has links to its position on (C)
WashU Epigenome Browser.


7 Epigenetics

In ZEAMAP, we have collected the chromatin interaction maps associated with RNA polymerase II
occupancy and the histone mark H3K4me3 according to the B73 reference genome . Open chromatin
regions
are based on micrococcal nuclease (MNase) digestion, histone acetylation and
methylation regions
, and populational DNA methylation information generated from the
third leaves at V3 of the 263 AMP inbred lines. This information can be accessed through a tabular data
browser or visualized through the WashU Epigenome Browser. For DNA methylation information from the AMP,
customized interfaces were developed to easily select multiple samples with differentially methylated
regions (DMRs) in the table browser and visualize both DMR and DNA methylation sites in the WashU Epigenome
Browser.



Features of the ZEAMAP Epigenetics module. (A) Schematic of chromatin
interaction, chromatin accessibility and histone modification tracks displayed in the WashU Epigenome
Browser. (B) Populational DNA methylation table browser. This tool filters population
DNA methylation information by the DNA methylation type, germplasm and genomic region of interest, with
the resulting matrix displaying DMRs for each selected germplasm within the query region.
(C) Interface of the population DNA methylation genome browser. This interface provides
options to display DNA methylation information by DMRs or DNA methylation sites of the selected
germplasms within specified regions.