The WashU Virus Genome Browser is a publicly available resource for efficient visualization of viral genomics data. In response to SARS-CoV-2’s rapid global spread and resulting pandemic health crisis, research efforts have been underway sequencing the 29 kb virus genome and transcriptome, mapping RNA-seq modifications, and predicting locations of antigenic peptide sequences, providing the research community with an abundance of data resources to better understand the virus’s pathology. As these data become available to the public, they are being integrated into the WashU Virus Genome Browser platform. Details regarding specific datasets available on the browser are outlined below as they become available. In addition to included data tracks, the browser supports user-uploaded sequences, as well as two visualization platforms: a genomic track view and a phylogenetic tree view. Our hope is that the WashU Virus Genome Browser will serve as an efficient tool, aiding researchers in better understanding the disease.
NCBI Database:
- Data from SARS-CoV-2 strains hosted on NCBI Genbank are continuously added to the browser in a SNV track format after alignment to the reference strain (NC_045512.2) using EMBL ‘stretcher’. In this view, the user can easily identify sequence deviation from the reference by colored bars indicating individual mutations, insertions, and deletions. Both complete genomes and partial genomes are available for viewing.
- Data source: https://www.ncbi.nlm.nih.gov/nuccore
Nextstrain Database:
- Data from SARS-CoV 2 strains hosted on Nextstrain are continuously added to the browser in a SNV track format, as outlined above. Both complete genomes and partial genomes are available for viewing. Completed genomes are also available to view in an interactive phylogenetic tree view.
- Data source: http://data.Nextstrain.org/ncov.json
GISAID Database:
- Data from SARS-CoV 2 strains provided by GISAID are continuously added to the browser in a SNV track format, after alignment to the reference strain (NC_045512.2) using EMBL ‘stretcher’, as outlined above. Both complete genomes and partial genomes are available for viewing.
- Data source: https://www.gisaid.org/
- In addition to SARS-CoV-2, the browser is home to hundreds of genomic sequences from related virus species, all downloaded from NCBI Genbank. Specifically, there are 332 severe acute respiratory syndrome coronavirus (SARS-CoV) genomes, 551 Middle East respiratory syndrome coronavirus (MERS-CoV) genomes, and 1574 Ebola virus sequences.
- Data source: https://www.ncbi.nlm.nih.gov/nuccore
Diagnostics:
The “Diagnostics” collection currently houses two separate data hubs, ‘Primers” and “CRISPR-based diagnostic test”. Data hubs encompassed in this collection contain relevant annotations pertaining to diagnostic testing.
Primers:
- The locations of CDC and WHO non-CDC primers for detecting SARS-CoV-2, along with associated metadata including country, are available for viewing.
- Data source: https://www.ncbi.nlm.nih.gov/nuccore
CRISPR-based diagnostic tests
- The CRISPR-based diagnostic tests data hub consists of two tracks which display primer and guide RNA sequence locations for two different assays: SHERLOCK and DETECTR.
- Data source (DOI): 10.1038/s41587-020-0513-4
Putative SARS-CoV-2 Immune Epitopes:
SARS-CoV-2 Epitopes Predicted to Bind HLA Class 1 Proteins
- SARS-CoV-2 epitope predictions across class 1 HLA alleles are provided. Predicted antigenic peptide sequences originating from Campbell, et. al, 2020 were downloaded and blasted (tblastn, V2.3.0+) with the SARS-CoV-2 reference (NC_045512.2). Sequence locations retaining 100% identity with SARS-CoV-2 are displayed (with the exception of one negative-strand hit, which is likely a false positive as there is no indication the sequence is transcribed).
- Data source (DOI): 10.1101/2020/03.30.016931v1
- Linear immune epitopes identified in SARS-CoV-1 cataloged in the Immune Epitope Database and Analysis Resource (IEDB) that retain 100% sequence identity in SARS-CoV-2 are displayed in a single track.
- Data source (DOI): 10.1093/nar/gky1006
Putative SARS-CoV-2 Epitopes
- This datahub hosts several (14) different tracks, pertaining to an assortment of different studies, and includes tracks displaying CD8 epitopes restricted to HLA-A*02:01, B cell immune epitope predictions, CD4 T-cell immune epitope predictions, CD8 T-cell immune epitope predictions, putative epitopes for CD8+ T cells with widespread HLA binding properties, and N-terminal SARS-CoV-2 putative MHC-II epitopes.
- Data sources (DOIs):
- 10.1101/2020.03.23.004176,
- 10.1101/2020.02.12.946087,
- 10.1101/2020.04.06.027805,
- 10.1101/2020/04.17.20061440
Sequence Variation:
- Dynamic tracks for using the browser to visualize and study sequence variation and diversity across strains over time.
- Data source (DOIs): 10.1101/2020.02.07.939124
Recombination Events:
- Recombination events detected by junction-spanning RNA-seq.
- Data sources (DOIs): 10.1038/s41586-020-2008-3 and DOI: 10.1016/j.cell.2020.04.011
Viral RNA Modifications:
- RNA modifications detected using Nanopore direct RNA sequencing.
- Data source (DOI): 10.1016/j.cell.2020.04.011
Viral RNA Expression:
- Viral RNA expression measured by Nanopore.
- Data source (DOI): 10.1016/j.cell.2020.04.011
SARS-CoV-2 Host Transcriptional Reponses Dataset:
- In addition to viral genomics pairwise alignments hosted on the browser, the WashU Virus Browser offers a unique view of host transcriptional responses to SARS-CoV-2 infection. When navigating to the virus browser landing page, the user can opt to view datahubs containing host responses by selecting the link “Host transcriptional responses to SARS-CoV-2” under the “Featured Datahubs” drop-down menu. Selecting this link redirects the user to the hg38 genome hosted in the WashU Epigenome Browser.
- Data source (DOI): 10.1016/j.cell.2020.04.026
Feedback, suggestions, or bug reports are welcome here: https://github.com/debugpoint136/WashU-Virus-Genome-Browser/issues