Sequencing for the BABS Genome is being performed at the Ramaciotti Centre for Genomics at UNSW, which is one of Australia’s top sequencing centres and has a long, rich history of genome sequencing.
The Gold Standard for genome assembly is currently to combine three technologies:
High coverage short read sequencing for accurate base calling of unique regions.
Long read sequencing for assembling complex and small repetitive regions of the genome.
Long range sequencing for scaffolding contigs across larger repetitive regions of the genome.
We will be using a combination of three of these latest technologies for the BABS genome:
Illumina NovaSeq and HiSeq X
Short read Illumina sequencing is still the starting point for sequencing large (>0.5 Gb) genomes. Although it is impossible to assemble short read data alone into a high-quality genome, it remains the most cost-effective technology in terms of high-quality bases sequenced per dollar. Illumina sequencing struggles with regions of the genome with certain compositional bias and short read assembly fails at repetitive regions. Nevertheless, it is possible to get a useful assembly of a large portion of the “unique” genome, which includes most of the protein-coding genes.
For the 2017 BABS genome, we are using two of the latest - and most cost-effective - Illumina sequencing platform: the HiSeq X (XTen) and new HiSeq NovaSeq. These machines have a phenomenal output per run. The NovaSeq is being used for pure Illumina sequencing, whereas the HiSeq X is being used for the sequencing component of the 10X Genomics Linked Read sequencing (below).
Whole genome sequencing and assembly has been revolutionised by the development of long read sequencing technologies by Pacific Biosciences (PacBio) and Oxford Nanopore (MinION). With typical read lengths a hundred times longer than Illumina reads, long read sequencing enables resolution of many of the shorter repetitive regions in the genome.
Long read sequencing is still comparably expensive and the budget does not stretch for a pure PacBio assembly this year. However, we will be getting some sequencing done on the new PacBio Sequel, which will help with scaffolding Illumina contigs. We also hope to be able to generate a pure PacBio mitochondrial genome; mitochondria are present in multiple copies per cell, which effectively increases the depth of coverage!
10X Genomics Chromium Linked Reads
Due to the cost (and DNA requirements) of long read sequencing, there has been considerable effort in recent years to combine cost-effective Illumina short read sequencing with additional experimental approaches to leverage long-range information. The long range service offered by the Ramaciotti Centre is 10X Genomics Chromium linked read sequencing. Unlike PacBio or MinION, this does not contiguously sequence a long DNA molecule. Instead, it uses a clever barcoding system to link short reads back to their DNA molecule of origin. 10X Genomics software then uses this linkage to regenerate pseudo-long-reads that can be used for both genome assembly and haplotype phasing.