Monday, 28 August 2017

Where do our snakes come from?

The snakes we are sequencing for the BABS Genome project were kindly supplied by Nathan Dunstan at Venom Supplies as a collaborative contribution to Paul Waters and Denis O’Meally when they were at ANU. Thanks Nathan!

We have sequenced two Tiger snake parents, originally caught from the southeast of South Australia (just north of Mt Gambier) in about 2004. They were bred at Venom Supplies, and we have also sequenced one of the babies (sex unknown) born in February 2013.

The brown snake was a female from a clutch of eggs from a gravid (pregnant) female caught locally in the Barossa.

Photo Credits

Tiger Snake (left): Teneche [CC BY-SA 3.0] | Brown Snake (right): Denis O'Meally.

Tuesday, 15 August 2017

Linked read sequencing is go!

We already have over four billion reads and 620 GB of NovaSeq Illumina data for our three tiger snakes; next week’s BABS3291 prac will look at some of the early ABySS assemblies of one of these snakes.

Phase 2 of the sequencing is now go! 10x Chromium linked read libraries were prepared at the Ramaciotti Centre for Genomics last week for one tiger snake and one eastern brown snake. These data promise to make much easier and more intact genome assemblies. We received notification today that the samples have arrived in the KCCG sequencing laboratory at the Garvan Institute for Illumina HiSeq X (“XTen”) sequencing.

Nobody knows how well linked read sequencing, which is optimised for human genomes, will work in a snake but we look forward to finding out!

Friday, 4 August 2017

Important considerations for sample preparation

Today we had a tutorial on the things you need to think about during a genome sequencing project. The first student suggestion for sample selection and handling is good advice for life:

Thursday, 3 August 2017

Sequencing technologies used for the BABS Genome

Sequencing for the BABS Genome is being performed at the Ramaciotti Centre for Genomics at UNSW, which is one of Australia’s top sequencing centres and has a long, rich history of genome sequencing.

The Gold Standard for genome assembly is currently to combine three technologies:

  1. High coverage short read sequencing for accurate base calling of unique regions.

  2. Long read sequencing for assembling complex and small repetitive regions of the genome.

  3. Long range sequencing for scaffolding contigs across larger repetitive regions of the genome.

We will be using a combination of three of these latest technologies for the BABS genome:

Illumina NovaSeq and HiSeq X

Short read Illumina sequencing is still the starting point for sequencing large (>0.5 Gb) genomes. Although it is impossible to assemble short read data alone into a high-quality genome, it remains the most cost-effective technology in terms of high-quality bases sequenced per dollar. Illumina sequencing struggles with regions of the genome with certain compositional bias and short read assembly fails at repetitive regions. Nevertheless, it is possible to get a useful assembly of a large portion of the “unique” genome, which includes most of the protein-coding genes.

For the 2017 BABS genome, we are using two of the latest - and most cost-effective - Illumina sequencing platform: the HiSeq X (XTen) and new HiSeq NovaSeq. These machines have a phenomenal output per run. The NovaSeq is being used for pure Illumina sequencing, whereas the HiSeq X is being used for the sequencing component of the 10X Genomics Linked Read sequencing (below).

PacBio Sequel

Whole genome sequencing and assembly has been revolutionised by the development of long read sequencing technologies by Pacific Biosciences (PacBio) and Oxford Nanopore (MinION). With typical read lengths a hundred times longer than Illumina reads, long read sequencing enables resolution of many of the shorter repetitive regions in the genome.

Long read sequencing is still comparably expensive and the budget does not stretch for a pure PacBio assembly this year. However, we will be getting some sequencing done on the new PacBio Sequel, which will help with scaffolding Illumina contigs. We also hope to be able to generate a pure PacBio mitochondrial genome; mitochondria are present in multiple copies per cell, which effectively increases the depth of coverage!

10X Genomics Chromium Linked Reads

Due to the cost (and DNA requirements) of long read sequencing, there has been considerable effort in recent years to combine cost-effective Illumina short read sequencing with additional experimental approaches to leverage long-range information. The long range service offered by the Ramaciotti Centre is 10X Genomics Chromium linked read sequencing. Unlike PacBio or MinION, this does not contiguously sequence a long DNA molecule. Instead, it uses a clever barcoding system to link short reads back to their DNA molecule of origin. 10X Genomics software then uses this linkage to regenerate pseudo-long-reads that can be used for both genome assembly and haplotype phasing.