Low coverage genome sequence and ovine 50k SNP chip creation

The consortium investigated various methods by which low sequence coverage of the ovine genome could be produced. In addition to various simulations using bovine and ovine sequence, it also included pilot studies using Sanger resequencing from existing sheep sequence and Roche 454 GS20 sequencing of previously sequenced sheep BACs to provide baseline information.

An important aspect of the process is to identify SNPs, their genomic location, estimate their minor allele frequency (MAF), and provide sufficient known surrounding unique sequence to design probes for their detection.

This was considered to be a challenge using existing technology with the following issues being identified:

Current Approach

The International Sheep Genomics Consortium.s immediate objective has been to skim sequence the ovine genome so as to identify SNPs in order to produce a 50k SNP chip. Roche 454 FLX sequencing technology is a new technology based on pyrosequencing. Simulations based on the limited ovine genomic sequence available, and results from a pilot ovine resequencing projects identified the following strategy: Roche 454 FLX technology to produce a 3x whole ovine genome coverage, consisting of 0.5x shotgun sequence coverage from 6 ewes. Each animal represents a different breed, and the resulting sequence was initially assembled using the bovine genome as a framework which has then been reorganized using the virtual sheep genome to create a sheep genome sequence.

The 454 approach has been used to produce more than 9 Gb of ovine sequence and to provide assembled and ordered sequence for approximately 76% of the unique portion of the ovine genome. It also allowed the detection of more than 590,000 probable SNPs with defined genomic locations of which more than 270,000 were classified as "class A" SNPs (both alleles seen in two sheep). This is sufficient to select from for goal of a 50k ovine SNP chip comprising equally spaced SNPs. Based on available information a 50k ovine SNP chip would be a resource where the mean linkage disequilibrium (r2) between adjacent SNPs would be in excess of 0.25, which is suitable for whole genome association studies.

The 454 sequencing phase of the project was completed by AgResearch in New Zealand sequencing 3 sheep representing the Romney, Texel and Scottish Blackface breeds and Baylor HGSC in Houston Texas sequencing 3 sheep representing the Merino, Poll Dorset and Awassi breeds.

The project had two additional components added to identify more SNPs as well as to estimate their minor allele frequency more accurately and improve the genome assembly. The first extension was to include ~4 Gbp of reduced representational sequencing (RRS) with an Illumina Genome Analyser (GA) to identify numerous additional SNPs and estimate their minor allele frequency using a technique outlined by Smith et al. (2008). The second extension has been to improve assembly by creation of paired end reads of various insert sizes and sequencing lengths using a combination of next generation and Sanger sequencing.

Roche FLX skim sequencing method

Sequencing

Diagram of 454 sequencing, sheep genome reordering and SNP detection strategy

Assembly

Method used for creating MELDed ovine sequence using bovine sequence as a template

SNP Detection

SNP detection from aligned sequence reads

Additional Sequencing

Groups Involved

Time Frame

Sequence 3x coverage using 454 FLX completed January 2008
Assemble and create MELD 454 sequence completed February 2008
Illumina GA RRS sequencing completed May 2008
BAC and RRS SNP detection completed July 2008
pilot testing of 454 SNPs completed July 2008
pilot testing of Solexa SNPs completed August 2008
SNP Chip design completed August 2008
SNP Chip synthesis and initial testing completed December 2008