Low coverage genome sequence and ovine 60K SNP chip creation
The consortium has investigated various methods by which low sequence coverage of the ovine genome could be produced. In addition to various simulations using bovine and ovine sequence, it has also included pilot studies using Sanger resequencing from existing sheep sequence and Roche 454 GS20 sequencing of previously sequenced sheep BACs to provide baseline information.

An important aspect of the process is to identify SNPs, their genomic location, estimate their minor allele frequency (MAF), and provide sufficient known surrounding unique sequence to design probes for their detection.

This is a challenge using existing technology; however, several aspects have already been identified.
  • The sequencing would make use of the virtual sheep genome to provide a framework for genome assembly.
  • The new sequencing technologies result in very short sequence lengths, this means sequencing needs to use a divide and conquer approach to assembly, even then assembly though even modest lengths of repetitive sequence is a challenge.
  • The short contigs almost certainly need to be ordered and orientated against a reference genome such as the bovine.
  • The best technology for SNP detection and estimation of MAF provides insufficient genomic sequence for probe design and genome positioning.
Current Approach
The International Sheep Genomics Consortium’s immediate objective is to skim sequence the ovine genome so as to identify SNPs in order to produce a 60K SNP chip. Roche 454 FLX sequencing technology is a new technology based on pyrosequencing. Simulations based on the limited ovine genomic sequence available, and results from a pilot ovine resequencing projects identified the following strategy: Roche 454 FLX technology would be used to produce a 3x whole genome coverage, consisting of 0.5x shotgun sequence coverage from 6 ewes. Each animal would represent a different breed, and the resulting sequence would be assembled using the bovine genome as a framework which would then be reorganized using the virtual sheep genome.

This approach has been estimated to provide assembled and ordered sequence for approximately 60% of the ovine genome. It would also detect 286,000 probable SNPs with defined genomic locations of which 180,000 would potentially be useful to select from for construction of a 60K ovine SNP chip comprising equally spaced SNPs. Based on available information this would provide a resource where the mean linkage disequilibrium (r2) between adjacent SNPs would be in excess of 0.25, which is suitable for whole genome association studies.

Sequencing is of this phase is nearing completion with AgResearch in New Zealand sequencing the Romney, Texel and Scottish Blackface and Baylor HGSC in Houston Texas sequencing Merino, Poll Dorset and Awassi breeds.

The project has recently had two additional components added to identify more SNPs as well as estimate their minor allele frequency more accurately and improve the genome assembly. The first extension is to include ~4 Gbp of reduced representational sequencing with an Illumina Solexa Genome Analyser to identify numerous additional SNPs and estimate their minor allele frequency using a technique outlined by Smith et al. (2008). The second extension is to improve assembly by creation of paired end reads of various insert sizes and sequencing lengths using a combination of next generation and Sanger sequencing.

Roche FLX skim sequencing method
Sequencing
  • Six animals (females), each of different breeds were selected (Fig. 1)
        - different breeds help identify SNPs with higher minor allele frequencies (MAF)
        - females chosen to equalise representation of the X chromosome
  • DNA isolated from white blood cells using standard Protease K digestion and salt ethanol precipitation
  • Each animal sequenced to 0.5 x genome coverage (1.5 Gbp) via Roche 454 FLX
  • Two 454 FLX libraries made per animal with each library titrated and the best used
Diagram of 454 sequencing, sheep genome reordering and SNP detection strategy

Assembly
  • 454 reads repeat masked with an in-house repeats database consisting of repbase bovine repeats coupled with CAP3 assembled ovine FLX sequence segments found to have >1000 hits in the bovine genome
  • Unique hits matched to location on bovine genome
  • MEGABLAST used with options -D 3 -t 21 -W 11 -q -3 -r 2 -G 5 -E 2 -s 56 -N 2 -F "m D" -U T
  • Unique is defined as being where only a single hit occurred with an e value of less than 1e-5, or multiple hits were present with the ratio e top hit/e second hit being less than 1e-20
  • Retrieved raw reads matching bovine scaffold segments (typically < 2 Mbp) and assembled using Newbler
  • Position orientated Newbler ovine contigs on to bovine scaffold
  • Summarised as a virtual ovine sequence (MELD, see Fig 2)
  • Reorder MELDed ovine segments using ovine BES information and VSG into ovine genome order
Method used for creating MELDed ovine sequence using bovine sequence as a template

SNP Detection
  • align sequence reads to 454 MELD sequence (Fig 3)
  • filter high quality SNPs based on:
       - unique genomic match,
       - high quality sequence, no flanking SNPs,
       - not within or adjacent to a homopolymeric run,
       - at least 2 reads of minor allele preferably from different animals,
       - at least 50 bp of flanking sequence on both sides
SNP detection from aligned sequence reads

Current modifications and extensions
  • The consortium is now also using reduced representational sequencing with Illumina Solexa genome analyser (Curt Van Tassell, Tim Smith & James Kijas pers comm).
  • 60 animals (primarily female) and ~1% of the genome sequenced to 20X depth/run
  • 4 Solexa sequencing runs of 1 Gbp with 35 bp reads should generate at least 150,000 high quality SNPs
  • Solexa sequences to be positioned on Roche 454 MELDed sequence to provide genome location of SNP and sufficient flanking sequence for probe design
  • We expect the majority of SNPs selected for use on the 60k chip will originate from this approach
  • Paired end reads to aid de novo assembly are being created on a limited trial basis by Baylor HGSC
Groups involved
  • Funding: Ovita (New Zealand), ISL Grant (Sydney University, Australia), Genesis Faraday (United Kingdom)
  • Roche 454 FLX sequencing: AgResearch, University of Otago and Baylor HGSC
  • Illumina Solexa reduced representational sequencing: CSIRO, Illumina, USDA
  • Assembly: AgResearch, Baylor HGSC, CSIRO
  • SNP detection: AgResearch, Baylor HGSC, CSIRO, USDA
Time Frame
  Sequence 3X coverage using 454 FLX complete early January 2008
  Assemble and create MELD sequence target late January 2008
  Illumina RRS sequencing target early February 2008
  SNP detection and selection target mid March 2008
  SNP Chip creation and validation target end June 2008

Last modified: 5th January 2008
Maintainers: John McEwan, AgResearch, Jill Maddox, University of Melbourne
Email: jillm@rubens.its.unimelb.edu.au