1.5k Pilot SNP Array
The 1.5K pilot sheep SNP array project investigated methods for developing SNP arrays for sheep. Over 2,600 genomic targets with positions on the Virtual Sheep Genome (VSG) were resequenced using Sanger sequencing. A 1.5K Illumina SNP array was constructed and used to genotype sheep from 23 domestic breeds and two wild sheep species, as well as the International Mapping Flock (IMF). The genotyping information was used to investigate levels of polymorphism within the breeds and to develop an updated linkage map.
Information for all SNP is accessible through the Virtual Sheep Genome. The browser provides access to all the sequence trace files, sequence alignments, primers used for analysis, and gives genomic position and minor allele frequency data for each SNP. A downloadable zip file containing the genotypes at 1,406 SNP from 28 sheep populations is available here. The datafile is formatted for analysis using the Arlequin program.
The information gained from the pilot resequencing project demonstrated that the Sanger resequencing strategy was successful, but identified that it was an impractical (both timewise and expensewise) approach for large scale SNP discovery in sheep. Consequently, the ISGC has decided to use the cheaper Roche 454 and Illumina Solexa whole genome shotgun and reduced representational sequencing instead of Sanger resequencing for the sheep HapMap project. However, the Sanger resequencing project and subsequent genotyping provided valuable information on sequence and breed diversity. More details of the approach used are given below.
The aim of this pilot project was to investigate methods for developing the first high density genotyping array of SNPs evenly distributed across the sheep genome so as to get best recovery of SNPs per unit sequence and cost. Two main strategies were compared for creating a high density sheep SNP chip - Sanger resequencing and next generation sequencing methods (454, Illumina Genome Analyzer - Solexa).
Sanger resequencing strategy and 1.5k SNP array development
Step 1. Identify targets for resequencing
Two subsets of the sheep genome were available for resequencing:
- 370 K BAC end sequences (ISGC - USDA, AWI and MLA)
- 140 K expressed sequences tags (ESTs - Ovita)
Target selection was based on
- targets being free of repetitive sequence
- equal spacing across the genome
- association with some known genes
Primers were designed using an automated pipeline.
Step 2. Sequence through a panel of animals
- Texel - BAC library DNA donor (USDA Tim Smith)
- NZ Romney (AgResearch, John McEwan)
- Poll Dorset (sheepGENOMICS, Hutton Oddy, ram provided by George Carter)
- Merino (sheepGENOMICS, Hutton Oddy, ram provided by UNE)
- Lacaune (INRA, Andre Eggen)
- Gulf Coast Native (USU, Noelle Cockett & Jim Miller)
- Katahdin (USU, Noelle Cockett & Jim Miller)
- Red Maasai (ILRI, Olivier Hanotte)
- Awassi (University of Sydney, Herman Raadsma)
- a pool of the above 9 sheep
- 2 pools of 24 sheep per pool (pool 1 Texel/Romney, pool 2 Merino/Poll Dorset)
The collection of DNA samples from the animals and pools was completed in June 2007. The resequencing component of the pilot phase for 3,000 amplicons was then performed by the AGRF, and was completed in February 2007.
Step 3. Identify SNPs and design and create a SNP chip
The project resulted in 49,077 useable sequences (deposited in GenBank GSS and Trace archives, accessions ET114568-ET163644) with around 1 SNP being detected for every ~250 bp of sequence. About 50% of these SNPs had properties of interest in terms of appropriate spacing across the genome, a minor allele frequency greater than 0.15, and the ability to be multiplexed into a SNP chip
Figure 1. SNPs identified from processing trace files
Figure 2. SNP chips can query many thousands of loci at one time
Step 4. 1.5k Illumina sheep SNP array and genotyping of selected sheep resources
A 1.5k pilot SNP chip was then designed. The pilot chip was then tested on 413 sheep from a range of breeds and various resources (including the IMF and UTAH5000 sheep radiation hybrid panel) using Illumina technology at Johns Hopkins University and University of Alberta. The genotype work was completed in late 2007 and the data was analysed in 2008. A series of findings were compiled concerning:
- Minor allele frequencies across 23 breeds
- Genetic diversity within breeds
- Genetic distance between sheep populations
- Identification of a core set of 384 SNP which identify population substructure. The SNP type and flanking sequence for each marker can be obtained here.
Fortnightly phone discussions (Tuesday 9:00 Australian Eastern Standard Time) are held that discuss the progress of this work. To participate in these discussions contact James Kijas.