![]() Variants were called using SAMtools (deprecated algorithms see Materials and methods). We chose strict alignment to focus on high quality reads. Sequence reads were mapped by Bowtie using strict alignment parameters (-v 3: entire read must align with three or fewer mismatches). Examples of amplicon ligation, distribution of fragmented products, and 12 indexed libraries are shown in Figure 2.įull size image Data analysis and variant calling We aimed for 30-fold coverage for each allele. These 12 indexed libraries were combined at equal molar concentrations and sequenced on one lane of a GAII (Illumina) using a 47-bp single-end module. The 12 pools of amplicons were individually blunt-end ligated and randomly fragmented for construction of sequencing libraries, each with a unique Illumina barcode. ![]() We separately amplified each of the 24 regions for each pool, then normalized and combined resulting PCR products at equal molar ratio. We pooled 40 DNA samples at equal concentration into 12 pools, which was done conveniently by combining samples from the same columns of five 96-well plates. The total targeted region is 6.7 kb per sample. We sequenced 24 exon-containing regions (250 to 300 bp) of a gene on chromosome 3, GRIP2 (encoding glutamate-receptor interacting protein 2 ) in 480 unrelated individuals (Figure 1). Additionally, PCR of pooled samples alleviates known technical issues associated with PCR multiplexing. ![]() This approach ensures low cost and maximal flexibility in study design compared to other techniques. We utilized a PCR-based amplicon-ligation method because PCR remains the most reliable method of template enrichment for selected regions in a complex genome. We anticipate that our pooling strategy and filtering algorithms can be easily adapted to other popular platforms of template enrichment, such as microarray capture and liquid hybridization. Compared to publicly available software, this strategy achieved an excellent combination of sensitivity and specificity for rare variant detection in pooled samples through a substantial reduction of false positive and false negative variant calls that often confound next-generation sequencing. We validated this strategy using Illumina sequencing data from an additional independent cohort of 480 samples. We utilized an alternative base-calling algorithm, Srfim, and an automated filtering program, SERVIC 4 E (Sensitive Rare Variant Identification by Cross-pool Cluster, Continuity, and tailCurve Evaluation), designed for sensitive and reliable detection of rare variants in pooled samples. For validation of this strategy, we present data from sequencing 12 indexed libraries of 40 samples each (total of 480 samples) using a single lane of a GAII Illumina Sequencer. We have optimized a flexible and efficient strategy that combines a PCR-based amplicon ligation method for template enrichment, sample pooling, and library indexing in conjunction with novel quality and filtering algorithms for identification of rare variants in large sample cohorts. Fifth, while pooling samples can reduce both labor and costs, it reduces sensitivity for the identification of rare variants using currently available next-generation sequencing strategies and bioinformatics tools. Fourth, generating sequence templates for target DNA regions in large numbers of samples is laborious and costly. Third, methods for individually indexing hundreds to thousands of samples are challenging to develop and limited in efficacy. Second, for target regions of tens to hundreds of kilobases or less for a single DNA sample, the smallest functional unit of a next-generation sequencer (for example, a single lane of an Illumina Genomic Analyzer II (GAII) or HiSeq2000 flow cell) generates a wasteful excess of coverage. First, it remains expensive to sequence a large number of samples despite a substantial cost reduction in available technologies. However, several technical and analytical challenges must be resolved to efficiently apply next-generation sequencing to large samples in individual laboratories. Sequencing large sample cohorts is essential to discover the full spectrum of genetic variants and provide sufficient power to detect differences in the allele frequencies between cases and controls. There is considerable interest in sequencing limited genomic regions such as sets of candidate genes and target regions identified by linkage and/or association studies. Recently, rare variants of large effect have been recognized as conferring substantial risks for common diseases and complex traits in humans. Next-generation sequencing and computational genomic tools permit rapid, deep sequencing for hundreds to thousands of samples.
0 Comments
Leave a Reply. |