Comparison of PCA and DAPC methods for analysis of Iranian Buffalo population structure using SNPchip90k data

Document Type : Research Paper

Authors

1 Former Ph.D. Student, Department of Animal Sciences, Faculty of Agricultural Sciences, University of Tabriz, Iran

2 Assistant Professor, Department of Animal Sciences, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran

3 Professor, Department of Animal Sciences, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran

Abstract

Understanding of population genetic structure is valuable for better implementation of breeding programs and most importantly, preservation of genetic resources. Genomic data provide an opportunity to consider complex evolutionary history of populations and reconstruct rare historical events. In this research, the structure of Iranian buffalo populations was studied by using principal component analysis and discriminant analysis principal component methods. For this purpose, the number of 404 buffalos from three breeds including North, Azari and Khozestani were sampled and genotyped by SNPChip 90k from Padano Company in Italy. The results of principal component analysis and discriminant analysis principal component showed a clear picture of the genetic structure of the studied populations. Assessing the optimal number of clusters with criteria BIC, K = 3 by the DAPC method showed the best results. The result of cross-validation for retaining principal components was optimized to 50 first components that showed the lowest MSE. In this study, DAPC predicted assignment of individuals to clusters and membership probabilities with 100% accuracy. PCA method was not able to provide a group assessment and DAPC method outperformed than PCA in achieving a clear variance difference between populations. DAPC method can be applied in quality control and stratification population correction of GWAS as an alternative to the PCA because of summarizing the genetic differentiation between groups and overlooking within-group variation and providing better population structure.

Keywords


  1. Abdi, H. (2007). Bonferroni and Šidák corrections for multiple comparisons. Encyclopedia of Measurement and Statistics, pp. 103-107.
  2. Agricultural ministry statistics. (2012).Tehran, Ministry of Agriculture, Deputy Director of Planning and Economics, Technology Center of Information and Communication. Vol 2. (in Farsi)
  3. Alberts, C.C., Ribeiro-Paes, J. T., Aranda-Selverio, G., Cursino-Santos, J. R., Moreno-Cotulio, V. R., Oliveira, A. L., Porchia, B. F., Santos, W. F. & Souza, E. B. (2010). DNA extraction from hair shafts of wild Brazilian felids and canids. Genet Mol Res, 9(4), 2429-2435.
  4. Barendse, W., Harrison, B. E., Bunch, R. J., Thomas, M. B. & Turner, L. B. (2009). Genome wide signatures of positive selection: the comparison of independent samples and the identification of regions associated to traits. BMC Genomics, 10(1), 178.
  5. Ekblom, R. & Galindo, J. (2011). Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity, 107(1), 1-15.
  6. Epps, C. W., Castillo, J. A., Schmidt-Küntzel, A., du Preez, P., Stuart-Hill, G., Jago, M. & Naidoo, R. (2013). Contrasting historical and recent gene flow among African buffalo herds in the Caprivi Strip of Namibia. Journal of Heredity, ess142.
  7. Grimberg, J., Nawoschik, S., Belluscio, L., McKee, R., Turck, A. & Eisenberg, A. (1989). A simple and efficient non-organic procedure for the isolation of genomic DNA from blood. Nucleic Acids Research, 17(20), 8390-8390.
  8. Jemaa, S. B., Boussaha, M., Mehdi, M. B., Lee, J. H. & Lee, S. H. (2015). Genome-wide insights into population structure and genetic history of Tunisian local cattle using the illumina bovinesnp50 beadchip. BMC Genomics, 16(1), 1.
  9. Jolliffe, I. (2002). Principal component analysis. Wiley Online Library.
  10. Jombart, T. & Collins, C. (2015). A tutorial for discriminant analysis of principal components (DAPC) using adegenet 2.0. 0. London: Imperial College London, MRC Centre for Outbreak Analysis and Modelling.
  11. Jombart, T., Devillard, S. & Balloux, F. (2010). Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC genetics, 11(1), 94.
  12. Laloë, D., Jombart, T.  Dufour, A.-B. & Moazami-Goudarzi, K.  (2007). Consensus genetic structuring and typological value of markers using multiple co-inertia analysis. Genetics Selection Evolution, 39(5), 1-23.
  13. Lao, O., Lu, T. T., Nothnagel, M., Junge, O., Freitag-Wolf, S., Caliebe, A., Balascakova, M., Bertranpetit, J., Bindoff, L. A. & Comas, D. (2008). Correlation between genetic and geographic structure in Europe. Current Biology, 18(16), 1241-1248.
  14. Lee, C., Abdool, A., & Huang, C.-H. (2009). PCA-based population structure inference with generic clustering algorithms. BMC bioinformatics, 10(Suppl 1), S73.
  15. Li, Q. & Yu, K. (2008). Improved correction for population stratification in genome‐wide association studies by identifying hidden population structures. Genetic epidemiology, 32(3), 215-226.
  16. Liu, N. & Zhao, H. (2006). A non-parametric approach to population structure inference using multilocus genotypes. Human genomics, 2(6), 1.
  17. Patterson, N., Price, A. L.  & Reich, D. (2006). Population structure and eigenanalysis. PLoS Genet, 2(12), e190.
  18. Peason, K. (1901). On lines and planes of closest fit to systems of point in space. Philosophical Magazine, 2, 559-572.
  19. Pometti, C. L., Bessega, C. F., Saidman, B. O. & Vilardi, J. C. (2014). Analysis of genetic population structure in Acacia caven (Leguminosae, Mimosoideae), comparing one exploratory and two Bayesian-model-based methods. Genetics and Molecular Biology, 37(1), 64-72.
  20. Price, A. L., Patterson, N. J., Plenge, R. M.,  Weinblatt, M. E., Shadick, N. A. & Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics, 38(8), 904-909.
  21. Sethuraman, A. (2013). On inferring and interpreting genetic population structure-applications to conservation, and the estimation of pairwise genetic relatedness. Ph.D. Thesis.Iowa State University, Paper 13332. U.S.
  22. Teo, Y. Y., Fry, A. E., Clark, T. G., Tai, E. & Seielstad, M. (2007). On the usage of HWE for identifying genotyping errors. Annals of Human genetics, 71(5), 701-703.
  23. Thomas, D. C. & Witte, J. S. (2002). Point: population stratification: a problem for case-control studies of candidate-gene associations? Cancer Epidemiology Biomarkers & Prevention, 11(6), 505-512.
  24. Wacholder, S., Rothman, N. & Caporaso, N. (2002). Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiology Biomarkers & Prevention, 11(6), 513-520.