Impact of genotype imputation and different genomic architectures on the performance of random forest and threshold Bayes A methods for genomic prediction

Document Type : Research Paper


Assistant Professor of Genetics and Animal Breeding, Islamic Azad University, Astara Branch, Astara, Iran


Genomic selection using imputed genotypes can have an important role in increasing economic efficiency andthe genetic improvement of the threshold traits. The objective of this study was to: investigate the accuracy of imputation and to evaluate its effect on area under receiver operating characteristic (AUROC) of threshold BayesA (TBA) and random forest (RF) algorithms for discrete traits with different genomic architectures. Genomic data were simulated to reflect variations in heritability (0.30 and 0.10), number of QTL (108 and 1080) and linkage disequilibrium (low and high) for 27 chromosomes. To simulate a condition close to reality, we randomly masked markers with 50% and 90% missing rate for each scenario; afterwards, missing genotypes were imputed and imputation accuracy was estimated. In the last step, to evaluate the AUROC of TBA and RF, original or imputed genotypes were used. The accuracy of imputation was improved with increasing level of LD and decreased missing rate. The total average of AUROC values were 0.64 and 0.66 when using RF and TBA, respectively. Comparing to original genotypes, using imputed genotypes with 50% and 90% missing rate decreased the average AUROC about 0.013 and 0.02 for RF and 0.0018 and 0.026 for TBA, respectively. Despite the higher AUROC of TBA at different scenarios, RF showed a better performance in large number QTL. Generally, genomic prediction based on imputed genotypes (5K) can be implemented to reduce of the cost of a genomic evaluation.


  1. Berry, D. P. & Kearney, J. F. (2011). Imputation of genotypes from low- to high-density genotyping platforms and implications for genomic selection. Animal, 5(8), 1162-1169.
  2. Boison, S., Neves, H. H. d. R., O’Brien, A. P., Utsunomiya, Y. T., Carvalheiro, R., da Silva, M., Sölkner, J. & Garcia, J. F. (2014). Imputation of non-genotyped individuals using genotyped progeny in Nellore, a Bos indicus cattle breed. Livestock Science, 89, 166-176.
  3. Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
  4. Calus, M., De Haas, Y., Pszczola, M. & Veerkamp, R. (2013). Predicted accuracy of and response to genomic selection for new traits in dairy cattle. Animal, 7, 183-191.
  5. Carvalheiro, R., Boison, S. A., Neves, H. H., Sargolzaei, M., Schenkel, F. S., Utsunomiya, Y. T., O'Brien, A. M. P., Sölkner, J., McEwan, J. C. & Van Tassell, C. P. (2014). Accuracy of genotype imputation in Nelore cattle. Genetics Selection Evolution, 11,  44- 69.
  6. Chen, L., Li, C., Sargolzaei, M. & Schenkel, F. (2014). Impact of genotype imputation on the performance of GBLUP and Bayesian methods for genomic prediction. PLoS One, 9(8), 1-7.
  7. Daetwyler, H. D., Wiggans, G. R, Hayes, B. J., Woolliams, J. A. & Goddard, M. E. (2011). Imputation of missing genotypes from sparse to high density using long-range phasing. Genetics, 189(1), 317-327.
  8. Dekkers, J. C. (2002). The use of molecular genetics in the improvement of agricultural populations. Nature Reviews Genetics, 3, 22-33.
  9. Felipe1, V. P. S., Okut, H., Gianola, D., Silva, M. A.  & Rosa, G. J. M. (2014). Effect of genotype imputation on genome-enabled prediction of complex traits: an empirical study with mice data. BMC Genetics, 15(149), 1-10.
  10. Goddard, M. E. & Hayes, B. J. (2009). Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nature Reviews Genetics, 10, 381-391.
  11. Goldstein, B. A., Hubbard, A. E., Cutler, A. & Barcellos, L. F. (2010). An application of Random Forests to a genome-wide association dataset: methodological considerations & new findings. BMC Genetics, 1, 11-49.
  12. González-Recio, O. & Forni, S. (2011). Genome-wide prediction of discrete traits using Bayesian regressions and machine learning. Genetics Selection Evolution, 43(7), 1-12.
  13. Hand, D. J. (2009). Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine Learning, 77, 103-123.
  14. Hayes, B. (2007). QTL mapping, MAS, and genomic selection. A short-course. Animal Breeding & Genetics Department of Animal Science. IowaState University, 1, 3-4.
  15. Hayes, B. J., Bowman, P. J., Chamberlain, A. & Goddard, M. (2009). Invited review: Genomic selection in dairy cattle: Progress and challenges. Journal of dairy science, 92, 433-443.
  16. Hickey, J. M., Crossa, J., Babu, R. & de los Campos, G. (2012). Factors affecting the accuracy of genotype imputation in populations from several maize breeding programs. Crop Science, 52, 654-663.
  17. Ke, X., Hunt, S., Tapper, W., Lawrence, R., Stavrides, G., Ghori, J., Whittaker, P., Collins, A., Morris, A. P. & Bentley, D. (2004). The impact of SNP density on fine-scale patterns of linkage disequilibrium. Human Molecular Genetics, 13, 577-588.
  18. Khatkar, M. S., Moser, G., Hayes, B. J. & Raadsma, H. W. (2012). Strategies and utility of imputed SNP genotypesfor genomic analysis in dairy cattle. BMC genomics, 13(1), 526-538.
  19. Meuwissen, T., Hayes, B. & Goddard, M. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157, 1819-1829.
  20. Montaldo, H. H. (2006). Genetic engineering applications in animal breeding. Electronic Journal of Biotechnology, 9(2), 157-170.
  21. Muir, W. (2007). Comparison of genomic and traditional BLUP estimated breeding value accuracy and selection response under alternative trait and genomic parameters. Journal of Animal Breeding and Genetics, 124, 342-355.
  22. Mulder, H., Calus, M., Druet, T. & Schrooten, C. (2012). Imputation of genotypes with low-density chips and its effect on reliability of direct genomic values in Dutch Holstein cattle. Journal of Dairy Science, 95, 876-889.
  23. Naderi, S., Yin, T. & König, S. (2016). Random forest estimation of genomic breeding values for disease susceptibility over different disease incidences and genomic architectures in simulated cow calibration groups. Journal of Dairy Science, 99, 7261-7273.
  24. Nejati-Javaremi, A., Smith, C. & Gibson, J. (1997). Effect of total allelic relationship on accuracy of evaluation and response to selection. Journal of Animal Science, 75, 1738-1745.
  25. Nguyen, T.-T., Huang, J. Z., Wu, Q., Nguyen, T. T. & Li, M. J. (2015) Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests. BMC Genomics, 16(5), 1-11.
  26. Ogawa, S., Matsuda, H., Taniguchi, Y., Watanabe, T., Takasuga, A., Sugimoto, Y. & Iwaisaki, H. (2016). Accuracy of imputation of single nucleotide polymorphism marker genotypes from low density panels in Japanese Black cattle. Animal Science Journal, 87, 3-12.
  27. Pausch, H., MacLeod, I. M., Fries, R., Emmerling, R., Bowman, P. J., Daetwyler, H. D. & Goddard, M. E. (2017). Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle. Genetics Selection Evolution, 49(24), 1-14.
  28. Pimentel, E., Edel, C., Emmerling, R. & Götz, K.-U. (2015). How imputation errors bias genomic predictions. Journal of dairy science, 98, 4131-4138.
  29. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., Maller, J., Sklar, P., De Bakker, P. I. & Daly, M. J. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81, 559-575.
  30. Sargolzaei, M. & Schenkel, F. S. (2009). QMSim: a large-scale genome simulator for livestock. Bioinformatics, 25(5), 680-681.
  31. Sargolzaei, M., Chesnais, J. & Schenkel, F. (2011). FImpute-An efficient imputation algorithm for dairy cattle populations. Journal of Dairy Science, 94(1), 421-422.
  32. Solberg, T., Sonesson, A. & Woolliams, J. (2008). Genomic selection using different marker types and densities. Journal of Animal Science, 86, 2447-2454.
  33. Sun, X., Fernando, R. & Dekkers, J. (2016). Contributions of linkage disequilibrium and co-segregation information to the accuracy of genomic prediction. Genetics Selection Evolution, 48(77), 1-18.
  34. Swets, J. A. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285-1293.
  35. Toghiani, S., Aggrey, S. & Rekaya, R. (2016). Multi-generational imputation of single nucleotide polymorphism marker genotypesand accuracy of genomic selection. Animal, 10, 1077-1085.
  36. VanRaden, P., Null, D., Sargolzaei, M., Wiggans, G., Tooker, M., Cole, J., Sonstegard, T., Connor, E., Winters, M. & van Kaam, J. (2013). Genomic imputation and evaluation using high-density Holstein genotypes. Journal of Dairy Science, 96, 668-6678.
  37. Ventura, R. V., Miller, S. P., Dodds, K. G., Auvray, B., Lee, M., Bixley, M., Clarke, S. M. & McEwan, J. C. (2016). Assessing accuracy of imputation using different SNP panel densities in a multi-breed sheep population. Genetics Selection Evolution, 48(71), 1-20.
  38. Villumsen, T., Janss, L. & Lund, M. (2009). The importance of haplotype length and heritability using genomic selection in dairy cattle. Journal of Animal Breeding and Genetics, 126, 3-13.
  39. Wang, Q., Yu, Y., Yuan, J., Zhang, X., Huang, H., Li, F. & Xiang, J. (2017). Effects of marker density and population structure on the genomic prediction accuracy for growth trait in Pacific white shrimp Litopenaeus vannamei. BMC genetics, 18(45), 1-9.
  40. Weigel, K. A., De Los Campos, G., Vazquez, A. I., Rosa, G. J. M., Gianola, D. & Van Tassell, C. P. (2010). Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle. Journal of Dairy Science, 93(11), 5423–5435.
  41. Yin, T., Pimentel, E., Borstel, U. K. v. & König, S. (2014). Strategy for the simulation and analysis of longitudinal phenotypic and genomic data in the context of a temperature× humidity-dependent covariate. Journal of Dairy Science, 97, 2444-2454.
  42. Zhang, Z., Ding, X., Liu, J., Zhang, Q. & de Koning, D. J. (2011). Accuracy of genomic prediction using low-density marker panels. Journal of Dairy Science, 94, 3642-3650.