Genetic classification of Azari and North ecotype Buffalo population using SVM method

Document Type : Research Paper

Authors

1 Ph.D. Student, Department of Animal Sciences, Faculty of Agricultural Sciences, University of Tabriz, Iran

2 Associate Professor, Department of Animal Sciences, Faculty of Agricultural Sciences, University of Tabriz, Iran

3 Professor, Department of Animal Sciences, University College of Agriculture & Natural Resources, University of Tehran, Karaj, Iran

4 Professor, Department of Animal Sciences, Faculty of Agricultural Sciences, University of Tabriz, Iran

Abstract

The purpose of this research was to classify buffaloes from different areas of the two Azari (West and East Azarbayjan and Ardabil provinces) and North (Guilan province) ecotypes using support vector machine method. A total of 258 buffalo were sampled and genotyped using the Axiom Buffalo 90K Genotyping Array at the Parco Technologic Padano lab in Italy. Two metric methods of cross validation and the area under the receiver operating characteristic (AUC) were used to determine the predictive performance of support vector machine (SVM) to classify individuals. The results of cross validation and methods for classifying different regions of the two ecotypes (4 provinces) were 92% and 96%, respectively that showed despite the difficulty of identifying individuals from provinces close to each other, support vector machine (SVM) method shows higher accuracy in assigning animals to their herds. Result of two ecotypes showed accuracy about 96% and 98% which represents the better ability to separate the two ecotypes. Machine learning method provides predictions for classification of each individual which can be efficient in quality control and genetic studies.

Keywords


  1. Akgundogdu, A., Jennane, R., Aufort, G., Benhamou, C.L. & Ucan, O.N. (2010). 3D image analysis and artificial intelligence for bone disease classification. Journal of Medical Systems, 34(5), 815-828.
  2. Alexander, D. H., Novembre, J. & Lange, K. (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome research, 19(9), 1655-1664.
  3. Anonymous.Statical center of Iran (2012). Available from: http: www.amar.org ir/
  4. Boser, B.E., Guyon, I.M. & Vapnik, V.N. (1992). A training algorithm for optimal margin classifiers. Paper presented at the Proceedings of the fifth annual workshop on Computational learning theory.
  5. Bridges, M., Heron, E. A., O'Dushlaine, C., Segurado, R., Morris, D., Corvin, A., ... Consortium, I. S. (2011a). Genetic classification of populations using supervised learning. PLoS ONE, 6(5), e14802.
  6. Bridges, M., Heron, E. A., O'Dushlaine, C., Segurado, R., Morris, D., Corvin, A., ... Consortium, I.S. (2011b). Genetic classification of populations using supervised learning.
  7. Brown, M. P., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., ... Haussler, D. (2000). Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences, 97(1), 262-267.
  8. Buturovic, L., Cohen, S., He, Z., Eggenberger, M., Nacci, D. & Petkovic, D. Supervised Classification of Genetic Sequences for Population Analysis.
  9. Cavalli-Sforza, L. L. & Feldman, M. W. (2003). The application of molecular genetic approaches to the study of human evolution. nature genetics, 33, 266-275.
  10. Cortes, C. & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
  11. Epps, C. W., Castillo, J. A., Schmidt-Küntzel, A., du Preez, P., Stuart-Hill, G., Jago, M. & Naidoo, R. (2013). Contrasting historical and recent gene flow among African buffalo herds in the Caprivi Strip of Namibia. Journal of Heredity, ess142.
  12. Fernández, M. E., Goszczynski, D. E., Lirón, J. P., Villegas-Castagnasso, E. E., Carino, M. H., Ripoli, M. V., ... Giovambattista, G. (2013). Comparison of the effectiveness of microsatellites and SNP panels for genetic identification, traceability and assessment of parentage in an inbred Angus herd. Genetics and molecular biology, 36(2), 185-191.
  13. Gao, X. & Starmer, J. (2007). Human population structure detection via multilocus genotype clustering. BMC genetics, 8(1), 34.
  14. Guerard, E., Heyer, E. & Manni, F. (2004). Geographic patterns of (genetic, morphologic, linguistic) variation: how barriers can be detected by using Monmonier's algorithm. Human biology, 76(2), 173-190.
  15. Guinand, B., Topchy, A., Page, K., Burnham-Curtis, M., Punch, W. & Scribner, K. (2002). Comparisons of likelihood and machine learning methods of individual classification. Journal of Heredity, 93(4), 260-269.
  16. Gutiérrez, S., Tardaguila, J., Fernández-Novales, J., Diago, M. P. & Scali, M. (2015). Support Vector Machine and Artificial Neural Network Models for the Classification of Grapevine Varieties Using a Portable NIR Spectrophotometer. PLoS ONE, 10(11), e0143197.
  17. Hand, D. J. (2009). Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine learning, 77(1), 103-123.
  18. Hastie, T., Tibshirani, R., Friedman, J. & Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2), 83-85.
  19. Hsu, C.-W., Chang, C.-C. & Lin, C.-J. (2003a). A practical guide to support vector classification.
  20. Hsu, C.-W., Chang, C.-C. & Lin, C.-J. (2003b). A practical guide to support vector classification.
  21. https://cran.r-project.org/web/packages/e1071/index.html.
  22. Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Paper presented at the Ijcai.
  23. Lao, O., Lu, T.T., Nothnagel, M., Junge, O., Freitag-Wolf, S., Caliebe, A., ... Comas, D. (2008). Correlation between genetic and geographic structure in Europe. Current Biology, 18(16), 1241-1248.
  24. Larrañaga, P., Calvo, B., Santana, R., Bielza, C., Galdiano, J., Inza, I., ... Pérez, A. (2006). Machine learning in bioinformatics. Briefings in bioinformatics, 7(1), 86-112.
  25. Limpiti, T., Intarapanich, A., Assawamakin, A., Shaw, P. J., Wangkumhang, P., Piriyapongsa, J., ... Tongsima, S. (2011). Study of large and highly stratified population datasets by combining iterative pruning principal component analysis and structure. BMC bioinformatics, 12(1), 255.
  26. Lin, B. Z., Sasazaki, S. & Mannen, H. (2010). Genetic diversity and structure in Bos taurus and Bos indicus populations analyzed by SNP markers. Animal science journal, 81(3), 281-289.
  27. Liu, L., Zhang, D., Liu, H. & Arendt, C. (2013). Robust methods for population stratification in genome wide association studies. BMC bioinformatics, 14(1), 1.
  28. Ma, J. & Amos, C. I. (2010). Theoretical formulation of principal components analysis to detect and correct for population stratification. PLoS ONE, 5(9), e12510.
  29. Marchini, J., Cardon, L. R., Phillips, M. S. & Donnelly, P. (2004). The effects of human population structure on large genetic association studies. Nature genetics, 36(5), 512-517.
  30. McTavish, E. J. & Hillis, D. M. (2014). A genomic approach for distinguishing between recent and ancient admixture as applied to cattle. Journal of Heredity, 105(4), 445-456.
  31. Naserian, A. A. & Saremi, B. (2010). Water buffalo industry in Iran. Italian Journal of Animal Science, 6(2s), 1404-1405.
  32. Price, A.L., Zaitlen, N.A., Reich, D. & Patterson, N. (2010). New approaches to population stratification in genome-wide association studies. Nature Reviews Genetics, 11(7), 459-463.
  33. Saitou, N. & Nei, M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution, 4(4), 406-425.
  34. Stehman, S. V. (1997). Selecting and interpreting measures of thematic classification accuracy. Remote sensing of Environment, 62(1), 77-89.
  35. Steinwart, I. & Christmann, A. (2008). Support vector machines: Springer Science & Business Media.
  36. Swets, J. A. (1988). Measuring the accuracy of diagnostic systems. Science, 240(4857), 1285-1293.
  37. Thomas, D. C. & Witte, J. S. (2002). Point: population stratification: a problem for case-control studies of candidate-gene associations? Cancer Epidemiology Biomarkers & Prevention, 11(6), 505-512.
  38. Vapnik, V. N. & Vapnik, V. (1998). Statistical learning theory (Vol. 1): Wiley New York.
  39. Vignal, A., Milan, D., SanCristobal, M. & Eggen, A. (2002). A review on SNP and other types of molecular markers and their use in animal genetics. Genetics Selection Evolution, 34(3), 275-306.
  40. Wacholder, S., Rothman, N. & Caporaso, N. (2002). Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiology Biomarkers & Prevention, 11(6), 513-520.
  41. Wright, S. (1949). The genetical structure of populations. Annals of eugenics, 15(1), 323-354.
  42. Wright, S. (1969). Evolution and the genetics of populations: Vol. 2. The theory of gene frequencies.
  43. Ziv, E. & Burchard, E.G. (2003). Human population structure and genetic association studies. Pharmacogenomics, 4(4), 431-441.