گروه‌بندی ژنتیکی گاومیش های بومی آذری و شمالی با روش شبکۀ عصبی SVM

نوع مقاله : مقاله پژوهشی

نویسندگان

1 دانشجوی دکتری، گروه علوم دامی، دانشکدۀ کشاورزی، دانشگاه تبریز

2 استادیار، گروه علوم دامی، پردیس کشاورزی و منابع طبیعی، دانشگاه تهران، کرج

3 استاد، گروه علوم دامی، پردیس کشاورزی و منابع طبیعی، دانشگاه تهران، کرج

4 دانشیار، گروه علوم دامی، دانشکدۀ کشاورزی، دانشگاه تبریز

5 استاد، گروه علوم دامی، دانشکدۀ کشاورزی، دانشگاه تبریز

چکیده

هدف این تحقیق گروه‌بندی گاومیش‌های استان­های آذربایجان شرقی، غربی و اردبیل از بوم‌جور (اکوتیپ) آذری و استان گیلان از بوم‌جور شمالی و درنهایت قابلیت جداسازی افراد مناطق مختلف با روش یادگیری ماشین بود. به شمار 258 گاومیش از مناطق مختلف دو بوم‌جور شمالی و آذری نمونه‌گیری شد و با استفاده از SNPChip 90K مربوط به شرکت افی متریکس در کشور ایتالیا تعیین ژنوتیپ شد. برای پیش‌بینی عملکرد روش ماشین بردار پشتیبان برای گروه‌بندی افراد، دو روش متریک اعتبارسنجی متقابل و سطح زیر منحنی مشخصۀ عملکرد سامانۀ (AUC) اعمال شد. نتایج آزمون اعتبارسنجی متقابل و سطح زیر منحنی برای گروه‌بندی افراد چهار منطقه به ترتیب 92 و 96 درصد بود که گویای این است که باوجود نزدیک بودن افراد گله‌های مختلف و سخت بودن جداسازی این افراد، روش ماشین بردار پشتیبان با درستی بالایی، توانایی اختصاص دادن افراد به گله­های مربوط به خود را دارد. نتایج آزمون­های اعتبارسنجی متقابل و سطح زیر منحنی مشخصه عملکرد سامانه برای دو بوم‌جور به ترتیب برابر 96 و 98 درصد بود که نشان‌دهندۀ قابلیت جداسازی بهتر دو بوم‌جور است. روش یادگیری ماشین با توجه به این موارد و با پیش‌بینی‌هایی که برای گروه‌بندی هر فرد انجام می­دهد می­تواند در کنترل کیفیت و کاربردهای ژنتیکی کارآمد باشد.

کلیدواژه‌ها


عنوان مقاله [English]

Genetic classification of Azari and North ecotype Buffalo population using SVM method

نویسندگان [English]

  • Zahra Azizi 1
  • Hossein Moradi Shahrbabak 2
  • Mohammad Moradi Shahrbabak 3
  • Abbas Rafat 4
  • Jalil Shodja 5
1 Ph.D. Student, Department of Animal Sciences, Faculty of Agricultural Sciences, University of Tabriz, Iran
2 Associate Professor, Department of Animal Sciences, Faculty of Agricultural Sciences, University of Tabriz, Iran
3 Professor, Department of Animal Sciences, University College of Agriculture & Natural Resources, University of Tehran, Karaj, Iran
4 Associate Professor, Department of Animal Sciences, Faculty of Agricultural Sciences, University of Tabriz, Iran
5 Professor, Department of Animal Sciences, Faculty of Agricultural Sciences, University of Tabriz, Iran
چکیده [English]

The purpose of this research was to classify buffaloes from different areas of the two Azari (West and East Azarbayjan and Ardabil provinces) and North (Guilan province) ecotypes using support vector machine method. A total of 258 buffalo were sampled and genotyped using the Axiom Buffalo 90K Genotyping Array at the Parco Technologic Padano lab in Italy. Two metric methods of cross validation and the area under the receiver operating characteristic (AUC) were used to determine the predictive performance of support vector machine (SVM) to classify individuals. The results of cross validation and methods for classifying different regions of the two ecotypes (4 provinces) were 92% and 96%, respectively that showed despite the difficulty of identifying individuals from provinces close to each other, support vector machine (SVM) method shows higher accuracy in assigning animals to their herds. Result of two ecotypes showed accuracy about 96% and 98% which represents the better ability to separate the two ecotypes. Machine learning method provides predictions for classification of each individual which can be efficient in quality control and genetic studies.

کلیدواژه‌ها [English]

  • Buffalo
  • Classification
  • SNPChip 90K
  • Support vector Machine
  1. Akgundogdu, A., Jennane, R., Aufort, G., Benhamou, C.L. & Ucan, O.N. (2010). 3D image analysis and artificial intelligence for bone disease classification. Journal of Medical Systems, 34(5), 815-828.
  2. Alexander, D. H., Novembre, J. & Lange, K. (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome research, 19(9), 1655-1664.
  3. Anonymous.Statical center of Iran (2012). Available from: http: www.amar.org ir/
  4. Boser, B.E., Guyon, I.M. & Vapnik, V.N. (1992). A training algorithm for optimal margin classifiers. Paper presented at the Proceedings of the fifth annual workshop on Computational learning theory.
  5. Bridges, M., Heron, E. A., O'Dushlaine, C., Segurado, R., Morris, D., Corvin, A., ... Consortium, I. S. (2011a). Genetic classification of populations using supervised learning. PLoS ONE, 6(5), e14802.
  6. Bridges, M., Heron, E. A., O'Dushlaine, C., Segurado, R., Morris, D., Corvin, A., ... Consortium, I.S. (2011b). Genetic classification of populations using supervised learning.
  7. Brown, M. P., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., ... Haussler, D. (2000). Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences, 97(1), 262-267.
  8. Buturovic, L., Cohen, S., He, Z., Eggenberger, M., Nacci, D. & Petkovic, D. Supervised Classification of Genetic Sequences for Population Analysis.
  9. Cavalli-Sforza, L. L. & Feldman, M. W. (2003). The application of molecular genetic approaches to the study of human evolution. nature genetics, 33, 266-275.
  10. Cortes, C. & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
  11. Epps, C. W., Castillo, J. A., Schmidt-Küntzel, A., du Preez, P., Stuart-Hill, G., Jago, M. & Naidoo, R. (2013). Contrasting historical and recent gene flow among African buffalo herds in the Caprivi Strip of Namibia. Journal of Heredity, ess142.
  12. Fernández, M. E., Goszczynski, D. E., Lirón, J. P., Villegas-Castagnasso, E. E., Carino, M. H., Ripoli, M. V., ... Giovambattista, G. (2013). Comparison of the effectiveness of microsatellites and SNP panels for genetic identification, traceability and assessment of parentage in an inbred Angus herd. Genetics and molecular biology, 36(2), 185-191.
  13. Gao, X. & Starmer, J. (2007). Human population structure detection via multilocus genotype clustering. BMC genetics, 8(1), 34.
  14. Guerard, E., Heyer, E. & Manni, F. (2004). Geographic patterns of (genetic, morphologic, linguistic) variation: how barriers can be detected by using Monmonier's algorithm. Human biology, 76(2), 173-190.
  15. Guinand, B., Topchy, A., Page, K., Burnham-Curtis, M., Punch, W. & Scribner, K. (2002). Comparisons of likelihood and machine learning methods of individual classification. Journal of Heredity, 93(4), 260-269.
  16. Gutiérrez, S., Tardaguila, J., Fernández-Novales, J., Diago, M. P. & Scali, M. (2015). Support Vector Machine and Artificial Neural Network Models for the Classification of Grapevine Varieties Using a Portable NIR Spectrophotometer. PLoS ONE, 10(11), e0143197.
  17. Hand, D. J. (2009). Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine learning, 77(1), 103-123.
  18. Hastie, T., Tibshirani, R., Friedman, J. & Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2), 83-85.
  19. Hsu, C.-W., Chang, C.-C. & Lin, C.-J. (2003a). A practical guide to support vector classification.
  20. Hsu, C.-W., Chang, C.-C. & Lin, C.-J. (2003b). A practical guide to support vector classification.
  21. https://cran.r-project.org/web/packages/e1071/index.html.
  22. Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Paper presented at the Ijcai.
  23. Lao, O., Lu, T.T., Nothnagel, M., Junge, O., Freitag-Wolf, S., Caliebe, A., ... Comas, D. (2008). Correlation between genetic and geographic structure in Europe. Current Biology, 18(16), 1241-1248.
  24. Larrañaga, P., Calvo, B., Santana, R., Bielza, C., Galdiano, J., Inza, I., ... Pérez, A. (2006). Machine learning in bioinformatics. Briefings in bioinformatics, 7(1), 86-112.
  25. Limpiti, T., Intarapanich, A., Assawamakin, A., Shaw, P. J., Wangkumhang, P., Piriyapongsa, J., ... Tongsima, S. (2011). Study of large and highly stratified population datasets by combining iterative pruning principal component analysis and structure. BMC bioinformatics, 12(1), 255.
  26. Lin, B. Z., Sasazaki, S. & Mannen, H. (2010). Genetic diversity and structure in Bos taurus and Bos indicus populations analyzed by SNP markers. Animal science journal, 81(3), 281-289.
  27. Liu, L., Zhang, D., Liu, H. & Arendt, C. (2013). Robust methods for population stratification in genome wide association studies. BMC bioinformatics, 14(1), 1.
  28. Ma, J. & Amos, C. I. (2010). Theoretical formulation of principal components analysis to detect and correct for population stratification. PLoS ONE, 5(9), e12510.
  29. Marchini, J., Cardon, L. R., Phillips, M. S. & Donnelly, P. (2004). The effects of human population structure on large genetic association studies. Nature genetics, 36(5), 512-517.
  30. McTavish, E. J. & Hillis, D. M. (2014). A genomic approach for distinguishing between recent and ancient admixture as applied to cattle. Journal of Heredity, 105(4), 445-456.
  31. Naserian, A. A. & Saremi, B. (2010). Water buffalo industry in Iran. Italian Journal of Animal Science, 6(2s), 1404-1405.
  32. Price, A.L., Zaitlen, N.A., Reich, D. & Patterson, N. (2010). New approaches to population stratification in genome-wide association studies. Nature Reviews Genetics, 11(7), 459-463.
  33. Saitou, N. & Nei, M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution, 4(4), 406-425.
  34. Stehman, S. V. (1997). Selecting and interpreting measures of thematic classification accuracy. Remote sensing of Environment, 62(1), 77-89.
  35. Steinwart, I. & Christmann, A. (2008). Support vector machines: Springer Science & Business Media.
  36. Swets, J. A. (1988). Measuring the accuracy of diagnostic systems. Science, 240(4857), 1285-1293.
  37. Thomas, D. C. & Witte, J. S. (2002). Point: population stratification: a problem for case-control studies of candidate-gene associations? Cancer Epidemiology Biomarkers & Prevention, 11(6), 505-512.
  38. Vapnik, V. N. & Vapnik, V. (1998). Statistical learning theory (Vol. 1): Wiley New York.
  39. Vignal, A., Milan, D., SanCristobal, M. & Eggen, A. (2002). A review on SNP and other types of molecular markers and their use in animal genetics. Genetics Selection Evolution, 34(3), 275-306.
  40. Wacholder, S., Rothman, N. & Caporaso, N. (2002). Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiology Biomarkers & Prevention, 11(6), 513-520.
  41. Wright, S. (1949). The genetical structure of populations. Annals of eugenics, 15(1), 323-354.
  42. Wright, S. (1969). Evolution and the genetics of populations: Vol. 2. The theory of gene frequencies.
  43. Ziv, E. & Burchard, E.G. (2003). Human population structure and genetic association studies. Pharmacogenomics, 4(4), 431-441.