شناسایی ژن‌های موثر بر صفات رشد در جوجه های گوشتی با استفاده از روش‌های رگرسیون خطی و یادگیری ماشین

نوع مقاله : مقاله پژوهشی

نویسندگان

1 گروه علوم دامی، دانشکده کشاورزی، دانشگاه تربیت مدرس، تهران، ایران

2 گروه علوم دامی دانشکده کشاورزی دانشگاه تربیت مدرس تهران ایران

3 گروه علوم دامی، دانشکده کشاورزی، دانشگاه تربیت مدرس، تهران، ایران.

4 تحقیقات کشاورزی ویکتوریا، مرکز آگریبایوساینس، بوندورا، ویکتوریا ۳۰۸۳، استرالیا.

چکیده

آگاهی از ارتباط چندشکلی­های تک­نوکلئوتیدی با صفات مهم اقتصادی یکی از ابزارهای مهم برنامه­­های اصلاح نژاد در صنعت طیور است. مطالعات پویش ژنومی برای کشف چندشکلی­های تک­نوکلئوتیدی (نشان­گرها) مرتبط با این صفات، اغلب با استفاده از مدل­های خطی ساده صورت می­گیرد که به دلیل وجود برخی از فرضیات این مدل­ها، ممکن است بعضی از نشان­گرها شناسایی نشوند. این مطالعه با هدف ارزیابی کارآیی روش­های جنگل تصادفی و گرادیان بوستینگ و ارزیابی عملکرد آن­ها در مقابل مدل خطی برای شناسایی نشان­گرهای همبسته با صفات وزن بدن در سنین 6 و 9 هفتگی در جوجه­های گوشتی نسل دوم حاصل از تلاقی­های دوطرفه لاین تجاری آرین با پرنده­های بومی ارومیه انجام شد. نتایج نشان داد که دو روش یادگیری ماشین توانستند نشان‌گرهای مهمی از جمله GGaluGA308573، GGaluGA255033، Gga_rs13614212، Gga_rs13743072، GGaluGA258772، Gga_rs14034395 و Gga_rs13858398 را برای صفات وزن بدن شناسایی کنند که به ترتیب با ژن­های MAP2، ACSL1، CAMSAP2، FAM117B، SLC4A4، TIMP4 و LncRNA در ارتباط بودند. تقسیم سلولی، کنترل رشد، تنظیم ساختار  اسکلت سلولی و میکروتوبول، و فعالیت رونویسی مهمترین فرآیند بیولوژیکی این ژن­ها می­باشند. مطالعه ژن‌های جدید شناسایی شده توسط روش­های یادگیری ماشین، که مدل خطی قادر به شناسایی آن­ها در جمعیت مورد مطالعه نبودند، می‌تواند بینش جدیدی را برای کنترل ژنتیکی صفات رشد در جوجه‌های گوشتی باز کند. علاوه بر این، نشان­گرهای با اهمیت کشف شده، قابلیت استفاده در برنامه­های اصلاح ژنتیکی جوجه‌های گوشتی را دارند.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Identification of genes affecting growth traits in broiler chickens using linear regression and machine learning methods

نویسندگان [English]

  • Hossein Bani Saadat 1
  • Rasoul Vaez Torshizi 2
  • Ali Akbar Masoudi 3
  • Alireza Ehsani 3
  • Saleh Shahinfar 4
1 Department of Animal Science, Faculty of Agriculture,, Tarbiat Modares University, Tehran, Iran.
2 Department of Animal Science, Agricultural Faculty, Tarbiat Modares University, Tehran, Iran
3 Department of Animal Science, Faculty of Agriculture, Tarbiat Modares University, Tehran, Iran.
4 Agriculture Victoria Research, AgriBio, Centre for AgriBioscience, Bundoora, Victoria 3083, Australia.
چکیده [English]

Knowledge about the association between single nucleotide polymorphisms (SNPs) and important economic traits is one of the crucial tools in breeding programs within the poultry industry. Genome-wide studies for discovering SNP variations related to these traits are often conducted using simple linear models. However, due to certain assumptions of these models, some SNP markers may not be identified. This study aimed to evaluate the performance of random forest and gradient boosting methods compared to linear models in identifying SNP markers associated with body weight traits at 6 and 9 weeks of age in F2 broiler chickens resulting from crosses between the commercial Arian line and native Urmia birds. The results showed that the machine learning approaches were able to identify important markers, such as GGaluGA308573, GGaluGA255033, Gga_rs13614212, Gga_rs13743072, GGaluGA258772, Gga_rs14034395, and Gga_rs13858398, associated with body weight traits, which were related to genes MAP2, ACSL1, CAMSAP2, FAM117B, SLC4A4, TIMP4, and LncRNA, respectively. These genes are primarily involved in cellular division, growth control, regulation of cellular skeleton structure and microtubules, and transcription activity, constituting the most important biological processes. The identification of these novel genes using machine learning methods, which were not detected by linear models and previous studies in this population, could provide new insights into genetic control of growth traits in broiler chickens. Moreover, the discovered significant markers can be utilized in genetic improvement programs for broiler chickens.

کلیدواژه‌ها [English]

  • nucleotide polymorphisms
  • Genome-wide association studies
  • broiler chickens
  • machine learning

Extended Abstract

Introduction

By employing Genome-Wide Association Studies (GWAS) and identifying Single Nucleotide Polymorphisms (SNPs) and their associations with genes influencing traits, the necessary molecular information for marker and gene selection for improving quantitative traits has been provided. However, in GWAS studies, the most common approach used to examine the effects of markers is simple linear regression and P-value, which disregards issues such as increasing the false positive rate, overestimation of marker effects, ignoring linkage disequilibrium between markers, incorrect assumption of independence among markers, assumption of all genomic variables following a normal distribution, and neglecting the interaction effects between the SNP markers. Consequently, a suitable alternative to address these problems is conducting GWAS based on machine learning methods. The main objective of this study is to identify important and influential markers for body weight traits measured at 6 and 9 weeks of age in F2 chicken population using the Random Forest (RF) and Gradient Boosting (GB) methods and compare them with the linear model (LM) approach.

 

Material and methods

For the current study, body weight data at ages 6 and 9 weeks of 312 F2 chickens, resulting from two-way crossbreeding between fast-growing commercial Arian line and indigenous fowls from West Azerbaijan province, were used. At the age of 70 days, DNA from blood samples of chickens was extracted and stored at -20°C. These DNAs were used to identify the genotype of each bird using the 60k Illumina Chicken SNP BeadChip, containing 54,340 SNP markers provided by Cobb Vantress with the cooperation of Aarhus University of Denmark. The phenotypic data were adjusted for sex and hatching effects, and three methods, including a linear model, random forest, and gradient boosting, were used to identify the important markers. The top ten markers for body weight traits were identified for each method. In addition, genes located within the Mb-1 region above and below the three top markers, as identified by each method in the genomic region, were determined using the NCBI and Ensemble databases from the reference genome of chicken (Gallus Gallus).

 

Results and discussion

Identification of important markers and the corresponding genes using machine learning methods showed that some markers and genes influencing traits might not be identified by a linear model, indicating that machine learning methods including random forest and gradient boosting were suitable tools for selecting important markers. These associated markers were GGaluGA308573, GGaluGA255033, Gga_rs13614212, Gga_rs13743072, GGaluGA258772, Gga_rs14034395, and Gga_rs13858398. By examining only 3 of the most important markers in each method, new related genes such as MAP2, ACSL1, CAMSAP2, FAM117B, SLC4A4, TIMP4, and LncRNA were identified that were not detected by the linear model in previous studies. Literature results reported that these genes are regulating microtubule-stabilizing activity, cells’ shape, intestinal tissue during post-hatch development, augmenting adipogenesis, tissue growth and morphogenesis, fatty acid metabolism and abdominal fat deposition, intramuscular fat content, meat tenderness, and flavor, axon development, dendrite development, organelle organization, bicarbonate secretion and absorption, intracellular pH, and induction of apoptosis (Programmed death of cells) in chickens.

 

Conclusion

In this study, random forest, gradient boosting, and simple linear models were used to detect important marker’s associations with body weight traits at 6 and 9 weeks of age in broiler chickens using a 60k Chicken SNP BeadChip. The most important markers that were identified by the linear model were GGaluGA141221, GGaluGA142838. For machine learning methods, the top markers were GGaluGA308573, Gga_rs13743072, Gga_rs13614212, GGaluGA322130, and Gga_rs15763229. These markers were associated with genes that control several biochemical, physiological and biological functions in chickens. Results indicate that machine learning algorithms were able to identify new genes for body weight traits that were not previously identified by linear models.

Arabnejad, M., Montgomery, C. G., Gaffney, P. M., McKinney, B. A. (2020). Nearest-neighbor projected distance regression for epistasis detection in GWAS with population structure correction. Frontier Genetics, 11:784.
Bahadoran, S., Dehghani Samani, A., & Hassanpour, H. (2018). Effect of heat stress on the gene expression of ion transporters/channels in the uterus of laying hens during eggshell formation. Stress, 21(1), 51-58.
Baştanlar, Y., & Özuysal, M. (2014). Introduction to machine learning. Methods in Molecular Biology, 1107: 105-128.
Boulesteix, A. L., Janitza, S., ruppa, J. K., & König I. R. (2012). Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Technical Report. Department of Statistics, University of Munich.
Breiman, L. (2001). Random forests. Machine. Learning, 45, 5-32.
Brew, K., Dinakarpandian, D., & Nagase, H. (2000). Tissue inhibitors of metalloproteinases: evolution, structure and function. Biochimica et Biophysica Acta (BBA)-Protein Structure and Molecular Enzymology, 1477(1-2), 267-283.
Cha, J., Choo, H., Srikanth, K., Lee, S. H., Son, J. W., Park, M. R., Kim, N., Jang, G. W., & Park, J. E. (2021). Genome-wide association study identifies 12 loci associated with body weight at age 8 weeks in Korean native chickens. Genes, 12(8), 1170.
Cremonesi, P., Capoferri, R., Pisoni, G., Del Corvo, M., Strozzi, F., Rupp, R., Caillat, H., Modesto, P., Moroni, P., Williams, J. L., & Castiglioni, B. (2012). Response of the goat mammary gland to infection with Staphylococcus aureus revealed by gene expression profiling in milk somatic and white blood cells. BMC Genomics, 13(1), 1-17.
Dadousis, C., Somavilla, A., Ilska, J. J., Johnsson, M., Batista, L., Mellanby, R. J., Headon, D., Gottardo, P., Whalen, A., Wilson, D., & Dunn, I. C. (2021). A genome-wide association analysis for body weight at 35 days measured on 137,343 broiler chickens. Genetics Selection Evolution, 53, 1-14.
Dzomba, E. F., Chimonyo, M., Snyman, M. A., & Muchadeyi, F. C. (2020). The genomic architecture of South African mutton, pelt, dual‐purpose and nondescript sheep breeds relative to global sheep populations. Animal Genetics, 51 (6), 910-923.
Emrani, H., Vaez Torshizi, R., Masoudi, A. A., & Ehsani, A. (2017). Identification of new loci for body weight traits in F2 chicken population using genome-wide association study. Livestock Science, 206, 125–131.
Enoma, D. O., Bishung, J., Abiodun, T., Ogunlana, O., & Osamor, V. C. (2022). Machine learning approaches to genome-wide association studies. Journal of King Saud University-Science34(4), 101847.
Fehm, L., Kern, W., & Peters, A. (2004). Body weight regulation through the central nervous system. The development of a pathogenetically based adiposity therapy. Medizinische Klinik, 99 (11), 674-679.
Friedman, J. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38,
367–378
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29 (5), 1189-1232.
Goddard M. E., & Hayes, B. J. (2009) Mapping genes for complex traits in domestic
animals and their use in breeding programmes. Nature Reviews Genetics, 10, 381–
391.
Goldstein, B. A., Hubbard, A. E., Cutler A., & Barcellos L. F. (2010). An application of random forests to a genome-wide association dataset: Methodological considerations and new findings. BMC Genetics, 11, 49.
Harada, A., Teng, J., Takei, Y., Oguchi, K., & Hirokawa, N. (2002). MAP2 is required for dendrite elongation, PKA anchoring in dendrites, and proper PKA signal transduction. The Journal of Cell Biology, 158 (3), 541-549.
Hayes B. (2013). Overview of Statistical Methods for Genome-Wide Association Studies (GWAS). Methods in Molecular Biology, 1019, 149-69.
Hong E. P. & Park J. W. (2012). Sample size and statistical power calculation in genetic association studies. Genomics Inform, 10(2), 117-22.
Hrabia, A., Miska, K. B., Schreier, L. L., Proszkowiec-Weglarz, M. (2022) Altered gene expression of selected matrix metalloproteinase system proteins in the broiler chicken gastrointestinal tract during post-hatch development and coccidia infection. Poultry Science, 101(8):101915.
Hu, T., Darabos, C., & Urbanowicz, R. (2020). Machine learning in genome-wide association studies. Frontiers in Genetics11, 593958.
Jin, C. F., Chen, Y. J., Yang, Z. Q., Shi, K., & Chen, C. K. (2015). A genome-wide association study of growth trait-related single nucleotide polymorphisms in Chinese Yancheng chickens. Genetics and Molecular Research, 14 (4), 15783-15792.
Kanakachari, M., Ashwini, R., Chatterjee, R. N., & Bhattacharya, T. K. (2021). Transcriptome analysis reveals potential mechanisms and pathways underlying embryonic development with respect to muscle growth and egg production in slow and fast growing chickens. BMC Genomics, 13 (1), 58.
Li, B., Zhang, N., Wang, Y. G., George, A. W., Reverter, A., & Li, Y. (2018). Genomic prediction of breeding values using a subset of SNPs identified by three machine learning methods. Frontiers in Genetics, 9, 237.
Liaw, A., & Wiener, M. (2015). randomForest: Breiman and Cutler’s random forests for classification and regression. R package version4, 14.
Liu, L., Cui, H., Xing, S., Zhao, G., & Wen, J. (2019). Effect of divergent selection for intramuscular fat content on muscle lipid metabolism in chickens. Animals, 10(1), 4.
Lyu, S., Yang, P., Liu, Y., Song, T., Zhang, Z., Shi, Q., Chen, F., Liu, X., Li, Z., Ru, B., & Cai, C. (2021). Genetic effects of MOGAT1 gene SNP in growth traits of Chinese cattle. Gene, 769, 145201.
Mebratie, W., Madsen, P., Hawken, R. Rome, H., Marois, D., Henshall, J., Bovenhuis, H., & Jensen J. (2019). Genetic parameters for body weight and different definitions of residual feed intake in broiler chickens. Genetics Selection Evolution, 51, 53.
Nicodemus, K. K., Malley, J. D., Strobl, C., & Ziegler, A. (2010). The behaviour of random forest permutation-based variable importance measures under predictor correlation. BMC Bioinformatics. 11,110.
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., Maller, J., Sklar, P., De Bakker, P. I., Daly, M. J., &  Sham, P. C. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics, 81 (3), 559-575.
Ren, T., Li, Z., Zhou, Y., Liu, X., Han, R., Wang, Y., Yan, F., Sun, G., Li, H., & Kang, X. (2018). Sequencing and characterization of lncRNAs in the breast muscle of Gushi and Arbor Acres chickens. Genome, 61(5), 337-347.
Resnyk, C. W., Carré, W., Wang, X., Porter, T. E., Simon, J., Le Bihan-Duval, E., Duclos, M. J., Aggrey, S. E., &  Cogburn, L.A. (2017). Transcriptional analysis of abdominal fat in chickens divergently selected on bodyweight at two ages reveals novel mechanisms controlling adiposity: validating visceral adipose tissue as a dynamic endocrine and metabolic organ. BMC Genomics, 18, 1-31.
Ridgeway, G. (2013). Package’GBM’: Generalized Boosted Regression Models. R Package version, 2.
Srikanth, K., Lee, S. H., Chung, K. Y., Park, J. E., Jang, G. W., Park, M. R., Kim, N. Y., Kim, T. H., Chai, H. H.,
Sun, J., Zhang, C., Lan, X., Lei, C., & Chen, H. (2012). Exploring polymorphisms and associations of the bovine MOGAT3 gene with growth traits. Genome55(1), 56-62.
Sun, S., Dong, B., & Zou, Q. (2021). Revisiting genome-wide association studies from statistical modelling to machine learning. Briefings in Bioinformatics22(4), 263.
Teuliere, J., Cordes, S., Singhvi, A., Talavera, K., & Garriga, G. (2014). Asymmetric neuroblast divisions producing apoptotic cells require the cytohesin GRP-1 in Caenorhabditis ElegansGenetics, 198(1), 229-247.
Tian, W., Wang, D., Wang, Z., Jiang, K., Li, Z., Tian, Y., Kang, X., Liu, X., & Li, H. (2021). Evolution, expression profile, and regulatory characteristics of ACSL gene family in chicken (Gallus Gallus). Gene, 764, 145094.
Wang, J., Yuan, X., Ye, S., Huang, S., He, Y., Zhang, H., Li, J., Zhang, X., & Zhang, Z. (2019). Genome wide association study on feed conversion ratio using imputed sequence data in chickens. Asian-Australasian Journal of Animal Sciences, 32 (4), 494-500.
Willer, C. J., Schmidt, E. M., Sengupta, S., Peloso, G. M., Gustafsson, S., Kanoni, S., Ganna, A., Chen, J., Buchkovich, M.L., Mora, S., & Beckmann, J.S. (2013). Discovery and refinement of loci associated with lipid levels. Nature Genetics, 45 (11), 1274-1283.
Wray, N. R., Yang, J., Hayes, B. J., Price, A. L., Goddard, M. E., & Visscher, P. M. (2013). Pitfalls of predicting complex traits from SNPs. Nature Reviews Genetics, 14 (7), 507–515.
Yates, A., Akanni, W., Amode, M. R., Barrell, D., Billis, K., Carvalho-Silva, D., Cummins, C., Clapham, P., Fitzgerald, S., Gil, L., & Girón, C. G. (2016). Ensembl 2016. Nucleic Acids Research, 44, 710-716.
Yue, Q., Chen, Y., Chen, H., & Zhou, R. (2022). Transcriptome profile reveals novel candidate genes associated with bone strength in end-of-lay hens. Animal Biotechnology, 1-9.
Zhang, G. X., Fan, Q. C., Zhang, T., Wang, J. Y., Wang, W. H., Xue, Q., & Wang, Y. J. (2015). Genome-wide association study of growth traits in the Jinghai Yellow chicken. Genetics and Molecular Research, 14(4), 15331-15338.