نوع مقاله : مقاله پژوهشی
نویسندگان
1 گروه علوم دامی، دانشکده کشاورزی، دانشگاه تربیت مدرس، تهران، ایران
2 گروه علوم دامی دانشکده کشاورزی دانشگاه تربیت مدرس تهران ایران
3 گروه علوم دامی، دانشکده کشاورزی، دانشگاه تربیت مدرس، تهران، ایران.
4 تحقیقات کشاورزی ویکتوریا، مرکز آگریبایوساینس، بوندورا، ویکتوریا ۳۰۸۳، استرالیا.
چکیده
کلیدواژهها
موضوعات
عنوان مقاله [English]
نویسندگان [English]
Knowledge about the association between single nucleotide polymorphisms (SNPs) and important economic traits is one of the crucial tools in breeding programs within the poultry industry. Genome-wide studies for discovering SNP variations related to these traits are often conducted using simple linear models. However, due to certain assumptions of these models, some SNP markers may not be identified. This study aimed to evaluate the performance of random forest and gradient boosting methods compared to linear models in identifying SNP markers associated with body weight traits at 6 and 9 weeks of age in F2 broiler chickens resulting from crosses between the commercial Arian line and native Urmia birds. The results showed that the machine learning approaches were able to identify important markers, such as GGaluGA308573, GGaluGA255033, Gga_rs13614212, Gga_rs13743072, GGaluGA258772, Gga_rs14034395, and Gga_rs13858398, associated with body weight traits, which were related to genes MAP2, ACSL1, CAMSAP2, FAM117B, SLC4A4, TIMP4, and LncRNA, respectively. These genes are primarily involved in cellular division, growth control, regulation of cellular skeleton structure and microtubules, and transcription activity, constituting the most important biological processes. The identification of these novel genes using machine learning methods, which were not detected by linear models and previous studies in this population, could provide new insights into genetic control of growth traits in broiler chickens. Moreover, the discovered significant markers can be utilized in genetic improvement programs for broiler chickens.
کلیدواژهها [English]
Extended Abstract
Introduction
By employing Genome-Wide Association Studies (GWAS) and identifying Single Nucleotide Polymorphisms (SNPs) and their associations with genes influencing traits, the necessary molecular information for marker and gene selection for improving quantitative traits has been provided. However, in GWAS studies, the most common approach used to examine the effects of markers is simple linear regression and P-value, which disregards issues such as increasing the false positive rate, overestimation of marker effects, ignoring linkage disequilibrium between markers, incorrect assumption of independence among markers, assumption of all genomic variables following a normal distribution, and neglecting the interaction effects between the SNP markers. Consequently, a suitable alternative to address these problems is conducting GWAS based on machine learning methods. The main objective of this study is to identify important and influential markers for body weight traits measured at 6 and 9 weeks of age in F2 chicken population using the Random Forest (RF) and Gradient Boosting (GB) methods and compare them with the linear model (LM) approach.
Material and methods
For the current study, body weight data at ages 6 and 9 weeks of 312 F2 chickens, resulting from two-way crossbreeding between fast-growing commercial Arian line and indigenous fowls from West Azerbaijan province, were used. At the age of 70 days, DNA from blood samples of chickens was extracted and stored at -20°C. These DNAs were used to identify the genotype of each bird using the 60k Illumina Chicken SNP BeadChip, containing 54,340 SNP markers provided by Cobb Vantress with the cooperation of Aarhus University of Denmark. The phenotypic data were adjusted for sex and hatching effects, and three methods, including a linear model, random forest, and gradient boosting, were used to identify the important markers. The top ten markers for body weight traits were identified for each method. In addition, genes located within the Mb-1 region above and below the three top markers, as identified by each method in the genomic region, were determined using the NCBI and Ensemble databases from the reference genome of chicken (Gallus Gallus).
Results and discussion
Identification of important markers and the corresponding genes using machine learning methods showed that some markers and genes influencing traits might not be identified by a linear model, indicating that machine learning methods including random forest and gradient boosting were suitable tools for selecting important markers. These associated markers were GGaluGA308573, GGaluGA255033, Gga_rs13614212, Gga_rs13743072, GGaluGA258772, Gga_rs14034395, and Gga_rs13858398. By examining only 3 of the most important markers in each method, new related genes such as MAP2, ACSL1, CAMSAP2, FAM117B, SLC4A4, TIMP4, and LncRNA were identified that were not detected by the linear model in previous studies. Literature results reported that these genes are regulating microtubule-stabilizing activity, cells’ shape, intestinal tissue during post-hatch development, augmenting adipogenesis, tissue growth and morphogenesis, fatty acid metabolism and abdominal fat deposition, intramuscular fat content, meat tenderness, and flavor, axon development, dendrite development, organelle organization, bicarbonate secretion and absorption, intracellular pH, and induction of apoptosis (Programmed death of cells) in chickens.
Conclusion
In this study, random forest, gradient boosting, and simple linear models were used to detect important marker’s associations with body weight traits at 6 and 9 weeks of age in broiler chickens using a 60k Chicken SNP BeadChip. The most important markers that were identified by the linear model were GGaluGA141221, GGaluGA142838. For machine learning methods, the top markers were GGaluGA308573, Gga_rs13743072, Gga_rs13614212, GGaluGA322130, and Gga_rs15763229. These markers were associated with genes that control several biochemical, physiological and biological functions in chickens. Results indicate that machine learning algorithms were able to identify new genes for body weight traits that were not previously identified by linear models.