Document Type : Research Paper
Authors
1 Department of Animal Science, Faculty of Agriculture,, Tarbiat Modares University, Tehran, Iran.
2 Department of Animal Science, Agricultural Faculty, Tarbiat Modares University, Tehran, Iran
3 Department of Animal Science, Faculty of Agriculture, Tarbiat Modares University, Tehran, Iran.
4 Agriculture Victoria Research, AgriBio, Centre for AgriBioscience, Bundoora, Victoria 3083, Australia.
Abstract
Keywords
Main Subjects
Extended Abstract
Introduction
By employing Genome-Wide Association Studies (GWAS) and identifying Single Nucleotide Polymorphisms (SNPs) and their associations with genes influencing traits, the necessary molecular information for marker and gene selection for improving quantitative traits has been provided. However, in GWAS studies, the most common approach used to examine the effects of markers is simple linear regression and P-value, which disregards issues such as increasing the false positive rate, overestimation of marker effects, ignoring linkage disequilibrium between markers, incorrect assumption of independence among markers, assumption of all genomic variables following a normal distribution, and neglecting the interaction effects between the SNP markers. Consequently, a suitable alternative to address these problems is conducting GWAS based on machine learning methods. The main objective of this study is to identify important and influential markers for body weight traits measured at 6 and 9 weeks of age in F2 chicken population using the Random Forest (RF) and Gradient Boosting (GB) methods and compare them with the linear model (LM) approach.
Material and methods
For the current study, body weight data at ages 6 and 9 weeks of 312 F2 chickens, resulting from two-way crossbreeding between fast-growing commercial Arian line and indigenous fowls from West Azerbaijan province, were used. At the age of 70 days, DNA from blood samples of chickens was extracted and stored at -20°C. These DNAs were used to identify the genotype of each bird using the 60k Illumina Chicken SNP BeadChip, containing 54,340 SNP markers provided by Cobb Vantress with the cooperation of Aarhus University of Denmark. The phenotypic data were adjusted for sex and hatching effects, and three methods, including a linear model, random forest, and gradient boosting, were used to identify the important markers. The top ten markers for body weight traits were identified for each method. In addition, genes located within the Mb-1 region above and below the three top markers, as identified by each method in the genomic region, were determined using the NCBI and Ensemble databases from the reference genome of chicken (Gallus Gallus).
Results and discussion
Identification of important markers and the corresponding genes using machine learning methods showed that some markers and genes influencing traits might not be identified by a linear model, indicating that machine learning methods including random forest and gradient boosting were suitable tools for selecting important markers. These associated markers were GGaluGA308573, GGaluGA255033, Gga_rs13614212, Gga_rs13743072, GGaluGA258772, Gga_rs14034395, and Gga_rs13858398. By examining only 3 of the most important markers in each method, new related genes such as MAP2, ACSL1, CAMSAP2, FAM117B, SLC4A4, TIMP4, and LncRNA were identified that were not detected by the linear model in previous studies. Literature results reported that these genes are regulating microtubule-stabilizing activity, cells’ shape, intestinal tissue during post-hatch development, augmenting adipogenesis, tissue growth and morphogenesis, fatty acid metabolism and abdominal fat deposition, intramuscular fat content, meat tenderness, and flavor, axon development, dendrite development, organelle organization, bicarbonate secretion and absorption, intracellular pH, and induction of apoptosis (Programmed death of cells) in chickens.
Conclusion
In this study, random forest, gradient boosting, and simple linear models were used to detect important marker’s associations with body weight traits at 6 and 9 weeks of age in broiler chickens using a 60k Chicken SNP BeadChip. The most important markers that were identified by the linear model were GGaluGA141221, GGaluGA142838. For machine learning methods, the top markers were GGaluGA308573, Gga_rs13743072, Gga_rs13614212, GGaluGA322130, and Gga_rs15763229. These markers were associated with genes that control several biochemical, physiological and biological functions in chickens. Results indicate that machine learning algorithms were able to identify new genes for body weight traits that were not previously identified by linear models.