We utilise statistical models to differentiate between genetic and environmental influences on an individual's phenotype. These methods are useful in agriculture for selection purposes and in human medicine for the personalisation of treatment. The task of separating genetic and environmental effects on the phenotype becomes challenging when dealing with small herds that have low genetic connectedness. Typically, environmental effects are described using information about the herd to which the animal belongs. Moreover, we can include environmental variables such as climate, but such data is often not available. Alternatively, we can incorporate the location's effect, since we assume that geographically closer herds share some environmental influences. Therefore, information about the spatial location can help us distinguish between sources of variability on the phenotype.
In this thesis, we focus on various approaches to model the effect of location on phenotype using linear mixed models. We evaluate the contribution of such modelling in improving the estimation of animals' genetic values. We analyse a dataset consisting of 30,314 observations from the population of brown cattle in Slovenia. To better reflect the characteristics of small-scale breeding programs, which are characterized by smaller herds and poor genetic connectivity between herds, we focus on analysing a subset of the data containing 3,800 observations. We model a standardised physical trait, chest girth, using selected fixed and random effects. We include up to three effects as random effects, with the genetic effect being present in all models. The baseline model G comprises only the genetic effect. In the GH model, we include genetic and herd effects, while in the GS model, we include genetic and location effects, with location effect being modelled in three ways. We incorporate all three random effects in the full GHS model. The location effect is modelled using three different methods: the Besag regional model, the SPDE method, and the exponential covariance function. We estimate the models using the INLA method from the R-INLA package. The Besag model and the model with an radial basis covariance function are also estimated in the BLUPF90 family of programs.
We have found that the full GHS model provides the best fit for the analysed subset of data based on DIC statistics. The baseline model G, which only considers the genetic effect as a random effect, fails to adequately distinguish between environmental and genetic influences on the phenotypic value, as it attributes too much of the variability to the genetic effect. The inclusion of the random herd effect allows us to account for differences at the farm level, while the inclusion of the geographical location helps us explain the more general influence of climate and other environmental factors. We observe a positive trend in the difference between estimated random genetic values from models G and GHS, as well as from models GH and GHS, against the estimated random spatial effect. This suggests an overestimation of genetic values for animals from better environments or an underestimation of genetic values for animals from poorer environments. Additionally, we note a trend in the difference in predicted phenotypic values between models, against the estimated random spatial effect in the test sample. The three methods used for modelling the location effect give comparable results.
The estimates of genetic values and location random effects obtained from the BLUPF90 programme are comparable to those obtained using the INLA method. We suggest to the community of geneticists and breeders to include spatial location of the farm in the modelling process in order to improve the separation of genetic and environmental effects.
|