• A comparison of 4 different machine learning algorithms to predict lactoferrin content in bovine milk from mid-infrared spectra

      Soyeurt, H.; Grelet, C.; McParland, Sinead; Calmels, M.; Coffey, M.; Tedde, A.; Delhez, P.; Dehareng, F.; Gengler, N.; European Union; et al. (American Dairy Science Association, 2020-10-22)
      Lactoferrin (LF) is a glycoprotein naturally present in milk. Its content varies throughout lactation, but also with mastitis; therefore it is a potential additional indicator of udder health beyond somatic cell count. Condequently, there is an interest in quantifying this biomolecule routinely. First prediction equations proposed in the literature to predict the content in milk using milk mid-infrared spectrometry were built using partial least square regression (PLSR) due to the limited size of the data set. Thanks to a large data set, the current study aimed to test 4 different machine learning algorithms using a large data set comprising 6,619 records collected across different herds, breeds, and countries. The first algorithm was a PLSR, as used in past investigations. The second and third algorithms used partial least square (PLS) factors combined with a linear and polynomial support vector regression (PLS + SVR). The fourth algorithm also used PLS factors, but included in an artificial neural network with 1 hidden layer (PLS + ANN). The training and validation sets comprised 5,541 and 836 records, respectively. Even if the calibration prediction performances were the best for PLS + polynomial SVR, their validation prediction performances were the worst. The 3 other algorithms had similar validation performances. Indeed, the validation root mean squared error (RMSE) ranged between 162.17 and 166.75 mg/L of milk. However, the lower standard deviation of cross-validation RMSE and the better normality of the residual distribution observed for PLS + ANN suggest that this modeling was more suitable to predict the LF content in milk from milk mid-infrared spectra (R2v = 0.60 and validation RMSE = 162.17 mg/L of milk). This PLS +ANN model was then applied to almost 6 million spectral records. The predicted LF showed the expected relationships with milk yield, somatic cell score, somatic cell count, and stage of lactation. The model tended to underestimate high LF values (higher than 600 mg/L of milk). However, if the prediction threshold was set to 500 mg/L, 82% of samples from the validation having a content of LF higher than 600 mg/L were detected. Future research should aim to increase the number of those extremely high LF records in the calibration set.
    • Mid-infrared prediction of lactoferrin content in bovine milk: potential indicator of mastitis

      Soyeurt, H.; Bastin, C.; Colinet, F. G.; Arnould, V.M.R; Berry, Donagh; Wall, E.; Dehareng, F.; Nguyen, H. N.; Dardenne, P.; Schefers, J.; et al. (Cambridge University Press, 2012-04-27)
      Lactoferrin (LTF) is a milk glycoprotein favorably associated with the immune system of dairy cows. Somatic cell count is often used as an indicator of mastitis in dairy cows, but knowledge on the milk LTF content could aid in mastitis detection. An inexpensive, rapid and robust method to predict milk LTF is required. The aim of this study was to develop an equation to quantify the LTF content in bovine milk using mid-infrared (MIR) spectrometry. LTF was quantified by enzyme-linked immunosorbent assay (ELISA), and all milk samples were analyzed by MIR. After discarding samples with a coefficient of variation between 2 ELISA measurements of more than 5% and the spectral outliers, the calibration set consisted of 2499 samples from Belgium (n = 110), Ireland (n = 1658) and Scotland (n = 731). Six statistical methods were evaluated to develop the LTF equation. The best method yielded a cross-validation coefficient of determination for LTF of 0.71 and a cross-validation standard error of 50.55 mg/l of milk. An external validation was undertaken using an additional dataset containing 274 Walloon samples. The validation coefficient of determination was 0.60. To assess the usefulness of the MIR predicted LTF, four logistic regressions using somatic cell score (SCS) and MIR LTF were developed to predict the presence of mastitis. The dataset used to build the logistic regressions consisted of 275 mastitis records and 13 507 MIR data collected in 18 Walloon herds. The LTF and the interaction SCS × LTF effects were significant (P < 0.001 and P = 0.02, respectively). When only the predicted LTF was included in the model, the prediction of the presence of mastitis was not accurate despite a moderate correlation between SCS and LTF (r = 0.54). The specificity and the sensitivity of models were assessed using Walloon data (i.e. internal validation) and data collected from a research herd at the University of Wisconsin – Madison (i.e. 5886 Wisconsin MIR records related to 93 mastistis events – external validation). Model specificity was better when LTF was included in the regression along with SCS when compared with SCS alone. Correct classification of non-mastitis records was 95.44% and 92.05% from Wisconsin and Walloon data, respectively. The same conclusion was formulated from the Hosmer and Lemeshow test. In conclusion, this study confirms the possibility to quantify an LTF indicator from milk MIR spectra. It suggests the usefulness of this indicator associated to SCS to detect the presence of mastitis. Moreover, the knowledge of milk LTF could also improve the milk nutritional quality.