The high strategic interest for European agri-food systems of greater legume cultivation is supported by the European Soya Declaration signed by 14 EU member States in June 2017 and by the EU Parliament Resolution 2017/2116 (INI) of April 2018. This interest is acknowledged by the ‘Strategic plan for innovation and research in the agricultural, food and forestry sector (2014-2020)’ approved by the Italian Ministry for Agricultural, Food and Forestry Policies.

Low and unstable yield is the main reason for the insufficient economic convenience of legume crop cultivation in Europe, and genetic improvement is acknowledged as the main avenue for reducing the production gap (De Visser et al., 2014, OCL 21 (4), D407; Magrini et al., 2014, Ecol. Econ. 126, 152-162). In Southern Europe drought stress, which is the main responsible for low and unstable yields of these and other crops, is rising because of the changing climate (Alessandri et al., 2014, Sci. Rep. 4, 7211). Greater drought tolerance is key for legume production, yield stability and adaptation to climate change in this region.

In Italy, alfalfa is the legume with greatest cropping area and highest production of proteins per unit area (Annicchiarico, 2017, It. J. Agron. 12, 880). This crop also has agro-industrial importance (about 90,000 ha used annually to produce dehydrated fodder and protein pellets), and development prospects to produce protein isolates (Pilorgé and Muel, 2016, OCL 23 (4), D402). Soybean is the second most-grown legume in Italy with high predicted expansion in Europe (Pilorgé and Muel, 2016, OCL 23 (4), D402), but the European breeding effort on this crop has historically been very limited (also due to international agreements such as the Blair House one in 1992). Pea has high potential as an autumn-sown crop due to recent breeding progress in tolerance to lodging and grain yield (Annicchiarico, 2017, It. J. Agron. 12, 880).

The use of molecular markers to genetically improve traits of major agronomic interest, such as crop yield and drought tolerance, has been limited at first by the unavailability and then by the high cost of large markers numbers dispersed in the genome. Large marker numbers are needed to exploit marker-trait associations for many genes – each with a limited effect – that are responsible for such complex traits. Genomic selection based on a model that jointly estimates the effects of allelic variants of thousands of molecular markers (Meuwissen et al., 2001, Genetics 157, 1819-1829) has revolutionized animal breeding (Wiggans et al., 2017, Ann. Rev. Anim. Biosci. 5, 309-327), and has finally become applicable to plant breeding after the development of low-cost genotyping methods such as genotyping-by-sequencing (GBS) (Elshire et al ., 2011, PLoS ONE 6, e19379).

Pioneer studies of genomic selection for legume crop yield have been encouraging. The ability to predict the breeding values ​​of pure lines (in autogamous species) or parent plants of synthetic varieties (in allogamous species), estimated as the correlation between genomic predictions and phenotypic values ​​in independent observations, was high (r ≥ 0.40) for soybean (Jarquín et al., 2014, BMC Genomics 15, 740, in the USA; Ma et al., 2016, Mol. Breed. 36, 113, in China; Duhnen et al., 2017, Crop Sci 57, 1325-1337, in France), chickpea (Roorkiwal et al., 2016, Front Plant Sci 7, 1666), pea in conditions of severe water stress (Annicchiarico et al., 2017, Plant Genome 10, 3835), and white lupin (Annicchiarico et al., 2019, Mol. Breed. 39, 142); and moderate (r ≥ 0.30) but still convenient compared to phenotypic selection based on expected genetic gains, for alfalfa (Annicchiarico et al., 2015, BMC Genomics 16, 1020) and pea (Annicchiarico et al., 2019, BMC Genomics 20, 603) in relatively favorable Italian environments. However, ​​r values < 0.30 were reported in another study on soybean in the USA (Stewart-Brown et al., 2019, G3, 2253-2265) and a study on pea in drought-prone locations of North Africa (Annicchiarico et al., 2020, Int. J. Mol. Sci. 21, 2414). Comparisons between genomic and phenotypic selection based on actual gains, which are of particular interest, are substantially lacking for legumes, and are very rare for other crops.

Genomic selection studies for protein content of legumes are very few. Genomic predictions on soybean were very high in Stewart-Brown et al. (2019) and moderately high in Duhnen et al. (2017). Moderate predictions of clear practical interest have been reported for protein content and digestibility of NDF in Mediterranean alfalfa germplasm, in a work that also highlighted the polygenic control of these traits through GWAS (Biazzi et al., 2017, PLoS ONE 12, e0169234).

The predictive ability of genomic selection models can be influenced by various factors, such as the statistical model (Wang et al., 2018, Crop J. 6, 330-340), the missing data threshold and the method to estimate missing genotypic values (Nazzicari et al., 2016, Mol. Breed. 36, 69) and the genetic structure. These factors must be studied to maximize the predictive capacity. An important aspect for autotetraploid species such as alfalfa is the impact of allele dosage estimation.

GENLEG will exploit results, genotypic or phenotypic data, and plant materials generated by other projects such as REFORMA, LEGATO, REMIX, INVITE, CAMA, COBRA, GENALFA, ZOOBIO2SYSTEMS and RGV-FAO Treaty, with respect to genomic selection models to validate and/or to apply to new test material, useful data for construction of new models, and plant material used for genomic or phenotypic selection.