category_news

More accurate research through harmonisation of in-situ crop data

Published on
May 20, 2025

Crop growth models calculate crop yields based on climate, soil, crop type and the chosen cultivation strategy. Field data is needed to make these models usable on a local scale. The problem is that this data is not easy to find. And even when it is available, it is often not described in a standardised way.

Hendrik Boogaard, a researcher at Wageningen Environmental Research, has been specialising in applying models for agriculture for 30 years. ‘To simulate crop growth in the field, you need local data,’ he explains. ‘We call this in situ data. This data is required for a model to work properly. You often calibrate a model over one or more years and test it against an independent dataset from another year.’

Not easy to find

Hendrik believes that the problem with this data is that it is not easy to find: ‘It is often data that has been collected for a specific study. When the research data is published, future use is often not taken into account. This requires good documentation: what other researchers are permitted to do with the data, for example. How the research data is described is also very important. To give an example: dry matter can be expressed as fresh weight or dry weight. If this is not specified, it becomes difficult to use the data.’ He cites another example: ‘The units of phenology, how the crop develops from seed to maturity, are expressed in different formulations in studies. These often cannot simply be translated into the standard scale that is usually used for this purpose. So, there are quite a few snags when it comes to reusing research data.’

AGROSTAC

In the In-situ Data project, part of the Wageningen Common Data Solutions programme, Hendrik and his colleagues set to work harmonising relevant in-situ crop data. As a first step, they mapped existing sets of crop data in collaboration with the Plant Sciences Group: ‘To start with, we looked for relevant datasets. That's a lot of work, because they are stored in many different places. We then looked at how the different kinds of data were described. Often, background information was provided in a readme file or a scientific publication. We developed a generic infrastructure, AGROSTAC, to harmonise and manage this crop data and make it available to fellow researchers in a standardised format via an API interface.’ It would be great if researchers took future use by other researchers into account when describing their research data. Hendrik argues that this is not always realistic: ‘In a project with the European Space Agency and the Flemish Institute for Technological Research, we are creating crop maps with a resolution of 10 by 10 metres. In another study, a resolution of 10 by 10 kilometres may suffice. For that researcher, a more accurate spatial resolution makes no sense.’ In addition, data often has to be published at the last minute. ’Usually, no attention is paid to the interoperabibility and reusability of data.’

Global Yield Atlas

One of the practical cases worked on in the project is the Global Yield Atlas, an initiative of Wageningen University & Research, among others. ‘This atlas describes the difference between farmers’ current yields and the potential yield if they did everything perfectly. For maize production in Germany, we recalibrated the crop model used based on a large number of in-situ measurements that we found via AGROSTAC. This has resulted in a more realistic length of the growing season and an adjusted yield ceiling.’

Monitoring Agricultural Resources (MARS)

The EU programme Monitoring Agricultural Resources (MARS) uses meteorological data to monitor crop growth and predict yields. A crop growth model developed in Wageningen in the 1980s serves as the basis for this. The In-situ Data project also contributed to MARS, says Hendrik: ‘There was a strong need for in-situ data to better predict potato growth throughout Europe. We went in search of new data, then standardised and harmonised it and made it available.’

Yoda

The more datasets are added, the more interesting it will be for the plant science community, Hendrik expects: ‘We already have around 25 datasets, some of which are big, from different types of research. In recent years, we have found a number of interesting datasets that we didn't know existed. As our network becomes better known, it will strengthen itself.’ He expects Yoda, the IT solution for smart and efficient data organisation, to contribute to the further enrichment of AGROSTAC: ‘Wageningen scientists are already working with it. I imagine they will be able to tick a box if they want their data to be automatically shared with AGROSTAC.’

Generic design

In his opinion, a major advantage of AGROSTAC is its generic design: 'We can easily add variables, such as data to improve sustainability management or data on the protein content of crops. Essentially, any observation with a time and location can be added. But we want to continue to focus on what is most needed.’