category_news

Bike fixes urgent data storage problem

Published on
October 30, 2024

What do you do when you need to transfer research data quickly, but the network does not cooperate? Data experts at WUR came up with a typically Dutch solution that is as simple as it is effective. What you need: a pile of hard drives, a bicycle and good panniers.

Three years ago, the Netherlands Plant Eco-Phenotyping Centre (NPEC) was opened, a joint venture between WUR and Utrecht University, both of which have branches at the centre. At the facility on Wageningen Campus, the appearance - the phenotype - of plants can be mapped down to the nanometre.

The data collected in NPEC every day must, of course, be stored securely. This was compromised by TraitSeeker, a robot that uses state-of-the-art cameras and sensors to acquire data in the field. According to Tim van Daalen, data & sensor quality lead at NPEC, after a day of driving around a field, there is soon a terabyte of data on the robot's hard drive. ‘The next day it is taken to another field. So that hard drive has to be emptied as quickly as possible,’ Tim says.

Image 1, TraitSeeker scanning a pixel farming experiment
Image 1, TraitSeeker scanning a pixel farming experiment

Scality

Standard solutions to copy the data from the hard drive to the network proved impossible to do overnight. As a result, the field robot could only be deployed hours later the next day. WUR Private cloud S3 object storage, a new storage service used by WUR, offered a solution. The advantage of this service is that the connection bandwidth is used very efficiently, allowing data to be transferred at maximum speed. So when the servers in the NPEC facility also started to fill up, Tim wanted to move this data to WUR's data centre via fibre-optic cable using the storage service.

Now, Tim effortlessly moves large amounts of data across the network every day. Yet he could not get this to work for moving data from NPEC as well. Long story short: due to certain network settings of the storage service provider, the connection to the network is lost during the transfer. Tim: ‘Those settings cannot be changed quickly. As a result, we couldn't get the data copied. And that was serious, because the hard drives were now full.’

Rethinking

He contacted FB-IT's storage team. ‘We have a problem,’ I said. ’It won’t work via fibre-optic cable, so what can we do?’ A serious bit of reverse thinking brought the solution: ‘Someone brings a stack of hard drives to NPEC on their bicycle, I put the research data from the facility's computers on it and then the hard drives go back in the panniers to the data centre where they are transferred.’

Behind the scenes, people are working hard to find a final solution. Meanwhile, the ‘data bike’ has been riding back and forth for three months. The trick works so well that Tim is keen to bring it to the attention of research groups also struggling with data storage: ‘I recently heard from a group who wanted to transfer a sizeable set of data over the line to the data centre. That would take three months, they said. If you run the bike, it's done in a few days.’

Preventing research from grinding to a halt

According to Tim, WUR private cloud S3 object storage is still relatively unknown to researchers. But that will certainly change, he expects: ‘I see many opportunities, especially in terms of AI applications and analysing large data collections. But now many research groups still face problems in transferring data. Simply laying a fibre-optic cable is not the solution; all sorts of things also need to be set up and tested. Until these problems are solved structurally, we offer a solution with the data bike. With it, we can prevent research from stalling because the data analysis is delayed.’

With the private cloud solution, WUR has a new storage platform, alongside other platforms such as iSilon. Having data stored in different places can be confusing. iRODS, a new ‘layer’ in the data infrastructure, changes that. Tim: ‘We can link all storage systems to iRODS. There you will find your entire file list, no matter where those files are. Through iRODS you can retrieve all those files and vice versa you can store files with it. You can specify write permissions per person per folder and also set, for example, that others cannot delete certain files. This also means that we no longer need to store a backup somewhere.’

Image 2, data flow from the facilities to the researcher
Image 2, data flow from the facilities to the researcher

Linking metadata

According to Tim, what makes iRODS even more interesting is that it can link metadata to data files: ‘For example, I can add that the researcher did a trial on photosynthesis efficiency of a particular tomato variety. That that trial lasted two months. And that the results are publicly available. Suppose another researcher wants to do the same trial, we can indicate that the trial has been done before. That way, we can hopefully avoid unnecessary, expensive trials.’

After two years of trials, the application is ready for large-scale use. Like the data bike, iRODS can also make researchers' work a lot easier, Tim expects. His advice? ‘If you have problems with data storage, don't keep struggling with it. There are always solutions possible.’