Project process

Process established for metadata inventory

Date: March 23, 2023    Author: Thaïsa van der Woude (ISRIC), Emily Toner (ISRIC)

LSC Hubs project follows the FAIR principles for scientific data management and stewardship and developed a process to enable project partners to describe their data systematically

Land Soil Crop Hubs (LSC Hubs) project follows the FAIR principles for scientific data management and stewardship. FAIR principles provide guidelines to improve the Findability, Accessibility, Interoperability, and Reuse (FAIR) of digital assets. An important FAIR principle is describing data assets using a standard and sharing these records via a central searchable repository (i.e. a metadata catalogue). Therefore, we developed a process to enable project partners to describe their data systematically.

This process includes:

Step 1: Develop a metadata inventory template;

Step 2: Collect initial metadata with project partners;

Step 3: Obtain feedback from project partners;

Step 4: Implement the feedback in the template;

Step 5: Collect additional metadata from project partners;

Step 6: Create an Open Data Kit Form.

A metadata inventory template was developed in the first step based on the ISO19139:2007 standard. By collecting the metadata in a standardized format, it can be smoothly uploaded to the LSC Hub. The template presents a subset of common ISO19139:2007 metadata properties, extended with project-specific keywords. The template is an Excel file which collects the metadata properties shown in Table 1 below.

Table 1: Metadata properties based on ISO19139:2007

Metadata item Explanation
Identification Unique identification of the dataset (A UUID, URN, or URI, such as DOI) or Dataset location (A specific data storage like a folder on a personal computer, common hard disk, etc.)
Title Short meaningful title
LCS category Land Soil Crop domain
Data category Generic data category according to AGROVOC
Abstract Brief description or abstract that describes the dataset
Keywords Keywords; separated by ‘;’. These are important words for the dataset, e.g., variable names, column names, key concepts, etc.
Authors Person (last name(s) initials first name(s) - e.g., Jones, J.A., or institute
Year Creation year
Contact - Name Name of the contact
Contact - Organization Name of the Organization
Contact - Role Role within the organization
Contact - Email Email
Who fills the form Name, Organization, contact. Fill in case this person is different from the Contact person
Source Source is a reference to another dataset which is used as a source for this dataset. Reference a single dataset per line; Title; Date; or provide a DOI;
Language Language, of the data and metadata, if metadata is multilingual multiple languages can be provided
Reference system Spatial Projection: drop down list of options, including ‘unknown’ (you can also leave out the field if it is unknown)
Citation Citations are references to articles which reference this dataset; one citation on each line; Title; Authors; Date; or provide a DOI
Paper or Report Reference to a scientific paper or report which used this dataset as a source. Needed info: Title; Date; or provide a DOI;
Spatial resolution Resolution (grid) or scale (vector)
Spatial format Vector, Grid
Format File Format in which the data is maintained or published
Extent (geographic) Geographical coverage (e.g., Global, Africa, Rwanda, Ethiopia, …)
Extent (category) national, county/district/province, catchment, village, plot (farm)
Usage constraints Indicates if there are legal usage constraints (license); free text and/or value from list

After developing the template, the second step was for the project partners to fill in initial metadata. The template was made available through a central folder with shared editing capability. Project partners including Kenya Agricultural & Livestock Research Organization (KALRO), Rwanda Agriculture and Animal Resources Development Board (RAB), ICRAF World Agroforestry and the International Union for Conservation of Nature (IUCN), provided feedback on the metadata inventory template in the third step which was processed to improve the template. Our project partners are currently describing more metadata in the newest version of the metadata inventory template.

In addition, our project partners provided the feedback that the Excel file can not be easily distributed to other stakeholders inside or outside their organisation. Therefore, we developed an Open Data Kit (ODK) form. This is an open-source tool that enables users to fill out forms and submit them to a central database. The metadata can then be automatically retrieved from the database and ingested into the catalogue. The initial ODK form is shown below.

ODK form

Once the metadata is collected, these will be transformed to ISO19139 records using Python scripts. Relevant and complete metadata records will be published in the data catalogues of the Hubs once these become operational. We are testing this process with an initial catalogue. The next step is to discuss which kind of catalogue per country is preferred and support the countries with implementing the catalogue.

More about this project:

LSC Hubs is a four-year project (2021-2024) supported through funding from the European Union’s Development of Smart Innovation through Research in Agriculture (DeSIRA) program, the Dutch Ministry of Foreign Affairs, and a contribution from ISRIC - World Soil Information with the aim to develop sustainable land, soil, and crop information hubs. The project’s objective is to develop sustainable land, soil, and crop information hubs in national agricultural research organizations in East Africa. Ethiopia, Kenya and Rwanda will host the information hubs to enhance the effectiveness of national Agricultural Knowledge and Innovation Systems (AKIS) and contribute to rural transformation and climate-smart agriculture.