BioVeL Data Refinement Workflow (DRW)

The (Taxonomic) Data Refinement Workflow (DRW) helps you to efficiently aggregate, integrate, and clean observational and specimen data sets from many different sources. The tool works across large geo-temporal, taxonomic and environmental scales and prepares your data for use in scientific analyses such as: species distribution analysis, species richness and diversity studies, species occurrence studies, species distribution modeling, historical analysis, taxonomic revisions, and conservation assessments, and other spatio-temporal analyses.

The workflow includes a number of graphical user interfaces to view and interact with the data, while the output of each part of the workflow is compatible with the input of each part. This implies that the user is free to choose any specific sequence of actions and repetition of steps. The construction of the workflow also allows for custom-built as well as third-party tools applications to easily be integrated into the workflow.

This workflow can be accessed through the BioVeL Portal here.


This workflow can be combined with the Ecological Niche Modelling Workflows.
Developed by: 

Biodiversity Virtual e-Laboratory (BioVeL) (EU FP7 project)

Used data resources: 

The workflow accepts input data in a recognized format, and these data can be combined from various sources (e.g. occurrence retrieval services, local user data sets). Within the Taxonomic Name Resolution sub-workflow (see below), users can make use of a number of Standard checklists such as the Catalogue of Life (CoL), Pan-European Species directories Infrastructure (PESI) and the World Register of Marine Species (WoRMS) services, and a number of aggregated checklists like the European Distributed Institute of Taxonomy (EDIT) and the GBIF checklist bank.

Web services: 

Currently, the data refinement workflow is composed of three distinct parts:

  1. Taxonomic Name Resolution/Occurrence retrievalTaxonomic checklists web services for standardizing species lists and resolving synonyms: CoL, PESI, WoRMS, EDIT, and GBIF. Occurrence data is retrieved through GBIF.
  2. Geo-temporal data selection: Spatial and temporal selection services are provided by the web-based BioSTIF client.
  3. Data quality checks/Filtering: Open Refine is used for accessing local an external filtering and cleaning functionalities.
Technology or platform: 

The workflow has been developed to be run in the Taverna automated workflow environment. In its current form (version 14), the workflow file (with the .t2flow extension) can be loaded and executed in the Taverna Workbench. In the case of running it in the Taverna Workbench, because the workflow is dependent on external libraries (written in JAVA) as well on as an instance of the Google Refine, it is necessary to follow the instructions described in the page How to install and run DR workflows on Taverna Workbench.