PEMA: a Pipeline for Environmental DNA Metabarcoding Analysis

PEMA is a pipeline for two marker genes, 16S rRNA (microbes) and COI (eukaryotes). As input, PEMA accepts fastq files as returned by Illumina sequencing platforms. PEMA processes the reads from each sample and returns an OTU-table with the taxonomies of the organisms found and their abundances in each sample. It also returns statistics and a FASTQC diagram of the quality of the reads for each sample. Finally, in the case of 16S, PEMA. returns alpha and beta diversities, and make correlations between samples. The last step is facilitated by the phyloseq R package which allows the downstream 16S amplicon analysis of microbial profiles.

For COI, two clustering algorithms can be performed by PEMA: CROP and SWARM. For 16S, two approaches for taxonomy assignment are supported: alignment- and phylogenetic-based. For the latter, a reference tree with 1000 taxa was created using SILVA_132_SSURef, EPA-ng and RaxML-ng.

For more information on using PEMA, see our github repository. PEMA is reposited in Docker Hub and in Singularity Hub.

 

Developed by: 
Haris Zafeiropoulos and Evangelos Pafilis, IMBCC-HCMR
Used data resources: 
Study MGYS00001813: Investigation of the suitability of Remane's species minimum concept in a Mediterranean transitional waters ecosystem
Technology or platform: