Decision support for the identification of testate amoebae in microscopy images to detect peat presence in horticultural substrates

Serge Zaugg; Camille Vögeli; Lena Märki; Clément Duckert; Edward A.D. Mitchell

doi:10.1017/eds.2025.15

Decision support for the identification of testate amoebae in microscopy images to detect peat presence in horticultural substrates

Published online by Cambridge University Press: 22 April 2025

Clément Duckert and

Serge Zaugg*: Affiliation:
Swiss Federal Institute of Metrology METAS, Bern, Switzerland
Camille Vögeli: Affiliation:
Laboratory of Soil Biodiversity, University of Neuchâtel, Neuchâtel, Switzerland
Lena Märki: Affiliation:
Swiss Federal Institute of Metrology METAS, Bern, Switzerland
Clément Duckert: Affiliation:
Laboratory of Soil Biodiversity, University of Neuchâtel, Neuchâtel, Switzerland
Edward A.D. Mitchell: Affiliation:
Laboratory of Soil Biodiversity, University of Neuchâtel, Neuchâtel, Switzerland
*: Corresponding author: Serge Zaugg; Email: [email protected]

Article contents

Abstract
Impact Statement
Introduction
Methods
Results
Discussion
Conclusion
Open peer review
Author contribution
Competing interests
Data availability statement
Ethical statement
Funding statement
Footnotes
References

Abstract

Peat is formed by the accumulation of organic material in water-saturated soils. Drainage of peatlands and peat extraction contribute to carbon emissions and biodiversity loss. Most peat extracted for commercial purposes is used for energy production or as a growing substrate. Many countries aim to reduce peat usage but this requires tools to detect its presence in substrates. We propose a decision support system based on deep learning to detect peat-specific testate amoeba in microscopy images. We identified six taxa that are peat-specific and frequent in European peatlands. The shells of two taxa (Archerella sp. and Amphitrema sp.) were well preserved in commercial substrate and can serve as indicators of peat presence. Images from surface and commercial samples were combined into a training set. A separate test set exclusively from commercial substrates was also defined. Both datasets were annotated and YOLOv8 models were trained to detect the shells. An ensemble of eight models was included in the decision support system. Test set performance (average precision) reached values above 0.8 for Archerella sp. and above 0.7 for Amphitrema sp. The system processes thousands of images within minutes and returns a concise list of crops of the most relevant shells. This allows a human operator to quickly make a final decision regarding peat presence. Our method enables the monitoring of peat presence in commercial substrates. It could be extended by including more species for applications in restoration ecology and paleoecology.

Keywords

carbon cycle deep learning decision support microscopy peatlands

Type: Application Paper
Information: Environmental Data Science , Volume 4 , 2025 , e25

DOI: https://doi.org/10.1017/eds.2025.15 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices: Open data
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Impact Statement

Peatlands capture atmospheric carbon and are biodiversity hotspots. Their degradation through drainage and peat extraction contributes to carbon emissions and biodiversity loss. Extracted peat is often used as a substrate for gardening and horticulture. Many countries aim to reduce peat usage but this requires tools to detect its presence on commercial substrates. In this context, we propose a decision support system based on deep learning that detects persistent shells of peat-specific testate amoebae in microscopy images. It processes thousands of images in a few minutes and returns a concise list of images centered on the most relevant shells. This allows an operator to efficiently make the final decision regarding peat presence and it provides supporting evidence.

1. Introduction

1.1. Socio-economic background

Peat is formed by the accumulation of more or less decomposed organic matter in mires (peat-rich fens, raised bogs, or tropical swamps such as forested or papyrus swamps) under hydromorphic and anaerobic conditions. Peatlands play a key role in the global carbon cycle as they act as important carbon sinks. As such, peatlands store around 30% of the soil organic carbon worldwide, although they only cover around 3% of the Earth’s surface (Frolking et al., Reference Frolking2011; Xu et al., Reference Xu2018). However, this natural system of carbon storage is endangered through human activities. The degradation of peatlands through drainage and peat extraction leads to aerobic mineralization of the stored organic matter and thus to significant amounts of greenhouse gas emissions. It has been estimated that the destruction of peatlands accounts for about 5% of the global anthropogenic greenhouse gas emissions which corresponds to around 2 Gigatons of CO₂ per year (Leifeld et al., Reference Leifeld2019; Climate Change (IPCC), 2023). The conservation and re-wetting of peatlands (and thus the restoration of their carbon storage capacity) are therefore among the most efficient actions to mitigate climate change (Temmink et al., Reference Temmink2023). Peatlands also form ecosystems of major importance for biodiversity conservation, which support numerous rare species adapted to the specific site conditions. Additionally, peatlands play key roles in hydrological regulation; by storing water during rainy periods, they act as buffers against floods and help to maintain baseline water flow during dry periods. As such, peatlands and other wetlands also contribute to stabilizing regional climate by reducing temperature extremes (Ahmad et al., Reference Ahmad2020). These sensitive ecosystems are destroyed due to drainage and peat extraction.

While most of the peat extracted in Europe is used for energy production (62%), the second most important use of peat (38%) is as horticultural substrates (growing media) (Hirschler et al., Reference Hirschler2022). Several natural characteristics of peat make it suitable for its use as an ingredient of substrate or as a soil enhancer: its high capacity to store water and fertilizers, its structural stability provided by the decay-resistant Sphagnum moss debris, its low pH as well as its low levels of nutrients and pollutants. While global numbers are difficult to obtain, it has been estimated that in Europe around 8 million tons of peat are annually extracted for use in horticulture (Hirschler et al., Reference Hirschler2022). With increasing awareness of the value of intact peatlands, several countries and substrate producers aim to reduce their use of peat. The certification of peat-free substrates is nowadays based solely on the traceability of supply chains, which is not sufficient. Peat might still be added unintentionally due to a lack of specific controls in the supply chain. Establishing these controls incurs costs for producers, who are unlikely to implement them without clear incentives. Additionally, the fact that routine peat detection is currently not feasible facilitates potential fraud. To close this gap, we propose a decision support system based on deep learning to detect peat-specific testate amoebae on microscopy images in order to identify peat in commercial substrates.

It is worth noting that several alternative approaches could be used to detect peat. For instance, environmental ancient DNA (aDNA) is increasingly used in environmental monitoring. This is however unlikely to work based alone on testate amoebae DNA because shells extracted from peat are empty sub-fossils. Living cells are only found in the upper few centimeters and would therefore only be expected in fresh Sphagnum as used for hanging baskets (Wilkinson, Reference Wilkinson2010). The analysis of aDNA may not yield the same results as subfossils preserved in the peat. In a comparative study using both approaches, DNA sequences for given plant taxa were recovered from only part of the samples where the corresponding plant macrofossils were found. As a result, reconstructed assemblages from ancient communities may not correspond to any modern assemblage (Garcés-Pastor et al., Reference Garcés-Pastor2019). Furthermore, it is challenging to disentangle modern from ancient DNA. For this, sequence capture is a possible method, but it is more time-consuming and much more expensive than microscopy examination (Fracasso et al., Reference Fracasso2024). The presence of Sphagnum leaves could be a straightforward indicator of peat. However, in older peat the leaves are fragmented and degraded, possibly impeding their optical recognition based on morphological features. In contrast, the shells of testate amoebae, which are preserved unfragmented for extended periods, offer a reliable target for optical recognition. Testate amoeba species building proteinaceous tests, such as the genera Archerella and Hyalosphenia, are especially well preserved as also found in lake sediments (Ruzicka, Reference Ruzicka1982). Nonetheless, morphological recognition of Sphagnum leaves may be effective for fresher peat samples. Although these two alternative approaches fall outside the scope of our study, it is important to recognize their potential to complement our method. In the future, integrating the outputs of various approaches could lead to more robust predictions and a comparative study of such methods would be useful.

1.2. Biological background

Testate amoebae are a common and diverse group of free-living amoeboid protists. The shell (called test) is either secreted (SiO² calcite, or protein) or built from recycled organic or mineral particles glued together with an organic cement and allows identification to species level (Meisterfeld Reference Meisterfeld, Leedale, Bradbury and Lee2002). The shells remain after the death of the amoeba and under some conditions (anoxia or volcanic deposition) may be preserved for millennia (Harnisch, Reference Harnisch1927) to millions of years (Boeuf et al., Reference Boeuf1997; Barber et al., Reference Barber2013) and even hundreds of millions of years (Porter et al., Reference Porter2000; Schmidt et al., Reference Schmidt2006; Morais et al., Reference Morais2017). Testate amoebae are commonly used as bioindicators of present and past environmental conditions, especially in peatlands where they are mostly used as hydrological indicators (water table depth) but also pH and nutrient status (Mitchell et al., Reference Mitchell2008; Swindles et al., Reference Swindles2019; Qin et al., Reference Qin2021), freshwater habitats (Patterson et al., Reference Patterson2002; Velho et al., Reference Velho2003; Yang et al., Reference Yang2011; Nasser et al., Reference Nasser2020), and estuaries (Gehrels et al., Reference Gehrels2006). Testate amoebae are also used used as bioindicators in lakes where they respond to nutrients, and heavy metal pollution (Mitchell et al., Reference Mitchell2008; Nasser et al., Reference Nasser2020). Testate amoebae are also increasingly used to monitor peatland functioning (Frésard et al., Reference Frésard2023; Jassey et al., Reference Jassey2015) and restoration success (Creevy et al., Reference Creevy2023; Jauhiainen, Reference Jauhiainen2002; Koenig et al., Reference Koenig2017; Laggoun-Défarge et al., Reference Laggoun-Défarge2008; Swindles et al., Reference Swindles2016; Valentine et al., Reference Valentine2013; Vickery et al., Reference Vickery2004) as well as to assess the impact of forest management (Krashevska et al., Reference Krashevska2018). Peatlands are home to a high diversity of testate amoebae (Gilbert et al., Reference Gilbert2006). In a recent monograph, Bankov and Todorov (Todorov et al., Reference Todorov2019) listed 175 testate amoeba species living in Sphagnum in Bulgaria. There are no compilations for testate amoeba diversity across broader regions or globally at high taxonomic resolution, but it is very likely the that total diversity of testate amoebae existing in peatlands worldwide is well over 200 species. However, not all species listed as occurring in Sphagnum or in peatlands are restricted to these habitats. Many species found in peatlands may also be found in acidic forest litter or freshwater habitats. This may in part be due to the existence of several morphologically similar species within a given morphotype. A detailed analysis of such a species complex (Nebela tincta group) in the Jura Mountains revealed that closely related species differed in their ecology, some being specific to forested peatlands while others occurred preferentially in wetter and more nutrient-rich habitats (Singer et al., Reference Singer2018). Still, several taxa are clearly specific to Sphagnum-dominated peatlands, being frequent in Sphagnum and rare or absent from other habitats. This list includes several mixotrophic taxa (i.e. the genera Archerella and Amphitrema, and the species Heleopera sphagni and Placocista spinosa) that harbour endosymbiotic green algae (Chlorella) (Gomaa et al., Reference Gomaa2014). This metabolism allows them the thrive in the nutrient-depleted habitats of peatlands.

1.3. Related deep learning work

Deep learning algorithms for image processing have seen a steep development in the past few years. Many mature algorithms are now available and have proven to achieve low error rates on difficult tasks. Deep learning algorithms have been successfully applied to microscopy images in a variety of health-related domains such as histopathology (Rączkowska et al., Reference Rączkowska2019; Senousy et al., Reference Senousy2021b, Reference Senousy2021a; Syrykh et al., Reference Syrykh2020), bacterial cultures (Ferrari et al., Reference Ferrari2017) and blood parasites (Paul et al., Reference Paul2022; Abdurahman et al., Reference Abdurahman2021; Maturana et al., Reference Maturana2023; Krishnadas et al., Reference Krishnadas2022). Many deep learning methods have been developed for processing microscopy images in general (See Ma et al., Reference Ma2023; Rani et al., Reference Rani2022; Zhang et al., Reference Zhang2022 for reviews). A few studies have applied deep learning specifically to detect environmental microorganisms in microscopy images (Shao et al., Reference Shao2022; Kosov et al., Reference Kosov2018; Zhang et al., Reference Zhang2021; Liang et al., Reference Liang2021). We found only one study that focused on testate amoebae, but this was on activated sludge and not peatlands (Dziadosz et al., Reference Dziadosz2024). In many applied scenarios object detection (OD) models have proven their usefulness. A trained OD model can automatically predict rectangular regions of an image that contain the target together with a confidence value. The YOLO family of models for OD was introduced in 2016 (Redmon et al., Reference Redmon2016) and has undergone a steep evolution since then (see (Jiang et al., Reference Jiang2022) for a review). YOLO models were originally developed to detect usual objects in photos (humans, dogs, cars, apples). They have proven to be very general and have been used in diverse scenarios, including microscopy. A detailed description of the model architecture can be found in (Bochkovskiy et al., Reference Bochkovskiy2020). For this study, we used YOLOv8 (https://pytorch.org/hub/ultralytics), which is well integrated into the Python ecosystem. Models from the YOLO family have been applied to microscopy images in various domains such as microbes in industrial sludge (Dziadosz et al., Reference Dziadosz2024), bacterial solutions in micro-fluidic chips (Sun et al., Reference Sun2022), malaria parasites in blood (Paul et al., Reference Paul2022; Abdurahman et al., Reference Abdurahman2021; Maturana et al., Reference Maturana2023; Krishnadas et al., Reference Krishnadas2022), and small algae and diatoms (Abdullah et al., Reference Abdullah2022; Salido et al., Reference Salido2020).

1.4. Objective

The objective of this study was to develop a method that detects peat-specific testate amoeba in microscopy images from horticultural substrate samples. The primary focus was on commercial substrates containing peat. The automation should allow to batch-process images from multiple samples and extract small images (crops) of candidate testate amoeba morpho-taxa (that is species or groups of species sharing a very similar morphology and thus difficult to tell apart). This will be part of a decision support system where highly digested summaries of the crops will be available to human experts who make final decisions on peat presence with minimal time and effort. The method is expected to enable large-scale monitoring of commercial substrates to monitor the presence of peat and thus, to certify the absence of peat below a given threshold.

1.5. Challenges

1.5.1. Data scarcity

Deep learning algorithms need representative data to learn from. In addition, the data must be carefully annotated to have the ground truth needed for the training and testing of the algorithms. At the project start, image data of surface peat samples were available. However, images from commercial growing substrates containing 100% peat, here called commercial peat, were initially not available at all. Thus, we had to acquire images and manually annotate them. This process was tedious and time consuming and only a modest data size of over 7000 images (Table 2) could be acquired for this pilot study. Considering that data must be split into training and test sets, this is a small size to work with deep learning algorithms. We argue that this problem is widespread for very specialized applications like ours, where application-specific annotated data is generally scarce.

1.5.2. Rarity of target

Commercial substrates are processed and from our images, we noticed that the shells of the testate amoebae were often degraded as compared to surface samples. Optical features like shape and texture are used by humans and deep learning algorithms alike to recognize objects. Degraded shells lose these features and tend to look more like other organic residues. This makes automated recognition by deep learning algorithms more challenging. Moreover, in automatically acquired images, which are by definition un-selected, testate amoebae are rare (i.e. they only cover a very small proportion of the slide). Despite careful preparation, they are typically embedded in other soil particles or covered by other objects (e.g. mineral particles, organic residues). In peat substrate, plant residues with diverse shapes and textures were predominant. The task at hand was to detect rare objects hidden in a matrix of diverse other items of similar shape and color. This is challenging because the predominance and diversity of other items increase the risk of false positives. Given these challenges, this study can be seen as a way to estimate in a very conservative way the potential for automatic identification of testate amoebae as the results will likely be better with less degraded material or if a higher number of images are used to train the model.

2. Methods

2.1. Selection of species

For the detection of peat, we are interested in species that occur exclusively in peatlands. We selected species characteristics for peat based on extensive data sets of testate amoebae community from Holarctic peatlands (Amesbury et al., Reference Amesbury2016; Amesbury et al., Reference Amesbury2018). We identified a set of species that together are found in most samples. Species were combined into classes (species groups) that could be unambiguously identified by several experts. These classes were then used to annotate the data and to train the deep learning models. Taking into consideration the ease of identification, frequency of occurrence, and specificity of habitat, we selected 10 classes (Table 1). We also included species that are commonly found in peatlands but that are not peat indicators for a more general assessment of the method.

Table 1. Taxonomic definition of classes and their status as peat indicators. *1 Nebela tincta, N. pechorensis, Navicula guttata, N. gimlii, N. rotunda, Navicula bohemica, N. collaris, N. minor

Archerella flavum was by far the most frequent taxon found in commercial peat samples. This genus also has a very typical shape, color, and appearance (Figures 1 and 2). As in many instances degraded, folded, or heavily masked Archerella shells were also observed, we defined a dedicated class for these (class Archerella degraded). This allowed us to train and validate models with “clear” examples (class Archerella sp), which makes interpretation of results easier, especially for non-experts. The trained model will predict clear Archerella which will be easier to understand for end users in the proposed decision support process. For the other eight classes, this distinction was not done due to their lower frequency in commercial samples.

Figure 1. One image was obtained with 20-fold magnification from a commercial peat sample. A shell of Archerella sp. is shown with a red arrow. Many unidentified plant residues are present all over the image.

Figure 2. Individual from the 10 morpho-species groups used in this study. (A) Archerella sp (B) Archerella degraded (C) Assulina sp (D) Amphitrema sp (E) Hyalosphenia elegans aggr (F) Hyalosphenia papilio (G) Heleopera sphagni (H) Planocarina carinata (I) Euglypha sp (J) Nebela combined.

2.2. Image acquisition

For sample preparation, a small volume (ca. 5–10 cm³) of the substrate was mixed with water, shaken for 1 minute in a wide screw-capped jar, and filtered through a tea strainer. The material was then passed through an 80 μm mesh, which removes coarse particles with only marginal loss of testate amoebae. The filtrate was left to settle overnight after which the clear supernatant was carefully poured off. The concentrate was then transferred to a tube. One drop of this concentrate was placed on a slide with a pipette and mixed with one drop of glycerol. Images were acquired under bright field microscopy at 20-fold magnification with a camera mounted on the microscope and stored in TIFF format. For this project, as an automated microscope was not available images were manually acquired. An early exploration showed that commercial peat samples contained a low density of testate amoeba shells. Thus, we defined two complementary image acquisition procedures: grid search and active search.

In the grid search, a 5 by 10 grid of adjacent images is manually taken (Figure 3A). For each sample, 2 or 3 slides were imaged resulting in 100 or 150 images per sample. Most images typically do not contain any testate amoeba (only plant remains) and, when present, testate amoebae are generally not centered. The grid search mimics a realistic application scenario where data is acquired by an automatic scanning microscope.

Figure 3. Schematics of how images were obtained via manual grid and active search (A, B) and how they could be acquired via automated scanning in the future (C). Each red box schematically represents a single image.

In the active search, the whole slide is visually explored, but pictures are only taken when target amoeba species are found and the amoebae are centered in the picture (Figure 3B). Therefore, all images contain at least one individual. This procedure was designed to capture all target amoebae present under a slide. In addition, additional images were taken with each observed amoeba placed either close to the bottom-left or top-right corner of the image.

2.3. Image preparations

A standard image width of 1728 pixels was defined. The target magnification was 20-fold, which corresponds to approximately 0.32 μm/pixel. All images from Datasets 2 and 3 were acquired with a 20-fold magnification by design but some training images that had been acquired prior to project start had 40-fold magnification. The latter images were downsized to obtain a resolution of 0.32 μm/pixel and they were then padded to reach the standard width of 1728 pixels. For padding, we used images from the EMDS7 dataset (Yang et al., Reference Yang2023). The original EMDS7 images were greenish, so we converted them to black and white and then to a variety of soft colors to make them more diverse. All images were scrutinized by microbiology experts with a dedicated tool (Roboflow) and the position of peat-specific testate amoebae in the images was annotated in the form of bounding boxes. No distinction was made between dead or alive individuals because our focus was on detecting the shells. All amoeba that could be recognized were annotated even if they were degraded or masked. The annotated images were exported as JPG files in the YOLO format. A summary of classes is shown in Table 1.

2.4. Data sets

Dataset 1 was derived from 16 Sphagnum moss samples collected in peatlands (that is not commercially processed). This data was available prior to the study as part of the image repository of the Laboratory of Soil Biodiversity. These images were rich as they contained testate amoeba of many species (Table 3) often the amoebae were alive and presented in diverse natural colors. Note that these images are not representative of the commercially processed substrate and were therefore used only for training. Datasets 2 and 3 were taken exclusively from commercial substrates. In these images, the shells were typically empty, often more or less degraded and the coloration less vivid than for living specimens. Dataset 2 was derived from 17 commercial samples (11 peat, 6 non-peat, Table 2). From each sample, we prepared one suspension that was used to make 5 slides (2 for active search, and 3 for grid search). Dataset 3 was derived from 16 commercial samples (12 peat, 4 non-peat, Table 2) From each sample, we prepared one suspension that was used to make 2 slides that were imaged with active and grid search. Datasets 1 and 2 were combined to create the training set. In this way, the rich data from Dataset 1 together with the more representative data of Dataset 2 are used for learning by the neural networks. The training set consists of 5121 images from 33 independent samples. Dataset 3 was left out as a test set and was used to compare models and assess the final performance. The test set consists of 2415 images from 16 independent samples.

Table 2. Count overview of all samples and images. Comm: Commercial; The letters L, C, and R (Left, Center, Right) refer to the actively chosen position of the shell in the images obtained with active search

2.5. Preparation for training sessions

In the training set (Datasets 1 and 2), the classes were unbalanced (Tables 3, 4, and 5). The class balance was improved by making multiple copies of images from rare classes. This was a static transform performed prior to training and it was applied to the training set only. This improves learning by the network because all classes are seen approximately the same number of times during training and this reduces the risk that the network focuses only on one class. Additionally, static data augmentation was applied to images of the training set, namely: random crop from 0 to 20 pixels, random rotation −3 to 3 degrees, random mixing with EMDS7 images weighted from 0.0 to 0.5, application of a small amount of elastic transform (local distortion). This adds some diversity to multiple copies of the same image. Note that training-time data augmentation was also used (see below). The test set (Dataset 3) was not modified.

Table 3. Overview of number of manually annotated individuals per class in Dataset 1

Table 4. Active search shell counts from all commercial peat samples were used to estimate the number of shells per slide for each class. Left and right images were discarded and only the center image was used to avoid counting the same specimen 3 times

Table 5. Grid search shell counts from all commercial peat samples were used to estimate the number of shells per image for each class

2.6. Training

As an object detection algorithm, we chose YOLOv8 models. In one training session, the training set was used to train one model. We performed several sessions with different random initializations. This allowed us to assess the between-session variability and serves as a basis for ensemble prediction. During training, images were resized to a width of 512 pixels. Training-time data augmentation was applied. The defaults of YOLOv8 are for photo images and performed poorly with our images (comparison in Supplement). We modified the defaults as follows: (a) increase random rotation to a range between −180 and + 180 degrees, (b) reduce random re-scaling to 0.2, (c) add a small amount of shear of up to 10 degrees, (d) activate up-down flip in addition to left–right flip that is active by default, (e) reduce the mosaic transform probability to 0.2, (f) increase mix-up probability to 0.5. (Full details in supplement) We trained for 500 epochs and used the weights of the last epoch for prediction (no early stopping). Label smoothing was applied. All 10 classes (Table 1) were used for training. A well-established principle of ensembles is that the base predictors should be as diverse as possible. For a recent review see (Ganaie et al., Reference Ganaie2022). Therefore, the random seed was different in each session: this leads to different values of the initial weights (except for pretrained models) and different random realization of the training-time data augmentation. This is expected to give trained models that behave differently on individual items but similarly on average. We assessed 4 model sizes of YOLOv8 (NANO, SMALL, MEDIUM, LARGE) crossed with two weight initialization procedures: Random initialization versus weights pretrained with the COCO dataset. Short names are RANDINIT and COCOINIT.

2.7. Performance evaluation

Performance was assessed by comparing the true bounding boxes with the predicted boxes. An Intersection over Union (IoU) value >0.5 was used to declare a positive detection. Precision-recall curves (PR-curves) were constructed by incorporating the confidence value. The Average Precision (AP) was then obtained from these curves with the trapezoidal rule. PR-curves and AP were computed for each class separately, but only for 3 classes with sufficient instances in the test data. AP integrates model performance across all possible levels of the predicted confidence value and is useful for comparing models or training strategies. For the assessment of real-world usefulness, we computed the precision-confidence and recall-confidence curves separately. These two metrics give an interpretation to the confidence value in terms of false positives (precision) and missed targets (recall).

2.8. Final prediction module via ensemble

Based on the model comparison (Details in section 3.3.), the final models were from YOLOv8 MEDIUM trained with random initialization of the weights (RANDINIT). In the final prediction module, we used 10 individual models trained in independent sessions. Each model predicts boxes and confidence values for each class. When the confidence from individual models was <0.01 the prediction was suppressed to avoid cluttering the process with many irrelevant boxes. Note that predicted boxes from different models that detected the same object are never perfectly aligned and there is no in-built way to identify them as belonging to the same object. Therefore, we developed a post-processing method to associate several predictions that detected the same object. First, the Intersection over Union (IoU) of all predicted boxes of the same class was computed for all pairs of boxes within an image. A between-box distance metric was defined as D = 1-IoU such that perfectly overlapped boxes have D = 0 while non-overlapped boxes have D = 1. Second, predicted boxes that were sufficiently close to the image were aggregated via agglomerative hierarchical clustering (AHC) using D as the distance metric. This was done separately for each class and image. Therefore, only a small number of boxes at a time were processed with AHC. Confidence values of 0.0 were imputed for missing predictions (i.eless than 10 predictions associated with an object). The value of 0.0 is the lowest possible and a natural choice when an object was not detected at all by an individual model. Finally, the mean of the 10 predicted confidence values (incl. Imputed zeros if present) was used as an ensemble score and the corresponding crop was extracted from the prediction with max confidence value. To allow efficient decisions by a human expert, the crops were plotted on a single summary image ranked by their ensemble score (highest value first, left to right, and then top to bottom). Additionally, the ensemble score was color-coded from blue (high score) to red (low score). In this way, samples that produced only low-score crops are easily spotted. Examples of summary images are shown in Section 3.4.

3. Results

3.1. Testate amoeba in commercial non-peat samples

All 10 samples (1301 images) of peat-free substrate were searched for testate amoebae. Only a small number of testate amoeba shells were found and they were generally degraded or masked and thus more difficult to identify as compared to most shells found in surface mosses. The following potential testate amoeba morpho-taxa were found: Centropyxis aerophila-type (N = 10), Euglypha sp. (N = 1), Assulina sp. (N = 1), and Difflugia lucida-type (N = 1). However, these identifications are partly tentative as the shells were either partly degraded or not clearly visible in the images (e.g. due to t of their position or masked by other particles). This result illustrates how using an automated microscope to increase the number of images will improve the assessment of the testate amoeba diversity and abundance in samples.

3.2. Testate amoeba in commercial peat samples

For the descriptive analysis of commercial peat samples (this subsection), Datasets 2 and 3 were pooled. Dataset 1 was not included because it was obtained from unprocessed peat samples.

3.2.1. Archerellae found with active search

With active search, where the full slides were scrutinized, well-preserved Archerella sp shells were found in 22 out of 23 commercial peat samples from Datasets 2 and 3 (Figure 4). The between-sample variability was considerable, with Archerella sp counts ranging from 0 to 23. Overall, 258 shells (Dataset 2: 109, Dataset 3: 149) in good state of Archerella sp were found in 46 slides with active search (Dataset 2: 22, Dataset 3: 24). From this we can estimate a frequency of 5.61 shells per slide in commercial samples (Table 4).

Figure 4. Matrix view of testate amoeba distribution of all commercial peat samples (one sample per row, Dataset 2: 11 samples, Dataset 3: 12 samples). Values are the count of individual shells found with active and grid searches. Dataset 2: Grid and active search images from different slides. Dataset 3: Grid and active search images from the same slides. Count values are color-coded with shades of green for easier reading.

3.2.2. Archerellae found with grid search

With grid search, well-preserved Archerella sp shells were found in 17 out of 23 samples from the pooled Datasets 2 and 3 (Figure 4). Note that with the manual grid search only a small part of the slides was scrutinized (Figure 3A) which explains the smaller counts. Overall, 62 well-preserved Archerella sp shells (Dataset 2: 45, Dataset 3: 17) were found in 2902 grid search images (Dataset 2: 1705, Dataset 3: 1200 images). From this, we can estimate a frequency of 0.021 shells per image (Table 5) when automated scanning will be applied to commercial samples. Archerella was by far the most frequent testate amoeba found in commercial samples. Masked or degraded shells (Archerella degraded) were less frequently found than the well-preserved Archerella sp.

3.2.3. Other taxa found with active search

Two other genera, Amphitrema sp and Assulina sp, were consistently found in many commercial samples. But both were much less frequent than Archerella. With active search, shells from Amphitrema were found in 13 out of 23 and shells from Assulina in 19 out of 23 commercial samples (Figure 4). For both species, the counts were low with 0 to 5 individuals per sample. For Amphitrema and Assulina we estimate a frequency in commercial samples of 0.74 and 0.89 shells per slide respectively (Table 4).

3.2.4. Other taxa found with grid search

With grid search, shells from Amphitrema or Assulina were rarely found (Figure 4). The grid size of 100 or 150 images was clearly insufficient to obtain enough pictures of these two species. However, with the higher acquisition rate of automated microscopes, higher counts can be expected.

3.2.5. Rarer taxa

A total of 13 shells of Hyalosphenia papillo, a strict peat indicator, were found overall with active search. Again, with the higher acquisition rate of automated microscopes, higher counts can be expected. Finally, shells of the five other species were rarely found in commercial samples (Table 4, bottom rows).

3.3. Model performance

Models were trained with the training data (5121 images). Then, performance estimates were obtained by predicting the complete test data (2415 images: grid search and active search C, L, R). See Table 2 for details. Only three classes had counts large enough in the test set to estimate the average precision (AP) and were reported. Please consult Figure 5 for a graphical overview of the results that are reported hereafter. The number of test instances was moderate for Archerella (N = 460) and small for Amphitrema (N = 62) and Assulina (N = 60). This is reflected in the variance between individual models (blue dots) which is smaller in Archerella compared to the two other classes. First, we noticed that in all model sizes and training procedures the ensemble prediction (red dots) clearly and consistently outperformed individual models. Weight initialization (RANDINIT vs COCOINIT) had no detectable impact on performance. The smallest model NANO performed worse than the three larger models. Among the models SMALL, MEDIUM, and LARGE, performance was comparable. The AP for Archerella was clearly above 0.8 for the ensemble models. For Amphitrema, AP was around 0.7 for the three larger ensemble models. For Assulina, AP was around 0.8 for the three larger ensemble models. The MEDIUM-RANDINIT model was chosen as the final model and used in the final prediction module to process the test data. The performance of individual models was stable over the sessions. Noticeably, the behavior of the score, represented by the shape of the PR curves, was also stable across individual models (Figure 6). The AP metric is reported by convention and is useful to compare individual models to the ensemble but not very telling in terms of practical usefulness. Therefore, the separate precision and recall curves are provided in the next section.

Figure 5. Average Precision (AP) of the models individually (blue dots) and the ensemble (red dots) for several model sizes and training procedures.

Figure 6. Precision-recall curves were obtained by predicting all test images with the 8 models individually (blue lines) and the ensemble (red line). Test data from grid and active search pooled (2415 images). Nbox (GT): number of annotated ground truth boxes.

3.4. Illustration of module with test data

This section shows the summaries that were automatically generated by the prediction module together with estimates of the detection performance given as precision and recall (Figures 7 and 8). It is meant to illustrate a possible practical application of the module with data of the test set (Figures 9 to 15). More conventional performance metrics are reported in the previous section. The whole process, including prediction by an ensemble of eight models and post-processing, took 581 seconds for 1872 images on a GPU-accelerated desktop computer (HP Z4 workstation with NVIDIA A6000 GPU). This corresponds to 0.31 seconds per image; thus for 1000 images, it would take approximately 5 minutes.

Figure 7. Precision and recall vs ensemble score obtained with the grid search test data (1600 images). Only available for Archerella because Amphitrema and Assulina had too few items in the test set. These curves allow us to estimate the errors if a particular threshold is applied to the ensemble score. Corresponding crops with ensemble scores can be seen in Figure 10.

Figure 8. Same as Figure 7 but obtained with 1872 images of test data from grid and active search. Actual crops with ensemble scores can be seen in Figures 11, 13, and 15.

Figure 9. Ground truth in the test set for N = 17 Archerella sp from grid search. Order is arbitrary. The 15 individuals that were detected are framed in green.

Archerella grid only: In the first evaluation, 1600 test images, only from grid search, were passed through the prediction module. In these images, the frequency of occurrence of Archerella shells was low at 1% (17 individuals in 1600 images). The 17 true Archerella shells are shown in Figure 9. We consider this to be a realistic scenario because the images result from a grid search. It also represents a difficult challenge with only 17 targets hidden in 1600 images containing many items of different shapes and sizes (mostly more or less degraded plant remains). The prediction modules detected a total of 208 crops (Archerella candidates). Using a detection threshold of 0.5 on the ensemble score, we obtained a recall of 0.71 and a precision of 1.00 (Figure 7). In other words, 71% of all true Archerella specimens were in the detected set and 29% were missed while the automatically detected set contained 100% true Archerella and 0% false positives. Using the lowest detection threshold of 0.0, 88% (15 out of 17) individuals were detected (Recall: 15/17 = 0.88) at the cost of a lower precision of 0.07. The first row of the summary (12 crops) contains exclusively true Archerella (Figure 10). Looking at the second to fourth rows, we found that three more crops were true Archerella. All other crops from the fifth line to the bottom were not Archerella and most had an ensemble score close to 0 (see supplement).

Figure 10. Automatically detected candidates of Archerella sp ranked by ensemble score (only top rows shown). Obtained from 1600 grid search images of the test set. The correctly detected shells are framed in green. Uncertain but quite likely are framed in yellow.

Archerella grid and active: In the second evaluation, the grid and active search (center only) test data, were passed through the prediction module. The frequency of occurrence of individuals per image of Archerella was higher with 9% (166 individuals in 1872 images). The prediction modules detected a total of 450 crops (candidates), each with a value of the ensemble score on which a decision threshold can be applied. Using a detection threshold of 0.5 on the ensemble score, we get a recall of 0.76 and a precision of 0.91 (Figure 8). Using the lowest detection threshold of 0.0 we get a recall of 0.96 at the cost of a lower precision of 0.35. Looking at the 6 first rows of the summary (Figure 11, full image in supplement), we see that all top-ranked crops clearly belong to Archerella. Note that two items marked with a yellow frame were labeled as Archarella degraded because they were partially masked. For all practical purposes these are not false positives. This is also what causes the bump in the precision curve on Figure 8. This illustrates the capacity of the system to extract and rank the most relevant items.

Figure 11. Automatically detected candidates of Archerella sp ranked by ensemble score (only top rows shown). Obtained from the 1872 grid and 272 active search images of the test set. Two items that were labeled as Archerella degraded due to masking are framed in yellow; the remaining 73 crops are correctly detected shells.

Amphitrema grid and active: The frequency of occurrence of Amphitrema shells per image was low with only 1.3% of images containing an Amphitrema (24 individuals in 1872 images) which represents a difficult detection challenge. The 24 true Amphitrema shells are shown in Figure 12. The prediction modules detected a total of 135 crops (candidates). Using a detection threshold of 0.5, we obtained a recall of 0.33 and a precision of 0.88 (Figure 8). Using the lowest detection threshold of 0.0 we obtained a recall of 0.79 at the cost of a lower precision of 0.14. In the first row of the summary, 8 out of 9 crops were true Amphitrema and looking at the three top rows, we found that 16 out of 28 crops were true Amphitrema (Figure 13).

Figure 12. Ground truth in the test set for N = 24 Amphitrema sp from grid search (top) and active search (bottom). Order is arbitrary. The 19 individuals that were detected are framed in green.

Figure 13. Automatically detected candidates of Amphitrema sp ranked by ensemble score (only top rows shown). Obtained from 1600 grids and 272 active search images of the test set. The correctly detected crops are framed in green.

Assulina grid and active: The frequency of occurrence of Assulina per image was low with only 1.0% of images containing a shell (19 individuals in 1872 images), which represents a difficult detection challenge. The 24 true Assulina shells are shown in Figure 14. The prediction modules detected a total of 92 crops (Assulina candidates). Using a detection threshold of 0.5, we obtained a recall of 0.79 and a precision of 0.94 (Figure 8). Using the lowest detection threshold of 0.0 we obtained a recall of 1.00 at the cost of a lower precision of 0.21. In the first row of the summary, 8 out of 11 crops were true Assulina and looking at the first three rows, we found that 17 out of 34 crops were true Assulina (Figure 15).

Figure 14. Ground truth in test set for N = 19 Assulina sp active search (zero individuals were found with grid search). Order is arbitrary. All 19 individuals were detected and are framed in green.

Figure 15. Automatically detected candidates of Assulina sp ranked by ensemble score (only top 5 rows shown). Obtained from 1600 grids and 272 active search images of the test set. The correct detections are framed in green.

4. Discussion

4.1. Shells preserved in commercial peat products

The shells of several peat-specific testate amoebae taxa were present in commercially processed substrates labeled as peat. For two of the smaller peat-specific taxa (Archerella and Amphitrema) well-preserved specimens were found in many samples. These two species can therefore be used as indicators of peat presence in commercially processed products. Another taxon (Assulina) which is common in peatlands but also frequent in acidic soils outside of peatlands (e.g. coniferous forest) was also frequently found in commercial substrates. This further confirms the resistance of small testate amoeba shells to industrial processes. This observation is not surprising as Assulina is among the most resistant testate amoeba taxon to degradation; it is not even destroyed by HF treatment used for pollen preparation (Payne et al., Reference Payne2012). While a few shells of larger species were found (Hyaloshpenia sp), they were infrequent and often degraded. For this reason, they seem less optimal as indicators of peat in commercial products. However, if a larger number of images can be automatically acquired, we can expect to find enough shells in a good state from Hyaloshpenia sp to be used as indicators. Based on the above findings, we are confident that the automated detection of peat-specific testate amoebae is a valid method to detect peat in the substrate.

4.2. Commercial non-peat products

A total of 400 images from commercial media labeled as peat-free were carefully checked. Several specimens of Centropyxis aerophila-type and one Difflulgia lucida-type were found. Although these species can be found in peatlands, they are also frequent in other habitats including caves (Yuri et al., Reference Yuri2012). The samples also included many fungal spores and some conifer pollen. The fact that several specimens of testate amoeba species were found is positive as it shows that they can be found in such samples. The absence of the typical peat indicators can be interpreted as proof that the samples were indeed peat-free. In addition, the absence of Assulina further demonstrates that the samples were not even from acidic litter collected e.g. in coniferous forests. If the substrate is not acidic this taxon would indeed not be expected to be found as it is clearly an indicator of low pH (Bonnet, Reference Bonnet1991).

4.3. DNN models trained with small data

An ensemble of deep neural networks for object detection was successfully trained with a small dataset dedicated to a very specific microbiological application. The detection performance was high for 3 classes. The final prediction module extracts small images centered around the candidate amoeba (crops) together with ensemble scores which are used to rank the most relevant candidates first. This shows that mature models for object detection in images are nowadays available and can be successfully adapted for highly specialized domains. A large part of the development work was to fine-tune the data augmentation procedure for the specific data. The most time consuming and cumbersome part was the creation and curation of the datasets. It shows that nowadays the bottleneck for developing useful DNN applications is the availability of curated and annotated data, not the algorithms.

4.4. Feasibility of large-scale analysis

In commercial samples, only 2% of images contained an Archerella shell (Table 5). When present, the shell covers much less than 1% of the image and is embedded in residues (example in Figure 2). To obtain a clear statement regarding peat presence in each sample, we propose that at least 1000 images per sample should be searched (thus we expect to catch approximately 20 Archerella shells). Performing this manually is very tedious and time consuming and simply not realistic for a large-scale analysis with hundreds of samples. Our proposed decision support process automates the tedious detection and extraction process and produces a concise summary as a ranked list of Archerella candidates where the most promising items are shown first. The final step, manual confirmation and counting by a human operator is expected to take at most a few minutes per sample. The proposed decision support process significantly reduces the amount of human effort required.

Our decision support system is an important step to enable the large-scale monitoring of commercial substrates from multiple sources. Let us assume for instance a scenario with 100 samples, each with 1000 images. We would need around 8.3 hours to process all 100′000 images on a single GPU-accelerated desktop computer. In addition, the process is reproducible and reliable because the final prediction module incorporates the expertise of very specialized domain experts (CV, EM, CD) via highly curated datasets. This would allow the decision support process to be safely performed by less experienced staff who underwent short training. The process is also transparent because the generated crops can be stored for each sample as evidence and allow independent review of the decision by third parties if needed.

The decision support system requires many images per sample that must be acquired by systematically scanning all the material under the slide. In this study, we did this manually, but for a large-scale analysis as proposed above this is simply not feasible. Thus, an automated scanning microscope will be needed. However, note that this would only accelerate the image acquisition per-se but not the preliminary lab work to get sample material and prepare the slides.

4.5. Strengths and limitations

Our method is strong enough to demonstrate (rule-in) the presence of peat. That is, if peat-specific shells of testate amoeba were found, this represents strong evidence that peat is in the substrate. In addition, the method automatically provides supporting evidence in the form of crops, which can be interpreted by human experts. There is a between-sample variability in the frequency of peat-specific amoeba. We must assume that the species could be absent in peat from certain provenances. In this case, our methods will naturally not work. The method is thus weaker to rule out the presence of peat. That is, if peat-specific shells were not found, the substrate could still contain peat. Detection of testate amoebae is likely to work best with raised bog peat, as preservation of testate amoeba and other remains is better than in fen or blanket bog peat. Most commercial peat for horticulture is extracted from raised bogs due to its favorable properties (lower degradation, lightweight, higher water holding capacity). Therefore, our method is likely to be effective for many, but not all, commercial substrates.

Some commercial growing media contain recycled substrate with peat. Such substrates are less problematic from a sustainability point of view. We noted that two indicator amoeba species are robust to commercial processing. These shells will probably also be conserved through the recycling process and thus our method might detect them. Fundamentally, this would not be a limitation of the methods per-se but rather a consequence of its good detection performance. Note also that any detection method for peat will probably suffer the same limitation.

4.6. Extension to other microorganisms

The present results show that deep neural networks for object detection could be adapted to effectively detect soil microorganisms in a very specific setting. The task was challenging due to the degraded state of the amoeba shells in commercial products and to their low frequency of occurrence. Adding to the difficulty, the shells were embedded in plant residues and particles with diverse shapes and textures. This is likely to be the case in similar applications involving images from natural soil samples or products derived thereof. Despite these complications, well-performing models could be trained without dependence on pre-trained models and with a rather small development dataset of a few thousand images. A big workload was the manual acquisition of images. Fortunately, in future applications, this can be achieved with automated microscopy devices, which are expected to considerably reduce the human workload. The data annotation (bounding boxes) could be done efficiently thanks to the availability of user-friendly tools that can be used collaboratively via the web. We have also shown that a few thousand images together with a carefully tuned data augmentation strategy are sufficient to obtain actionable and useful models. We believe that this will translate to future projects and thus it will keep the annotation burden within reasonable boundaries. Finally, model selection and fine-tuning of the training procedure was time consuming. Fortunately, the developed procedure can be used for new classes, thus this work must not be repeated. Based on the above insights, the extension of the model to more microorganism classes seems a feasible endeavor. For instance, we could make models that detect many taxa from direct soil probes. This would open new possibilities for large-scale assessments of soil biodiversity or the continuous monitoring of soil health at restoration sites.

4.7. Quantification of peat proportion

Horticultural growing substrate often contains a mix of peat and other substrates (e.g. compost). It would be useful to estimate the proportion of peat in such mixtures. The present results (Table 4 and Figure 4) show that Archerella sp was consistently present in natural peat from Europe. Another strict peat indicator, Amphitrema sp, was found in half of the peat samples despite the small number of images available per sample. If a larger number of images per sample (>1000) can be gathered via automated microscopy techniques, we can expect to find sufficient shells of peat indicator taxa to get an accurate estimation of the shell density in each sample: e.g. number of shells per mg substrate. Provided that the shell density in pure peat and its natural variation is known, this could be used to estimate the proportion of peat in the mixture. The natural variation will obviously add uncertainty to this estimate. The results obtained with a rather small number of images are quite promising. Thus, as a next step, it would be interesting to assess the natural frequency, especially of Archerella sp and Amphitrema sp, in peat extracted from many European provenances.

5. Conclusion

• Peat-specific testate amoebae were recovered from a commercial substrate containing peat. Two taxa, Archerella and Amphitrema, are robust indicators of peat presence in such products.
• Deep neural networks were successfully trained and tested with a small application-specific data set. This illustrates the maturity of these algorithms for real-world applications.
• We propose a decision process where large image collections are automatically batch processed and human experts can quickly review a ranked list of small crops to make the final decision.
• Our method enables effective large-scale monitoring of peat presence in commercial substrates from multiple sources.
• To make the method practical, an automated image acquisition procedure will be necessary.

Open peer review

To view the open peer review materials for this article, please visit http://doi.org/10.1017/eds.2025.15.

Acknowledgements

We are very grateful to Gisela Umbricht for precious support with project administration and to Laura Tschümperlin for many fruitful discussions.

Author contribution

Conceptualization: L.M.; E.M.; S.Z. Methodology: S.Z.; E.M.; L.M. Formal analysis: S.Z. Funding acquisition: L.M. Project administration: L.M. Investigation: C.V.; C.D.; E.M. Data curation: C.V.; S.Z.; C.D. Writing - original draft: S.Z. Writing - review and editing: S.Z.; C.V.; L.M.; E.M. All authors approved the final submitted draft.

Competing interests

The authors declare none.

Data availability statement

The data (images and annotations) can be downloaded from https://zenodo.org/records/14609759

Ethical statement

The research meets all ethical guidelines, including adherence to the legal requirements of the study country.

Funding statement

Commissioned by Federal Office for the Environment (FOEN), CH-3003 Bern. The FOEN is an agency of the Federal Department of the Environment, Transport, Energy and Communications (DETEC). Disclaimer: This study was prepared under contract to the Federal Office for the Environment (FOEN). The contractor bears sole responsibility for the content.

Footnotes

This research article was awarded Open Data badge for transparent practices. See the Data Availability Statement for details.

References

Abdullah, et al. (2022) Computer vision based deep learning approach for the detection and classification of algae species using microscopic images. Water 14(14), 2219.CrossRef Google Scholar

Abdurahman, F, et al. (2021) Malaria parasite detection in thick blood smear microscopic images using modified YOLOV3 and YOLOV4 models. BMC Bioinformatics 22, 1–17.CrossRef Google Scholar PubMed

Ahmad, S, et al. (2020) Long-term rewetting of degraded peatlands restores hydrological buffer function. Science of the Total Environment 749, 141571.CrossRef Google Scholar PubMed

Amesbury, MJ, et al. (2016) Development of a new pan-European testate amoeba transfer function for reconstructing peatland palaeohydrology. Quaternary Science Reviews 152, 132–151.CrossRef Google Scholar

Amesbury, MJ, et al. (2018) Towards a Holarctic synthesis of peatland testate amoeba ecology: Development of a new continental-scale palaeohydrological transfer function for North America and comparison to European data. Quaternary Science Reviews 201, 483–500.CrossRef Google Scholar

Barber, A, et al. (2013) Euglyphid testate amoebae (Rhizaria: Euglyphida) from an Arctic Eocene waterbody: Evidence of evolutionary stasis in plate morphology for over 40 million years. Protist 164(4), 541–555.CrossRef Google Scholar PubMed

Bochkovskiy, A, et al. (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.Google Scholar

Boeuf, O et al. (1997) Presence of testate amoebae (genus: Trinema), in the Upper Pliocene, discovered in the locality of Chilhac (Haute-Loire, France). In: Comptes Rendus de l’Academie des Sciences Series IIA Earth and Planetary Science 8.325, pp. 623–627.Google Scholar

Bonnet, L (1991) Ecologie de quelques Euglyphidae (Thécamoebiens, Filosea) des milieux édaphiques et paraédaphiques. I: Genres Corythion et Trinema. Bulletin de la Société d’histoire naturelle de Toulouse 127, 7–13.Google Scholar

Climate Change (IPCC), Intergovernmental Panel on (2023) Climate Change 2022 – Impacts, Adaptation and Vulnerability: Working Group II Contribution to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press.Google Scholar

Creevy, AL, et al. (2023) Testate amoebae response and vegetation composition after plantation removal on a former raised bog. European Journal of Protistology 89, 125977.CrossRef Google Scholar PubMed

Dziadosz, M, et al. (2024) Microscopic studies of activated sludge supported by automatic image analysis based on deep learning neural networks. Journal of Ecological Engineering 25(4), 360–369.CrossRef Google Scholar

Ferrari, A, et al. (2017) Bacterial colony counting with convolutional neural networks in digital microbiology imaging. Pattern Recognition 61, 629–640.CrossRef Google Scholar

Fracasso, I, et al. (2024) Exploring different methodological approaches to unlock paleobiodiversity in peat profiles using ancient DNA. Science of the Total Environment 908, 168159.CrossRef Google Scholar PubMed

Frésard, A et al. (2023) Inferring northern peatland methane emissions from testate amoebae: A proof of concept study. Mires & Peat.Google Scholar

Frolking, S, et al. (2011) Peatlands in the Earth’s 21st century climate system. Environmental Reviews 19, 371–396.CrossRef Google Scholar

Ganaie, MA, et al. (2022) Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence 115, 105151.CrossRef Google Scholar

Garcés-Pastor, S, et al. (2019) DNA metabarcoding reveals modern and past eukaryotic communities in a high-mountain peat bog system. Journal of Paleolimnology 62, 425–441.CrossRef Google Scholar

Gehrels, WR, et al. (2006) Distribution of testate amoebae in salt marshes along the North American East Coast. Journal of Foraminiferal Research 36(3), 201–214.CrossRef Google Scholar

Gilbert, D, et al. (2006) Microbial diversity in Sphagnum peatlands. Peatlands: Basin evolution and depository of records on global environmental and climatic changes.Google Scholar

Gomaa, F, et al. (2014) One alga to rule them all: Unrelated mixotrophic testate amoebae (Amoebozoa, Rhizaria and Stramenopiles) share the same symbiont (Trebouxiophyceae). Protist 165(2), 161–176.CrossRef Google Scholar PubMed

Harnisch, O (1927) Einige Daten zur rezenten und fossilen testaceen Rhizopodenfauna der Sphagnen. Archiv für Hydrobiologie 18(3), 245.Google Scholar

Hirschler, O, et al. (2022) Peat extraction, trade and use in Europe: A material flow analysis. Mires & Peat 28.Google Scholar

Jassey, VEJ, et al. (2015) An unexpected role for mixotrophs in the response of peatland carbon cycling to climate warming. Scientific Reports 5(1), 16931.CrossRef Google Scholar PubMed

Jauhiainen, S (2002) Testacean amoebae in different types of mire following drainage and subsequent restoration. European Journal of Protistology 38(1), 59–72.CrossRef Google Scholar

Jiang, P, et al. (2022) A review of Yolo algorithm developments. Procedia Computer Science 199, 1066–1073.CrossRef Google Scholar

Koenig, I, et al. (2017) Response of Sphagnum testate amoebae to drainage, subsequent re-wetting and associated changes in the moss carpet-results from a three year mesocosm experiment. Acta Protozoologica 56(3), 191–210.Google Scholar

Kosov, S, et al. (2018) Environmental microorganism classification using conditional random fields and deep convolutional neural networks. Pattern Recognition 77, 248–261.CrossRef Google Scholar

Krashevska, V, et al. (2018) Micro-decomposer communities and decomposition processes in tropical lowlands as affected by land use and litter type. Oecologia 187, 255–266.Google Scholar PubMed

Krishnadas, P et al. (2022) Classification of malaria using object detection models. Informatics 9(4), 76. MDPICrossRef Google Scholar

Laggoun-Défarge, F, et al. (2008) Cut-over peatland regeneration assessment using organic matter and microbial indicators (bacteria and testate amoebae). Journal of Applied Ecology 45(2), 716–727.CrossRef Google Scholar

Leifeld, J, et al. (2019) Intact and managed peatland soils as a source and sink of GHGs from 1850 to 2100. Nature Climate Change 9(12), 945–947.CrossRef Google Scholar

Liang, C-M, et al. (2021) Environmental microorganism classification using optimized deep learning model. Environmental Science and Pollution Research 28, 31920–31932.CrossRef Google Scholar PubMed

Ma, P, et al. (2023) A state-of-the-art survey of object detection techniques in microorganism image analysis: From classical methods to deep learning approaches. Artificial Intelligence Review 56(2), 1627–1698.CrossRef Google Scholar PubMed

Maturana, CR, et al. (2023) iMAGING: A novel automated system for malaria diagnosis by using artificial intelligence tools and a universal low-cost robotized microscope. Frontiers in Microbiology 14, 1240936.CrossRef Google Scholar

Meisterfeld, R (2002) The Illustrated Guide to the Protozoa: Order Arcellinida Kent, 1880 (Ed. by Leedale, GF, Bradbury, P and Lee, JJ.Google Scholar

Mitchell, EAD, et al. (2008) Testate amoebae analysis in ecological and paleoecological studies of wetlands: Past, present and future. Biodiversity and Conservation 17, 2115–2137.CrossRef Google Scholar

Morais, L, et al. (2017) Carbonaceous and siliceous Neoproterozoic vase-shaped microfossils (Urucum Formation, Brazil) and the question of early protistan biomineralization. Journal of Paleontology 91(3), 393–406.CrossRef Google Scholar

Nasser, NA, et al. (2020) Use of Arcellinida (testate lobose amoebae) arsenic tolerance limits as a novel tool for biomonitoring arsenic contamination in lakes. Ecological Indicators 113, 106177.CrossRef Google Scholar

Patterson, RT, et al. (2002) Arcellaceans (thecamoebians) as indicators of land-use change: Settlement history of the Swan Lake area, Ontario as a case study. Journal of Paleolimnology 28, 297–316.CrossRef Google Scholar

Paul, S, et al. (2022) A novel ensemble weight-assisted Yolov5-based deep learning technique for the localization and detection of malaria parasites. Electronics 11(23), 3999.CrossRef Google Scholar

Payne, RJ, et al. (2012) Testate amoebae in pollen slides. Review of Palaeobotany and Palynology 173, 68–79.CrossRef Google Scholar

Porter, SM, et al. (2000) Testate amoebae in the Neoproterozoic era: Evidence from vase-shaped microfossils in the Chuar Group, Grand Canyon. Paleobiology 26(3), 360–385.2.0.CO;2>CrossRef Google Scholar

Qin, Y, et al. (2021) Developing a continental-scale testate amoeba hydrological transfer function for Asian peatlands. Quaternary Science Reviews 258, 106868.CrossRef Google Scholar

Rączkowska, A, et al. (2019) ARA: Accurate, reliable and active histopathological image classification framework with Bayesian deep learning. Scientific Reports 9(1), 14347.CrossRef Google Scholar PubMed

Rani, P, et al. (2022) Machine learning and deep learning based computational approaches in automatic microorganisms image recognition: Methodologies, challenges, and developments. Archives of Computational Methods in Engineering 29(3), 1801–1837.CrossRef Google Scholar PubMed

Redmon, J, et al. (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788.CrossRef Google Scholar

Ruzicka, E (1982) Die subfossile Testaceen des Krottensees (Salzburg, Oesterreich). Limnologica 1, 231–254.Google Scholar

Salido, J, et al. (2020) A low-cost automated digital microscopy platform for automatic identification of diatoms. Applied Sciences 10(17), 6033.CrossRef Google Scholar

Schmidt, AR, et al. (2006) A microworld in Triassic amber. Nature 444(7121), 835–835.CrossRef Google Scholar PubMed

Senousy, Z, et al. (2021a) 3E-Net: Entropy-based elastic ensemble of deep convolutional neural networks for grading of invasive breast carcinoma histopathological microscopic images. Entropy 23(5), 620.CrossRef Google Scholar PubMed

Senousy, Z, et al. (2021b) MCUa: Multi-level context and uncertainty aware dynamic deep ensemble for breast cancer histology image classification. IEEE Transactions on Biomedical Engineering 69(2), 818–829.Google Scholar

Shao, R, et al. (2022) A novel hybrid transformer-CNN architecture for environmental microorganism classification. PLoS One 17(11), e0277557.CrossRef Google Scholar PubMed

Singer, D, et al. (2018) Environmental filtering and phylogenetic clustering correlate with the distribution patterns of cryptic protist species. Ecology 99(4), 904–914.Google Scholar PubMed

Sun, L, et al. (2022) YOLO algorithm for long-term tracking and detection of escherichia coli at different depths of microchannels based on microsphere positioning assistance. Sensors 22(19), 7454.CrossRef Google Scholar PubMed

Swindles, GT, et al. (2016) Evaluating the use of dominant microbial consumers (testate amoebae) as indicators of blanket peatland restoration. Ecological Indicators 69, 318–330.CrossRef Google Scholar

Swindles, GT, et al. (2019) Widespread drying of European peatlands in recent centuries. Nature Geoscience 12(11), 922–928.CrossRef Google Scholar

Syrykh, C, et al. (2020) Accurate diagnosis of lymphoma on whole-slide histopathology images using deep learning. npj Digital Medicine 3(1), 63.CrossRef Google Scholar PubMed

Temmink, RJM, et al. (2023) Wetscapes: Restoring and maintaining peatland landscapes for sustainable futures. Ambio 52(9), 1519–1528.CrossRef Google Scholar PubMed

Todorov, M, et al. (2019) An atlas of Sphagnum-dwelling testate amoebae in Bulgaria. Advanced Books 1, e38685.Google Scholar

Valentine, J, et al. (2013) The use of testate amoebae in monitoring peatland restoration management: Case studies from North West England and Ireland. Acta Protozoologica 52(3).Google Scholar

Velho, LFM, et al. (2003) Influence of environmental heterogeneity on the structure of testate amoebae (Protozoa, Rhizopoda) assemblages in the plankton of the upper Paraná river floodplain, Brazil. International Review of Hydrobiology: A Journal Covering all Aspects of Limnology and Marine Biology 88(2), 154–166.CrossRef Google Scholar

Vickery, E et al. (2004) Biomonitoring of peatland restoration using testate amoebae. In 7th INTECOL International Wetlands Conference, Vol. Book of Abstracts. Utrecht, pp. 25–30.Google Scholar

Wilkinson, DM (2010) Have We Underestimated the Importance of Humans in the Biogeography of Free-Living Terrestrial Microorganisms?CrossRef Google Scholar

Xu, J, et al. (2018) PEATMAP: Refining estimates of global peatland distribution based on a meta-analysis. Catena 160, 134–140.CrossRef Google Scholar

Yang, H, et al. (2023) EMDS-7: Environmental microorganism image dataset seventh version for multiple object detection evaluation. Frontiers in Microbiology 14, 1084312.CrossRef Google Scholar PubMed

Yang, Z-C, et al. (2011) Biomonitoring of testate amoebae (protozoa) as toxic metals absorbed in aquatic bryophytes from the Hg-Tl mineralized area (China). Environmental Monitoring and Assessment 176, 321–329.CrossRef Google Scholar PubMed

Yuri, M, et al. (2012) Testate amoebae communities from caves of some territories in European Russia and North-Eastern Italy. Protistology 7(1), 42–50.Google Scholar

Zhang, J, et al. (2021) LCU-Net: A novel low-cost U-Net for environmental microorganism image segmentation. Pattern Recognition 115, 107885.CrossRef Google Scholar

Zhang, J, et al. (2022) A comprehensive review of image analysis methods for microorganism counting: From classical image processing to deep learning approaches. Artificial Intelligence Review, pp. 1–70.Google Scholar PubMed

Table 1. Taxonomic definition of classes and their status as peat indicators. *1 Nebela tincta, N. pechorensis, Navicula guttata, N. gimlii, N. rotunda, Navicula bohemica, N. collaris, N. minor

Table 3. Overview of number of manually annotated individuals per class in Dataset 1

Table 5. Grid search shell counts from all commercial peat samples were used to estimate the number of shells per image for each class

Figure 5. Average Precision (AP) of the models individually (blue dots) and the ensemble (red dots) for several model sizes and training procedures.

Figure 8. Same as Figure 7 but obtained with 1872 images of test data from grid and active search. Actual crops with ensemble scores can be seen in Figures 11,13, and 15.

Figure 9. Ground truth in the test set for N = 17 Archerella sp from grid search. Order is arbitrary. The 15 individuals that were detected are framed in green.

Figure 12. Ground truth in the test set for N = 24 Amphitrema sp from grid search (top) and active search (bottom). Order is arbitrary. The 19 individuals that were detected are framed in green.

Figure 14. Ground truth in test set for N = 19 Assulina sp active search (zero individuals were found with grid search). Order is arbitrary. All 19 individuals were detected and are framed in green.

Article contents

Decision support for the identification of testate amoebae in microscopy images to detect peat presence in horticultural substrates

Abstract

Keywords

Impact Statement

1. Introduction

1.1. Socio-economic background

1.2. Biological background

1.3. Related deep learning work

1.4. Objective

1.5. Challenges

1.5.1. Data scarcity

1.5.2. Rarity of target

2. Methods

2.1. Selection of species

2.2. Image acquisition

2.3. Image preparations

2.4. Data sets

2.5. Preparation for training sessions

2.6. Training

2.7. Performance evaluation

2.8. Final prediction module via ensemble

3. Results

3.1. Testate amoeba in commercial non-peat samples

3.2. Testate amoeba in commercial peat samples

3.2.1. Archerellae found with active search

3.2.2. Archerellae found with grid search

3.2.3. Other taxa found with active search

3.2.4. Other taxa found with grid search

3.2.5. Rarer taxa

3.3. Model performance

3.4. Illustration of module with test data

4. Discussion

4.1. Shells preserved in commercial peat products

4.2. Commercial non-peat products

4.3. DNN models trained with small data

4.4. Feasibility of large-scale analysis

4.5. Strengths and limitations

4.6. Extension to other microorganisms

4.7. Quantification of peat proportion

5. Conclusion

Open peer review

Acknowledgements

Author contribution

Competing interests

Data availability statement

Ethical statement

Funding statement

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests