Impact Statement
Peatlands capture atmospheric carbon and are biodiversity hotspots. Their degradation through drainage and peat extraction contributes to carbon emissions and biodiversity loss. Extracted peat is often used as a substrate for gardening and horticulture. Many countries aim to reduce peat usage but this requires tools to detect its presence on commercial substrates. In this context, we propose a decision support system based on deep learning that detects persistent shells of peat-specific testate amoebae in microscopy images. It processes thousands of images in a few minutes and returns a concise list of images centered on the most relevant shells. This allows an operator to efficiently make the final decision regarding peat presence and it provides supporting evidence.
1. Introduction
1.1. Socio-economic background
Peat is formed by the accumulation of more or less decomposed organic matter in mires (peat-rich fens, raised bogs, or tropical swamps such as forested or papyrus swamps) under hydromorphic and anaerobic conditions. Peatlands play a key role in the global carbon cycle as they act as important carbon sinks. As such, peatlands store around 30% of the soil organic carbon worldwide, although they only cover around 3% of the Earth’s surface (Frolking et al., Reference Frolking2011; Xu et al., Reference Xu2018). However, this natural system of carbon storage is endangered through human activities. The degradation of peatlands through drainage and peat extraction leads to aerobic mineralization of the stored organic matter and thus to significant amounts of greenhouse gas emissions. It has been estimated that the destruction of peatlands accounts for about 5% of the global anthropogenic greenhouse gas emissions which corresponds to around 2 Gigatons of CO2 per year (Leifeld et al., Reference Leifeld2019; Climate Change (IPCC), 2023). The conservation and re-wetting of peatlands (and thus the restoration of their carbon storage capacity) are therefore among the most efficient actions to mitigate climate change (Temmink et al., Reference Temmink2023). Peatlands also form ecosystems of major importance for biodiversity conservation, which support numerous rare species adapted to the specific site conditions. Additionally, peatlands play key roles in hydrological regulation; by storing water during rainy periods, they act as buffers against floods and help to maintain baseline water flow during dry periods. As such, peatlands and other wetlands also contribute to stabilizing regional climate by reducing temperature extremes (Ahmad et al., Reference Ahmad2020). These sensitive ecosystems are destroyed due to drainage and peat extraction.
While most of the peat extracted in Europe is used for energy production (62%), the second most important use of peat (38%) is as horticultural substrates (growing media) (Hirschler et al., Reference Hirschler2022). Several natural characteristics of peat make it suitable for its use as an ingredient of substrate or as a soil enhancer: its high capacity to store water and fertilizers, its structural stability provided by the decay-resistant Sphagnum moss debris, its low pH as well as its low levels of nutrients and pollutants. While global numbers are difficult to obtain, it has been estimated that in Europe around 8 million tons of peat are annually extracted for use in horticulture (Hirschler et al., Reference Hirschler2022). With increasing awareness of the value of intact peatlands, several countries and substrate producers aim to reduce their use of peat. The certification of peat-free substrates is nowadays based solely on the traceability of supply chains, which is not sufficient. Peat might still be added unintentionally due to a lack of specific controls in the supply chain. Establishing these controls incurs costs for producers, who are unlikely to implement them without clear incentives. Additionally, the fact that routine peat detection is currently not feasible facilitates potential fraud. To close this gap, we propose a decision support system based on deep learning to detect peat-specific testate amoebae on microscopy images in order to identify peat in commercial substrates.
It is worth noting that several alternative approaches could be used to detect peat. For instance, environmental ancient DNA (aDNA) is increasingly used in environmental monitoring. This is however unlikely to work based alone on testate amoebae DNA because shells extracted from peat are empty sub-fossils. Living cells are only found in the upper few centimeters and would therefore only be expected in fresh Sphagnum as used for hanging baskets (Wilkinson, Reference Wilkinson2010). The analysis of aDNA may not yield the same results as subfossils preserved in the peat. In a comparative study using both approaches, DNA sequences for given plant taxa were recovered from only part of the samples where the corresponding plant macrofossils were found. As a result, reconstructed assemblages from ancient communities may not correspond to any modern assemblage (Garcés-Pastor et al., Reference Garcés-Pastor2019). Furthermore, it is challenging to disentangle modern from ancient DNA. For this, sequence capture is a possible method, but it is more time-consuming and much more expensive than microscopy examination (Fracasso et al., Reference Fracasso2024). The presence of Sphagnum leaves could be a straightforward indicator of peat. However, in older peat the leaves are fragmented and degraded, possibly impeding their optical recognition based on morphological features. In contrast, the shells of testate amoebae, which are preserved unfragmented for extended periods, offer a reliable target for optical recognition. Testate amoeba species building proteinaceous tests, such as the genera Archerella and Hyalosphenia, are especially well preserved as also found in lake sediments (Ruzicka, Reference Ruzicka1982). Nonetheless, morphological recognition of Sphagnum leaves may be effective for fresher peat samples. Although these two alternative approaches fall outside the scope of our study, it is important to recognize their potential to complement our method. In the future, integrating the outputs of various approaches could lead to more robust predictions and a comparative study of such methods would be useful.
1.2. Biological background
Testate amoebae are a common and diverse group of free-living amoeboid protists. The shell (called test) is either secreted (SiO2 calcite, or protein) or built from recycled organic or mineral particles glued together with an organic cement and allows identification to species level (Meisterfeld Reference Meisterfeld, Leedale, Bradbury and Lee2002). The shells remain after the death of the amoeba and under some conditions (anoxia or volcanic deposition) may be preserved for millennia (Harnisch, Reference Harnisch1927) to millions of years (Boeuf et al., Reference Boeuf1997; Barber et al., Reference Barber2013) and even hundreds of millions of years (Porter et al., Reference Porter2000; Schmidt et al., Reference Schmidt2006; Morais et al., Reference Morais2017). Testate amoebae are commonly used as bioindicators of present and past environmental conditions, especially in peatlands where they are mostly used as hydrological indicators (water table depth) but also pH and nutrient status (Mitchell et al., Reference Mitchell2008; Swindles et al., Reference Swindles2019; Qin et al., Reference Qin2021), freshwater habitats (Patterson et al., Reference Patterson2002; Velho et al., Reference Velho2003; Yang et al., Reference Yang2011; Nasser et al., Reference Nasser2020), and estuaries (Gehrels et al., Reference Gehrels2006). Testate amoebae are also used used as bioindicators in lakes where they respond to nutrients, and heavy metal pollution (Mitchell et al., Reference Mitchell2008; Nasser et al., Reference Nasser2020). Testate amoebae are also increasingly used to monitor peatland functioning (Frésard et al., Reference Frésard2023; Jassey et al., Reference Jassey2015) and restoration success (Creevy et al., Reference Creevy2023; Jauhiainen, Reference Jauhiainen2002; Koenig et al., Reference Koenig2017; Laggoun-Défarge et al., Reference Laggoun-Défarge2008; Swindles et al., Reference Swindles2016; Valentine et al., Reference Valentine2013; Vickery et al., Reference Vickery2004) as well as to assess the impact of forest management (Krashevska et al., Reference Krashevska2018). Peatlands are home to a high diversity of testate amoebae (Gilbert et al., Reference Gilbert2006). In a recent monograph, Bankov and Todorov (Todorov et al., Reference Todorov2019) listed 175 testate amoeba species living in Sphagnum in Bulgaria. There are no compilations for testate amoeba diversity across broader regions or globally at high taxonomic resolution, but it is very likely the that total diversity of testate amoebae existing in peatlands worldwide is well over 200 species. However, not all species listed as occurring in Sphagnum or in peatlands are restricted to these habitats. Many species found in peatlands may also be found in acidic forest litter or freshwater habitats. This may in part be due to the existence of several morphologically similar species within a given morphotype. A detailed analysis of such a species complex (Nebela tincta group) in the Jura Mountains revealed that closely related species differed in their ecology, some being specific to forested peatlands while others occurred preferentially in wetter and more nutrient-rich habitats (Singer et al., Reference Singer2018). Still, several taxa are clearly specific to Sphagnum-dominated peatlands, being frequent in Sphagnum and rare or absent from other habitats. This list includes several mixotrophic taxa (i.e. the genera Archerella and Amphitrema, and the species Heleopera sphagni and Placocista spinosa) that harbour endosymbiotic green algae (Chlorella) (Gomaa et al., Reference Gomaa2014). This metabolism allows them the thrive in the nutrient-depleted habitats of peatlands.
1.3. Related deep learning work
Deep learning algorithms for image processing have seen a steep development in the past few years. Many mature algorithms are now available and have proven to achieve low error rates on difficult tasks. Deep learning algorithms have been successfully applied to microscopy images in a variety of health-related domains such as histopathology (Rączkowska et al., Reference Rączkowska2019; Senousy et al., Reference Senousy2021b, Reference Senousy2021a; Syrykh et al., Reference Syrykh2020), bacterial cultures (Ferrari et al., Reference Ferrari2017) and blood parasites (Paul et al., Reference Paul2022; Abdurahman et al., Reference Abdurahman2021; Maturana et al., Reference Maturana2023; Krishnadas et al., Reference Krishnadas2022). Many deep learning methods have been developed for processing microscopy images in general (See Ma et al., Reference Ma2023; Rani et al., Reference Rani2022; Zhang et al., Reference Zhang2022 for reviews). A few studies have applied deep learning specifically to detect environmental microorganisms in microscopy images (Shao et al., Reference Shao2022; Kosov et al., Reference Kosov2018; Zhang et al., Reference Zhang2021; Liang et al., Reference Liang2021). We found only one study that focused on testate amoebae, but this was on activated sludge and not peatlands (Dziadosz et al., Reference Dziadosz2024). In many applied scenarios object detection (OD) models have proven their usefulness. A trained OD model can automatically predict rectangular regions of an image that contain the target together with a confidence value. The YOLO family of models for OD was introduced in 2016 (Redmon et al., Reference Redmon2016) and has undergone a steep evolution since then (see (Jiang et al., Reference Jiang2022) for a review). YOLO models were originally developed to detect usual objects in photos (humans, dogs, cars, apples). They have proven to be very general and have been used in diverse scenarios, including microscopy. A detailed description of the model architecture can be found in (Bochkovskiy et al., Reference Bochkovskiy2020). For this study, we used YOLOv8 (https://pytorch.org/hub/ultralytics), which is well integrated into the Python ecosystem. Models from the YOLO family have been applied to microscopy images in various domains such as microbes in industrial sludge (Dziadosz et al., Reference Dziadosz2024), bacterial solutions in micro-fluidic chips (Sun et al., Reference Sun2022), malaria parasites in blood (Paul et al., Reference Paul2022; Abdurahman et al., Reference Abdurahman2021; Maturana et al., Reference Maturana2023; Krishnadas et al., Reference Krishnadas2022), and small algae and diatoms (Abdullah et al., Reference Abdullah2022; Salido et al., Reference Salido2020).
1.4. Objective
The objective of this study was to develop a method that detects peat-specific testate amoeba in microscopy images from horticultural substrate samples. The primary focus was on commercial substrates containing peat. The automation should allow to batch-process images from multiple samples and extract small images (crops) of candidate testate amoeba morpho-taxa (that is species or groups of species sharing a very similar morphology and thus difficult to tell apart). This will be part of a decision support system where highly digested summaries of the crops will be available to human experts who make final decisions on peat presence with minimal time and effort. The method is expected to enable large-scale monitoring of commercial substrates to monitor the presence of peat and thus, to certify the absence of peat below a given threshold.
1.5. Challenges
1.5.1. Data scarcity
Deep learning algorithms need representative data to learn from. In addition, the data must be carefully annotated to have the ground truth needed for the training and testing of the algorithms. At the project start, image data of surface peat samples were available. However, images from commercial growing substrates containing 100% peat, here called commercial peat, were initially not available at all. Thus, we had to acquire images and manually annotate them. This process was tedious and time consuming and only a modest data size of over 7000 images (Table 2) could be acquired for this pilot study. Considering that data must be split into training and test sets, this is a small size to work with deep learning algorithms. We argue that this problem is widespread for very specialized applications like ours, where application-specific annotated data is generally scarce.
1.5.2. Rarity of target
Commercial substrates are processed and from our images, we noticed that the shells of the testate amoebae were often degraded as compared to surface samples. Optical features like shape and texture are used by humans and deep learning algorithms alike to recognize objects. Degraded shells lose these features and tend to look more like other organic residues. This makes automated recognition by deep learning algorithms more challenging. Moreover, in automatically acquired images, which are by definition un-selected, testate amoebae are rare (i.e. they only cover a very small proportion of the slide). Despite careful preparation, they are typically embedded in other soil particles or covered by other objects (e.g. mineral particles, organic residues). In peat substrate, plant residues with diverse shapes and textures were predominant. The task at hand was to detect rare objects hidden in a matrix of diverse other items of similar shape and color. This is challenging because the predominance and diversity of other items increase the risk of false positives. Given these challenges, this study can be seen as a way to estimate in a very conservative way the potential for automatic identification of testate amoebae as the results will likely be better with less degraded material or if a higher number of images are used to train the model.
2. Methods
2.1. Selection of species
For the detection of peat, we are interested in species that occur exclusively in peatlands. We selected species characteristics for peat based on extensive data sets of testate amoebae community from Holarctic peatlands (Amesbury et al., Reference Amesbury2016; Amesbury et al., Reference Amesbury2018). We identified a set of species that together are found in most samples. Species were combined into classes (species groups) that could be unambiguously identified by several experts. These classes were then used to annotate the data and to train the deep learning models. Taking into consideration the ease of identification, frequency of occurrence, and specificity of habitat, we selected 10 classes (Table 1). We also included species that are commonly found in peatlands but that are not peat indicators for a more general assessment of the method.
Table 1. Taxonomic definition of classes and their status as peat indicators. *1 Nebela tincta, N. pechorensis, Navicula guttata, N. gimlii, N. rotunda, Navicula bohemica, N. collaris, N. minor

Archerella flavum was by far the most frequent taxon found in commercial peat samples. This genus also has a very typical shape, color, and appearance (Figures 1 and 2). As in many instances degraded, folded, or heavily masked Archerella shells were also observed, we defined a dedicated class for these (class Archerella degraded). This allowed us to train and validate models with “clear” examples (class Archerella sp), which makes interpretation of results easier, especially for non-experts. The trained model will predict clear Archerella which will be easier to understand for end users in the proposed decision support process. For the other eight classes, this distinction was not done due to their lower frequency in commercial samples.

Figure 1. One image was obtained with 20-fold magnification from a commercial peat sample. A shell of Archerella sp. is shown with a red arrow. Many unidentified plant residues are present all over the image.

Figure 2. Individual from the 10 morpho-species groups used in this study. (A) Archerella sp (B) Archerella degraded (C) Assulina sp (D) Amphitrema sp (E) Hyalosphenia elegans aggr (F) Hyalosphenia papilio (G) Heleopera sphagni (H) Planocarina carinata (I) Euglypha sp (J) Nebela combined.
2.2. Image acquisition
For sample preparation, a small volume (ca. 5–10 cm3) of the substrate was mixed with water, shaken for 1 minute in a wide screw-capped jar, and filtered through a tea strainer. The material was then passed through an 80 μm mesh, which removes coarse particles with only marginal loss of testate amoebae. The filtrate was left to settle overnight after which the clear supernatant was carefully poured off. The concentrate was then transferred to a tube. One drop of this concentrate was placed on a slide with a pipette and mixed with one drop of glycerol. Images were acquired under bright field microscopy at 20-fold magnification with a camera mounted on the microscope and stored in TIFF format. For this project, as an automated microscope was not available images were manually acquired. An early exploration showed that commercial peat samples contained a low density of testate amoeba shells. Thus, we defined two complementary image acquisition procedures: grid search and active search.
In the grid search, a 5 by 10 grid of adjacent images is manually taken (Figure 3A). For each sample, 2 or 3 slides were imaged resulting in 100 or 150 images per sample. Most images typically do not contain any testate amoeba (only plant remains) and, when present, testate amoebae are generally not centered. The grid search mimics a realistic application scenario where data is acquired by an automatic scanning microscope.

Figure 3. Schematics of how images were obtained via manual grid and active search (A, B) and how they could be acquired via automated scanning in the future (C). Each red box schematically represents a single image.
In the active search, the whole slide is visually explored, but pictures are only taken when target amoeba species are found and the amoebae are centered in the picture (Figure 3B). Therefore, all images contain at least one individual. This procedure was designed to capture all target amoebae present under a slide. In addition, additional images were taken with each observed amoeba placed either close to the bottom-left or top-right corner of the image.
2.3. Image preparations
A standard image width of 1728 pixels was defined. The target magnification was 20-fold, which corresponds to approximately 0.32 μm/pixel. All images from Datasets 2 and 3 were acquired with a 20-fold magnification by design but some training images that had been acquired prior to project start had 40-fold magnification. The latter images were downsized to obtain a resolution of 0.32 μm/pixel and they were then padded to reach the standard width of 1728 pixels. For padding, we used images from the EMDS7 dataset (Yang et al., Reference Yang2023). The original EMDS7 images were greenish, so we converted them to black and white and then to a variety of soft colors to make them more diverse. All images were scrutinized by microbiology experts with a dedicated tool (Roboflow) and the position of peat-specific testate amoebae in the images was annotated in the form of bounding boxes. No distinction was made between dead or alive individuals because our focus was on detecting the shells. All amoeba that could be recognized were annotated even if they were degraded or masked. The annotated images were exported as JPG files in the YOLO format. A summary of classes is shown in Table 1.
2.4. Data sets
Dataset 1 was derived from 16 Sphagnum moss samples collected in peatlands (that is not commercially processed). This data was available prior to the study as part of the image repository of the Laboratory of Soil Biodiversity. These images were rich as they contained testate amoeba of many species (Table 3) often the amoebae were alive and presented in diverse natural colors. Note that these images are not representative of the commercially processed substrate and were therefore used only for training. Datasets 2 and 3 were taken exclusively from commercial substrates. In these images, the shells were typically empty, often more or less degraded and the coloration less vivid than for living specimens. Dataset 2 was derived from 17 commercial samples (11 peat, 6 non-peat, Table 2). From each sample, we prepared one suspension that was used to make 5 slides (2 for active search, and 3 for grid search). Dataset 3 was derived from 16 commercial samples (12 peat, 4 non-peat, Table 2) From each sample, we prepared one suspension that was used to make 2 slides that were imaged with active and grid search. Datasets 1 and 2 were combined to create the training set. In this way, the rich data from Dataset 1 together with the more representative data of Dataset 2 are used for learning by the neural networks. The training set consists of 5121 images from 33 independent samples. Dataset 3 was left out as a test set and was used to compare models and assess the final performance. The test set consists of 2415 images from 16 independent samples.
Table 2. Count overview of all samples and images. Comm: Commercial; The letters L, C, and R (Left, Center, Right) refer to the actively chosen position of the shell in the images obtained with active search

2.5. Preparation for training sessions
In the training set (Datasets 1 and 2), the classes were unbalanced (Tables 3, 4, and 5). The class balance was improved by making multiple copies of images from rare classes. This was a static transform performed prior to training and it was applied to the training set only. This improves learning by the network because all classes are seen approximately the same number of times during training and this reduces the risk that the network focuses only on one class. Additionally, static data augmentation was applied to images of the training set, namely: random crop from 0 to 20 pixels, random rotation −3 to 3 degrees, random mixing with EMDS7 images weighted from 0.0 to 0.5, application of a small amount of elastic transform (local distortion). This adds some diversity to multiple copies of the same image. Note that training-time data augmentation was also used (see below). The test set (Dataset 3) was not modified.
Table 3. Overview of number of manually annotated individuals per class in Dataset 1

Table 4. Active search shell counts from all commercial peat samples were used to estimate the number of shells per slide for each class. Left and right images were discarded and only the center image was used to avoid counting the same specimen 3 times

Table 5. Grid search shell counts from all commercial peat samples were used to estimate the number of shells per image for each class

2.6. Training
As an object detection algorithm, we chose YOLOv8 models. In one training session, the training set was used to train one model. We performed several sessions with different random initializations. This allowed us to assess the between-session variability and serves as a basis for ensemble prediction. During training, images were resized to a width of 512 pixels. Training-time data augmentation was applied. The defaults of YOLOv8 are for photo images and performed poorly with our images (comparison in Supplement). We modified the defaults as follows: (a) increase random rotation to a range between −180 and + 180 degrees, (b) reduce random re-scaling to 0.2, (c) add a small amount of shear of up to 10 degrees, (d) activate up-down flip in addition to left–right flip that is active by default, (e) reduce the mosaic transform probability to 0.2, (f) increase mix-up probability to 0.5. (Full details in supplement) We trained for 500 epochs and used the weights of the last epoch for prediction (no early stopping). Label smoothing was applied. All 10 classes (Table 1) were used for training. A well-established principle of ensembles is that the base predictors should be as diverse as possible. For a recent review see (Ganaie et al., Reference Ganaie2022). Therefore, the random seed was different in each session: this leads to different values of the initial weights (except for pretrained models) and different random realization of the training-time data augmentation. This is expected to give trained models that behave differently on individual items but similarly on average. We assessed 4 model sizes of YOLOv8 (NANO, SMALL, MEDIUM, LARGE) crossed with two weight initialization procedures: Random initialization versus weights pretrained with the COCO dataset. Short names are RANDINIT and COCOINIT.
2.7. Performance evaluation
Performance was assessed by comparing the true bounding boxes with the predicted boxes. An Intersection over Union (IoU) value >0.5 was used to declare a positive detection. Precision-recall curves (PR-curves) were constructed by incorporating the confidence value. The Average Precision (AP) was then obtained from these curves with the trapezoidal rule. PR-curves and AP were computed for each class separately, but only for 3 classes with sufficient instances in the test data. AP integrates model performance across all possible levels of the predicted confidence value and is useful for comparing models or training strategies. For the assessment of real-world usefulness, we computed the precision-confidence and recall-confidence curves separately. These two metrics give an interpretation to the confidence value in terms of false positives (precision) and missed targets (recall).
2.8. Final prediction module via ensemble
Based on the model comparison (Details in section 3.3.), the final models were from YOLOv8 MEDIUM trained with random initialization of the weights (RANDINIT). In the final prediction module, we used 10 individual models trained in independent sessions. Each model predicts boxes and confidence values for each class. When the confidence from individual models was <0.01 the prediction was suppressed to avoid cluttering the process with many irrelevant boxes. Note that predicted boxes from different models that detected the same object are never perfectly aligned and there is no in-built way to identify them as belonging to the same object. Therefore, we developed a post-processing method to associate several predictions that detected the same object. First, the Intersection over Union (IoU) of all predicted boxes of the same class was computed for all pairs of boxes within an image. A between-box distance metric was defined as D = 1-IoU such that perfectly overlapped boxes have D = 0 while non-overlapped boxes have D = 1. Second, predicted boxes that were sufficiently close to the image were aggregated via agglomerative hierarchical clustering (AHC) using D as the distance metric. This was done separately for each class and image. Therefore, only a small number of boxes at a time were processed with AHC. Confidence values of 0.0 were imputed for missing predictions (i.eless than 10 predictions associated with an object). The value of 0.0 is the lowest possible and a natural choice when an object was not detected at all by an individual model. Finally, the mean of the 10 predicted confidence values (incl. Imputed zeros if present) was used as an ensemble score and the corresponding crop was extracted from the prediction with max confidence value. To allow efficient decisions by a human expert, the crops were plotted on a single summary image ranked by their ensemble score (highest value first, left to right, and then top to bottom). Additionally, the ensemble score was color-coded from blue (high score) to red (low score). In this way, samples that produced only low-score crops are easily spotted. Examples of summary images are shown in Section 3.4.
3. Results
3.1. Testate amoeba in commercial non-peat samples
All 10 samples (1301 images) of peat-free substrate were searched for testate amoebae. Only a small number of testate amoeba shells were found and they were generally degraded or masked and thus more difficult to identify as compared to most shells found in surface mosses. The following potential testate amoeba morpho-taxa were found: Centropyxis aerophila-type (N = 10), Euglypha sp. (N = 1), Assulina sp. (N = 1), and Difflugia lucida-type (N = 1). However, these identifications are partly tentative as the shells were either partly degraded or not clearly visible in the images (e.g. due to t of their position or masked by other particles). This result illustrates how using an automated microscope to increase the number of images will improve the assessment of the testate amoeba diversity and abundance in samples.
3.2. Testate amoeba in commercial peat samples
For the descriptive analysis of commercial peat samples (this subsection), Datasets 2 and 3 were pooled. Dataset 1 was not included because it was obtained from unprocessed peat samples.
3.2.1. Archerellae found with active search
With active search, where the full slides were scrutinized, well-preserved Archerella sp shells were found in 22 out of 23 commercial peat samples from Datasets 2 and 3 (Figure 4). The between-sample variability was considerable, with Archerella sp counts ranging from 0 to 23. Overall, 258 shells (Dataset 2: 109, Dataset 3: 149) in good state of Archerella sp were found in 46 slides with active search (Dataset 2: 22, Dataset 3: 24). From this we can estimate a frequency of 5.61 shells per slide in commercial samples (Table 4).

Figure 4. Matrix view of testate amoeba distribution of all commercial peat samples (one sample per row, Dataset 2: 11 samples, Dataset 3: 12 samples). Values are the count of individual shells found with active and grid searches. Dataset 2: Grid and active search images from different slides. Dataset 3: Grid and active search images from the same slides. Count values are color-coded with shades of green for easier reading.
3.2.2. Archerellae found with grid search
With grid search, well-preserved Archerella sp shells were found in 17 out of 23 samples from the pooled Datasets 2 and 3 (Figure 4). Note that with the manual grid search only a small part of the slides was scrutinized (Figure 3A) which explains the smaller counts. Overall, 62 well-preserved Archerella sp shells (Dataset 2: 45, Dataset 3: 17) were found in 2902 grid search images (Dataset 2: 1705, Dataset 3: 1200 images). From this, we can estimate a frequency of 0.021 shells per image (Table 5) when automated scanning will be applied to commercial samples. Archerella was by far the most frequent testate amoeba found in commercial samples. Masked or degraded shells (Archerella degraded) were less frequently found than the well-preserved Archerella sp.
3.2.3. Other taxa found with active search
Two other genera, Amphitrema sp and Assulina sp, were consistently found in many commercial samples. But both were much less frequent than Archerella. With active search, shells from Amphitrema were found in 13 out of 23 and shells from Assulina in 19 out of 23 commercial samples (Figure 4). For both species, the counts were low with 0 to 5 individuals per sample. For Amphitrema and Assulina we estimate a frequency in commercial samples of 0.74 and 0.89 shells per slide respectively (Table 4).
3.2.4. Other taxa found with grid search
With grid search, shells from Amphitrema or Assulina were rarely found (Figure 4). The grid size of 100 or 150 images was clearly insufficient to obtain enough pictures of these two species. However, with the higher acquisition rate of automated microscopes, higher counts can be expected.
3.2.5. Rarer taxa
A total of 13 shells of Hyalosphenia papillo, a strict peat indicator, were found overall with active search. Again, with the higher acquisition rate of automated microscopes, higher counts can be expected. Finally, shells of the five other species were rarely found in commercial samples (Table 4, bottom rows).
3.3. Model performance
Models were trained with the training data (5121 images). Then, performance estimates were obtained by predicting the complete test data (2415 images: grid search and active search C, L, R). See Table 2 for details. Only three classes had counts large enough in the test set to estimate the average precision (AP) and were reported. Please consult Figure 5 for a graphical overview of the results that are reported hereafter. The number of test instances was moderate for Archerella (N = 460) and small for Amphitrema (N = 62) and Assulina (N = 60). This is reflected in the variance between individual models (blue dots) which is smaller in Archerella compared to the two other classes. First, we noticed that in all model sizes and training procedures the ensemble prediction (red dots) clearly and consistently outperformed individual models. Weight initialization (RANDINIT vs COCOINIT) had no detectable impact on performance. The smallest model NANO performed worse than the three larger models. Among the models SMALL, MEDIUM, and LARGE, performance was comparable. The AP for Archerella was clearly above 0.8 for the ensemble models. For Amphitrema, AP was around 0.7 for the three larger ensemble models. For Assulina, AP was around 0.8 for the three larger ensemble models. The MEDIUM-RANDINIT model was chosen as the final model and used in the final prediction module to process the test data. The performance of individual models was stable over the sessions. Noticeably, the behavior of the score, represented by the shape of the PR curves, was also stable across individual models (Figure 6). The AP metric is reported by convention and is useful to compare individual models to the ensemble but not very telling in terms of practical usefulness. Therefore, the separate precision and recall curves are provided in the next section.

Figure 5. Average Precision (AP) of the models individually (blue dots) and the ensemble (red dots) for several model sizes and training procedures.

Figure 6. Precision-recall curves were obtained by predicting all test images with the 8 models individually (blue lines) and the ensemble (red line). Test data from grid and active search pooled (2415 images). Nbox (GT): number of annotated ground truth boxes.
3.4. Illustration of module with test data
This section shows the summaries that were automatically generated by the prediction module together with estimates of the detection performance given as precision and recall (Figures 7 and 8). It is meant to illustrate a possible practical application of the module with data of the test set (Figures 9 to 15). More conventional performance metrics are reported in the previous section. The whole process, including prediction by an ensemble of eight models and post-processing, took 581 seconds for 1872 images on a GPU-accelerated desktop computer (HP Z4 workstation with NVIDIA A6000 GPU). This corresponds to 0.31 seconds per image; thus for 1000 images, it would take approximately 5 minutes.

Figure 7. Precision and recall vs ensemble score obtained with the grid search test data (1600 images). Only available for Archerella because Amphitrema and Assulina had too few items in the test set. These curves allow us to estimate the errors if a particular threshold is applied to the ensemble score. Corresponding crops with ensemble scores can be seen in Figure 10.

Figure 8. Same as Figure 7 but obtained with 1872 images of test data from grid and active search. Actual crops with ensemble scores can be seen in Figures 11, 13, and 15.

Figure 9. Ground truth in the test set for N = 17 Archerella sp from grid search. Order is arbitrary. The 15 individuals that were detected are framed in green.
Archerella grid only: In the first evaluation, 1600 test images, only from grid search, were passed through the prediction module. In these images, the frequency of occurrence of Archerella shells was low at 1% (17 individuals in 1600 images). The 17 true Archerella shells are shown in Figure 9. We consider this to be a realistic scenario because the images result from a grid search. It also represents a difficult challenge with only 17 targets hidden in 1600 images containing many items of different shapes and sizes (mostly more or less degraded plant remains). The prediction modules detected a total of 208 crops (Archerella candidates). Using a detection threshold of 0.5 on the ensemble score, we obtained a recall of 0.71 and a precision of 1.00 (Figure 7). In other words, 71% of all true Archerella specimens were in the detected set and 29% were missed while the automatically detected set contained 100% true Archerella and 0% false positives. Using the lowest detection threshold of 0.0, 88% (15 out of 17) individuals were detected (Recall: 15/17 = 0.88) at the cost of a lower precision of 0.07. The first row of the summary (12 crops) contains exclusively true Archerella (Figure 10). Looking at the second to fourth rows, we found that three more crops were true Archerella. All other crops from the fifth line to the bottom were not Archerella and most had an ensemble score close to 0 (see supplement).

Figure 10. Automatically detected candidates of Archerella sp ranked by ensemble score (only top rows shown). Obtained from 1600 grid search images of the test set. The correctly detected shells are framed in green. Uncertain but quite likely are framed in yellow.
Archerella grid and active: In the second evaluation, the grid and active search (center only) test data, were passed through the prediction module. The frequency of occurrence of individuals per image of Archerella was higher with 9% (166 individuals in 1872 images). The prediction modules detected a total of 450 crops (candidates), each with a value of the ensemble score on which a decision threshold can be applied. Using a detection threshold of 0.5 on the ensemble score, we get a recall of 0.76 and a precision of 0.91 (Figure 8). Using the lowest detection threshold of 0.0 we get a recall of 0.96 at the cost of a lower precision of 0.35. Looking at the 6 first rows of the summary (Figure 11, full image in supplement), we see that all top-ranked crops clearly belong to Archerella. Note that two items marked with a yellow frame were labeled as Archarella degraded because they were partially masked. For all practical purposes these are not false positives. This is also what causes the bump in the precision curve on Figure 8. This illustrates the capacity of the system to extract and rank the most relevant items.

Figure 11. Automatically detected candidates of Archerella sp ranked by ensemble score (only top rows shown). Obtained from the 1872 grid and 272 active search images of the test set. Two items that were labeled as Archerella degraded due to masking are framed in yellow; the remaining 73 crops are correctly detected shells.
Amphitrema grid and active: The frequency of occurrence of Amphitrema shells per image was low with only 1.3% of images containing an Amphitrema (24 individuals in 1872 images) which represents a difficult detection challenge. The 24 true Amphitrema shells are shown in Figure 12. The prediction modules detected a total of 135 crops (candidates). Using a detection threshold of 0.5, we obtained a recall of 0.33 and a precision of 0.88 (Figure 8). Using the lowest detection threshold of 0.0 we obtained a recall of 0.79 at the cost of a lower precision of 0.14. In the first row of the summary, 8 out of 9 crops were true Amphitrema and looking at the three top rows, we found that 16 out of 28 crops were true Amphitrema (Figure 13).

Figure 12. Ground truth in the test set for N = 24 Amphitrema sp from grid search (top) and active search (bottom). Order is arbitrary. The 19 individuals that were detected are framed in green.

Figure 13. Automatically detected candidates of Amphitrema sp ranked by ensemble score (only top rows shown). Obtained from 1600 grids and 272 active search images of the test set. The correctly detected crops are framed in green.
Assulina grid and active: The frequency of occurrence of Assulina per image was low with only 1.0% of images containing a shell (19 individuals in 1872 images), which represents a difficult detection challenge. The 24 true Assulina shells are shown in Figure 14. The prediction modules detected a total of 92 crops (Assulina candidates). Using a detection threshold of 0.5, we obtained a recall of 0.79 and a precision of 0.94 (Figure 8). Using the lowest detection threshold of 0.0 we obtained a recall of 1.00 at the cost of a lower precision of 0.21. In the first row of the summary, 8 out of 11 crops were true Assulina and looking at the first three rows, we found that 17 out of 34 crops were true Assulina (Figure 15).

Figure 14. Ground truth in test set for N = 19 Assulina sp active search (zero individuals were found with grid search). Order is arbitrary. All 19 individuals were detected and are framed in green.

Figure 15. Automatically detected candidates of Assulina sp ranked by ensemble score (only top 5 rows shown). Obtained from 1600 grids and 272 active search images of the test set. The correct detections are framed in green.
4. Discussion
4.1. Shells preserved in commercial peat products
The shells of several peat-specific testate amoebae taxa were present in commercially processed substrates labeled as peat. For two of the smaller peat-specific taxa (Archerella and Amphitrema) well-preserved specimens were found in many samples. These two species can therefore be used as indicators of peat presence in commercially processed products. Another taxon (Assulina) which is common in peatlands but also frequent in acidic soils outside of peatlands (e.g. coniferous forest) was also frequently found in commercial substrates. This further confirms the resistance of small testate amoeba shells to industrial processes. This observation is not surprising as Assulina is among the most resistant testate amoeba taxon to degradation; it is not even destroyed by HF treatment used for pollen preparation (Payne et al., Reference Payne2012). While a few shells of larger species were found (Hyaloshpenia sp), they were infrequent and often degraded. For this reason, they seem less optimal as indicators of peat in commercial products. However, if a larger number of images can be automatically acquired, we can expect to find enough shells in a good state from Hyaloshpenia sp to be used as indicators. Based on the above findings, we are confident that the automated detection of peat-specific testate amoebae is a valid method to detect peat in the substrate.
4.2. Commercial non-peat products
A total of 400 images from commercial media labeled as peat-free were carefully checked. Several specimens of Centropyxis aerophila-type and one Difflulgia lucida-type were found. Although these species can be found in peatlands, they are also frequent in other habitats including caves (Yuri et al., Reference Yuri2012). The samples also included many fungal spores and some conifer pollen. The fact that several specimens of testate amoeba species were found is positive as it shows that they can be found in such samples. The absence of the typical peat indicators can be interpreted as proof that the samples were indeed peat-free. In addition, the absence of Assulina further demonstrates that the samples were not even from acidic litter collected e.g. in coniferous forests. If the substrate is not acidic this taxon would indeed not be expected to be found as it is clearly an indicator of low pH (Bonnet, Reference Bonnet1991).
4.3. DNN models trained with small data
An ensemble of deep neural networks for object detection was successfully trained with a small dataset dedicated to a very specific microbiological application. The detection performance was high for 3 classes. The final prediction module extracts small images centered around the candidate amoeba (crops) together with ensemble scores which are used to rank the most relevant candidates first. This shows that mature models for object detection in images are nowadays available and can be successfully adapted for highly specialized domains. A large part of the development work was to fine-tune the data augmentation procedure for the specific data. The most time consuming and cumbersome part was the creation and curation of the datasets. It shows that nowadays the bottleneck for developing useful DNN applications is the availability of curated and annotated data, not the algorithms.
4.4. Feasibility of large-scale analysis
In commercial samples, only 2% of images contained an Archerella shell (Table 5). When present, the shell covers much less than 1% of the image and is embedded in residues (example in Figure 2). To obtain a clear statement regarding peat presence in each sample, we propose that at least 1000 images per sample should be searched (thus we expect to catch approximately 20 Archerella shells). Performing this manually is very tedious and time consuming and simply not realistic for a large-scale analysis with hundreds of samples. Our proposed decision support process automates the tedious detection and extraction process and produces a concise summary as a ranked list of Archerella candidates where the most promising items are shown first. The final step, manual confirmation and counting by a human operator is expected to take at most a few minutes per sample. The proposed decision support process significantly reduces the amount of human effort required.
Our decision support system is an important step to enable the large-scale monitoring of commercial substrates from multiple sources. Let us assume for instance a scenario with 100 samples, each with 1000 images. We would need around 8.3 hours to process all 100′000 images on a single GPU-accelerated desktop computer. In addition, the process is reproducible and reliable because the final prediction module incorporates the expertise of very specialized domain experts (CV, EM, CD) via highly curated datasets. This would allow the decision support process to be safely performed by less experienced staff who underwent short training. The process is also transparent because the generated crops can be stored for each sample as evidence and allow independent review of the decision by third parties if needed.
The decision support system requires many images per sample that must be acquired by systematically scanning all the material under the slide. In this study, we did this manually, but for a large-scale analysis as proposed above this is simply not feasible. Thus, an automated scanning microscope will be needed. However, note that this would only accelerate the image acquisition per-se but not the preliminary lab work to get sample material and prepare the slides.
4.5. Strengths and limitations
Our method is strong enough to demonstrate (rule-in) the presence of peat. That is, if peat-specific shells of testate amoeba were found, this represents strong evidence that peat is in the substrate. In addition, the method automatically provides supporting evidence in the form of crops, which can be interpreted by human experts. There is a between-sample variability in the frequency of peat-specific amoeba. We must assume that the species could be absent in peat from certain provenances. In this case, our methods will naturally not work. The method is thus weaker to rule out the presence of peat. That is, if peat-specific shells were not found, the substrate could still contain peat. Detection of testate amoebae is likely to work best with raised bog peat, as preservation of testate amoeba and other remains is better than in fen or blanket bog peat. Most commercial peat for horticulture is extracted from raised bogs due to its favorable properties (lower degradation, lightweight, higher water holding capacity). Therefore, our method is likely to be effective for many, but not all, commercial substrates.
Some commercial growing media contain recycled substrate with peat. Such substrates are less problematic from a sustainability point of view. We noted that two indicator amoeba species are robust to commercial processing. These shells will probably also be conserved through the recycling process and thus our method might detect them. Fundamentally, this would not be a limitation of the methods per-se but rather a consequence of its good detection performance. Note also that any detection method for peat will probably suffer the same limitation.
4.6. Extension to other microorganisms
The present results show that deep neural networks for object detection could be adapted to effectively detect soil microorganisms in a very specific setting. The task was challenging due to the degraded state of the amoeba shells in commercial products and to their low frequency of occurrence. Adding to the difficulty, the shells were embedded in plant residues and particles with diverse shapes and textures. This is likely to be the case in similar applications involving images from natural soil samples or products derived thereof. Despite these complications, well-performing models could be trained without dependence on pre-trained models and with a rather small development dataset of a few thousand images. A big workload was the manual acquisition of images. Fortunately, in future applications, this can be achieved with automated microscopy devices, which are expected to considerably reduce the human workload. The data annotation (bounding boxes) could be done efficiently thanks to the availability of user-friendly tools that can be used collaboratively via the web. We have also shown that a few thousand images together with a carefully tuned data augmentation strategy are sufficient to obtain actionable and useful models. We believe that this will translate to future projects and thus it will keep the annotation burden within reasonable boundaries. Finally, model selection and fine-tuning of the training procedure was time consuming. Fortunately, the developed procedure can be used for new classes, thus this work must not be repeated. Based on the above insights, the extension of the model to more microorganism classes seems a feasible endeavor. For instance, we could make models that detect many taxa from direct soil probes. This would open new possibilities for large-scale assessments of soil biodiversity or the continuous monitoring of soil health at restoration sites.
4.7. Quantification of peat proportion
Horticultural growing substrate often contains a mix of peat and other substrates (e.g. compost). It would be useful to estimate the proportion of peat in such mixtures. The present results (Table 4 and Figure 4) show that Archerella sp was consistently present in natural peat from Europe. Another strict peat indicator, Amphitrema sp, was found in half of the peat samples despite the small number of images available per sample. If a larger number of images per sample (>1000) can be gathered via automated microscopy techniques, we can expect to find sufficient shells of peat indicator taxa to get an accurate estimation of the shell density in each sample: e.g. number of shells per mg substrate. Provided that the shell density in pure peat and its natural variation is known, this could be used to estimate the proportion of peat in the mixture. The natural variation will obviously add uncertainty to this estimate. The results obtained with a rather small number of images are quite promising. Thus, as a next step, it would be interesting to assess the natural frequency, especially of Archerella sp and Amphitrema sp, in peat extracted from many European provenances.
5. Conclusion
-
• Peat-specific testate amoebae were recovered from a commercial substrate containing peat. Two taxa, Archerella and Amphitrema, are robust indicators of peat presence in such products.
-
• Deep neural networks were successfully trained and tested with a small application-specific data set. This illustrates the maturity of these algorithms for real-world applications.
-
• We propose a decision process where large image collections are automatically batch processed and human experts can quickly review a ranked list of small crops to make the final decision.
-
• Our method enables effective large-scale monitoring of peat presence in commercial substrates from multiple sources.
-
• To make the method practical, an automated image acquisition procedure will be necessary.
Open peer review
To view the open peer review materials for this article, please visit http://doi.org/10.1017/eds.2025.15.
Acknowledgements
We are very grateful to Gisela Umbricht for precious support with project administration and to Laura Tschümperlin for many fruitful discussions.
Author contribution
Conceptualization: L.M.; E.M.; S.Z. Methodology: S.Z.; E.M.; L.M. Formal analysis: S.Z. Funding acquisition: L.M. Project administration: L.M. Investigation: C.V.; C.D.; E.M. Data curation: C.V.; S.Z.; C.D. Writing - original draft: S.Z. Writing - review and editing: S.Z.; C.V.; L.M.; E.M. All authors approved the final submitted draft.
Competing interests
The authors declare none.
Data availability statement
The data (images and annotations) can be downloaded from https://zenodo.org/records/14609759
Ethical statement
The research meets all ethical guidelines, including adherence to the legal requirements of the study country.
Funding statement
Commissioned by Federal Office for the Environment (FOEN), CH-3003 Bern. The FOEN is an agency of the Federal Department of the Environment, Transport, Energy and Communications (DETEC). Disclaimer: This study was prepared under contract to the Federal Office for the Environment (FOEN). The contractor bears sole responsibility for the content.