Impact Statement
We present a new data analysis framework based on causality and explainability to help farmers adopt sustainable alternatives to traditional practices for agricultural management. The framework makes agricultural management more practical and trustworthy by providing clear, reliable predictions, advice tailored to specific fields, and impact assessment of recommended actions. In our example, this could lead to less reliance on harmful pesticides, helping to protect the environment and fight climate change. With this tool, farmers can make better-informed decisions that benefit their crops and the planet, promoting a healthier and more sustainable future.
1. Introduction
Digital agriculture integrates agricultural expertise with digital technologies, such as remote sensing, IoT, and data analytics, to effectively leverage diverse data sources like satellite imagery, weather forecasts, and soil health metrics. This approach promotes more sustainable, resilient, and profitable farming by enabling data-driven decisions across the agricultural value chain (Basso and Antle, Reference Basso and Antle2020). This approach is essential for adapting agriculture to our rapidly changing climate and mitigating its impact on climate change (Balasundram et al., Reference Balasundram, Shamshiri, Sridhara and Rizan2023). Artificial Intelligence (AI) serves digital agriculture as the means to transform the data into insights, estimations, forecasts, and recommendations that aim to support decision-making to balance agriculture’s environmental, societal, and economic aspects. However, digital agriculture has remained largely confined to using almost solely correlation-based AI, which excels at predictive tasks but cannot go further. In this context, we propose exploiting two underutilized branches of AI by digital agriculture—causality and explainability. They can unlock capabilities beyond the continuous pursuit of prediction accuracy for enhancing digital agriculture, given that it considers agricultural knowledge and practice and integrates it into the modeling and inference parts (Sitokonstantinou et al., Reference Sitokonstantinou, Porras, Bautista, Piles, Athanasiadis, Kerner, Martini, Sweet, Tsoumas and Zscheischler2024). Thus, causality and explainability bring in digital agriculture domain-aware robust models, explainable predictions, counterfactual reasoning, and quantifying effects of advice, action, and policy.
Pest management is a quintessential example in this context, demonstrating the valuable contributions that causality and explainability offer. Conventional pest management has been shown to contribute to climate change. Raising temperatures, intensifying ultraviolet radiation, and reducing relative humidity are expected to increase pest outbreaks and undermine the efficacy of pest control methods like host-plant resistance, bio-pesticides, and synthetic pesticides (Sharma and Prabhakar, Reference Sharma and Prabhakar2014; Skendžić et al., Reference Skendžić, Zovko, Živković, Lešić and Lemić2021). Despite climate experts’ warnings, pesticide use in agriculture adversely affects public health (Boedeker et al., Reference Boedeker, Watts, Clausing and Marquez2020) and contributes to the climate crisis. This impact includes: (i) greenhouse gas (GHG) emissions from pesticide production, packaging, and transportation (Audsley et al., Reference Audsley, Stacey, Parsons and Williams2009), (ii) compromised soil carbon sequestration (Xu et al., Reference Xu, Sheng and Tian2020), (iii) elevated GHG emissions from soil (Spokas and Wang, Reference Spokas and Wang2003; Marty et al., Reference Marty, Spurlock and Barry2010; Heimpel et al., Reference Heimpel, Yang, Hill and Ragsdale2013), and (iv) contamination of adjacent soil and water ecosystems, resulting in biodiversity loss (Sharma et al., Reference Sharma, Kumar, Shahzad, Tanveer, Sidhu, Handa, Kohli, Yadav, Bali and Parihar2019).
Thus, a vicious cycle has been established between pesticides and climate change (Sharma et al., Reference Sharma, Reeves and Washburn2022). In response, the European Commission (EC) has taken action to reduce all chemical and high-risk pesticides by 50% by 2030. Achieving such reductions requires adopting integrated pest management (IPM), which promotes sustainable agriculture and agroecology. IPM consists of eight principles inspired by the Food and Agriculture Organization (FAO) description. The authors in Barzman et al. (Reference Barzman, Bàrberi, Birch, Boonekamp, Dachbrodt-Saaydeh, Graf, Hommel, Jensen, Kiss and Kudsk2015) condense these principles into prevention and suppression, monitoring, decision-making, non-chemical methods, pesticide selection, reduced pesticide use, anti-resistance strategies, and evaluation.
Data-driven methods have played a crucial role in optimizing pest management decisions. Some studies employ supervised machine learning techniques, such as Random Forests and Artificial Neural Networks (ANNs), satellite Earth observations, and in-situ data for pest presence prediction (Aparecido et al., Reference Aparecido, Rolim, Moraes, Costa and Souza2019; Zhang et al., Reference Zhang, Huang, Pu, González-Moreno, Yuan, Wu and Huang2019). Others extend their models to include weather data (Skawsang et al., Reference Skawsang, Nagai, Tripathi and Soni2019). Recurrent Neural Networks (RNNs) capture temporal features from weather data, effectively handling unobservable counterfactual outcomes (Xiao et al., Reference Xiao, Li, Kai, Chen, Zhang and Wang2019). Iost Filho et al. (Reference Iost Filho, de Bastos Pazini, Alves, Koch and Yamamoto2022) highlight the extraction of fine-scale information for Integrated Pest Management (IPM) using meteorological data, insect scouting records, machine learning, and remote sensing. Nanushi et al. (Reference Nanushi, Sitokonstantinou, Tsoumas and Kontoes2022) propose an interpretable machine learning solution integrating numerical weather predictions, vegetation indices, and trap catch data for estimating Helicoverpa armigera presence in cotton fields. This approach enhances the decision-making aspect of IPM, shifting away from traditional threshold-based pesticide applications. The interpretability of these predictions enhances trust and allows for incorporating domain expertise in pest management decision-making.
2. Proposal
As Barzman et al. (Reference Barzman, Bàrberi, Birch, Boonekamp, Dachbrodt-Saaydeh, Graf, Hommel, Jensen, Kiss and Kudsk2015) point out, threshold-based and “spray/don’t spray” advice is not enough. There is a need for a new class of digital tools that consider the entire set of IPM principles to enhance decision-making truly. In this direction, we propose a data analysis framework for IPM based on causality and explainability. It consists of short-term actionable advice for in-season interventions and long-term advice for supporting strategic farm planning (Figure 1).

Figure 1. Causal and explainable data analysis framework for enhanced IPM.
This way, we will upgrade the monitoring and decision-making IPM principles leading to actionable advice for direct pest control interventions and assisting the selection of practices relevant to other IPM principles, such as the use of non-chemical methods and reduce pesticide dosage. Additionally, the proposed framework will better inform farmers concerning the potential impact of practices that, in turn, will enhance the IPM principle of prevention and suppression, for example, crop rotation, day of sowing, and no-tillage. Furthermore, our framework employs observational causal inference to continuously assess the recommendations above and satisfy the IPM principle of evaluation.
In this study, we exploit the proposed framework, demonstrating its applicability and efficiency in a case study for pest management. While the case study is specific it represents the general case of pest management in several crops and conditions, and the typical availability of data for such case studies.
3. Data
Our approach relies on diverse data sources as a key leverage to capture a comprehensive picture of the past, present, and future agro-environmental conditions. This will enable us to improve the modeling and comprehension of pest dynamics.
3.1. Earth observations
We leverage biophysical and biochemical properties such as Leaf Area Index (LAI), Normalized Difference Vegetation Index (NDVI), chlorophyll content, as well as data on evapotranspiration and soil moisture. These factors play a crucial role in monitoring pest population dynamics. The data is derived from the Sentinel-1/2 and Terra/Aqua (MODIS) satellite missions that provide open access to optical multi-spectral and Synthetic Aperture Radar (SAR) images.
3.2. Terrain & soil characteristics
We incorporate data from open-access digital elevation models and information on topsoil physical properties and soil organic carbon content (de Brogniez et al., Reference de Brogniez, Ballabio, Stevens, Jones, Montanarella and vanWesemael2015; Ballabio et al., Reference Ballabio, Panagos and Monatanarella2016). This allows us to include fixed or long-term characteristics specific to the area of interest.
3.3. Numerical weather predictions (NWP) and reanalysis of environmental datasets
Any high spatial resolution weather forecast can be used. We utilize a custom configuration of WRF-ARW (Skamarock et al., Reference Skamarock, Klemp, Dudhia, Gill, Liu, Berner, Wang, Powers, Duda and Barker2019) at a spatial resolution of 2 km. Hourly predictions are made, and for each trap location (i.e., where we have measurements about pest abundance), we obtain daily values for air (2 m) and soil temperature (0 m), relative humidity (RH), accumulated precipitation (AP), dew point (DP), and wind speed (WS). These parameters have been widely used in related work and are extremely valuable for learning from past (reanalysis) and future (NWP) pest states.
3.4. In-field measurements
In-field measurements involve ground observations of pest abundance using pheromone traps specifically designed for monitoring the cotton bollworm, known by the scientific name Helicoverpa armigera (H. armigera). These traps contain the active ingredients Z-11-hexadecen-1-al and Z-9-hexadecenal. The traps are used from the beginning of the first generation until the end of the season, with regular replacement every 4 to 6 weeks. The company Corteva Agriscience Hellas has established a dense (in time and space) trap network (Figure 2) that covers almost all areas in the Greek mainland where cotton is cultivated. The traps are strategically positioned at suitable distances from each other to prevent interference and ensure accurate data collection. An agronomist examines the traps and counts the trapped insects at regular intervals every 3–5 days. Corteva Agriscience Hellas provides historical data consisting of 398 trap sequences and 8202 unique data points from 2019 to 2022 (Table 1). They also provide auxiliary data on pesticide application, potential crop damage from pests, the severity of the damage, trap replacements, and scouter comments.

Figure 2. Traps distribution in the Greek mainland for 2019–2022. Colors indicate the different agroclimatic zones in which traps from the dataset belong. These zones have been identified based on the study conducted by Ceglar et al. (Reference Ceglar, Zampieri, Toreti and Dentener2019).
Table 1. Summary of trap data

4. Approach and methods
4.1. Causal graph for representing domain knowledge
We constructed a causal graph (Figure 3) based on domain knowledge and expertise, denoted as
$ G $
, that represents the underlying causal relationships within the pest-farm ecosystem for the H. armigera case. The graph
$ G $
comprises vertices
$ V $
, which represent the variables in the system, and directed edges
$ E $
, which symbolize the cause-and-effect relationships between these variables. Besides helping us articulate domain knowledge, the causal graph
$ G $
will benefit the downstream technical analyses in various ways. For instance,
$ G $
will be employed for effect identification via graphical tests (Pearl, Reference Pearl2009), where the structure of
$ G $
is integral to discerning causal relationships. Conversely, in the case of estimating conditional average treatment effects within the potential outcomes framework,
$ G $
will be utilized as a conceptual guide for considering causal structures during the control phase. In invariant causal prediction, the graph will facilitate the construction of an accurate list of invariant features using causal parents of the target outcome. Moreover, the structural knowledge captured in
$ G $
could benefit invariant learning methods by guiding the environment
$ E $
definition. This diverse and tailored incorporation of
$ G $
is aimed at optimizing the utilization of domain knowledge by the specifications and objectives of each analytical technique.

Figure 3. Causal graph of a pest-farm ecosystem for Helicoverpa armigera case.
Specifically, in the current case of the pest-farm ecosystem of H. armigera, various biotic and abiotic factors (Table 2) can influence the population dynamics
$ Y $
of H. armigera (Sharma et al., Reference Sharma, Kumar, Vyas, Sharma and Shrivastava2012). Temperature
$ T $
plays a crucial role, affecting the insect’s growth, development, fecundity, and survival (Howe, Reference Howe1967). The size
$ SG $
of the first generation is related to the size of the second generation, and the Southern Oscillation Index
$ SOI $
has a significant correlation with the size of the first spring generation (Maelzer and Zalucki, Reference Maelzer and Zalucki1999, Reference Maelzer and Zalucki2000). Additionally, the life cycle
$ LC $
of H. armigera is temperature-dependent, with completion occurring between 17.5°C and 32.5°C (Mironidis and Savopoulou-Soultani, Reference Mironidis and Savopoulou-Soultani2014). The presence of parasitoids and natural enemies in cotton cultivation is crucial to many IPM programs, including the control of H. armigera (Pereira et al., Reference Pereira, Reigada, Diniz and Parra2019). Many egg parasitoids of different families are known for their high parasitism
$ P $
rates and their effectiveness in reducing the population of H. armigera (Noor-ul-Ane et al., Reference Noor-ul-Ane, Arif, Gogi and Khan2015). Nevertheless, parasitism rates are influenced by temperature and relative humidity (Kalyebi et al., Reference Kalyebi, Sithanantham, Overholt, Hassan and Mueke2005; Noor-ul-Ane et al., Reference Noor-ul-Ane, Arif, Gogi and Khan2015). Moreover, the efficacy of spray application
$ Sp $
also impacts population dynamics (Wardhaugh et al., Reference Wardhaugh, Room and Greenup1980). The efficacy of
$ Sp $
is significantly influenced by the plant growth stage
$ PGS $
. During the seedling stage, limited leaf surface area reduces spray coverage, while the vegetative stage offers more extensive leaf area, enhancing spray interception. However, dense canopies at later stages may impede spray penetration. Plant physiology also varies, affecting the absorption and translocation of sprayed substances (Fishel and Ferrell, Reference Fishel and Ferrell2010).
Table 2. Pest-farm ecosystem variables

Other environmental factors come into play as well. Precipitation
$ \mathit{\Pr} $
affects the population size, with heavy precipitation leading to a decrease in the population (Ge et al., Reference Ge, Liu, Ding, Wang and Zhao2003). It also increases soil water content
$ SW $
which affects the emergence rate of H. armigera similar to air relative humidity
$ RHa $
(Fajun et al., Reference Fajun, Baoping and Xiaoxi2003). The presence of fruiting organs during the plant growth stage
$ PGS $
is important for population dynamics, as it serves as the oviposition site for females (Fitt, Reference Fitt1989). Crop variety
$ V $
, such as transgenic Bt cotton, can suppress the second generation of H. armigera, while both different cropping systems
$ CS $
and adjacent crops
$ AC $
can influence the population structure (Wardhaugh et al., Reference Wardhaugh, Room and Greenup1980; Gao et al., Reference Gao and Zhai2010; Lu et al., Reference Lu, Zalucki, Perkins, Wang and Wu2013). Finally, wind
$ W $
and wind direction play a significant role in the emergence of H. armigera, influencing the distance covered during migration from nearby locations. Additionally, wind conditions at the time of spraying
$ {W}_s $
can also impact the effectiveness of the intervention. These various factors collectively shape the population dynamics of H. armigera in a complex and interconnected manner as defined through domain knowledge and depicted in the causal graph (Figure 3).
4.2. Invariant & causal learning for robust pest prediction
Our goal is to predict near-future pest populations (
$ {Y}_{t+1} $
) using Earth observation (EO) and environmental data (
$ {X}_t $
) along with weather forecasts (
$ {W}_{t+1} $
) by learning the function
$ {y}_{t+1}=f\left({x}_t,{w}_{t+1}\right) $
. Pest management recommendations heavily depend on these predictions. Conventional machine learning methods (Aparecido et al., Reference Aparecido, Rolim, Moraes, Costa and Souza2019; Skawsang et al., Reference Skawsang, Nagai, Tripathi and Soni2019; Xiao et al., Reference Xiao, Li, Kai, Chen, Zhang and Wang2019; Zhang et al., Reference Zhang, Huang, Pu, González-Moreno, Yuan, Wu and Huang2019), which often assume that data points are independent and identically distributed (i.i.d.), struggle to generalize to unseen environments, capture spatiotemporal variability, and adapt to climate change. These methods are prone to learning spurious correlations, limiting their effectiveness in dynamic and non-i.i.d. scenarios.
To address these challenges, we turn to causal learning (Schölkopf and von Kügelgen, Reference Schölkopf and von Kügelgen2022), which leverages domain knowledge and is grounded in the principle of independent causal mechanisms. This principle suggests that joint probabilities can be decomposed into separate mechanisms, each reflecting an underlying causal relationship that remains stable despite environmental changes. By incorporating this principle, our models can improve generalization and robustness across varying conditions.
We achieve this by integrating invariant learning with causality and categorizing dataset units into environments
$ E $
as different agroclimatic zones or host crops (Figure 4). While
$ E $
influences feature
$ {x}_t,{w}_{t+1} $
, it does not directly affect the target
$ {Y}_t $
. Utilizing Invariant Causal Prediction (ICP) (Heinze-Deml et al., Reference Heinze-Deml, Peters and Meinshausen2018), Directed Acyclic Graphs (DAGs), and Invariant Risk Minimization (IRM) (Arjovsky et al., Reference Arjovsky, Bottou, Gulrajani and Lopez-Paz2019), we can select causal features, identify potential causal relationships, and capture latent causal structures. These tools allow us to build models that are effective in current conditions and adaptable to future environmental changes.

Figure 4. Invariant learning for robust predictions. Stable and accurate predictions in diverse environments, such as when H. armigera feeds on different crops exhibiting variations in phenotype, agricultural management practices, and spatial distribution. Traditional ML methods risk capturing spurious correlations, such as associating pest abundance with a specific crop (e.g., cotton) due to its higher frequency in the dataset, leading to biased predictions based on the underlying crop rather than true pest presence.
4.3. Explainability & counterfactual reasoning for short-term advice
We define the problem as a binary classification of pest presence or absence at the next time step, using Earth observation (EO) data (
$ {X}_t $
) and weather forecasts (
$ {W}_{t+1} $
). The goal is to predict the pest population value at the next time step,
$ {Y}_{t+1} $
, by learning the function
$ {y}_{t+1}=f\left({x}_t,{w}_{t+1}\right) $
. To enhance the trustworthiness of our predictions, we employ Explainable Boosting Machines (EBM) (Nori et al., Reference Nori, Jenkins, Koch and Caruana2019). This glass-box model achieves high performance while providing inherent explanations at both global and local levels. EBM’s additive nature allows for the sorting and visualization of feature contributions on a local scale for each one of predictions and a global level to summarize the general behavior of the model depending on features (Figure 5), which facilitates a better understanding of the primary drivers of the model and enhances trust in its outputs.

Figure 5. Explainability for trustworthiness enhancement, on the right, with local and global explanations of each prediction and general model behavior, respectively, & Counterfactual explanations as agricultural actionable recommendations on the left.
We propose generating counterfactual examples as recommended interventions to bolster trust further and provide actionable insights. Following the setup of (Mothilal et al., Reference Mothilal, Sharma and Tan2020), we search for minimal perturbations to the feature values
$ \left({x}_t,{w}_{t+1}\right) $
that would alter the prediction to the desired class using the same model
$ f $
. These counterfactual examples represent proposed actions that could be implemented in natural farm systems, ensuring practicality and feasibility (Wachter et al., Reference Wachter, Mittelstadt and Russell2017; Mothilal et al., Reference Mothilal, Sharma and Tan2020). The approach ensures that the generated counterfactuals are close to the original input but predicted in the desired class, providing feasible and actionable recommendations for IPM (Figure 5).
4.4. Heterogeneous treatment effects for long-term advice
We provide long-term pest prevention and suppression advice by assessing how agricultural practices (e.g., crop rotation, balanced fertilization, sowing dates) impact pest harmfulness and yield indices. Since different agro-environments may respond variably to the same practice, it is crucial to account for this heterogeneity. We estimate the conditional average treatment effect (CATE) following the potential outcomes framework (Rubin, Reference Rubin2005).
The CATE quantifies the difference in potential outcomes, represented as
$ \unicode{x1D53C}\left[Y\left(T=1\right)-Y\left(T=0\right)|X\right] $
, where
$ Y(T) $
denotes the value of a random variable
$ Y $
(e.g., pest harmfulness and yield) if a unit is treated with treatment
$ T\in \left\{0,1\right\} $
. By controlling for field characteristics
$ X $
—which capture the heterogeneity across different agro-environmental conditions—we can better understand how specific practices affect outcomes in various contexts (Figure 6). This approach allows us to provide tailored and effective long-term IPM advice sensitive to each field’s unique conditions (Giannarakis et al., Reference Giannarakis, Sitokonstantinou, Lorilla and Kontoes2022).

Figure 6. Conditional Average Treatment Effect (CATE) is seen as long-term personalized guidance. By accounting for each land unit’s unique characteristics, we can estimate a distinct treatment effect for each land unit. For example, how differences in land’s characteristics can change the impact of fertilizer application on increasing the risk of pest emergence in the future.
4.5. Causal inference for evaluating advice effectiveness
We employ causal inference techniques to assess the effectiveness of our pest control recommendations, adapting approaches recently introduced in agricultural contexts (Tsoumas et al., Reference Tsoumas, Giannarakis, Sitokonstantinou, Koukos, Loka, Bartsotas, Kontoes and Athanasiadis2023). Specifically, in the case of pest management and with available panel data (Table 1), we utilize causal models such as difference-in-differences (DiDs) (Abadie, Reference Abadie2005), synthetic control (Arkhangelsky et al., Reference Arkhangelsky, Athey, Hirshberg, Imbens and Wager2021) and synthetic DiDs (Abadie, Reference Abadie2021) to quantify the treatment effect of adhering to our framework’s recommendations (treated units) compared to those who did not (control units). Historical intervention data retrospectively annotated based on whether our framework recommended action, will serve as the basis for advice evaluation. Causal inference will be performed per-environment to ensure comparability between treatment and control groups, adhering to the parallel trends assumption (Lechner et al., Reference Lechner2011).
However, digital agriculture requires a two-level evaluation of interventions to disentangle the effectiveness resulting from the accuracy of the recommendation (for intervention) in terms of space–time from the inherent efficacy of the intervention. It is crucial to determine what effect, if any, is attributable to the space and time of application and what is due to the pesticide itself.
In this context, we conducted an initial analysis using the aforementioned panel data to quantify the impact of pesticide application on pest abundance in a real-world setting without expert system guidance, employing staggered DiDs with fixed effects (Eq. 4.1).
Table 3. Results of staggered DiDs with controls for unobserved heterogeneity at the unit and time levels by including fixed effects

Note: It includes point estimates, 95% confidence intervals, and p-value. Numbers represent the increase/decrease of accumulated pest catchments at the trap level after the intervention.
The staggered approach accounts for units receiving treatment at different periods. We include unit-fixed effects to control for each unit’s time-invariant characteristics and time-fixed effects to capture overall time trends that affect all units in each period. The unit of analysis is the plot where the pest trap is located, with periods modeled at the weekly level. Here,
$ {Y}_{it} $
represents the outcome variable, accumulated pest abundance, for each unit
$ i $
at the time
$ t $
, and
$ \mathrm{treated}\_{\mathrm{time}}_{it} $
is an indicator of whether the unit
$ i $
receives treatment (pesticide application) in a period
$ t $
(in a staggered manner across units). Specifically,
$ {\beta}_0 $
is the intercept,
$ {\beta}_1 $
is the treatment effect coefficient,
$ {\alpha}_i $
represents unit fixed effects,
$ {\gamma}_t $
captures time fixed effects, and
$ {\varepsilon}_{it} $
is the error term. Thus,
$ {\beta}_1 $
provides the average causal effect of the treatment (pesticide application) on the outcome (accumulated pest abundance) for treated units (ATT), as presented in Table 3 for each cultivation period from 2019 to 2022.

For the years 2021 and 2022, we observe a statistically significant reduction in pest abundance, while for 2019 and 2020, we find the opposite effect. At first glance, this contradiction may seem unusual, but several reasonable explanations could account for it. Since the data come from real-world agricultural practice, it likely encapsulates some of the following issues: (i) Some interventions may have been applied incorrectly regarding timing and method, reducing or eliminating their efficacy in the pest-infested plots. This could lead to a biased estimate that the pest population increased after pesticide application (Figure 7). This occurs because the counterfactual is constructed by taking the growth trend from a plot without intervention, which might not experience the same infestation or pest pressure level. So, a mistreated plot that probably follows a steeper population increase, simply due to its higher infestation levels, can lead to this fallacy that pesticide application increases pest population. (ii) After discussions with the data provider (Corteva Agriscience Hellas), noise within the control group labels may be possible. The company is confident in the labels for treated plots, as they receive this information directly from farmers. However, they cannot be as certain about the control group. Some farmers may have applied pest control practices in their plots but chose not to report them for various reasons, such as using less expensive pesticides from competitor companies or participating in eco-schemes prohibiting pesticide use. Consequently, we face a scenario of positively labeled and unlabeled data, a common issue in machine learning. (iii) The assumption of parallel trends may not hold universally, or unobserved confounders may vary over time and between units.

Figure 7. A visual example of DiDS for assessing the real-world impact of pesticide application. It demonstrates how, even when the parallel trends assumption holds in both conditions, applying an intervention (i.e., spray) at a non-recommended time can lead to unexpected effects compared to applying the intervention at the recommended time.
In a more robust causal analysis, we can technically or conceptually address these issues. Technically, we could retrospectively employ a recommendation system or consult experts, as aforementioned, to annotate each time–space slot as favorable or unfavorable for intervention. On the other hand, we can conceptually accept reality and precisely define what causal effect we retrieve. In this case, the ATT in a real-world setting includes different application accuracy levels, farmer’s skills, expert guidance, and proper timing. To address the second issue, we plan to use Positive-Unlabeled (PU) learning methods (Bekker and Davis, Reference Bekker and Davis2020) to train a classifier on covariates, as they are outlined in Section 3. Using the positively labeled (treated) units only as ground truth and PU learning for training, this classifier will help establish a control group consisting only of unlabeled units that are classified there with high confidence. Lastly, a formal investigation with statistical tests is required to retain only cases where the parallel trends assumption holds. Clear assumptions statements should also be made regarding the potential of unobserved confounders that may vary by time and unit. By leveraging these techniques, we aim to rigorously evaluate the impact of our recommendations on pest control outcomes and attribute the effects to the right factors, providing robust evidence for the effectiveness of our framework in diverse agricultural environments.
5. Conclusions
In conclusion, this article presents a new framework integrating causality and explainability into digital agriculture, with a focus on enhancing pest management practices. By leveraging advanced data analysis techniques, such as causal inference and invariant learning, our approach addresses the limitations of conventional correlation-based models, providing more robust and transparent decision-making tools. This framework not only supports real-time pest control interventions but also facilitates strategic long-term planning by offering insights into the heterogeneous effects of various agricultural practices.
Our study illustrates how incorporating explainability can bolster farmers’ trust and adoption of sustainable practices like IPM. The framework’s use of counterfactual reasoning and explainable predictions ensures that farmers receive actionable, field-specific recommendations that can adapt to different environmental conditions. Additionally, the causal analysis embedded within our methodology allows for ongoing evaluation of the framework’s effectiveness, ensuring the recommendations are impactful and contribute positively to agricultural sustainability.
We consider that a successful application to pest management will highlight, in a tangible way, the broader potential of this framework to enhance digital agriculture to drive sustainable, evidence-based practices across agriculture. Therefore, we plan to implement the proposed ideas outlined in Section 4 using the data described in Section 3. In parallel, we are gathering additional in-situ data in collaboration with Corteva Agriscience Hellas to enrich our dataset for the same pest and crop, as well as independently for other crops and pests. Finally, we explore how this approach could be adapted to related areas.
Future research will aim to expand this framework beyond pest management, exploring its potential applications in other areas of digital agriculture, such as crop disease management and nutrient optimization. Additionally, integrating advanced machine learning models to account for real-time weather data and unforeseen environmental factors will further refine prediction accuracy. Developing user-friendly tools and interfaces that facilitate farmer interaction with these data-driven insights will be critical to fostering widespread adoption.
The growing demand for sustainable agriculture underlines the importance of integrating advanced data analysis frameworks like ours. By systematically quantifying and explaining agricultural interventions, this framework offers a promising pathway for enhancing the adoption of digital agriculture in alignment with global sustainability goals. This comprehensive, data-driven approach promises to make sustainable agricultural practices more practical, facilitating a transition to a resilient and environmentally conscious food system.
Open peer review
To view the open peer review materials for this article, please visit http://doi.org/10.1017/eds.2025.14.
Acknowledgments
We express our gratitude to Corteva Agriscience Hellas, particularly to Dr. George Zanakis, the Marketing & Development Manager, for their invaluable support, insights, trust, and provision of data.
Author contribution
Conceptualization: Ilias Tsoumas, Vasileios Sitokonstantinou; Methodology: Ilias Tsoumas, Vasileios Sitokonstantinou, Evagelia Lampiri; Software: Ilias Tsoumas; Formal analysis: Ilias Tsoumas; Investigation: Evagelia Lampiri; Data curation: Ilias Tsoumas; Writing—Original Draft: Ilias Tsoumas, Vasileios Sitokonstantinou, Evagelia Lampiri; Writing—Review & Editing: all authors; Visualization: Ilias Tsoumas All authors approved the final submitted draft.
Competing interests
The authors declare none.
Data availability statement
Data availability is governed by the terms of the Memorandum of Understanding (MoU) between the National Observatory of Athens and Corteva Agriscience Hellas. Access to the data can be granted upon a formal written request to both entities. Corteva Agriscience Hellas reserves the right to make the final decision regarding access to the raw in-situ data.
Ethics statement
The research meets all ethical guidelines, including adherence to the legal requirements of the study country.
Funding statement
This research is funded by the European Union’s Horizon Europe Programme under Grant Agreement No. 101135422 (UNIVERSWATER—Ilias Tsoumas, Charalampos Kontoes) and supported by the European Union Horizon Europe Research and Innovation program under Grant Agreement #101070496 (Smart Droplets—Ioannis N. Athanasiadis). Vasileios Sitokonstantinou and Gustau Camps-Valls acknowledge support from the Generalitat Valenciana and the Conselleria d’Innovació, Universitats, Ciència i Societat Digital, through the project “AI4CS: Artificial Intelligence for Complex Systems: Brain, Earth, Climate, Society” (CIPROM/2021/56). Additionally, Gustau Camps-Valls acknowledges the support of the European Research Council (ERC) under the ERC Synergy Grant USMILE (Grant Agreement 855187).
Comments
I am pleased to submit our manuscript, titled “Leveraging Causality and Explainability in Digital Agriculture” for your consideration. This paper presents a novel framework that integrates causal inference and explainability into digital agriculture, with a focus on enhancing pest management. By leveraging causal AI, we aim to refine decision-making processes in pest control, promoting sustainable practices that align with Integrated Pest Management (IPM) principles. Our work underscores the potential of data-driven tools to reduce environmental impacts and foster climate-resilient agriculture. We believe this approach offers a transformative step toward climate-smart, sustainable pest management and would be of interest to readers focused on advancing digital agriculture and sustainable practices. Thank you for considering our work for publication.
Sincerely,
Ilias Tsoumas