1. Introduction
Each summer, surface meltwater flows into topographic depressions and forms numerous supraglacial lakes on the Greenland Ice Sheet (GrIS) (Sundal and others, Reference Sundal, Shepherd, Nienow, Hanna, Palmer and Huybrechts2009; Banwell and others, Reference Banwell, Caballero, Arnold, Glasser, Mac Cathles and DR2014; Dunmire and others, Reference Dunmire, Banwell, Wever, Lenaerts and Datta2021; Zhang and others, Reference Zhang2023). Some of these supraglacial lakes may undergo hydrofracturing at their bottoms and drain large volumes of meltwater through the ice sheet to the bed, consequently affecting ice sheet motion and stability (Das and others, Reference Das2008; Williamson and others, Reference Williamson, Willis, Arnold and Banwell2018b; Tuckett and others, Reference Tuckett2019). Therefore, accurately estimating the supraglacial lake depth is crucial for analyzing surface meltwater storage on the GrIS and consequently improving our understanding of the GrIS mass balance.
Until now, only a few field measurements of GrIS supraglacial lake depth have been reported. For example, Box and Ski (Reference Box and Ski2007) measured the depth of two supraglacial lakes (1.05–11.5 m and 1.25–10.0 m) on the western GrIS using a raft equipped with a depth sounder. Tedesco and Steiner (Reference Tedesco and Steiner2011) measured the depth of the Olivia supraglacial lake (max depth 4.6 m) on the western GrIS using remote-controlled boats equipped with GPS and sonar. Legleiter and others (Reference Legleiter, Tedesco, Smith, Behar and Overstreet2014) measured the depth of the Napoli supraglacial lake (max depth 10.5 m) on the southwestern GrIS using remote-controlled boats equipped with GPS and sonar. Fitzpatrick and others (Reference Fitzpatrick2014) measured the depth of two supraglacial lakes on the southwestern GrIS (max depth ∼ 12 and 16 m) using a raft equipped with GPS and a depth sounder. More recently, Lutz and others (Reference Lutz2024) measured the depth of four supraglacial lakes (max depth ∼ 14 m) in the northeastern GrIS using a self-built remote-controlled boat equipped with a sonar sensor. However, the spatiotemporal coverage of these field measurements has been limited, mainly owing to the harsh GrIS natural conditions.
In recent years, the Ice, Clouds, and Land Elevation Satellite-2 (ICESat-2) raises new prospects for estimating supraglacial lake depth. The ICESat-2 photon clouds can penetrate water and thereby capture the surface and bottom elevation of shallow water simultaneously with high precision (Ranndal and others, Reference Ranndal, Sigaard Christiansen, Kliving, Baltazar Andersen and Nielsen2021; Xu and others, Reference Xu, Ma, Ma, Zhao, Yang and Wang2021; Bernardis and others, Reference Bernardis2023). Previous studies distinguished the photon clouds of the supraglacial lake surface and bottom using kernel density estimation algorithms and then estimated the lake depth accordingly (Fair and others, Reference Fair, Flanner, Brunt, Fricker and Gardner2020; Datta and Wouters, Reference Datta and Wouters2021; Xiao and others, Reference Xiao, Hui, Cheng and Liang2023). However, one ICESat-2 photon track can only generate one corresponding depth profile; it cannot generate complete lake bathymetry. Therefore, in supraglacial lake studies (and more broadly, in nearshore shallow water studies), ICESat-2-derived water depths are commonly used for accuracy validation or are combined with passive optical satellite imagery to generate complete water bathymetry (Albright and Glennie, Reference Albright and Glennie2020; Ma and others, Reference Ma2020; Datta and Wouters, Reference Datta and Wouters2021; Hsu and others, Reference Hsu2021).
Currently, there are three remote sensing methods commonly used for estimating supraglacial lake depth (Table 1), namely the empirical formula method (EFM), radiative transfer method (RTM), and depression topography method (DTM). The EFM combines field or ICESat-2-derived lake depth measurements with the corresponding single- or dual-band reflectance of passive optical satellite imagery (e.g. Sentinel-2, Landsat-8, and Planet) to construct empirical regression formulas, and the obtained formula is then applied to calculate lake bathymetry (Legleiter and others, Reference Legleiter, Tedesco, Smith, Behar and Overstreet2014; Pope and others, Reference Pope2016; Datta and Wouters, Reference Datta and Wouters2021). The RTM builds on the physical principle that the radiation of incident light decreases as depth increases (Lyzenga, Reference Lyzenga1978; Philpot, Reference Philpot1989) and combines optical image bands (e.g. MODIS, Sentinel-2, and Landsat-8) with pre-defined coefficients (e.g. scattering attenuation coefficients of upward and downward light) to estimate depth variation (Ignéczi and others, Reference Ignéczi2016; Pope and others, Reference Pope2016; Melling and others, Reference Melling2024). The DTM generates ice surface depressions using the pre- or post-melt season digital elevation models (DEMs, e.g. WorldView-2 DEM, ArcticDEM and TanDEM-X) and intersects the depression topography with the satellite-mapped lake shorelines to estimate lake bathymetry (Moussavi and others, Reference Moussavi2016; Yang and others, Reference Yang, Smith, Fettweis, Gleason, Lu and Li2019; Lutz and others, Reference Lutz2024; Melling and others, Reference Melling2024).
Table 1. Summary of supraglacial lake depth estimation studies, including field measurements, ICESat-2-derived algorithms, empirical formula method (EFM), radiative transfer method (RTM), and depression topography method (DTM)

Until recently, these three methods have been widely used, so it is crucial for comparing their performances. Pope and others (Reference Pope2016) used the field-measured lake depth (max depth 5 m) on the southwestern GrIS to compare the performance of the RTM and the dual-band EFM. Moussavi and others (Reference Moussavi2016) used DTM to estimate the depth of 14 supraglacial lakes (max depth 7 m) on the southwestern GrIS, and the resultant depth was employed as validation data to compare with RTM and EFM. Melling and others (Reference Melling2024) compared the performance of ICESat-2, RTM, and DTM to estimate the depths of five supraglacial lakes (max depth 3.2–7.9 m) on the southwestern GrIS. Lutz and others (Reference Lutz2024) used DTM to estimate the depth of 5 supraglacial lakes (max depth ∼ 27.6 m) on the northeastern GrIS, providing validation data for comparison with RTM, ICESat-2-EFM and Sonar-EFM. Most recently, Zhou and others (Reference Zhou, Liang, Xiao, Li, Zheng and Cheng2025) used ICESat-2-derived depths from seven supraglacial lakes (max depth ∼ 3.4–8.5 m) as validation to compare the performance of RTM, EFM, and two machine learning algorithms. Although these comparisons provide valuable information to guide depth estimation method selection, we suggest that a more detailed comparison should be conducted. First, the ICESat-2-derived depth data can be used as validation data, whereas DTM should be employed as a comparative method rather than validation data (Moussavi and others, Reference Moussavi2016; Pope and others, Reference Pope2016; Yang and others, Reference Yang, Smith, Fettweis, Gleason, Lu and Li2019). Second, EFM, RTM, and DTM should all be involved in comparisons. Third, the depth ranges and numbers of the supraglacial lakes selected in most previous studies are limited, and more representative supraglacial lakes (particularly deep lakes, which may most challenge estimation methods) on the entire GrIS should be used. For these purposes, in this study, we select 35 supraglacial lakes with a large depth variation (max depth 2.3–12.3 m) from various regions of GrIS and use the ICESat-2-derived supraglacial lake depth as validation data to compare the performance of EFM, RTM, and DTM.
2. Data and study lakes
2.1. Data
ICESat-2 is equipped with the Advanced Topographic Laser Altimeter System (ATLAS), which utilizes green (wavelength 532 nm) laser light and single-photon sensitive detection to measure surface height along each of its six beams (Markus and others, Reference Markus2017). ICESat-2 produces footprints on the ground spaced ∼0.7 m along the track and is capable of penetrating water up to ∼40 m (Parrish and others, Reference Parrish, Magruder, Neuenschwander, Forfinski-Sarkozi, Alonzo and Jasinski2019; Ranndal and others, Reference Ranndal, Sigaard Christiansen, Kliving, Baltazar Andersen and Nielsen2021). ICESat-2 ATL03 is a global geolocated photon data product with a measurement accuracy of 5 cm and a precision of 13 cm (Brunt and others, Reference Brunt, Neumann and Smith2019), and this product can capture both surface and bottom photon clouds of lakes (Jasinski and others, Reference Jasinski2023). ICESat-2 ATL06 is a land ice elevation product produced by fitting the ATL03 photon data as a function of along-track distance with a linear model in each 40 m segment (Smith and others, Reference Smith2019), with a measurement accuracy of 3 cm and a precision of 9 cm (Brunt and others, Reference Brunt, Neumann and Smith2019). In this study, we use the ICESat-2 ATL06 data to determine the photon cloud scope of the ICESat-2 ATL03 and to remove a portion of noise photons, and the ICESat-2 ATL03 data to measure the depth of supraglacial lakes.
Sentinel-2A/B are earth observation satellites launched by European Space Agency Copernicus program, with a 290 km swath width and a revisit period of less than 5 days. The Sentinel-2A/B images cover 13 spectral bands, including visible, near-infrared (NIR), and shortwave infrared, with a 10 m spatial resolution for visible and NIR bands (Drusch and others, Reference Drusch2012). In this study, we use the Sentinel-2 Level-2A surface reflectance product to identify the extent of a supraglacial lake and to build EFM and RTM depth estimation formulas. All images are acquired through the Google Earth Engine (GEE) platform (https://earthengine.google.com/).
ArcticDEM, released by the Polar Geospatial Center, is a high-resolution digital surface model for the entire Arctic region. ArcticDEM is generated from WorldView-1/2/3 and GeoEye-1 optical stereo imagery using Surface Extraction software with TIN-based Search-Space Minimization, which has the validation accuracy of 0.2 m (Noh and Howat, Reference Noh and Howat2015). In this study, we use the 2 m-resolution ArcticDEM strip data (version s2s041) (Porter and others, Reference Porter2022) to derive ice surface depression topography. Importantly, the ArcticDEM data used in this study are made by using pre- or post-melt season optical stereo imagery, ensuring that the studied ice surface depressions are not covered by supraglacial lakes, following Moussavi and others (Reference Moussavi2016), Yang and others (Reference Yang, Smith, Fettweis, Gleason, Lu and Li2019), and Melling and others (Reference Melling2024).
2.2. Study lakes
We select 35 supraglacial lakes across various regions of the GrIS (Fig. 1), based on the following criteria: first, near-simultaneous (±3 days) ICESat-2 tracks and Sentinel-2 imagery are available. Second, ICESat-2 tracks do not cross floating ice on the supraglacial lake. Third, ArcticDEM data from the same year during the dry period (determined by visually inspecting near-simultaneous Sentinel-1/2 imagery in GEE) of the selected lake is available. The selected lakes cover a maximum depth range of 2.3 to 12.3 m, providing continuous depth observations (at least one lake per meter of depth range).

Figure 1. Thirty-five study supraglacial lakes on the Greenland ice sheet are used to compare three remote sensing depth estimation methods. Sentinel-2 images (R: NIR band, G: red band, B: green band) of these lakes are depicted in the subplots, with the red lines indicating the ICESat-2 track. The subplot in the left corner shows the locations of these supraglacial lakes on the gris marked with red dots.
These 35 lakes are ranked as Lakes 1–35 in terms of their maximum depth. Lakes 1–12 are classified as shallow lakes (max depth 2.3–4.8 m), while Lakes 13–35 are classified as deep lakes (max depth 5.4–12.3 m), based on a 5 m threshold for maximum lake depth. This 5 m threshold represents the maximum penetration depth of the red band and is widely used in previous supraglacial lake studies (Moussavi and others, Reference Moussavi2016; Pope and others, Reference Pope2016; Melling and others, Reference Melling2024). Therefore, we followed this depth threshold to make our results comparable with previous studies. The data used for depth estimation are listed in Table S1.
3. Methods
3.1. ICESat-2 Watta lake depth estimation method
We use the Watta algorithm (Datta and Wouters, Reference Datta and Wouters2021) to derive supraglacial lake depth profiles from ICESat-2 at the native 0.7 m resolution of ATL03 photon clouds. First, this algorithm selects the ATL03 photon clouds within a distance of 50 m around the elevation of the corresponding ICESat-2 ATL06 photons to remove a portion of noise photons. Second, a sliding window of 75 ATL03 photons is applied to generate a probability density curve of photon elevations within each window through kernel density estimation. The two obtained elevation values with the highest probability density within each window are assigned as the lake surface and bottom elevations, respectively. Third, to obtain more reliable lake bottom elevations, the Watta algorithm conducts kernel density estimation again in a larger window of 5,000 ATL03 photons to remove the outliers of the bottom elevations due to the low photon density there. Fourth, the depth of each photon point is calculated by subtracting the lake bottom elevation from the lake surface elevation and applying a refractive index of ∼0.75 to remove the refraction influence:

The corrected lake bottom elevation is then derived as:

In this study, we use Sentinel-2 green and NIR bands to calculate the Normalized Difference Water Index (NDWI = (green – NIR)/(green + NIR), green and NIR are the reflectance of green and NIR bands, respectively) (McFeeters, Reference McFeeters1996), with a threshold of 0.3 to extract water masks and generate supraglacial lake shorelines. The ICESat-2 ATL03 photon points within each lake shoreline are then extracted and stratified at 1 m intervals of lake depth to avoid uneven distribution of water depths at sample points due to random sampling. Next, 70% of the photon points from each meter depth are randomly selected to form the training dataset, with the remaining 30% points used as validation data, ensuring a balanced evaluation of model training and validation (Nguyen and others, Reference Nguyen2021).
3.2. EFM
The EFM attempts to establish exponential, quadratic, or logarithmic empirical regression formulas between field-measured or ICESat-2-derived lake depth and the corresponding pixel reflectance of optical remote sensing imagery (Box and Ski, Reference Box and Ski2007; Williamson and others, Reference Williamson, Banwell, Willis and Arnold2018a). The quadratic formula, widely used in previous studies (Legleiter and others, Reference Legleiter, Tedesco, Smith, Behar and Overstreet2014; Moussavi and others, Reference Moussavi2016; Pope and others, Reference Pope2016), is selected in this study for comparison. We build the quadratic empirical formula by fitting the Sentinel-2 water pixel reflectance and ICESat-2-derived supraglacial lake depth following Datta and Wouters (Reference Datta and Wouters2021):


where Z represents the estimated lake depth, a, b, and c are fitting parameters, and X represents the logarithm of the ratio of the reflectance of the two spectral bands, R1 and R2, in the Sentinel-2 imagery.
Moreover, we use the optimal band ratio analysis (OBRA) to select the optimal band combination following Legleiter and others (Reference Legleiter, Roberts and Lawrence2009). First, a total number of N(N–1)/2 band combination ratios are obtained by pairing N bands. These band ratios are calculated, and then the training datasets are used to establish empirical formulas, with their R2 and RMSE calculated. Next, the band combination with the highest R2 and the lowest RMSE is determined as the optimal choice, and the associated depth estimates are then compared with other methods.
We conduct two experiments to analyze the spatiotemporal extrapolation of the EFM. Spatially, we construct an overall empirical formula based on all 35 study lakes, and then apply the resultant empirical formula to estimate lake depths for each lake. This experiment allows us to analyze the spatial stability of the overall empirical formula over space. Temporally, we select the supraglacial lakes covered by two different ICESat-2 tracks on different days. For each selected lake, we use the early-day ICESat-2 and Sentinel-2 imagery to construct the EFM, then apply the resultant empirical formula to estimate lake depths from the late-day Sentinel-2 imagery. Finally, we validate the estimated depths using the late-day ICESat-2-derived depths. This experiment allows us to analyze the temporal stability of the empirical formula over different depth ranges as the study lakes expand or shrink over time.
3.3. RTM
The RTM builds on the physical principle that incident radiation decreases with increasing water depth (Philpot, Reference Philpot1989), and assumes that the lake has few impurities, small particles, uniform lake bed sediment, and a relatively calm water surface with minimal wind and wave influence (Sneed and Hamilton, Reference Sneed and Hamilton2007). These assumptions are generally met in clean meltwater lakes, and therefore, the RTM has been widely used for estimating GrIS supraglacial lake depth before the advent of ICESat-2 (Sneed and Hamilton, Reference Sneed and Hamilton2007; Georgiou and others, Reference Georgiou, Shepherd, McMillan and Nienow2009; Morriss and others, Reference Morriss2013; Langley and others, Reference Langley, Leeson, Stokes and Jamieson2016). One key consideration of RTM is to select appropriate spectral bands, and numerous studies have shown that the shorter wavelengths allow for estimating deeper water depth due to their slower attenuation in the water (Moussavi and others, Reference Moussavi2016; Pope and others, Reference Pope2016). The green and red bands are most commonly used in previous studies (Table 1), and thereby, we also select these two bands. The RTM formulas are as follows:



where Z represents the lake depth, Ad is the lake bottom reflectance,
${R_\infty }$ is the deep ocean reflectance, Rw is the lake pixel reflectance, g is the spectral attenuation coefficient, Ku and Kd represent the scattering attenuation coefficients of upward and downward light, respectively, a is the absorption coefficient for water, and b is the backscattering coefficient for water.
Ad is calculated as the average reflectance value in a 30 m-wide buffer around each lake shoreline following Moussavi and others (Reference Moussavi, Pope, Halberstadt, Trusel, Cioffi and Abdalati2020), and the standard deviation value is used to represent the uncertainty of Ad. We calculate the mean pixel reflectance in the deep ocean area of the imagery to estimate
${R_\infty }$; and if no deep water is available in the imagery, the neighboring coastal imagery from the same period and orbit under similar atmospheric conditions is used, following Sneed and Hamilton (Reference Sneed and Hamilton2007). Previous studies assumed that Kd = Ku and thereby g = 2Kd (Sneed and Hamilton, Reference Sneed and Hamilton2007; Banwell and others, Reference Banwell, Caballero, Arnold, Glasser, Mac Cathles and DR2014; Pope and others, Reference Pope2016). In this study, we use agreen = 0.0619 m−1 and ared = 0.429 m−1 (Pope and Fry, Reference Pope and Fry1997), and bgreen = 0.0012 m−1 and bred = 0.0006 m−1 (Buiteveld and others, Reference Buiteveld, Hakvoort and Donze1994) to calculate Kd, resulting in ggreen = 0.1250 m−1 and gred = 0.8586 m−1. For g = mKd, although m = 2 is most commonly used, Kirk (Reference Kirk1989) suggested that m should vary between 2 and 3.5 and Melling and others (Reference Melling2024) recommended m = 2.75. We conduct a sensitivity experiment to compare the estimated depths under different m values with a step size of 0.25.
3.4. DTM
The DTM estimates lake depth by intersecting the DEM-derived depression topography with the satellite-mapped lake shoreline (Moussavi and others, Reference Moussavi2016; Pope and others, Reference Pope2016; Yang and others, Reference Yang, Smith, Fettweis, Gleason, Lu and Li2019). First, we fill the ArcticDEM data acquired during the dry period of a supraglacial lake, and subtract the data before and after filling to obtain the depression topography raster, with each pixel value representing its elevation difference (i.e. height) from the depression outer boundary, following Karlstrom and Yang (Reference Karlstrom and Yang2016). Next, we intersect the satellite-mapped lake shoreline with the depression topography to calculate the height of the lake shoreline. Ideally, the obtained shoreline height values should be equal; however, due to the accuracy limitation of the ArcticDEM data, the obtained height values vary considerably (Yang and others, Reference Yang, Smith, Fettweis, Gleason, Lu and Li2019). Therefore, their average value is used to represent the lake shoreline height and their standard deviation to represent uncertainty. The average height value is subtracted from each depression topography pixel within the lake water mask to obtain the lake depth raster.
The ArcticDEM data and Sentinel-2 imagery we used are acquired at different dates, potentially impacting depth estimation due to changes in depression topography (Ignéczi and others, Reference Ignéczi, Sole, Livingstone, Ng and Yang2018). To mitigate this problem, we select ArcticDEM data obtained in the same year as the Sentinel-2 imagery to conduct method comparison. Moreover, to analyze the impact of ArcticDEM data selection on the lake depth estimation, we conduct a sensitivity experiment using multiple ArcticDEM data obtained ± 1 year from the Sentinel-2 imagery acquisition dates.
3.5. Indicators for validating accuracy
We validate the accuracy of the depths estimated by EFM, RTM, and DTM (Dest) with the ICESat-2-derived depth (DIS2), using the following indicators: the coefficient of determination (R2), bias, root mean square error (RMSE), and relative root mean squared error (RRMSE). The validation is formed using the 30% of the dataset, which is not involved in constructing the EFM. The evaluation metrics are calculated as follows:




where
$\overline {{D_{IS2}}} $ is the mean of the ICESat-2-derived depths (DIS2), and N is number of ICESat-2-derived depths used for the comparison.
For the DTM method specifically, we additionally calculate the underestimation ratio (UR) to quantify its tendency to underestimate lake depths. The UR is defined as:

where DDTM is the depth estimated by the DTM.
4. Results
4.1. Accuracy comparison of three methods
The ICESat-2-derived lake depth profiles are used to validate the performance of the three depth estimation methods, including the green-band EFM, green-band and red-band RTMs, and DTM (Figs. 2, 3, S1; Table S2). The results show that the green-band EFM is most accurate and stable, with an average of R2 = 0.93 ± 0.06, bias = 0.00 ± 0.03 m, and RMSE = 0.48 ± 0.25 m for the 35 lake profiles. The DTM yields the second-highest accuracy, with an average of R2 = 0.75 ± 0.18 and RMSE = 0.95 ± 0.45 m, tending to underestimate lake depth (bias = −0.60 ± 0.55 m). The green-band and red-band RTMs yield the lowest and most unstable accuracy, with an average of R2 = 0. 60 ± 0.09 and 0.48 ± 0.11, and RMSE = 2.79 ± 1.10 m and 2.74 ± 1.71 m, respectively. Additionally, the green-band RTM significantly overestimates lake depth (bias = 2.62 ± 1.13 m), whereas the red-band RTM significantly underestimates (−2.01 ± 1.65 m) lake depth.

Figure 2. Depth profiles of 35 study supraglacial lakes estimated using various methods, with ICESat-2-derived lake bottoms in black and the lake bottom elevation after refraction correction in blue, depths estimated by the empirical formula method (EFM) in cyan, the green-band radiative transfer method (RTM) in green, the red-band RTM in red, and the depression topography method (DTM) in Orange. The range around each profile represents the uncertainty of the method.

Figure 3. Comparison of the empirical formula method (EFM), the green-band and red-band radiative transfer method (RTM), and the depression topography method (DTM) with average (a) R2, (b) bias, (c) RMSE, and (d) RRMSE (with the error bars showing the standard deviation) calculated for 12 shallow lakes (lakes 1–12, in green), 23 deep lakes (lakes 13–35, in red), and all 35 lakes (in Orange).
Shallow and deep lakes are analyzed separately to further compare the three depth estimation methods (Fig. 3; Table S2). The green-band EFM and DTM show no significant difference in estimating shallow and deep lake depths (Mann–Whitney U-test, p > 0.05), hence their performances are not sensitive to depth variations. However, the green-band RTM shows a significant difference (Mann–Whitney U-test, p < 0.05), performing well for deep lakes but substantially overestimating shallow lake depth (R2 = 0.66 versus 0.42; bias = 2.37 m versus 3.41 m), while the red-band RTM also shows a significant difference (Mann–Whitney U-test, p < 0.05), performing well for shallow lakes but substantially underestimating deep lake depth (R2 = 0.63 versus 0.41; bias = −0.10 m versus −3.00 m). This finding suggests that the performance of RTMs is very sensitive to lake depth variations.
4.2. Band optimization and spatiotemporal extrapolation of the EFM
OBRA is used to determine the optimal band combination for EFM, and the results show that the single green-band EFM performs best across all 35 lakes (R2 = 0.92 ± 0.05 and RMSE = 0.48 ± 0.25 m) (Fig. 4). Dual blue-green bands, recommended by Moussavi and others (Reference Moussavi2016), perform very similarly to the single green band (R2 = 0.91 ± 0.08 and RMSE = 0.54 ± 0.37 m) (Figs. 4a, b). In contrast, the red-band and dual red-other bands performs well for shallow lakes (Figs. 4c, d) but least accurate for deep lakes (Figs. 4e, f). This finding is consistent with previous results showing that the red-band EFM is only suitable for estimating shallow lake depth (max depth < 5 m), while the green-band and dual green-blue bands are suitable for deep lakes (max depth > 5 m) (Moussavi and others, Reference Moussavi2016).

Figure 4. The R2 and RMSE of the empirical formula method (EFM) calculated for each paired combination of blue, green, and red bands. (a) and (b) are for all 35 study supraglacial lakes, (c) and (d) for 12 shallow lakes, and (e) and (f) for 23 deep lakes.
The reflectance-depth empirical formulas obtained from different lakes are considerably different (Fig. 5a; Table S2). Additionally, the overall empirical formula performs significantly poorer than the empirical formula constructed for each lake (R2 = 0.66 ± 0.24 versus 0.93 ± 0.06, RMSE = 1.46 ± 0.92 versus 0.48 ± 0.25 m, bias = 0.15 ± 1.45 versus 0.00 ± 0.03 m) (Fig. 5b). These findings indicate that the empirical formulas derived by the EFM vary considerably over space and its spatial extrapolation ability is limited.

Figure 5. (a) Relationship between green band reflectance and ICESat-2-derived depth for the training data of 35 study lakes. The red solid line represents the overall fitted empirical formula, and the black dashed lines represent the 35 fitted empirical formulas. (b) Comparison of the overall EFM estimated lake depths and the ICESat-2-derived lake depths for the validation data of 35 study lakes.
Among our 35 lakes, there are six lakes covered by two different ICESat-2 tracks on different days (Fig. 6a). When validating with the independent late-day ICESat-2 tracks, the accuracy of the EFM decreases but remains higher than the RTM and the DTM (Figs. 6d–g). The EFM performs well within the depth range of the data used for construction, with estimated depths closely matching the ICESat-2-derived depths. In contrast, when extrapolated beyond this range, the estimated depths exhibit significant bias (Figs. 6b, c). This finding indicates that the EFM constructed for a specific lake captures some useful information about snow/ice surfaces beneath and near the lake (Fig. 6a), thereby allowing for the temporal extrapolation.

Figure 6. Temporal validation of the EFM using ICESat-2 tracks on different days. (a) Sentinel-2 images (R: NIR band, G: red band, B: green band) of six supraglacial lakes with two different ICESat-2 tracks on different days. The green and red lines represent the ICESat-2 tracks for early and late days, respectively. (b) Relationship between early-day green band reflectance and ICESat-2-derived depth for the six study lakes, and the dashed lines indicate the six fitted empirical formulas. (c) Comparison of late-day ICESat-2-derived depth and late-day lake depth estimated by applying the six early-day empirical formulas (i.e. Late-day EFM). (d) bias, (e) R2, (f) RMSE, and (g) RRMSE (with the error bar showing the standard deviation) are calculated for the original EFM, RTM, DTM, and late-day EFM.
4.3. Sensitivity analysis of the attenuation coefficient in the RTM
The impacts of different spectral attenuation coefficient (g = mKd) values on the performance of the RTM are evaluated (Fig. 7). The results show that the green-band RTM is more sensitive to m values than the red-band RTM, consistent with findings of Moussavi and others (Reference Moussavi2016) and Melling and others (Reference Melling2024). Additionally, the commonly used m = 2 for pure water induces the green-band RTM to significantly overestimate both shallow and deep lake depths, and induces the red-band RTM to significantly underestimate deep lake depth. Increasing m values can improve the green-band’s performance for estimating both shallow and deep lake depth, with m = 3 minimizing the average bias for the green band RTM in deep (max depth > 5 m) lakes (Fig. 7a). However, increasing m values declines red-band’s performance for estimating both shallow and deep lake depth, with m = 2 minimizing the average bias for the red band RTM in shallow (max depth < 5 m) lakes (Fig. 7b). Moreover, the Ad value calculated using the average reflectance of pixels around the lake shoreline also affects the depth estimates. Pixels around the lake shoreline with notable reflectance differences induce a large standard deviation of Ad value and consequently high depth uncertainty, as shown by the large depth range of Lake 20 and Lake 24 (Fig. 2).

Figure 7. Sensitivity analysis of the m values in (a) green-band and (b) red-band radiative transfer methods (RTMs) based on 35 supraglacial lakes (max depth ∼ 2.3–12.3 m, depth range ∼ 10 m).
4.4. Uncertainty analysis of the multi-period arcticdem DTM
Multi-period ArcticDEM data is used to analyze the impact of depression topography changes on the performance of the DTM (Fig. 8). Although the ArcticDEM-derived height variations (represented as standard deviations, ∼0.2–1.6 m) of the lake shoreline pixels induce considerably large uncertainties in depth estimates of multi-period ArcticDEM DTMs (pale orange ranges in Fig. 2), their accuracy differences calculated from average height values of the lake shoreline pixels (orange lines in Fig. 2) remain smaller than their differences with other comparison methods (Fig. 2). For the 18 lakes with multi-period (2-5 times) strips of 2 m ArcticDEM data available (Fig. 8a), multi-period ArcticDEM DTMs are more accurate than the green-band and red-band RTMs, and particularly, for Lake 11 and 31, multi-period ArcticDEM DTM is even comparable to the best-performing green-band EFM (Figs. 8b, c). This finding suggests that the temporal changes of depression topography have limited impacts on the performance of the DTM, whereas the uncertainties caused by lake shoreline height estimates should be considered when conducting DTM.

Figure 8. (a) Comparison of depth profiles estimated by depression topography method (DTM) using multi-period arcticdems, with the maximum uncertainty range (in orange) induced by the height differences of lake shoreline pixels, and (b) and (c) show the corresponding depth bias and RMSE for multi-period DTM (the error bars show the standard deviation) and other comparison methods.
5. Discussion
5.1. Applicability of the EFM
Our comparison shows that the EFM performs best in estimating supraglacial lake depth on the GrIS (Fig. 3; Table S2). This is predictable because it relies on synchronous ICESat-2 data to build empirical formulas, whereas RTM and DTM do not. However, the reflectance-depth empirical formulas obtained from different lakes are considerably different (Fig. 5a; Table S2), and the overall empirical formula performs significantly poorer than the empirical formula constructed for each lake (Fig. 5b), indicating that the EFM is challenging for spatial extrapolation. In previous studies, only one global empirical formula was used to estimate supraglacial lake depth over long periods (Box and Ski, Reference Box and Ski2007; Fitzpatrick and others, Reference Fitzpatrick2014) and large study areas (Fitzpatrick and others, Reference Fitzpatrick2014; Zhang and others, Reference Zhang2023). We suggest that this is problematic and recommend using EFM in small areas within the depth range of the training data used to construct the EFM.
The spatial extrapolation of the EFM is mainly limited by variable atmospheric conditions (Zhou and others, Reference Zhou, Liang, Xiao, Li, Zheng and Cheng2025), illumination angles (Wang and Zender, Reference Wang and Zender2010), and snow/ice surfaces beneath and near lakes (Schröder and others, Reference Schröder, Neckel, Zindler and Humbert2020). We suggest two approaches to mitigate these limitations: first, relative radiometric calibration (Carbonneau and others, Reference Carbonneau, Lane and Bergeron2006; Pahlevan and others, Reference Pahlevan, Lee, Wei, Schaaf, Schott and Berk2014) can be used to eliminate spectral inconsistence caused by different atmospheric conditions and illumination angles. Second, ice surface can be classified into different representative zones (e.g. bare ice zone, dry snow zone, and wet snow zone) based on satellite albedo products (Nolin and Payne, Reference Nolin and Payne2007; Shimada and others, Reference Shimada, Takeuchi and Aoki2016), and subsequently variable reflectance-depth empirical formulas can be built for different zones.
5.2. Applicability of the RTM
The performance of the green-band and red-band RTMs for estimating shallow and deep lake depths is demonstrated, respectively. The red band with longer wavelength attenuates more rapidly in water, resulting in higher accuracy for estimating shallow (max depth < 5 m) lake depth (owing to more distinguishable reflectance variations) but significantly underestimating deep lake depth (owing to stronger attenuation of deep water), whereas the green band with shorter wavelength performs better in estimating deep (max depth > 5 m) lake depth but significantly overestimating shallow lake depth (Figs. 2, 3; Table S2). These results are consistent with previous findings (Moussavi and others, Reference Moussavi2016; Pope and others, Reference Pope2016; Melling and others, Reference Melling2024) but are more illustrative because lakes are explicitly classified as shallow and deep lakes and the corresponding quantitative over-/under- estimations are provided.
The variability of the spectral attenuation coefficient (g) limits the spatiotemporal extrapolation of the RTM. Melling and others (Reference Melling2024) recommended m = 2.75 (derived empirically as the average of the commonly used 2–3.5Kd range), whereas we recommend a different value, g = 3Kd for green band RTM in deep lakes and g = 2Kd for red band RTM in shallow lakes (Fig. 7). Spatiotemporal variation of g value (Kirk, Reference Kirk1989; Melling and others, Reference Melling2024) may be induced by the non-negligible impurities (e.g. ice algae, snow algae, and light-absorbing impurities) at the bottom of the supraglacial lakes (Wang and others, Reference Wang, Tedesco, Xu and Alexander2018; Tedstone and others, Reference Tedstone2020; Halbach and others, Reference Halbach2022), causing higher attenuation of the radiation. To handle the variability of g, an empirical reflectance threshold can be determined based on the red band to classify lake areas into shallow and deep areas. For example, based on our 35 study lakes, this threshold is ∼0.08 (the average red band reflectance of points deeper than 5 m). As such, in shallow lake areas (red band reflectance > 0.08), the red band RTM with g = 2Kd is recommended, while in deep lake areas (<0.08), the green band RTM with g = 3Kd is recommended.
Like EFM, in previous studies, the green-band and red-band RTMs are widely used to estimate supraglacial lake depth over long time periods (Georgiou and others, Reference Georgiou, Shepherd, McMillan and Nienow2009; Langley and others, Reference Langley, Leeson, Stokes and Jamieson2016) and large study areas (Banwell and others, Reference Banwell, Caballero, Arnold, Glasser, Mac Cathles and DR2014; Williamson and others, Reference Williamson, Arnold, Banwell and Willis2017). We urge caution for using RTMs in this way because large spatiotemporal variations of GrIS supraglacial lake depths (McMillan and others, Reference McMillan, Nienow, Shepherd, Benham and Sole2007; Sundal and others, Reference Sundal, Shepherd, Nienow, Hanna, Palmer and Huybrechts2009; Zhang and others, Reference Zhang2023) can greatly hinder the appropriate use of either the green-band or red-band RTM. Combining the RTM depth estimates from different bands may mitigate this problem, as suggested by Pope and others (Reference Pope2016), which used the arithmetic average of the red and the panchromatic band RTMs. However, we suggest that the improvement of band combinations is limited. If we combine green and red band RTMs, the overall RMSE values only slightly increase from 2.74–2.79 m to 1.86 m, whereas the improvement is not significant (Fig. S2). Alternatively, field-measured or ICESat-2-derived lake depth can be used to calibrate RTM’s parameters, as per Moussavi and others (Reference Moussavi2016), but this will make RTM dependent on depth sample data as the EFM, thereby reducing the universality of the RTM. Instead, we suggest building a prior knowledge base of GrIS supraglacial lake depths based on the lake area, which positively correlates with lake depth (Sundal and others, Reference Sundal, Shepherd, Nienow, Hanna, Palmer and Huybrechts2009; Williamson and others, Reference Williamson, Banwell, Willis and Arnold2018a), or using the depression area/depth (Yang and others, Reference Yang, Smith, Fettweis, Gleason, Lu and Li2019), which can directly reflect the possible maximum lake depth, and then selecting the green-band or red-band RTMs accordingly to calculate the depth of deep and shallow lakes, respectively.
5.3. Applicability of the DTM
In our comparison, the DTM yields the second-highest accuracy, indicating that it is a practical approach for estimating supraglacial lake depth, consistent with previous studies (Moussavi and others, Reference Moussavi2016; Pope and others, Reference Pope2016; Yang and others, Reference Yang, Smith, Fettweis, Gleason, Lu and Li2019). However, we find the DTM systematically underestimates lake depth (Fig. 3; Table S2). For our 35 study lakes, the UR is 14 ± 10%. This underestimation may be induced during the ArcticDEM product generation. The weak ice/snow texture in stereo satellite imagery may induce insufficient matching points and thereby considerable elevation bias, and the georeferenced matches between ArcticDEM and optical satellite imagery may also contain bias (Noh and Howat, Reference Noh and Howat2015, Reference Noh and Howat2017; Dai and others, Reference Dai2024). These two factors may explain the height uncertainties (0.2–1.6 m) of the lake shoreline (Fig. 2). Moreover, the denoising process of ArcticDEM generation (Noh and Howat, Reference Noh and Howat2017, Reference Noh and Howat2019) may over-smooth the bottoms of topographic depressions, thereby systematically underestimating lake depths (Fig. 3). Furthermore, numerous supraglacial lakes do not exhibit dry-period 2 m ArcticDEM strips to date, although the future release of new ArcticDEM data may partially address this limitation.
Therefore, we currently need to rely on the simple UR for the better use of the DTM. We acknowledge that the possible transferability of this ratio over space and time is limited, so we recommend using it only for quantifying the DTM uncertainty (e.g. 1.14*DDTM) rather than rectifying the original DTM estimates. For example, Yang and others (Reference Yang, Smith, Fettweis, Gleason, Lu and Li2019) used DTM to estimate the depth and volume of two supraglacial lakes and discovered that the regional climate models overestimated the meltwater runoff on the GrIS by 106–123% (40–55% after early July) in the melt season, while the overestimation decreases to 80–95% (23–36% after early July) if applying our UR. In the future, DTM can be combined with the ICESat-2 data to create a dataset of dry depression topography on the GrIS and determine the corresponding URs for DTM-based depth estimation.
5.4. Impact of optical satellite resolutions on depth estimation
The optical satellite imagery resolution also influences the performance of EFM, RTM, and DTM. One 10 m resolution Sentinel-2 pixel covers ∼14–20 ICESat-2 ATL03 points (∼0.7 m along-track resolution), with the specific number depending on the interaction angle between Sentinel-2 pixels and ICESat-2 tracks. As such, the obtained reflectance-depth relationship may struggle in capturing small depth variations. Very high resolution (VHR) multispectral imagery (e.g. 2.44 m QuickBird, 1.64 m GeoEye-1, and 1.24–2.40 m WorldView-2/3/4) is potential for building more detailed and accurate reflectance-depth relationships. Moreover, for the DTM, VHR imagery may delineate more precise lake shorelines and thereby significantly reduce lake shoreline uncertainties. However, narrow swath (∼10–20 km) VHR imagery often does not cover ocean regions, and thereby faces challenges to estimate deep ocean reflectance required for the RTM. In contrast, coarse resolution, wide swath imagery (e.g. 250 m MODIS, 500 m VIIRS, and 300 m Sentinel-3) may introduce large lake depth estimation uncertainty due to mixed pixel effects and imprecise shorelines.
6. Conclusion
In this study, we compare the performance of three remote sensing methods, namely the EFM, RTM, and DTM, to estimate supraglacial lake depth on the GrIS, with the ICESat-2-derived depth as validation data. Based on the comparison results, we recommend optimizing the use of these three methods. When field-measured depth or near-simultaneous ICESat-2 data are available, the green-band EFM is preferred, while we urge caution for the instability of empirical formulas and recommend using the EFM in small areas within the depth range of the training data used to construct the EFM. If no lake depth sample data are available, the DTM is recommended, yet an UR should be considered to quantify the DTM uncertainties. For the RTM, we recommend setting the spectral attenuation coefficient g = 3Kd for the green band in deep lakes and g = 2Kd for the red band in shallow lakes, and caution against the large-area use of the green-band or red-band RTMs. Viewing collectively, our method comparison may contribute to improving our understanding of GrIS surface meltwater storage and hydrological processes.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/jog.2025.23
Acknowledgements
We acknowledge support from the National Key R&D Program (2022YFB3903601) and the National Natural Science Foundation of China (42271320). ArcticDEMs and geospatial support were provided by the Polar Geospatial Center under NSF-OPP awards (1043681, 1559691, 1542736, 1810976 and 2129685).