Google Earth Engine variables
Google Earth Engine Data from eDNA Explorer
We augment each sampling location and date in a project with earth metadata sourced from Google Earth Engine by using a tool we developed called Terradactyl. For a given coordinate in a project, Terradactyl will aggregate environmental data within a designated buffer area surrounding each sample location. The radius of this buffer is dynamically sized based on the GPS uncertainty associated with a given coordinate for project metadata, or if this is missing it will default to 30 meters. Most datasets use temporal windows that look back from the sample date (e.g., 30 years, 10 years, or 1 year) to capture both long-term baselines and recent conditions.
Overview
The data we provide is organized into the following categories:
-
Climate Related Data
-
Land Cover, Vegetation & Biogeography
-
Marine & Oceanographic Data
-
Geospatial & Terrain Data
-
Soil Properties
-
Human Impact & Infrastructure
-
Hydrology
-
Air Quality
-
Forest Structure
Climate Related Data
TerraClimate
A comprehensive dataset of monthly climate and climatic water balance for global terrestrial surfaces. This data, at a 4-kilometer resolution, includes measures of temperature, precipitation, soil moisture, evapotranspiration, and drought indices. We provide data from both 30-year and 10-year trailing windows to capture long-term climate normals and recent trends. Variables include:
-
Standard climate variables: actual evapotranspiration, climate water deficit, Palmer Drought Severity Index, precipitation, runoff, soil moisture, solar radiation, snow water equivalent, min/max temperature, vapor pressure, wind speed
-
19 standard BioClim variables derived from temperature and precipitation patterns
-
Climate variability metrics: precipitation annual coefficient of variation, driest year precipitation, hottest year mean temperature, severe drought frequency
-
Ecological variables: growing degree days (5°C and 10°C baselines), growing season length, aridity index, soil moisture seasonality
MODIS Land Surface Temperature
Daily maximum and minimum land surface temperatures at 1km resolution, sourced from the MODIS sensor aboard the NASA Terra satellite. We provide annual statistics (mean, maximum, minimum, and diurnal range) from 3-year and 1-year trailing windows. Quality flags are used to ensure only high-quality observations are included.
Landsat 8 Surface Temperature
High-resolution (30-meter) thermal measurements from the USGS/NASA Landsat 8 satellite, providing peak season surface temperature from cloud-masked composites. Peak season is determined based on sample location: June-August for the Northern Hemisphere, January-March for the Southern Hemisphere.
CHIRPS Precipitation
Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) is a 35+ year quasi-global rainfall dataset. Spanning 50°S-50°N (and all longitudes), and starting in 1981, CHIRPS incorporates 0.05° resolution satellite imagery with in-situ station data to create gridded rainfall time series. We provide metrics on heavy precipitation frequency and rainfall intensity from 30-year and 10-year trailing windows.
GPM IMERG Precipitation
NASA's Global Precipitation Measurement (GPM) mission provides high-resolution (~11km) precipitation data at 30-minute intervals. We aggregate this to provide total annual precipitation, mean precipitation rate, and liquid precipitation probability from 1-year, 10-year, and 30-year trailing windows. This complements CHIRPS with higher temporal resolution and global coverage.
ERA5-Land Climate Reanalysis
ECMWF's ERA5-Land provides advanced climate variables at ~11km resolution. We extract soil temperature at multiple depths (0-7cm, 7-28cm), volumetric soil moisture, Leaf Area Index separated by vegetation type (low/high), energy fluxes (net solar radiation, evaporation, transpiration), and surface runoff. Data is provided from 10-year and 30-year trailing windows.
Land Cover, Vegetation & Biogeography
Land Cover & Land Use
Copernicus Global Land Cover
A global land cover dataset at 100-meter resolution. It provides both discrete land cover classifications and fractional cover for categories such as trees, shrubs, herbaceous vegetation, cropland, urban areas, bare ground, water bodies (permanent and seasonal), moss/lichen, and snow/ice. Data represents annual composites from 2015-2019.
ESA WorldCover
The ESA WorldCover map provides a global land cover map at 10m resolution for 2020. It includes 10 land cover classes: tree cover, shrubland, grassland, cropland, built-up, bare/sparse vegetation, snow and ice, permanent water bodies, herbaceous wetland, mangroves, and moss and lichen.
Dynamic World
Dynamic World is a 10m resolution, near real-time land use/land cover dataset that includes nine land cover classes. Each pixel includes probability values for all nine classes, allowing for nuanced analysis of mixed land cover types. We use 180-day composites to capture recent land cover conditions and apply confidence thresholding to ensure quality.
Dynamic World Land Cover Change
This dataset detects recent, drastic changes in land cover by comparing the dominant land cover class from the two full calendar years prior to a sample's collection date. The output is a fractional value (0-1) representing the proportion of change within the sample area at 10m resolution. This enables detection of significant land use transformations such as deforestation, urbanization, or agricultural expansion.
Hansen Global Forest Change
The Hansen Global Forest Change dataset provides cumulative forest cover change from 2000-2024 at 30-meter resolution using Landsat imagery. We extract six forest change metrics: Direct Measurements:
-
Forest cover in year 2000 (0-100% tree canopy)
-
Binary forest loss indicator (any loss 2001-2024)
-
Year of forest loss (2001-2024) Derived Metrics:
-
Recent forest loss (2020-2024, last 5 years)
-
Cumulative forest loss through 2020
-
Forest loss rate (percentage of year 2000 forest lost) This static dataset represents the complete forest change record over the 24-year period and is widely used for deforestation monitoring and forest conservation assessment.
Vegetation & Productivity
Sentinel-2 Vegetation Analysis
Vegetation health metrics derived from ESA Sentinel-2 satellite imagery. This includes 10 spectral bands and five key vegetation indices:
-
NDVI (Normalized Difference Vegetation Index): Overall vegetation greenness
-
EVI (Enhanced Vegetation Index): Improved sensitivity in high biomass areas
-
SAVI (Soil-Adjusted Vegetation Index): Reduces soil brightness influence
-
NDMI (Normalized Difference Moisture Index): Vegetation water content
-
NBR (Normalized Burn Ratio): Fire severity and recovery Data is based on 180-day cloud-masked median composites. Resolution varies from 10-60 meters depending on the spectral band. Cloud cover filtering and quality masking ensure only high-quality observations are used.
EMIT Hyperspectral Imaging
NASA's Earth Surface Mineral Dust Source Investigation (EMIT) provides hyperspectral data from the International Space Station. With 285 spectral bands, EMIT enables unique measurements of vegetation stress, plant chemistry, and moisture content not possible with broader spectral sensors. We extract narrow-band reflectances, vegetation indices, moisture indices, and cellulose absorption features from 1-year median composites. Coverage is limited to 51.6°S-51.6°N due to ISS orbit.
MODIS Vegetation Indices
MODIS provides long-term vegetation monitoring at 500m resolution. We extract NDVI, EVI, and phenology metrics (growing season start/end, peak greenness) from 1-year, 3-year, and 10-year trailing windows. Quality flags ensure only good and marginal quality data is included.
Potential Fraction of Absorbed Photosynthetically Active Radiation (FAPAR)
Potential Natural Vegetation FAPAR predicted monthly median (based on PROBA-V FAPAR 2014-2017). This represents the theoretical maximum photosynthetic capacity for a location under natural vegetation conditions.
Biogeography
Potential Distribution of Biomes
Potential Natural Vegetation (PNV) biomes represent global predictions of vegetation types based on the BIOMES 6000 dataset's current biomes category. This dataset shows the vegetation cover in equilibrium with climate that would exist at a given location if not impacted by human activities. We provide 18 biome classes including various forest types (tropical, temperate, cold), woodlands, savannas, scrublands, steppes, deserts, and tundra variants.
Ecoregions and Realms
The RESOLVE Ecoregions dataset, updated in 2017, offers a depiction of the 846 terrestrial ecoregions that represent our living planet. We extract both the ecoregion name and biogeographic realm for each sample location based on spatial overlap analysis.
Marine & Oceanographic Data
Note: Marine datasets are automatically applied only to samples detected in water environments using ESA WorldCover land cover classification. Water detection uses a 500-meter buffer around each sample point.
Ocean Physics
NOAA OISST Sea Surface Temperature
The NOAA Optimally Interpolated Sea Surface Temperature dataset provides daily SST at ~28km resolution (0.25 degrees). We extract mean annual SST, SST annual range, and mean sea ice concentration from 1-year, 10-year, and 30-year trailing windows. Temperature values are masked where ice concentration exceeds 50%.
HYCOM: Sea Temperature and Salinity
The Hybrid Coordinate Ocean Model (HYCOM) is a global ocean model providing daily data on sea surface temperature and salinity at ~10km resolution. We process this to monthly averages, then calculate annual means and seasonal ranges for both temperature (°C) and salinity (PSU - Practical Salinity Units). Data is available from 10-year and 1-year trailing windows. Note: 30-year windows are not available as HYCOM data only begins in 1992.
Marine Biology & Human Impact
Ocean Color SMI: MODIS Aqua Data
This dataset provides monthly global data on ocean properties at 4km resolution, specifically chlorophyll-a (a proxy for phytoplankton biomass) and Particulate Organic Carbon. We calculate annual means of both variables. Note: Due to the MODIS-Aqua mission ending in February 2022, all samples use a fixed 2021 baseline year regardless of sample date.
GFW (Global Fishing Watch) Daily Fishing Hours
This dataset describes fishing effort at 1km resolution, measured in hours of inferred fishing activity, broken down by gear type (drifting longlines, fixed gear, purse seines, squid jigger, trawlers, and other fishing). We calculate mean annual fishing hours for each gear type. Note: Due to data availability, all samples use a fixed 2016 baseline year regardless of sample date.
Geospatial & Terrain Data
Terrain
SRTM Digital Elevation Model
Elevation data is sourced from the Shuttle Radar Topography Mission (SRTM) at a resolution of 30 meters. We provide four terrain metrics:
-
Elevation (meters above sea level)
-
Slope (degrees)
-
Aspect (degrees from North, 0-360)
-
Topographic Heterogeneity (standard deviation of elevation in 5×5 pixel neighborhood)
Global SRTM Landforms
The SRTM Landform dataset provides landform classes created by combining the Continuous Heat-Insolation Load Index (SRTM CHILI) and the multi-scale Topographic Position Index (SRTM mTPI) datasets. We provide 15 landform classes including peaks, ridges, mountains, cliffs, upper and lower slopes (with thermal variants), and valleys.
Soil Properties
OpenLandMap Soil Data
Soil data is available at 250 meter resolution from various OpenLandMap datasets. We provide data on multiple soil properties at two depth aggregations: Depth Aggregations:
-
0-30cm weighted average: Represents the primary root zone using a weighted average of three depth layers (0-5cm, 5-15cm, 15-30cm)
-
0-5cm surface layer: Represents surface conditions Continuous Variables (both depth aggregations provided):
-
Soil pH in H₂O
-
Organic carbon content (g/kg)
-
Sand percentage
-
Clay percentage
-
Bulk density (cg/cm³)
-
Water content at field capacity (volumetric %) Categorical Variables (surface layer only):
-
USDA Soil Taxonomy Great Group (200+ classification groups)
-
USDA Soil Texture Class (12 classes: various clays, loams, silts, and sands)
SMAP Soil Moisture
NASA's Soil Moisture Active Passive (SMAP) mission provides high-frequency soil moisture observations at 9km resolution. We extract both surface (0-5cm) and root zone (0-100cm) soil moisture from 3-hourly observations, aggregating to weekly averages before calculating long-term statistics. Variables include mean and minimum soil moisture for both depth ranges, surface wetness index, root zone moisture percentile, and land surface temperature. Data is available from 1-year, 3-year, and 10-year trailing windows.
Human Impact & Infrastructure
Human Modification & Development
Human Influence Index
Data for the CSP Global Human Modification (gHM) provides a cumulative measure of human modification of terrestrial lands globally at 1 square-kilometer resolution. This index integrates 13 anthropogenic stressors to quantify the degree of human alteration. Data represents a 2017 baseline.
Population Density
Modeled population totals from WorldPop for the year 2020 at 100-meter resolution. Population density is provided as people per hectare.
Light Pollution
Monthly average radiance composite images using nighttime data from NASA's Black Marble product (VIIRS Day/Night Band). We separate light sources into two categories based on quality flags:
-
Persistent lights: Stable infrastructure lighting (urban areas, roads)
-
Ephemeral lights: Temporary lighting (fires, boats, short-term activities) For each category, we provide mean and maximum annual nighttime radiance (nW/cm²/sr) from 1-year and 3-year trailing windows at 500-meter resolution.
Conservation & Protected Areas
World Database on Protected Areas
The World Database on Protected Areas (WDPA) is the most up-to-date and complete source of information on protected areas, updated monthly. It is managed by the United Nations Environment Programme's World Conservation Monitoring Centre (UNEP-WCMC). We extract the protected area designation name and IUCN management category for samples falling within or near protected areas.
Hydrology
HydroSHEDS Watersheds
HydroSHEDS is a mapping product that provides hydrographic information for regional and global-scale applications in a consistent format. We extract multiple watershed properties from the level 9 basin boundaries:
-
Unique watershed identifier and Pfafstetter code
-
Upstream drainage area (km²)
-
Individual sub-basin area (km²)
-
Distance to main outlet (km)
-
Endorheic basin indicator (not endorheic, part of, or sink)
-
Coastal basin indicator
-
River order classification (main stem, tributary order)
JRC Global Surface Water
The Joint Research Centre's Global Surface Water dataset provides comprehensive long-term water dynamics from 37 years of Landsat observations (1984-2021) at 30-meter resolution. We extract seven water metrics:
-
Water occurrence: Percentage of time water was present (0-100%)
-
Maximum water extent: Binary indicator of maximum observed water extent
-
Water change: Absolute and normalized change in water occurrence over the study period
-
Water seasonality: Number of months water is typically present (0-12)
-
Water recurrence: Percentage of time water returns after being present
-
Water transition class: Classification of water presence patterns
GPM IMERG Precipitation
See the Climate Related Data section above for details on precipitation monitoring.
Air Quality
Sentinel-5P Air Quality
The Copernicus Sentinel-5P satellite provides atmospheric composition measurements at ~1.1km resolution. We extract seven air quality indicators from 1-year and 3-year trailing windows to characterize air pollution levels and atmospheric chemistry. Pollutants Monitored:
-
Nitrogen Dioxide (NO₂): Vehicle emissions and industrial combustion indicator
-
Carbon Monoxide (CO): Incomplete combustion and biomass burning tracer
-
Methane (CH₄): Natural and anthropogenic greenhouse gas emissions
-
Sulfur Dioxide (SO₂): Volcanic emissions and industrial processes indicator
-
Ozone (O₃): Total column ozone (stratospheric and tropospheric)
-
Formaldehyde (HCHO): Indicator of biogenic and anthropogenic VOC oxidation
-
Absorbing Aerosol Index: Detects absorbing aerosols (dust, smoke, volcanic ash) Quality flags and cloud filtering are applied to ensure data reliability.
Forest Structure
GEDI Canopy Height and Vertical Structure
NASA's Global Ecosystem Dynamics Investigation (GEDI) uses spaceborne lidar technology from the International Space Station to measure forest vertical structure at 25-meter resolution. GEDI's laser pulses penetrate forest canopies to create detailed three-dimensional profiles of vegetation. We extract seven forest structure metrics from 1-year and 3-year trailing windows. Coverage: Limited to 51.6°S-51.6°N latitude due to ISS orbital inclination. Metrics Extracted:
-
Canopy Top Height (mean and maximum): Represents the dominant canopy layer, useful for biomass estimation
-
Canopy Median Height: Middle of the vertical canopy profile
-
Lower Canopy Height: Characterizes understory and lower canopy layers
-
Upper Canopy Height: Characterizes upper canopy layer
-
Canopy Cover Height: Vertical distance from ground to canopy top, indicating forest maturity
-
Tree Cover Percentage: Landsat-derived tree cover for context These metrics support forest biomass estimation, carbon accounting, deforestation monitoring, and habitat quality assessment. Quality filtering ensures only high-quality lidar shots are used.
Data Processing & Quality Control
Temporal Processing
Terradactyl uses multiple temporal strategies to provide the most relevant environmental context for each sample: Long-term Baselines: For stable climate characterization, datasets like TerraClimate, CHIRPS, and ERA5-Land are processed using 30-year trailing windows from the sample date to establish long-term climate normals. Decadal Trends: To capture medium-term trends and variability, many datasets provide 10-year trailing windows that balance temporal coverage with recent conditions. Recent Conditions: To capture the current environmental state, key datasets also provide shorter 1 to 3-year trailing windows that reflect recent observations. Seasonal Analysis: For certain variables, 180-day composites are used to analyze growing season patterns and reduce the impact of seasonal variability. Peak Season Analysis: Landsat 8 surface temperature specifically targets peak growing season (June-August in Northern Hemisphere, January-March in Southern Hemisphere) based on sample location. Fixed Temporal Baselines: Some datasets use fixed reference periods due to data availability:
-
Ocean Color (MODIS-Aqua): Fixed 2021 baseline (mission ended February 2022)
-
Global Fishing Watch: Fixed 2016 baseline (data collection ended December 2016)
Spatial Processing
Dynamic Sampling Areas: The analysis buffer around each sample point is dynamically sized based on two factors:
-
GPS Uncertainty: When provided, the buffer radius matches the GPS uncertainty associated with the sample coordinates. This ensures the sampling area reflects the actual location precision.
-
Data Resolution Match: If the GPS uncertainty is smaller than the dataset's native resolution, we use the sample centroid only to avoid over-sampling the same pixel. If GPS uncertainty is larger, we apply a buffer to capture spatial variability.
-
Default Buffer: When GPS uncertainty is not provided, a default 30-meter radius is used. Multi-Resolution Integration: Data is integrated from a wide range of sources with native resolutions from:
-
High resolution (10-30m): ESA WorldCover, Dynamic World, Sentinel-2, Landsat 8, SRTM terrain, JRC Surface Water
-
Medium resolution (100-1000m): Copernicus Land Cover, WorldPop, VIIRS nighttime lights, MODIS, Global Fishing Watch
-
Coarse resolution (4-28km): TerraClimate, CHIRPS, Ocean Color, HYCOM, SMAP, ERA5-Land, GPM, OISST
Quality Control & Data Limitations
Cloud & Quality Filtering: Optical satellite imagery is carefully filtered to ensure data quality:
-
Sentinel-2: Pre-filtered to scenes with <80% cloud cover, then Scene Classification Layer masking removes clouds, shadows, and cirrus
-
Landsat 8: Pixel-level quality flag masking removes cloud shadows and clouds
-
MODIS: Quality assessment flags retain only good and marginal quality observations
-
Dynamic World: Confidence thresholding (default 60%) removes low-confidence classifications Temporal Compositing: To reduce the impact of poor-quality observations (e.g., from atmospheric haze or shadows), median or mean composites are created from time-series imagery:
-
Sentinel-2 and Dynamic World: 180-day median composites
-
EMIT: 1-year median composites to reduce atmospheric noise
-
Landsat 8: Peak season median composites
-
Climate datasets: Monthly or annual aggregations over multi-year periods Marine Data Filtering: Datasets specific to marine environments (sea surface temperature, salinity, ocean color, fishing activity) are only processed for samples that are algorithmically determined to be located over water bodies. Water detection uses ESA WorldCover 2020 land cover classification with a 500-meter buffer around sample points. If any portion of this buffer contains permanent water (class 80), the sample is classified as marine. Data Availability Limitations: Users should be aware that some datasets have spatial or temporal limitations: Spatial Coverage:
-
CHIRPS: 50°S to 50°N latitude only (tropical and subtropical regions)
-
EMIT and GEDI: 51.6°S to 51.6°N (limited by ISS orbit) Temporal Coverage:
-
MODIS-based datasets: Begin in year 2000
-
HYCOM: Begins in 1992 (30-year baselines not available for recent samples)
-
Ocean Color: Ended February 2022 (all samples use 2021 baseline)
-
Global Fishing Watch: Ended December 2016 (all samples use 2016 baseline) Environmental Conditions:
-
Optical data: Can be sparse in persistently cloudy regions (e.g., tropical rainforests)
-
High-latitude areas: May have gaps during polar winter when optical satellites cannot acquire data
-
Ice-covered regions: Ocean temperature data masked where sea ice concentration exceeds 50% Missing Data Handling: When environmental data is unavailable for a sample location (due to clouds, sensor gaps, or geographic extent), the corresponding variables will be recorded as null/missing in the output. Samples are not excluded due to missing environmental data; rather, the available subset of environmental variables is provided for each sample.
Summary
Dataset Coverage:
-
70+ environmental datasets organized into 9 thematic categories
-
Multiple temporal windows: 180-day to 30-year lookback periods from sample date
-
Spatial resolutions: 10 meters (ESA WorldCover, Dynamic World) to 28 kilometers (OISST)
-
Global to regional coverage: Most datasets global, some with latitudinal constraints Major Dataset Categories:
-
Climate (8 datasets): TerraClimate, ERA5-Land, MODIS LST, Landsat 8, CHIRPS, GPM, CHILI
-
Land Cover (6 datasets): Copernicus, ESA WorldCover, Dynamic World, Dynamic World Change, Hansen Forest Change, Biomes, Ecoregions
-
Vegetation (5 datasets): Sentinel-2, EMIT, MODIS Vegetation, FAPAR
-
Marine (4 datasets): OISST, HYCOM, Ocean Color, Global Fishing Watch
-
Terrain & Soil (5 datasets): SRTM elevation/slope/aspect/heterogeneity, Landforms, OpenLandMap soil (14 variables), SMAP soil moisture
-
Human Impact (4 datasets): Human Modification, Population, Nighttime Lights, Protected Areas
-
Hydrology (3 datasets): HydroSHEDS watersheds, JRC Surface Water, GPM precipitation
-
Air Quality (1 dataset): Sentinel-5P (7 pollutants)
-
Forest Structure (1 dataset): GEDI (7 canopy metrics) Data Processing:
-
Temporal windows relative to sample date for trend analysis
-
Dynamic spatial buffering based on GPS uncertainty
-
Multi-resolution integration (10m to 28km)
-
Extensive quality control (cloud filtering, QA flags, confidence thresholds)
-
Marine sample auto-detection using ESA WorldCover This comprehensive environmental characterization enables robust analysis of biodiversity-environment relationships across diverse ecosystems worldwide, supporting conservation research, ecological modeling, and environmental monitoring applications.
Source document: Google Doc