Understanding alpha and beta diversity
In addition to considering the individual organisms found in eDNA, you can look at the community of organisms identified as a whole and evaluate their alpha and beta diversity. The charts below integrate alpha and beta diversity measures with geospatial variables and other metadata to give you a holistic comparison of the samples you collected and the possible environmental drivers of those differences.
Beta Diversity is a measure of community similarity. It is used to compare how similar the communities of organisms are in one sample versus another. Learn more below
Alpha Diversity is a measure of how species richness or how may different species there are in each sample. Learn more below
Identifying environmental drivers through alpha and beta diversity We help you start evaluating the impact of environmental drivers by integrating geospatial and other types of metadata into the visualizations of alpha and beta diversity below. Any metadata can be integrated into analysis by coding the samples based on the variable of interest.
For example, you can plot how elevation might impact the beta diversity of samples by color coding each sample for its elevation and then creating a chart that plots the similarity of the organisms communities that includes the color coding. This is what is displayed in the charts below.
You can either select metadata that were associated with the samples in the metadata spreadsheet such as site name or sampling depth OR earth variables that were associated with the data from Google Earth engine. The charts then display the relationship these variables have with the diversity of each sample.
View a description of the environmental variables we offer from Google Earth Engine here.
AN AI BOOST FOR SELECTING VARIABLES
To help identify the metadata variables that might be the key drivers of species diversity in the study area, we have run an AI random forest algorithm on the eDNA data and the metadata variables available to identify the environmental factors that have the strongest influence for Beta Diversity. The area above the charts displays the variables we have prioritized and lets you click on each one to view the associated visualizations. We also display a high/medium/low tag for how relevant that variable is for explaining the patterns in the data. Note that for some projects, none of the variables have high relevance – we just display the ones that have the highest relevance from the set available for consideration.
How this data analysis algorithm works
The algorithm we use for variable prioritization follows these steps:
-
Data preparation: Removes problematic columns with too many missing values or zero variance.
-
Variable categorization: Separates environmental variables into continuous (temperature, depth) and categorical (habitat type, season) types.
-
Correlation filtering: Removes highly correlated environmental variables (>80% correlation) to avoid redundancy.
-
Missing value handling: Fills gaps in environmental data using a statistical method called median imputation. For continuous environmental variables (temperature, depth, etc.), missing values or placeholder values (-999999) are replaced with the median value of that specific environmental parameter across all valid samples. Categorical variables (habitat type, season) are left unchanged.
-
Machine learning: Trains Random Forest models to predict diversity from environmental variables.
-
Feature ranking: Calculates importance scores showing which environmental factors are most predictive. Adding your own custom variables You can include your own custom variables for any projects you own by adding them to the end of the metadata spreadsheet. If you are not sure how, please don’t hesitate to reach out to hello@ednaexperor.com for help using this powerful feature.
How to understand the chart that compares differences in community composition between samples? (Beta diversity)
Beta Diversity is useful for quantifying the number of different organism communities that exist in a region and how different the organisms in each area are from each other. You can use beta diversity to understand how different species are spread out in different places -- i.e. how the composition of the community in each area differs. Imagine you have two different forests, one in the mountains and one in a valley. Beta diversity helps us understand how the types of trees, animals, and plants in these two forests compare to each other. If the mountain and the valley forest each have a lot of unique species and there is not much overlap, it means the beta diversity between them is high. On the other hand, if the mountain and valley forests have very similar species, the beta diversity is low because their biodiversity is similar.
When scientists study different communities of organisms, they often want to understand how these communities differ between locations and what might cause these differences. The charts we display help visualize these differences.
EXPLANATION OF EACH CHART DISPLAYED
Left Chart (PCoA):
-
This is a PCoA chart which is a visual representation of similarities and differences in samples
-
Each point represents a sample from a different location
-
Points that are close together have similar communities of organisms
-
Points that are far apart have very different communities
-
The large circles highlight clusters of samples that are similar to each other -- the points within the circles have similar communities of organisms
-
The axes (PCoA 1 and PCoA 2) show the main directions of variation in the data, with the percentages telling us how much of the variation each axis explains
Right Map:
-
Shows the actual geographic locations where samples were collected
-
Uses matching colors to connect the statistical patterns (left) to real locations (right)
-
Helps us see if similar communities are geographically close or spread out
-
Can reveal patterns like whether nearby locations tend to have similar communities
Chart Statistics:
These statistics tell us key information about the statistical significance of this chart:
F-Statistic The F-statistic helps you decide if a model's overall fit is real or just due to chance. It's basically a ratio comparing the variation explained by the model vs. unexplained, random variation that naturally occurs. If the F-statistic is large, it means the differences between the group averages are much bigger than the random spread within each group.
R-squared (R²)
R-squared, or R2, is a measure that tells you how well your statistical model fits the data. Think of it as a percentage that shows how much of the variation in one variable is explained by another variable or variables in your model.
How to Interpret the Value
The value of R-squared is always between 0 and 1 (or 0% and 100%).
-
R2=0: The model explains none of the variation. Knowing the independent variable(s) gives you no more information than just guessing the average value.
-
R2=1: The model explains all of the variation. All your data points fall perfectly on the regression line, meaning you can predict the dependent variable with 100% accuracy based on your model.
-
0<R2<1: The model explains a certain percentage of the variation. For example, an R2 of 0.75 (or 75%) means that 75% of the variability in the dependent variable can be explained by your model, while the remaining 25% is due to other factors or random chance. PERMANOVA Results This is a statistical test that tells us if the groups we see are significantly different from each other
-
Permutations: The number of times the test randomly shuffled the data to verify the patterns (higher numbers like 999 give more reliable results)
-
Number of groups: The number of variables included in this analysis
-
p-value: Tells us if the patterns are likely real or just random chance
-
p < 0.05 means the patterns are probably real
-
The smaller the p-value, the more confident we are in the results
-
pseudo-F: Measures how strong the grouping pattern is (higher numbers mean stronger patterns)
This type of visualization helps us:
-
Identify patterns in how organisms are distributed
-
Understand if communities are similar or different
-
Look for environmental factors that might explain these patterns
-
Make predictions about what organisms they might find in similar locations
How to understand the chart that looks at diversity within each sample (Alpha Diversity)
This chart showcases the Alpha Diversity within each of the samples relative to a selected environmental variable pulled from Google Earth Engine. Alpha diversity is a concept used in ecology and biology to describe the diversity of species within a sample. Think of alpha diversity as a way to measure how many different types of animals, plants, or other organisms are found in that one sample.
For example, if you have a sample in a spot in a forest and you find many different types of trees, birds, insects, and flowers in the sample, then that sample has high alpha diversity. But if you sample in another part of the same forest and find only see a few types of trees and a couple of animal species, then that sample has low alpha diversity. It means there's not as much variety of life in that specific spot.
Bottom line: alpha diversity helps us understand how rich or diverse the life is within a particular area.
In the cart, you will see alpha diversity graphed for different environmental variables that each sample might have. Some samples who have a variable in common may have similar numbers of species, showing that richness is stable for that environment variable. Or, they could have very different amounts of richness showing that the effect of that variable is inconsistent.
There are 4 types of alpha diversity measures to explore on this chart.: Chao1 is an estimate of richness that uses the number of DNA sequences retrieved for each organism found in a sample to estimate how many more sequences might exist that haven't been discovered yet because we didn’t sequence everything we could have. We like this statistic because we remember the sequences we get back are just a subsample of the total diversity that’s in the environment. Observed richness is the total number of unique organisms detected, regardless of if they had few sequences or many. Shannon index explains how difficult it is to predict the taxonomic identity of a randomly chosen individual DNA sequence. Higher Shannon index means there is more evenness in the community. Simpson is the probability that two randomly chosen individual DNA sequences are the same organism. Is bigger is better? Yes. For each of these measures, the higher the number, the more alpha diversity there is for that variable.
Environmental Variables
You also read a description of the environmental variables we offer for each site here.
Source document: Google Doc