Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics

Dr. Owns

February 21, 2025

[

The article was written by Guanao Yan, Ph.D. student of Statistics and Data Science at UCLA. Guanao is the first author of the Nature Communications review article [1].

Spatially resolved transcriptomics (SRT) is revolutionizing Genomics by enabling the high-throughput measurement of gene expression while preserving spatial context. Unlike single-cell RNA sequencing (scRNA-seq), which captures transcriptomes without spatial location information, SRT allows researchers to map gene expression to precise locations within a tissue, providing insights into tissue organization, cellular interactions, and spatially coordinated gene activity. The increasing volume and complexity of SRT data necessitate the development of robust statistical and computational methods, making this field highly relevant to data scientists, statisticians, and machine learning (ML) professionals. Techniques such as spatial statistics, graph-based models, and deep learning have been applied to extract meaningful biological insights from these data.

A key step in SRT analysis is the detection of spatially variable genes (SVGs)—genes whose expression varies non-randomly across spatial locations. Identifying SVGs is crucial for characterizing tissue architecture, functional gene modules, and cellular heterogeneity. However, despite the rapid development of computational methods for SVG detection, these methods vary widely in their definitions and statistical frameworks, leading to inconsistent results and challenges in interpretation.

In our recent review published in Nature Communications [1], we systematically examined 34 peer-reviewed SVG detection methods and introduced a classification framework that clarifies the biological significance of different SVG types. This article provides an overview of our findings, focusing on the three major categories of SVGs and the statistical principles underlying their detection.

SVG detection methods aim to uncover genes whose spatial expression reflects biological patterns rather than technical noise. Based on our review of 34 peer-reviewed methods, we categorize SVGs into three groups: Overall SVGs, Cell-Type-Specific SVGs, and Spatial-Domain-Marker SVGs (Figure 2).

Image created by the authors, adapted from [1]. Publication timeline of 34 SVG detection methods. Colors represent three SVG categories: overall SVGs (green), cell-type-specific SVGs (red), and spatial-domain-marker SVGs (purple).

Methods for detecting the three SVG categories serve different purposes (Fig. 3). First, the detection of overall SVGs screens informative genes for downstream analyses, including the identification of spatial domains and functional gene modules. Second, detecting cell-type-specific SVGs aims to reveal spatial variation within a cell type and help identify distinct cell subpopulations or states within cell types. Third, spatial-domain-marker SVG detection is used to find marker genes to annotate and interpret spatial domains already detected. These markers help understand the molecular mechanisms underlying spatial domains and assist in annotating tissue layers in other datasets.

Image created by the authors, adapted from [1]. Conceptual visualization of three SVG categories: overall SVGs, cell-type-specific SVGs, and spatial-domain-marker SVGs. The left column shows a tissue slice with two cell types and three spatial domains. The right column shows exemplar genes with colors representing the expression levels shown for an overall SVG, a cell-type-specific SVG, and a spatial-domain-marker SVG, respectively.

The relationship among the three SVG categories depends on the detection methods, particularly the null and alternative hypotheses they employ. If an overall SVG detection method uses the null hypothesis that a non-SVG’s expression is independent of spatial location and the alternative hypothesis that any deviation from this independence indicates an SVG, then its SVGs should theoretically include both cell-type-specific SVGs and spatial-domain-marker SVGs. For example, DESpace [2] is a method that detects both overall SVGs and spatial-domain-marker SVGs, and its detected overall SVGs must be marker genes for some spatial domains. This inclusion relationship holds true except in extreme scenarios, such as when a gene exhibits opposite cell-type-specific spatial patterns that effectively cancel each other out. However, if an overall SVG detection method’s alternative hypothesis is defined for a specific spatial expression pattern, then its SVGs may not include some cell-type-specific SVGs or spatial-domain-marker SVGs.

To understand how SVGs are detected, we categorized the statistical approaches into three major types of hypothesis tests: 

  1. Dependence Test – Examines the dependence between a gene’s expression level and the spatial location. 
  2. Regression Fixed-Effect Test – Examines whether some or all of the fixed-effect covariates, for instance, spatial location, contribute to the mean of the response variable, i.e., a gene’s expression. 
  3. Regression Random-Effect Test (Variance Component Test) – Examines whether the random-effect covariates, for instance, spatial location, contribute to the variance of the response variable, i.e., a gene’s expression.

To further explain how these tests are used for SVG detection, we denote Y as gene’s expression level and S as the spatial locations. Dependence test is the most general hypothesis test for SVG detection. For a given gene, it decides whether the gene’s expression level Y is independent of the spatial location S, i.e., the null hypothesis is:

There are two types of regression tests: fixed-effect tests, where the effect of the spatial location is assumed to be fixed, and random-effect tests, which assume the effect of the spatial location as random. To explain these two types of tests, we use a linear mixed model for a given gene as an example:

where the response variable ( Y_i ) is the gene’s expression level at spot ( i ),
( x_i ) ( epsilon ) ( R^p ) indicates the fixed-effect covariates of spot ( i ),
( z_i ) ( epsilon ) ( R^q ) denotes the random-effect covariates of spot ( i ),
and ( epsilon_i ) is the random measurement error at spot ( i ) with zero mean.

In the model parameters, ( beta_0 ) is the (fixed) intercept, ( beta ) ( epsilon ) ( R^p ) indicates the fixed effects, and ( gamma ) ( epsilon ) ( R^q ) denotes the random effects with zero means and the covariance matrix:

In this linear mixed model, independence is assumed between random effect and random errors and among random errors.

Fixed-effect tests examine whether some or all of the fixed-effect covariates ( x_i ) (dependent on spatial locations S) contribute to the mean of the response variable. If all fixed-effect covariates make no contribution, then:

The null hypothesis

implies

Random-effect tests examine whether the random-effect covariates ( z_i ) (dependent on spatial locations S) contribute to the variance of the response variable Var⁡Yi, focusing on the decomposition:

and testing if the contribution of the random-effect covariates is zero. The null hypothesis:

implies

Among the 23 methods that use frequentist hypothesis tests, dependence tests and random-effect regression tests have been primarily applied to detect overall SVGs, whereas fixed-effect regression tests have been used across all three SVG categories. Understanding these distinctions is key to selecting the right method for specific research questions.

Improving SVG detection methods requires balancing detection power, specificity, and scalability while addressing key challenges in spatial transcriptomics analysis. Future developments should focus on adapting methods to different SRT technologies and tissue types, as well as extending support for multi-sample SRT data to enhance biological insights. Additionally, strengthening statistical rigor and validation frameworks will be crucial for ensuring the reliability of SVG detection. Benchmarking studies also need refinement, with clearer evaluation metrics and standardized datasets to provide robust method comparisons.

References

[1] Yan, G., Hua, S.H. & Li, J.J. (2025). Categorization of 34 computational methods to detect spatially variable genes from spatially resolved transcriptomics data. Nature Communication, 16, 1141. https://doi.org/10.1038/s41467-025-56080-w

[2] Cai, P., Robinson, M. D., & Tiberi, S. (2024). DESpace: spatially variable gene detection via differential expression testing of spatial clusters. Bioinformatics, 40(2). https://doi.org/10.1093/bioinformatics/btae027

]

The post Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics appeared first on Towards Data Science.

​The post Unraveling Spatially Variable Genes: A Statistical Perspective on Spatial Transcriptomics appeared first on Towards Data Science.  Data Science, Biology, Computational Biology, Genomics, Science, Statistics Towards Data ScienceRead More

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

FavoriteLoadingAdd to favorites

Dr. Owns

February 21, 2025

Recent Posts

0 Comments

Submit a Comment