Data Preparation
Transcriptomics data
STAID requires the following input data:
scRNA-seq data (raw counts)
Must be provided as an
AnnDataobject before deconvolution.The data should contain raw count values (not normalized or log-transformed).
Cell type annotations must be provided in
adata.obs.keys(), e.g.:sc_adata.obs['celltype']
Spatial transcriptomics data (raw counts)
Must be provided as an
AnnDataobject before deconvolution.The expression matrix should contain raw count values.
Spatial coordinates (e.g.,
spatial) should be included inadata.obsm.
Both datasets should share a common set of genes (overlapping gene symbols), which STAID uses to perform deconvolution.
Example Datasets
The demo spatial transcriptomics data (human breast cancer Visium) are available at https://doi.org/10.5281/zenodo.4739739 and match human breast cancer scRNA-seq reference datasets are available through the Gene Expression Omnibus under accession number GSE176078.
For convenience, we also provide a sorted version on Google Drive: Download from Google Drive.