Data Preparation

Transcriptomics data

STAID requires the following input data:

scRNA-seq data (raw counts)
- Must be provided as an AnnData object before deconvolution.
- The data should contain raw count values (not normalized or log-transformed).
- Cell type annotations must be provided in adata.obs.keys(), e.g.:
```
sc_adata.obs['celltype']
```
Spatial transcriptomics data (raw counts)
- Must be provided as an AnnData object before deconvolution.
- The expression matrix should contain raw count values.
- Spatial coordinates (e.g., spatial) should be included in adata.obsm.

Both datasets should share a common set of genes (overlapping gene symbols), which STAID uses to perform deconvolution.

Example Datasets

The demo spatial transcriptomics data (human breast cancer Visium) are available at https://doi.org/10.5281/zenodo.4739739 and match human breast cancer scRNA-seq reference datasets are available through the Gene Expression Omnibus under accession number GSE176078.

For convenience, we also provide a sorted version on Google Drive: Download from Google Drive.