Data Preparation

Transcriptomics data

STAID requires the following input data:

  • scRNA-seq data (raw counts)

    • Must be provided as an AnnData object before deconvolution.

    • The data should contain raw count values (not normalized or log-transformed).

    • Cell type annotations must be provided in adata.obs.keys(), e.g.:

      sc_adata.obs['celltype']
      
  • Spatial transcriptomics data (raw counts)

    • Must be provided as an AnnData object before deconvolution.

    • The expression matrix should contain raw count values.

    • Spatial coordinates (e.g., spatial) should be included in adata.obsm.

Both datasets should share a common set of genes (overlapping gene symbols), which STAID uses to perform deconvolution.


Example Datasets

The demo spatial transcriptomics data (human breast cancer Visium) are available at https://doi.org/10.5281/zenodo.4739739 and match human breast cancer scRNA-seq reference datasets are available through the Gene Expression Omnibus under accession number GSE176078.

For convenience, we also provide a sorted version on Google Drive: Download from Google Drive.