spacec.tools package
Module contents
- spacec.tools.adata_stellar(adata_train, adata_unannotated, celltype_col='cell_type', region_column=None, x_col='x', y_col='y', sample_rate=0.5, distance_thres=50, epochs=50, num_seed_class=3, key_added='stellar_pred', STELLAR_path='', max_memory_usage=16000000000.0, chunk_size=5000, wd=0.05, lr=0.001, seed=1, batch_size=1)[source]
Apply the STELLAR algorithm to annotated and unannotated spatial single-cell data.
This function processes the input AnnData objects by preparing the training data, constructing graph edges based on spatial coordinates, and then running the STELLAR algorithm for label prediction. When a region column is provided, the edge computations are performed for each region separately and the resulting edges are concatenated.
- Parameters:
adata_train (AnnData) – The annotated single-cell data used for training.
adata_unannotated (AnnData) – The unannotated single-cell data for which predictions are desired.
celltype_col (str, optional) – Column name in adata_train.obs that contains the cell type labels, by default “cell_type”.
region_column (str or None, optional) – Column name to partition data into regions. If not None, edges are computed independently per region, by default None.
x_col (str, optional) – Column name in the AnnData objects denoting the x-coordinate, by default “x”.
y_col (str, optional) – Column name in the AnnData objects denoting the y-coordinate, by default “y”.
sample_rate (float, optional) – The rate at which to sample the training data (between 0 and 1), by default 0.5.
distance_thres (int, optional) – Distance threshold (in the same unit as the spatial coordinates) used to determine whether a pair of cells is connected, by default 50.
epochs (int, optional) – Number of training epochs for the STELLAR model, by default 50.
num_seed_class (int, optional) – Number of seed classes, which are appended to the number of unique cell types, by default 3.
key_added (str, optional) – Key under which the predicted labels will be stored in adata_unannotated.obs, by default “stellar_pred”.
STELLAR_path (str, optional) – Filesystem path to the STELLAR repository. This path is added to sys.path, by default “”.
max_memory_usage (float, optional) – Maximum allowable memory usage in bytes when computing pairwise distances; if exceeded, the computation will be done in chunks, by default 1.6e10.
chunk_size (int, optional) – The size of chunks to use for edge computation when memory usage is high, by default 5000.
wd (float, optional) – Weight decay parameter for model optimization, by default 5e-2.
lr (float, optional) – Learning rate for model training, by default 1e-3.
seed (int, optional) – Seed used for reproducibility, by default 1.
batch_size (int, optional) – Batch size for model training, by default 1.
- Returns:
The unannotated AnnData object with an additional observation column (key_added) containing the predicted cell type labels.
- Return type:
AnnData
Notes
- The function performs the following steps:
Prints a citation reminder for the STELLAR algorithm.
Sets up the model arguments by parsing command-line-like arguments.
Prepares the training data by concatenating coordinate information and cell types, and builds a mapping between original and sampled indices.
Computes graph edges either globally or per region (if region_column is provided) using the provided spatial coordinates and distance threshold.
Constructs a GraphDataset and runs the STELLAR algorithm on it.
Returns the modified adata_unannotated with predictions stored in obs[key_added].
The function assumes that helper functions (e.g., stellar_get_edge_index) and necessary modules, including torch, argparse, and dataset utility modules, are available in the environment.
- spacec.tools.cell_segmentation(file_name, channel_file, output_dir, output_fname='', seg_method='mesmer', nuclei_channel='DAPI', input_format='Multichannel', membrane_channel_list=None, cytoplasm_channel_list=None, size_cutoff=0, compartment='whole-cell', plot_predictions=False, model='cyto3', use_gpu=True, diameter=None, save_mask_as_png=False, model_path='./models', resize_factor=1, custom_model=False, differentiate_nucleus_cytoplasm=False, tile_size=4096, tile_overlap=128, tiling_threshold=5000, image_mpp=0.5, stitch_sigma=64, remove_tile_border_objects=True, feature_tile_size=4096, feature_tile_overlap=128, feature_memory_limit_gb=8, set_memory_growth=True)[source]
Perform cell segmentation using Mesmer or Cellpose with optional tiling and feature extraction.
This function implements a complete segmentation pipeline including image loading, preprocessing, segmentation, mask stitching, and feature extraction. It handles large images through tiling and provides memory-optimized processing.
- Parameters:
file_name (str or Path) – Path to input image file or directory (Multichannel = multichannel TIFF, Channels = single-channel TIFFs in a directory, CODEX = CODEX format with channels, cycles, y, x)
channel_file (str or Path) – Path to channel names file (ignored if input_format==”Channels”)
output_dir (str or Path) – Base directory for output files
output_fname (str, optional) – Basename for output files, by default auto-generated
seg_method ({'mesmer', 'cellpose'}, optional) – Segmentation algorithm to use, by default ‘mesmer’
nuclei_channel (str, optional) – Name of the nuclei channel, by default ‘DAPI’
input_format ({'Multichannel', 'Channels', 'CODEX'}, optional) – Format of input data, by default ‘Multichannel’
membrane_channel_list (list of str, optional) – Channel names for membrane/whole-cell segmentation
cytoplasm_channel_list (list of str, optional) – Channel names for cytoplasm (Cellpose only)
size_cutoff (int, optional) – Minimum object size in pixels for feature extraction
compartment ({'whole-cell', 'nuclear'}, optional) – Segmentation compartment for Mesmer (ignored by Cellpose)
plot_predictions (bool, optional) – Whether to plot Mesmer predictions
model (str, optional) – Model name or path for Cellpose
use_gpu (bool, optional) – Whether to use GPU acceleration
diameter (float, optional) – Expected cell diameter for Cellpose in pixels (setting a value is recommended to speed up segmentation significantly - if you are unsure you can measure the average cell diameter in ImageJ)
save_mask_as_png (bool, optional) – Save Cellpose overlay as PNG
model_path (str or Path, optional) – Path for Mesmer model download/load
resize_factor (float, optional) – Factor to resize images before segmentation
custom_model (bool, optional) – Whether ‘model’ is a path to custom Cellpose model
differentiate_nucleus_cytoplasm (bool, optional) – Perform separate nuclear and whole-cell segmentation
tile_size (int, optional) – Size of tiles for segmentation in pixels
tile_overlap (int, optional) – Overlap between adjacent tiles in pixels
tiling_threshold (int, optional) – Image size threshold to enable tiling
image_mpp (float, optional) – Microns per pixel (for Mesmer)
stitch_sigma (float, optional) – Sigma for Gaussian blending during stitching
remove_tile_border_objects (bool, optional) – Remove objects touching tile borders
feature_tile_size (int, optional) – Tile size for feature extraction
feature_tile_overlap (int, optional) – Overlap for feature extraction tiles
feature_memory_limit_gb (float, optional) – Memory limit per channel for feature extraction
set_memory_growth (bool, optional) – Enable TensorFlow memory growth
- Returns:
- Dictionary containing:
’img_ref’: Reference image
’image_dict’: Channel images
’masks’: Primary segmentation mask
’masks_nuclei’: Nuclear mask (if differentiated)
’masks_cytoplasm’: Cytoplasm mask (if differentiated)
’features’: DataFrame of extracted features
’features_nuclei/cytoplasm/whole_cell’: Region-specific features
’features_combined’: Combined features from all regions
Returns None on critical error
- Return type:
dict or None
Notes
Memory optimization strategies: - Tiling for large image segmentation - Memory-efficient feature extraction - Optional GPU memory growth - Cleanup of intermediate arrays
The pipeline includes: 1. Image loading and preprocessing 2. Segmentation (tiled or full image) 3. Mask post-processing and stitching 4. Feature extraction and combination
- spacec.tools.clustering(adata, clustering='leiden', marker_list=None, resolution=1, n_neighbors=10, reclustering=False, key_added=None, key_filter=None, subset_cluster=None, seed=42, fs_xdim=10, fs_ydim=10, fs_rlen=10, **cluster_kwargs)[source]
Perform clustering on the given annotated data matrix.
- Parameters:
adata (AnnData) – The annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to stained markers.
clustering (str, optional) – The clustering algorithm to use. Options are “leiden” or “louvain”. Defaults to “leiden”.
marker_list (list, optional) – A list of markers for clustering. Defaults to None.
resolution (int, optional) – The resolution for the clustering algorithm. Defaults to 1.
n_neighbors (int, optional) – The number of neighbors to use for the neighbors graph. Defaults to 10.
reclustering (bool, optional) – If set to True, the function will skip the calculation of neighbors and UMAP. This can be used to speed up the process when just reclustering or running flowSOM.
key_added (str, optional) – The key name to add to the adata object. Defaults to None.
key_filter (str, optional) – The key name to filter the adata object. Defaults to None.
subset_cluster (list, optional) – The list of clusters to subset. Defaults to None.
seed (int, optional) – Seed for random state. Default is 42.
fs_xdim (int, optional) – X dimension for FlowSOM. Default is 10.
fs_ydim (int, optional) – Y dimension for FlowSOM. Default is 10.
fs_rlen (int, optional) – Rlen for FlowSOM. Default is 10.
**cluster_kwargs (dict) – Additional keyword arguments for the clustering function.
- Returns:
The annotated data matrix with the clustering results added.
- Return type:
AnnData
- spacec.tools.filter_interactions(distance_pvals, pvalue=0.05, logfold_group_abs=0.1, comparison='condition')[source]
Filters interactions based on p-value, logfold change, and other conditions.
- Parameters:
distance_pvals (pandas.DataFrame) – DataFrame containing p-values, logfold changes, and interactions for each comparison.
pvalue (float, optional) – The maximum p-value to consider for significance. Defaults to 0.05.
logfold_group_abs (float, optional) – The minimum absolute logfold change to consider for significance. Defaults to 0.1.
comparison (str, optional) – The comparison condition to filter by. Defaults to “condition”.
- Returns:
dist_table (pandas.DataFrame) – DataFrame containing logfold changes sorted into two columns by the comparison condition.
distance_pvals_sig_sub (pandas.DataFrame) – Subset of the original DataFrame containing only significant interactions based on the specified conditions.
- spacec.tools.identify_interactions(adata, cellid, x_pos, y_pos, cell_type, region, comparison, min_observed=10, distance_threshold=128, num_cores=None, num_iterations=1000, key_name=None, correct_dtype=False, aggregate_per_cell=True)[source]
Identify significant cell-cell interactions based on spatial distances.
This function processes the input annotated data (adata) to compute observed triangulation distances and perform permutation testing to generate expected distances. It then compares the observed with expected mean distances using the Mann-Whitney U test to compute a p-value and a log-fold change for each pair of cell types. The results are stored back in the adata object and returned.
- Parameters:
adata (AnnData) – Annotated data object that holds cell observation data (adata.obs).
cellid (str) – Column name to be used as the unique cell identifier.
x_pos (str) – Column name for the x-coordinate.
y_pos (str) – Column name for the y-coordinate.
cell_type (str) – Column name for cell type information.
region (str) – Column name for region information.
comparison (str) – Column name used to compare different conditions.
min_observed (int, optional) – Minimum number of observed distance measurements required to consider a significant interaction (default: 10).
distance_threshold (int, optional) – Maximum distance to consider when grouping cell interactions (default: 128).
num_cores (int, optional) – Number of CPU cores to use for parallel processing. Defaults to half of available cores if None.
num_iterations (int, optional) – The number of permutation iterations for generating expected distances (default: 1000).
key_name (str, optional) – Key under which the triangulation distances will be stored in adata.uns. If None, defaults to “triDist”.
correct_dtype (bool, optional) – Flag to convert coordinate and region columns to string types (default: False).
aggregate_per_cell (bool, optional) – Whether to aggregate distances initially at a per-cell basis (default: True).
- Returns:
- A tuple containing:
distance_pvals (pandas.DataFrame): DataFrame with p-values and log-fold changes for each pair of cell types.
triangulation_distances_dict (dict): Dictionary containing observed and iterated triangulation distance DataFrames.
- Return type:
- spacec.tools.label_tissue(resized_im, lower_cutoff=0.012, upper_cutoff=0.025, savefig=False, showfig=True, output_dir='./', output_fname='')[source]
Label the tissue in the given image.
- Parameters:
resized_im (ndarray) – The resized image to label.
lower_cutoff (float, optional) – The lower cutoff for the sobel filter, by default 0.012.
upper_cutoff (float, optional) – The upper cutoff for the sobel filter, by default 0.025.
savefig (bool, optional) – Whether to save the figure or not, by default False.
showfig (bool, optional) – Whether to show the figure or not, by default True.
output_dir (str, optional) – The directory to save the figure in, by default “./”.
output_fname (str, optional) – The filename to save the figure as, by default “”.
- Returns:
A DataFrame containing the labels from the segmentation.
- Return type:
DataFrame
- spacec.tools.launch_interactive_clustering(adata=None, output_dir=None)[source]
Launch an interactive clustering application for single-cell data analysis.
- Parameters:
adata (AnnData, optional) – An AnnData object containing single-cell data. If provided, the data will be loaded automatically.
output_dir (str, optional) – The directory where the annotated AnnData object will be saved. Required if adata is provided.
- Returns:
main_layout – The main layout of the interactive clustering application.
- Return type:
panel.layout.Row
- Raises:
ValueError – If adata is provided but output_dir is not specified, or if output_dir is not a string.
- spacec.tools.ml_predict(adata_val, svc, save_name='svm_pred', return_prob_mat=False)[source]
Predict labels for a given dataset using a trained Support Vector Classifier (SVC) model.
- Parameters:
adata_val (AnnData) – The validation data as an AnnData object.
svc (SVC) – The trained Support Vector Classifier model.
save_name (str, optional) – The name under which the predictions will be saved in the AnnData object, by default “svm_pred”.
return_prob_mat (bool, optional) – Whether to return the probability matrix, by default False.
- Returns:
If return_prob_mat is True, returns a DataFrame with the probability matrix. Otherwise, returns None.
- Return type:
DataFrame or None
- spacec.tools.ml_train(adata_train, label, test_size=0.2, random_state=0, nan_policy_y='omit', mode='accurate_SVC', showfig=True, figsize=(8, 6), n_neighbors=5)[source]
Train a classifier (SVC, LinearSVC, or KNN) on the data.
- Parameters:
adata_train (anndata.AnnData) – The AnnData object containing the training data. The input features are expected in adata_train.X, and the target labels in adata_train.obs[label].
label (str) – The column name in adata_train.obs to use as the target variable.
test_size (float, optional) – The proportion of the dataset to include in the test split. Default is 0.2.
random_state (int, optional) – Seed for the random number generator. Default is 0.
nan_policy_y ({'omit', 'raise'}, optional) – Policy for handling NaNs in the target variable. ‘omit’ removes NaNs, ‘raise’ raises an error. Default is ‘omit’.
mode ({'accurate_SVC', 'fast_SVC', 'knn'}, optional) – The type of classifier to use. - ‘accurate_SVC’: Uses SVC with probability=True (slower, provides predict_proba). - ‘fast_SVC’: Uses LinearSVC with CalibratedClassifierCV (faster, optional predict_proba). - ‘knn’: Uses KNeighborsClassifier with specified n_neighbors. Default is ‘accurate_SVC’.
showfig (bool, optional) – Whether to display a heatmap of the classification report. Default is True.
figsize (tuple, optional) – Size of the figure if showfig is True. Default is (8, 6).
n_neighbors (int, optional) – Number of neighbors for KNN classifier. Default is 5.
- Returns:
svc – The trained classifier model. For ‘accurate_SVC’ and ‘fast_SVC’, the model supports predict_proba. For ‘knn’, only predict is available.
- Return type:
- Raises:
ValueError – If mode is not one of the allowed options, or if nan_policy_y is not ‘omit’ or ‘raise’.
Notes
The function handles NaNs in the target variable based on nan_policy_y.
For ‘accurate_SVC’, the classifier provides predict_proba for probability estimates.
For ‘fast_SVC’, the classifier uses LinearSVC with calibration for predict_proba.
For ‘knn’, the classifier uses KNeighborsClassifier with the specified n_neighbors.
The classification report is displayed as a heatmap if showfig is True.
The function prints progress messages (e.g., “Preparing training data!”, “Training now!”, etc.).
- spacec.tools.neighborhood_analysis(adata, unique_region, cluster_col, X='x', Y='y', k=35, n_neighborhoods=30, elbow=False, metric='distortion')[source]
Compute for Cellular neighborhoods (CNs).
- Parameters:
adata (AnnData) – Annotated data matrix.
unique_region (str) – Each region is one independent CODEX image.
cluster_col (str) – Columns to compute CNs on, typically ‘celltype’.
X (str, optional) – X coordinate column name, by default “x”.
Y (str, optional) – Y coordinate column name, by default “y”.
k (int, optional) – Number of neighbors to compute, by default 35.
n_neighborhoods (int, optional) – Number of neighborhoods one ends up with, by default 30.
elbow (bool, optional) – Whether to test for optimal number of clusters and visualize as elbow plot or not, by default False. If set to True, the function will test 1 to n_neighborhoods and plot the distortion score in an elbow plot to assist the user in finding the optimal number of clusters.
metric (str, optional) – The metric to use when calculating distance between instances in a feature array, by default “distortion”. Other options include “silhouette” and “calinski_harabasz”.
- Returns:
Annotated data matrix with updated neighborhood information.
- Return type:
AnnData
Notes
The function performs the following steps: 1. Extracts relevant columns from the input AnnData object. 2. Computes dummy variables for the cluster column. 3. Groups data by the unique region and computes neighborhoods. 4. Optionally performs k-means clustering and visualizes the elbow plot if elbow is set to True. 5. Updates the input AnnData object with neighborhood labels and centroids.
- spacec.tools.patch_proximity_analysis(adata, region_column, patch_column, group, min_cluster_size=80, x_column='x', y_column='y', radius=128, edge_neighbours=1, plot=True, savefig=False, output_dir='./', output_fname='', save_geojson=True, allow_single_cluster=True, method='border_cell_radius', concave_hull_length_threshold=50, concavity=2, original_unit_scale=1, tolerance_distance=0.001, key_name=None)[source]
Performs a proximity analysis on patches of a given group within each region of a dataset.
This function processes an AnnData object by extracting its cell observations and performing proximity analysis on a specified cell group (e.g. a cell type or neighborhood) within each region. Depending on the chosen method (“border_cell_radius” or “hull_expansion”), the analysis applies DBSCAN clustering, identifies concave hull boundaries, and then either determines nearby cells based on a fixed search radius or uses a peripheral buffering approach. Optionally, the function can plot visualization of the analysis and save outputs (figures, CSV files, and GeoJSON).
- Parameters:
adata (AnnData) – The annotated data matrix of shape (n_obs x n_vars). Rows correspond to individual cells and columns to gene expression or other features.
region_column (str) – The name of the column in adata.obs that contains region information.
patch_column (str) – The name of the column in adata.obs that contains patch (or group) information.
group (str) – The specific group (e.g. cell type or patch identifier) on which the proximity analysis is to be performed.
min_cluster_size (int, optional) – The minimum number of cells required in a region to perform the analysis. Regions with fewer cells than this value will be skipped. Default is 80.
x_column (str, optional) – The column name in adata.obs corresponding to the x-coordinate of each cell. Default is “x”.
y_column (str, optional) – The column name in adata.obs corresponding to the y-coordinate of each cell. Default is “y”.
radius (int, optional) – The distance (in spatial units) within which points are considered to be in proximity. This value is multiplied by original_unit_scale. Default is 128.
edge_neighbours (int, optional) – The number of neighbouring edge points to consider when identifying proximity relationships. Default is 1.
plot (bool, optional) – Whether to generate and display visualizations of the proximity analysis. Default is True.
savefig (bool, optional) – Whether to save the generated figure to disk. Default is False.
output_dir (str, optional) – The directory in which to save output files (figures, CSVs, or GeoJSON files). Default is “./”.
output_fname (str, optional) – The filename prefix to use when saving figures. Default is an empty string.
save_geojson (bool, optional) – Whether to convert certain results to GeoJSON format and save them. Default is True.
allow_single_cluster (bool, optional) – If True, allows DBSCAN to assign all cells to a single cluster even if no separate clusters exist. Default is True.
method (str, optional) – The analysis method to use. Options are “border_cell_radius” (default) or “hull_expansion”. Each method applies a different strategy for proximity detection.
concave_hull_length_threshold (int, optional) – Threshold value used for generating the concave hull boundary. Default is 50.
concavity (int, optional) – Parameter specifying the degree of concavity when calculating the hull boundary. Default is 2.
original_unit_scale (int or float, optional) – A scaling factor to convert the radius from its given unit to the coordinate system unit. Default is 1.
tolerance_distance (float, optional) – Tolerance value for buffering in the peripheral analysis (used when method is “hull_expansion”). Default is 0.001.
key_name (str, optional) – The key under which the final proximity analysis results are stored in adata.uns. If not provided, defaults to “ppa_result”.
- Returns:
final_results (pandas.DataFrame) – A DataFrame containing the combined proximity analysis results from all processed regions. It includes, among other information, a newly generated “unique_patch_ID” column that concatenates the region, group, and patch identifier.
outlines_results (pandas.DataFrame) – A DataFrame containing the outline (or hull) points corresponding to the patches; useful for visualization or further spatial analysis.
- spacec.tools.remove_rare_cell_types(adata, distance_pvals, cell_type_column='cell_type', min_cell_type_percentage=1)[source]
Remove cell types with a percentage lower than the specified threshold from the distance_pvals DataFrame.
- Parameters:
adata (AnnData) – Annotated data matrix.
distance_pvals (DataFrame) – DataFrame containing distance p-values with columns ‘celltype1’ and ‘celltype2’.
cell_type_column (str, optional) – Column name in adata containing cell type information, by default “cell_type”.
min_cell_type_percentage (float, optional) – Minimum percentage threshold for cell types to be retained, by default 1.
- Returns:
Filtered distance_pvals DataFrame with rare cell types removed.
- Return type:
DataFrame
- spacec.tools.save_labelled_tissue(filepath, tissueframe, region='region', padding=50, downscale_factor=64, output_dir='./', output_fname='')[source]
Save the labelled tissue from the given image.
- Parameters:
filepath (str) – The path to the image file.
tissueframe (DataFrame) – The DataFrame containing the labels from the segmentation.
region (str, optional) – The region to group by, by default “region”.
padding (int, optional) – The padding to add to the extracted tissue, by default 50.
downscale_factor (int, optional) – The factor to downscale the image by, by default 64.
output_dir (str, optional) – The directory to save the image in, by default “./”.
output_fname (str, optional) – The filename to save the image as, by default “”.
- Return type:
None
- spacec.tools.tm_viewer(adata, images_pickle_path, directory=None, region_column='unique_region', region='', xSelector='x', ySelector='y', color_by='cell_type', keep_list=None, include_masks=True, open_viewer=True, add_UMAP=True, use_jpg_compression=False)[source]
Prepare and visualize spatial transcriptomics data using TissUUmaps.
- Parameters:
adata (AnnData) – Annotated data matrix.
images_pickle_path (str) – Path to the pickle file containing images and masks.
directory (str, optional) – Directory to save the output files. If None, a temporary directory will be created.
region_column (str, optional) – Column name in adata.obs that specifies the region, by default “unique_region”.
region (str, optional) – Specific region to process, by default “”.
xSelector (str, optional) – Column name for x coordinates, by default “x”.
ySelector (str, optional) – Column name for y coordinates, by default “y”.
color_by (str, optional) – Column name for coloring the points, by default “celltype_fine”.
keep_list (list, optional) – List of columns to keep from adata.obs, by default None.
include_masks (bool, optional) – Whether to include masks in the output, by default True.
open_viewer (bool, optional) – Whether to open the TissUUmaps viewer, by default True.
add_UMAP (bool, optional) – Whether to add UMAP coordinates to the output, by default True.
use_jpg_compression (bool, optional) – Whether to use JPEG compression for saving images, by default False.
- Returns:
list – List of paths to the saved image files.
list – List of paths to the saved CSV files.
- spacec.tools.tm_viewer_catplot(adata, directory=None, region_column='unique_region', x='x', y='y', color_by='cell_type', open_viewer=True, add_UMAP=False, keep_list=None)[source]
Generate and visualize categorical plots using TissUUmaps.
- Parameters:
adata (AnnData) – Annotated data matrix.
directory (str, optional) – Directory to save the output CSV files. If None, a temporary directory is created.
region_column (str, optional) – Column name in adata.obs that contains region information. Default is “unique_region”.
x (str, optional) – Column name in adata.obs to be used for x-axis. Default is “x”.
y (str, optional) – Column name in adata.obs to be used for y-axis. Default is “y”.
color_by (str, optional) – Column name in adata.obs to be used for coloring the points. Default is “cell_type”.
open_viewer (bool, optional) – Whether to open the TissUUmaps viewer after generating the CSV files. Default is True.
add_UMAP (bool, optional) – Whether to add UMAP coordinates to the output data. Default is False.
keep_list (list of str, optional) – List of columns to keep from adata.obs. If None, defaults to [region_column, x, y, color_by].
- Returns:
List of paths to the generated CSV files.
- Return type: