spacec.tools package

Module contents

spacec.tools.adata_stellar(adata_train, adata_unannotated, celltype_col='coarse_anno3', x_col='x', y_col='y', sample_rate=0.5, distance_thres=50, epochs=50, key_added='stellar_pred', STELLAR_path='')[source]

Applies the STELLAR algorithm to the given annotated and unannotated data.

Parameters: adata_train (AnnData): The annotated data. adata_unannotated (AnnData): The unannotated data. celltype_col (str, optional): The column name for cell types in the annotated data. Defaults to ‘coarse_anno3’. x_col (str, optional): The column name for x coordinates in the data. Defaults to ‘x’. y_col (str, optional): The column name for y coordinates in the data. Defaults to ‘y’. sample_rate (float, optional): The rate at which to sample the training data. Defaults to 0.5. distance_thres (int, optional): The distance threshold for constructing edge indexes. Defaults to 50. key_added (str, optional): The key to be added to the unannotated data’s obs dataframe for the predicted results. Defaults to ‘stellar_pred’.

Returns: adata (AnnData): The unannotated data with the added key for the predicted results.

spacec.tools.cell_segmentation(file_name, channel_file, output_dir, output_fname='', seg_method='mesmer', nuclei_channel='DAPI', input_format='Multichannel', membrane_channel_list=None, size_cutoff=0, compartment='whole-cell', plot_predictions=True, model='cyto3', use_gpu=True, cytoplasm_channel_list=None, diameter=None, save_mask_as_png=False, model_path='./models', resize_factor=1, custom_model=False, differentiate_nucleus_cytoplasm=False)[source]

Perform cell segmentation on an image. :param file_name: The path to the image file. :type file_name: str :param channel_file: The path to the file containing the channel names. :type channel_file: str :param output_dir: The directory where the output will be saved. :type output_dir: str :param output_fname: The name of the output file. Default is an empty string. :type output_fname: str, optional :param seg_method: The segmentation method to use. Options are ‘mesmer’ and ‘cellpose’. Default is ‘mesmer’. :type seg_method: str :param nuclei_channel: The name of the nuclei channel. Default is ‘DAPI’. :type nuclei_channel: str :param input_format: The input_format used to generate the image. Options are ‘CODEX’ and ‘Multichannel’. Default is ‘Multichannel’. :type input_format: str :param membrane_channel_list: The names of the membrane channels. :type membrane_channel_list: list of str, optional :param size_cutoff: The size cutoff for segmentation. Default is 0. :type size_cutoff: int, optional :param compartment: The compartment to segment. Options are ‘whole-cell’ and ‘nuclei’. Default is ‘whole-cell’. This only applies to Mesmer. :type compartment: str, optional :param plot_predictions: Whether to plot the segmentation results. Default is True. :type plot_predictions: bool, optional :param model: The model to use for segmentation. Default is ‘tissuenet’. This only applies to Cellpose. :type model: str, optional :param use_gpu: Whether to use GPU for segmentation. Default is True. This only applies to Cellpose. :type use_gpu: bool, optional :param cytoplasm_channel_list: The names of the cytoplasm channels. :type cytoplasm_channel_list: list of str, optional :param diameter: The diameter of the cells. Default is None - if set to None the diameter is automatically defined. This only applies to Cellpose. :type diameter: int, optional :param save_mask_as_png: Whether to save the segmentation mask as a PNG file. Default is False. :type save_mask_as_png: bool, optional :param model_path: The path to the model. Default is ‘./models’. :type model_path: str, optional :param differentiate_nucleus_cytoplasm: Whether to differentiate between nucleus and cytoplasm. Default is False. :type differentiate_nucleus_cytoplasm: bool, optional

Returns:

A dictionary containing the original image (‘img’), the segmentation masks (‘masks’), and the image dictionary (‘image_dict’).

Return type:

dict

spacec.tools.clustering(adata, clustering='leiden', marker_list=None, resolution=1, n_neighbors=10, reclustering=False, key_added=None, key_filter=None, subset_cluster=None, seed=42, fs_xdim=10, fs_ydim=10, fs_rlen=10, **cluster_kwargs)[source]

Perform clustering on the given annotated data matrix.

Parameters:
  • adata (AnnData) – The annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to stained markers.

  • clustering (str, optional) – The clustering algorithm to use. Options are “leiden” or “louvain”. Defaults to “leiden”.

  • marker_list (list, optional) – A list of markers for clustering. Defaults to None.

  • resolution (int, optional) – The resolution for the clustering algorithm. Defaults to 1.

  • n_neighbors (int, optional) – The number of neighbors to use for the neighbors graph. Defaults to 10.

  • reclustering (bool, optional) – Whether to recluster the data. Defaults to False.

  • key_added (str, optional) – The key name to add to the adata object. Defaults to None.

  • key_filter (str, optional) – The key name to filter the adata object. Defaults to None.

  • subset_cluster (list, optional) – The list of clusters to subset. Defaults to None.

  • seed (int, optional) – Seed for random state. Default is 42.

  • fs_xdim (int, optional) – X dimension for FlowSOM. Default is 10.

  • fs_ydim (int, optional) – Y dimension for FlowSOM. Default is 10.

  • fs_rlen (int, optional) – Rlen for FlowSOM. Default is 10.

  • **cluster_kwargs (dict) – Additional keyword arguments for the clustering function.

Returns:

The annotated data matrix with the clustering results added.

Return type:

AnnData

spacec.tools.filter_interactions(distance_pvals, pvalue=0.05, logfold_group_abs=0.1, comparison='condition')[source]

Filters interactions based on p-value, logfold change, and other conditions.

Parameters:
  • distance_pvals (pandas.DataFrame) – DataFrame containing p-values, logfold changes, and interactions for each comparison.

  • pvalue (float, optional) – The maximum p-value to consider for significance. Defaults to 0.05.

  • logfold_group_abs (float, optional) – The minimum absolute logfold change to consider for significance. Defaults to 0.1.

  • comparison (str, optional) – The comparison condition to filter by. Defaults to “condition”.

Returns:

  • dist_table (pandas.DataFrame) – DataFrame containing logfold changes sorted into two columns by the comparison condition.

  • distance_pvals_sig_sub (pandas.DataFrame) – Subset of the original DataFrame containing only significant interactions based on the specified conditions.

spacec.tools.identify_interactions(adata, cellid, x_pos, y_pos, cell_type, region, comparison, iTriDist_keyname=None, triDist_keyname=None, min_observed=10, distance_threshold=128, num_cores=None, num_iterations=1000, key_name=None, correct_dtype=False)[source]

Identify interactions between cell types based on their spatial distances.

Parameters:
  • adata (AnnData) – Annotated data matrix.

  • id (str) – Identifier for cells.

  • x_pos (str) – Column name for x position of cells.

  • y_pos (str) – Column name for y position of cells.

  • cell_type (str) – Column name for cell type.

  • region (str) – Column name for region.

  • comparison (str) – Column name for comparison.

  • iTriDist_keyname (str, optional) – Key name for iterative triangulation distances, by default None

  • triDist_keyname (str, optional) – Key name for triangulation distances, by default None

  • min_observed (int, optional) – Minimum number of observed distances, by default 10

  • distance_threshold (int, optional) – Threshold for distance, by default 128

  • num_cores (int, optional) – Number of cores to use for computation, by default None

  • num_iterations (int, optional) – Number of iterations for computation, by default 1000

  • key_name (str, optional) – Key name for output, by default None

  • correct_dtype (bool, optional) – Whether to correct data type or not, by default False

Returns:

DataFrame with p-values and logfold changes for interactions.

Return type:

DataFrame

spacec.tools.label_tissue(resized_im, lower_cutoff=0.012, upper_cutoff=0.025, savefig=False, showfig=True, output_dir='./', output_fname='')[source]

Label the tissue in the given image.

Parameters:
  • resized_im (ndarray) – The resized image to label.

  • lower_cutoff (float, optional) – The lower cutoff for the sobel filter, by default 0.012.

  • upper_cutoff (float, optional) – The upper cutoff for the sobel filter, by default 0.025.

  • savefig (bool, optional) – Whether to save the figure or not, by default False.

  • showfig (bool, optional) – Whether to show the figure or not, by default True.

  • output_dir (str, optional) – The directory to save the figure in, by default “./”.

  • output_fname (str, optional) – The filename to save the figure as, by default “”.

Returns:

A DataFrame containing the labels from the segmentation.

Return type:

DataFrame

spacec.tools.ml_predict(adata_val, svc, save_name='svm_pred', return_prob_mat=False)[source]

Predict labels for a given dataset using a trained Support Vector Classifier (SVC) model.

Parameters:
  • adata_val (AnnData) – The validation data as an AnnData object.

  • svc (SVC) – The trained Support Vector Classifier model.

  • save_name (str, optional) – The name under which the predictions will be saved in the AnnData object, by default “svm_pred”.

  • return_prob_mat (bool, optional) – Whether to return the probability matrix, by default False.

Returns:

If return_prob_mat is True, returns a DataFrame with the probability matrix. Otherwise, returns None.

Return type:

DataFrame or None

spacec.tools.ml_train(adata_train, label, test_size=0.33, random_state=0, model='svm', nan_policy_y='raise', showfig=True, figsize=(10, 8))[source]

Train a svm model on the provided data.

Parameters:
  • adata_train (AnnData) – The training data as an AnnData object.

  • label (str) – The label to predict.

  • test_size (float, optional) – The proportion of the dataset to include in the test split, by default 0.33.

  • random_state (int, optional) – The seed used by the random number generator, by default 0.

  • model (str, optional) – The type of model to train, by default “svm”.

  • nan_policy_y (str, optional) – How to handle NaNs in the label, by default “raise”. Can be either ‘omit’ or ‘raise’.

  • showfig (bool, optional) – Whether to show the confusion matrix as a heatmap, by default True.

Returns:

The trained Support Vector Classifier model.

Return type:

SVC

Raises:

ValueError – If nan_policy_y is not ‘omit’ or ‘raise’.

spacec.tools.neighborhood_analysis(adata, unique_region, cluster_col, X='x', Y='y', k=35, n_neighborhoods=30, elbow=False, metric='distortion')[source]

Compute for Cellular neighborhoods (CNs).

Parameters:
  • adata (AnnData) – Annotated data matrix.

  • unique_region (str) – Each region is one independent CODEX image.

  • cluster_col (str) – Columns to compute CNs on, typically ‘celltype’.

  • X (str, optional) – X coordinate column name, by default “x”.

  • Y (str, optional) – Y coordinate column name, by default “y”.

  • k (int, optional) – Number of neighbors to compute, by default 35.

  • n_neighborhoods (int, optional) – Number of neighborhoods one ends up with, by default 30.

  • elbow (bool, optional) – Whether to test for optimal number of clusters and visulize as elbow plot or not, by default False. If set to true the funktion will test 1 to n_neighborhoods and plots the distortion score in an elbow plot to assist the user in finding the optimal number of clusters.

  • metric (str, optional) – The metric to use when calculating distance between instances in a feature array, by default “distortion”.

Returns:

Annotated data matrix with updated neighborhood information.

Return type:

AnnData

spacec.tools.patch_proximity_analysis(adata, region_column, patch_column, group, min_cluster_size=80, x_column='x', y_column='y', radius=128, edge_neighbours=3, plot=True, savefig=False, output_dir='./', output_fname='', key_name='ppa_result', plot_color='#6a3d9a')[source]

Performs a proximity analysis on patches of a given group within each region of a dataset.

Parameters: adata (AnnData): The annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to genes. region_column (str): The name of the column in the DataFrame that contains the region information. patch_column (str): The name of the column in the DataFrame that contains the patch information. group (str): The group to perform the proximity analysis on. min_cluster_size (int, optional): The minimum number of samples required to form a dense region. Default is 80. x_column (str, optional): The name of the column in the DataFrame that contains the x-coordinate. Default is ‘x’. y_column (str, optional): The name of the column in the DataFrame that contains the y-coordinate. Default is ‘y’. radius (int, optional): The radius within which to identify points in proximity. Default is 128. edge_neighbours (int, optional): The number of edge neighbours to consider. Default is 3. plot (bool, optional): Whether to plot the patches. Default is True. savefig (bool, optional): Whether to save the figure. Default is False. output_dir (str, optional): The directory to save the figure in. Default is “./”. output_fname (str, optional): The filename to save the figure as. Default is “”. key_name (str, optional): The key name to store the results in the AnnData object. Default is ‘ppa_result’.

Returns: final_results (DataFrame): A DataFrame containing the results of the proximity analysis. outlines_results (DataFrame): A DataFrame containing the outlines of the patches.

spacec.tools.save_labelled_tissue(filepath, tissueframe, region='region', padding=50, downscale_factor=64, output_dir='./', output_fname='')[source]

Save the labelled tissue from the given image.

Parameters:
  • filepath (str) – The path to the image file.

  • tissueframe (DataFrame) – The DataFrame containing the labels from the segmentation.

  • region (str, optional) – The region to group by, by default “region”.

  • padding (int, optional) – The padding to add to the extracted tissue, by default 50.

  • downscale_factor (int, optional) – The factor to downscale the image by, by default 64.

  • output_dir (str, optional) – The directory to save the image in, by default “./”.

  • output_fname (str, optional) – The filename to save the image as, by default “”.

Return type:

None

spacec.tools.tm_viewer(adata, images_pickle_path, directory, region_column='unique_region', region='', xSelector='x', ySelector='y', color_by='celltype_fine', keep_list=None, include_masks=True, open_viewer=True, add_UMAP=True)[source]