spacec.tools package
Module contents
- spacec.tools.adata_stellar(adata_train, adata_unannotated, celltype_col='coarse_anno3', x_col='x', y_col='y', sample_rate=0.5, distance_thres=50, epochs=50, key_added='stellar_pred', STELLAR_path='')[source]
Applies the STELLAR algorithm to the given annotated and unannotated data.
Parameters: adata_train (AnnData): The annotated data. adata_unannotated (AnnData): The unannotated data. celltype_col (str, optional): The column name for cell types in the annotated data. Defaults to ‘coarse_anno3’. x_col (str, optional): The column name for x coordinates in the data. Defaults to ‘x’. y_col (str, optional): The column name for y coordinates in the data. Defaults to ‘y’. sample_rate (float, optional): The rate at which to sample the training data. Defaults to 0.5. distance_thres (int, optional): The distance threshold for constructing edge indexes. Defaults to 50. key_added (str, optional): The key to be added to the unannotated data’s obs dataframe for the predicted results. Defaults to ‘stellar_pred’.
Returns: adata (AnnData): The unannotated data with the added key for the predicted results.
- spacec.tools.cell_segmentation(file_name, channel_file, output_dir, output_fname='', seg_method='mesmer', nuclei_channel='DAPI', input_format='Multichannel', membrane_channel_list=None, size_cutoff=0, compartment='whole-cell', plot_predictions=True, model='cyto3', use_gpu=True, cytoplasm_channel_list=None, diameter=None, save_mask_as_png=False, model_path='./models', resize_factor=1, custom_model=False, differentiate_nucleus_cytoplasm=False)[source]
Perform cell segmentation on an image. :param file_name: The path to the image file. :type file_name: str :param channel_file: The path to the file containing the channel names. :type channel_file: str :param output_dir: The directory where the output will be saved. :type output_dir: str :param output_fname: The name of the output file. Default is an empty string. :type output_fname: str, optional :param seg_method: The segmentation method to use. Options are ‘mesmer’ and ‘cellpose’. Default is ‘mesmer’. :type seg_method: str :param nuclei_channel: The name of the nuclei channel. Default is ‘DAPI’. :type nuclei_channel: str :param input_format: The input_format used to generate the image. Options are ‘CODEX’ and ‘Multichannel’. Default is ‘Multichannel’. :type input_format: str :param membrane_channel_list: The names of the membrane channels. :type membrane_channel_list: list of str, optional :param size_cutoff: The size cutoff for segmentation. Default is 0. :type size_cutoff: int, optional :param compartment: The compartment to segment. Options are ‘whole-cell’ and ‘nuclei’. Default is ‘whole-cell’. This only applies to Mesmer. :type compartment: str, optional :param plot_predictions: Whether to plot the segmentation results. Default is True. :type plot_predictions: bool, optional :param model: The model to use for segmentation. Default is ‘tissuenet’. This only applies to Cellpose. :type model: str, optional :param use_gpu: Whether to use GPU for segmentation. Default is True. This only applies to Cellpose. :type use_gpu: bool, optional :param cytoplasm_channel_list: The names of the cytoplasm channels. :type cytoplasm_channel_list: list of str, optional :param diameter: The diameter of the cells. Default is None - if set to None the diameter is automatically defined. This only applies to Cellpose. :type diameter: int, optional :param save_mask_as_png: Whether to save the segmentation mask as a PNG file. Default is False. :type save_mask_as_png: bool, optional :param model_path: The path to the model. Default is ‘./models’. :type model_path: str, optional :param differentiate_nucleus_cytoplasm: Whether to differentiate between nucleus and cytoplasm. Default is False. :type differentiate_nucleus_cytoplasm: bool, optional
- Returns:
A dictionary containing the original image (‘img’), the segmentation masks (‘masks’), and the image dictionary (‘image_dict’).
- Return type:
- spacec.tools.clustering(adata, clustering='leiden', marker_list=None, resolution=1, n_neighbors=10, reclustering=False, key_added=None, key_filter=None, subset_cluster=None, seed=42, fs_xdim=10, fs_ydim=10, fs_rlen=10, **cluster_kwargs)[source]
Perform clustering on the given annotated data matrix.
- Parameters:
adata (AnnData) – The annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to stained markers.
clustering (str, optional) – The clustering algorithm to use. Options are “leiden” or “louvain”. Defaults to “leiden”.
marker_list (list, optional) – A list of markers for clustering. Defaults to None.
resolution (int, optional) – The resolution for the clustering algorithm. Defaults to 1.
n_neighbors (int, optional) – The number of neighbors to use for the neighbors graph. Defaults to 10.
reclustering (bool, optional) – Whether to recluster the data. Defaults to False.
key_added (str, optional) – The key name to add to the adata object. Defaults to None.
key_filter (str, optional) – The key name to filter the adata object. Defaults to None.
subset_cluster (list, optional) – The list of clusters to subset. Defaults to None.
seed (int, optional) – Seed for random state. Default is 42.
fs_xdim (int, optional) – X dimension for FlowSOM. Default is 10.
fs_ydim (int, optional) – Y dimension for FlowSOM. Default is 10.
fs_rlen (int, optional) – Rlen for FlowSOM. Default is 10.
**cluster_kwargs (dict) – Additional keyword arguments for the clustering function.
- Returns:
The annotated data matrix with the clustering results added.
- Return type:
AnnData
- spacec.tools.filter_interactions(distance_pvals, pvalue=0.05, logfold_group_abs=0.1, comparison='condition')[source]
Filters interactions based on p-value, logfold change, and other conditions.
- Parameters:
distance_pvals (pandas.DataFrame) – DataFrame containing p-values, logfold changes, and interactions for each comparison.
pvalue (float, optional) – The maximum p-value to consider for significance. Defaults to 0.05.
logfold_group_abs (float, optional) – The minimum absolute logfold change to consider for significance. Defaults to 0.1.
comparison (str, optional) – The comparison condition to filter by. Defaults to “condition”.
- Returns:
dist_table (pandas.DataFrame) – DataFrame containing logfold changes sorted into two columns by the comparison condition.
distance_pvals_sig_sub (pandas.DataFrame) – Subset of the original DataFrame containing only significant interactions based on the specified conditions.
- spacec.tools.identify_interactions(adata, cellid, x_pos, y_pos, cell_type, region, comparison, iTriDist_keyname=None, triDist_keyname=None, min_observed=10, distance_threshold=128, num_cores=None, num_iterations=1000, key_name=None, correct_dtype=False)[source]
Identify interactions between cell types based on their spatial distances.
- Parameters:
adata (AnnData) – Annotated data matrix.
id (str) – Identifier for cells.
x_pos (str) – Column name for x position of cells.
y_pos (str) – Column name for y position of cells.
cell_type (str) – Column name for cell type.
region (str) – Column name for region.
comparison (str) – Column name for comparison.
iTriDist_keyname (str, optional) – Key name for iterative triangulation distances, by default None
triDist_keyname (str, optional) – Key name for triangulation distances, by default None
min_observed (int, optional) – Minimum number of observed distances, by default 10
distance_threshold (int, optional) – Threshold for distance, by default 128
num_cores (int, optional) – Number of cores to use for computation, by default None
num_iterations (int, optional) – Number of iterations for computation, by default 1000
key_name (str, optional) – Key name for output, by default None
correct_dtype (bool, optional) – Whether to correct data type or not, by default False
- Returns:
DataFrame with p-values and logfold changes for interactions.
- Return type:
DataFrame
- spacec.tools.label_tissue(resized_im, lower_cutoff=0.012, upper_cutoff=0.025, savefig=False, showfig=True, output_dir='./', output_fname='')[source]
Label the tissue in the given image.
- Parameters:
resized_im (ndarray) – The resized image to label.
lower_cutoff (float, optional) – The lower cutoff for the sobel filter, by default 0.012.
upper_cutoff (float, optional) – The upper cutoff for the sobel filter, by default 0.025.
savefig (bool, optional) – Whether to save the figure or not, by default False.
showfig (bool, optional) – Whether to show the figure or not, by default True.
output_dir (str, optional) – The directory to save the figure in, by default “./”.
output_fname (str, optional) – The filename to save the figure as, by default “”.
- Returns:
A DataFrame containing the labels from the segmentation.
- Return type:
DataFrame
- spacec.tools.ml_predict(adata_val, svc, save_name='svm_pred', return_prob_mat=False)[source]
Predict labels for a given dataset using a trained Support Vector Classifier (SVC) model.
- Parameters:
adata_val (AnnData) – The validation data as an AnnData object.
svc (SVC) – The trained Support Vector Classifier model.
save_name (str, optional) – The name under which the predictions will be saved in the AnnData object, by default “svm_pred”.
return_prob_mat (bool, optional) – Whether to return the probability matrix, by default False.
- Returns:
If return_prob_mat is True, returns a DataFrame with the probability matrix. Otherwise, returns None.
- Return type:
DataFrame or None
- spacec.tools.ml_train(adata_train, label, test_size=0.33, random_state=0, model='svm', nan_policy_y='raise', showfig=True, figsize=(10, 8))[source]
Train a svm model on the provided data.
- Parameters:
adata_train (AnnData) – The training data as an AnnData object.
label (str) – The label to predict.
test_size (float, optional) – The proportion of the dataset to include in the test split, by default 0.33.
random_state (int, optional) – The seed used by the random number generator, by default 0.
model (str, optional) – The type of model to train, by default “svm”.
nan_policy_y (str, optional) – How to handle NaNs in the label, by default “raise”. Can be either ‘omit’ or ‘raise’.
showfig (bool, optional) – Whether to show the confusion matrix as a heatmap, by default True.
- Returns:
The trained Support Vector Classifier model.
- Return type:
SVC
- Raises:
ValueError – If nan_policy_y is not ‘omit’ or ‘raise’.
- spacec.tools.neighborhood_analysis(adata, unique_region, cluster_col, X='x', Y='y', k=35, n_neighborhoods=30, elbow=False, metric='distortion')[source]
Compute for Cellular neighborhoods (CNs).
- Parameters:
adata (AnnData) – Annotated data matrix.
unique_region (str) – Each region is one independent CODEX image.
cluster_col (str) – Columns to compute CNs on, typically ‘celltype’.
X (str, optional) – X coordinate column name, by default “x”.
Y (str, optional) – Y coordinate column name, by default “y”.
k (int, optional) – Number of neighbors to compute, by default 35.
n_neighborhoods (int, optional) – Number of neighborhoods one ends up with, by default 30.
elbow (bool, optional) – Whether to test for optimal number of clusters and visulize as elbow plot or not, by default False. If set to true the funktion will test 1 to n_neighborhoods and plots the distortion score in an elbow plot to assist the user in finding the optimal number of clusters.
metric (str, optional) – The metric to use when calculating distance between instances in a feature array, by default “distortion”.
- Returns:
Annotated data matrix with updated neighborhood information.
- Return type:
AnnData
- spacec.tools.patch_proximity_analysis(adata, region_column, patch_column, group, min_cluster_size=80, x_column='x', y_column='y', radius=128, edge_neighbours=3, plot=True, savefig=False, output_dir='./', output_fname='', key_name='ppa_result', plot_color='#6a3d9a')[source]
Performs a proximity analysis on patches of a given group within each region of a dataset.
Parameters: adata (AnnData): The annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to genes. region_column (str): The name of the column in the DataFrame that contains the region information. patch_column (str): The name of the column in the DataFrame that contains the patch information. group (str): The group to perform the proximity analysis on. min_cluster_size (int, optional): The minimum number of samples required to form a dense region. Default is 80. x_column (str, optional): The name of the column in the DataFrame that contains the x-coordinate. Default is ‘x’. y_column (str, optional): The name of the column in the DataFrame that contains the y-coordinate. Default is ‘y’. radius (int, optional): The radius within which to identify points in proximity. Default is 128. edge_neighbours (int, optional): The number of edge neighbours to consider. Default is 3. plot (bool, optional): Whether to plot the patches. Default is True. savefig (bool, optional): Whether to save the figure. Default is False. output_dir (str, optional): The directory to save the figure in. Default is “./”. output_fname (str, optional): The filename to save the figure as. Default is “”. key_name (str, optional): The key name to store the results in the AnnData object. Default is ‘ppa_result’.
Returns: final_results (DataFrame): A DataFrame containing the results of the proximity analysis. outlines_results (DataFrame): A DataFrame containing the outlines of the patches.
- spacec.tools.save_labelled_tissue(filepath, tissueframe, region='region', padding=50, downscale_factor=64, output_dir='./', output_fname='')[source]
Save the labelled tissue from the given image.
- Parameters:
filepath (str) – The path to the image file.
tissueframe (DataFrame) – The DataFrame containing the labels from the segmentation.
region (str, optional) – The region to group by, by default “region”.
padding (int, optional) – The padding to add to the extracted tissue, by default 50.
downscale_factor (int, optional) – The factor to downscale the image by, by default 64.
output_dir (str, optional) – The directory to save the image in, by default “./”.
output_fname (str, optional) – The filename to save the image as, by default “”.
- Return type:
None