SPACEc: Cell Segmentation

The aim of cell segmentation is to identify each cell within a given image and derive single cell data from the image. For that we provide two commonly used approaches: Deepcell Mesmer and Cellpose. Mesmer is a deep learning-enabled segmentation algorithm that works out-of-the-box for most multiplexed images. Apart from that we provide Cellpose that is a deep learning-enabled segmentation algorithm as well but provides different models that can be directly employed. Additionally, Cellpose allows users to easily train their own models.

For the purpose of this tutorial we will use Mesmer.

The steps of the script are:

  1. Deciding on the image channels for segmentation

  2. Running segmentation

  3. Quality control the segmented images

  4. Store the data for further processing

# import spacec first
import spacec as sp

#import standard packages
import os
import warnings
import matplotlib
import pickle
warnings.filterwarnings('ignore')
INFO:root: * TissUUmaps version: 3.1.1.6
# set the default color map to viridis, the below paramters can be chanaged
matplotlib.rcParams["image.cmap"] = 'viridis'
root_path = "/home/user/path/SPACEc/" # inset your own path
data_path = root_path + 'example_data/raw/' # where the data is stored

# where you want to store the output
output_dir = root_path + 'example_data/output/' # inset your own path
os.makedirs(output_dir, exist_ok=True)

If you want to use GPU acceleration for the segmentation you should ensure that CUDA is installed and the GPU detected by the python environment. If you have a compatible Nvidia card but never installed CUDA Toolkit before you can find installation instructions here: https://developer.nvidia.com/cuda-downloads

#check if GPU availability
!nvcc --version
!nvidia-smi
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
Tue May 28 17:17:29 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1070        Off |   00000000:01:00.0  On |                  N/A |
|  0%   49C    P2             40W /  180W |    1080MiB /   8192MiB |     10%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      3782      G   /usr/lib/xorg/Xorg                            382MiB |
|    0   N/A  N/A      3928      G   /usr/bin/gnome-shell                          148MiB |
|    0   N/A  N/A      5593      G   ...erProcess --variations-seed-version        245MiB |
|    0   N/A  N/A      6283      G   ...irefox/4259/usr/lib/firefox/firefox         95MiB |
|    0   N/A  N/A      7103      G   ...yOnDemand --variations-seed-version         67MiB |
|    0   N/A  N/A     61413      G   ...ures=SpareRendererForSitePerProcess         24MiB |
|    0   N/A  N/A     61713    C+G   warp-terminal                                  16MiB |
|    0   N/A  N/A    116255      C   ...m/miniforge3/envs/spacec/bin/python         84MiB |
+-----------------------------------------------------------------------------------------+
sp.hf.check_for_gpu()
INFO:cellpose.core:** TORCH CUDA version installed and working. **
GPU is available to Tensorflow
GPU is available to Pytorch

Cell segmentation

NOTE: Our segmentation function features a parameter called ‘input_format’. This parameter defines what input data the function accepts. If set to ‘Multichannel’ the function expects a single multichannel tiff file, if set to ‘Channels’ the function expects a folder with single Tiff files (no channelnames.txt required) and if set to ‘CODEX’ the function reads the output of the classic first gen CODEX setup.

Before committing to potentially time intense segmentation it might be useful to visualize the segmentation channels. In this tutorial we provide both nuclei and membrane channels. Especially if no general membrane marker is available it is useful to combine membrane markers as shown below.

# (optional, one can just use nuclei for segmentation)
# Visualize membrane channels to use for cell segmentation 

sp.pl.segmentation_ch(
    file_name = output_dir + 'reg001_X01_Y01_Z01.tif', # image for segmentation
    channel_file = data_path + 'channelnames.txt', # all channels used for staining
    output_dir = output_dir, #
    extra_seg_ch_list = ["CD45", "betaCatenin"], #default is None; if provide more than one channel, then they will be combined
    nuclei_channel = 'DAPI', # channel to use for nuclei segmentation
    input_format = 'Multichannel', 
)
../_images/0b3f4caf405773286b80a6940caf4953eff287d042830fe5b6e0c427d3bedc7a.png

After deciding on the channels for segmentation, segmentation is performed with the ‘cell_segmentation’ function. Besides choosing the channels the function allows to select the segmentation model. The function expects a multichannel tif file and a channel names file as input (please see our example data as an example). The segmentation output is stored as csv file.

# choose between cellpose or mesmer for segmentation
# first image
# seg_output contains {'img': img, 'image_dict': image_dict, 'masks': masks}
seg_output1 = sp.tl.cell_segmentation(
    file_name = output_dir + 'reg001_X01_Y01_Z01.tif',
    channel_file = data_path + 'channelnames.txt',
    output_dir = output_dir,
    seg_method ='mesmer', # cellpose or mesmer
    nuclei_channel = 'DAPI',
    output_fname = 'tonsil1',
    membrane_channel_list = ["CD45", "betaCatenin"], #default is None; if provide more than one channel, then they will be combined
    compartment = 'whole-cell', # mesmer # segment whole cells or nuclei only
    input_format ='Multichannel', # Phenocycler or codex
    resize_factor=1, # default is 1; if the image is too large, lower the value. Lower values will speed up the segmentation but may reduce the accuracy.
    size_cutoff = 0)
Create image channels!
Segmenting with Mesmer!
INFO:root:Converting image dtype to float
../_images/b18a639bb2632424688524a4726f8c173fc3f300bb2add065908ba9481fc0be1.png
Quantifying features after segmentation!
Processing channels: 100%|██████████| 59/59 [00:19<00:00,  3.02it/s]
Done!
# second image
# choose the method that is consistent of the first image for a more comparable result
# seg_output contains {'img': img, 'image_dict': image_dict, 'masks': masks}
seg_output2 = sp.tl.cell_segmentation(
    file_name = output_dir + 'reg002_X01_Y01_Z01.tif',
    channel_file = data_path + 'channelnames.txt',
    output_dir = output_dir,
    output_fname = 'tonsil2',
    seg_method ='mesmer', # cellpose or mesmer
    nuclei_channel = 'DAPI',
    membrane_channel_list = ["CD45", "betaCatenin"], #default is None #default is None; if provide more than one channel, then they will be combined
    input_format ='Multichannel', # Phenocycler or codex
    compartment = 'whole-cell', # mesmer # segment whole cells or nuclei only
    resize_factor=1, # default is 1; if the image is too large, lower the value. Lower values will speed up the segmentation but may reduce the accuracy.
    size_cutoff = 0) 
Create image channels!
Segmenting with Mesmer!
2024-05-28 17:21:40.525249: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-28 17:21:40.554122: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-28 17:21:40.554333: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-28 17:21:40.554927: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-28 17:21:40.555638: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-28 17:21:40.555836: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-28 17:21:40.555927: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-28 17:21:40.556138: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-28 17:21:40.556267: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-28 17:21:40.556391: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-28 17:21:40.556507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3385 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1
INFO:root:Converting image dtype to float
2024-05-28 17:21:47.971129: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8902
../_images/386a080eb84fbaa0840755c3984a455c9bf80dfc6af9843f07600266c9eb5fe0.png
Quantifying features after segmentation!
Processing channels: 100%|██████████| 59/59 [00:14<00:00,  4.18it/s]
Done!

In addition to the mesmer segmentation that is used in our example you can use cellpose as shown in the example below.

seg_output_cellpose = sp.tl.cell_segmentation(
    file_name = output_dir + 'reg002_X01_Y01_Z01.tif',
    channel_file = data_path + 'channelnames.txt',
    output_dir = output_dir,
    output_fname = 'tonsil2',
    seg_method ='cellpose', # cellpose or mesmer
    model='cyto3', # cellpose model
    diameter=28, # average cell diameter (in pixels). If set to None, it will be automatically estimated.
    nuclei_channel = 'DAPI',
    membrane_channel_list = ["CD45", "betaCatenin"], #default is None #default is None; if provide more than one channel, then they will be combined
    input_format ='Multichannel', # Phenocycler or codex
    resize_factor=1, # default is 1; if the image is too large, lower the value. Lower values will speed up the segmentation but may reduce the accuracy.
    size_cutoff = 0) 
Create image channels!
Segmenting with Cellpose!
Selecting nucleus and membrane channel for segmentation.
INFO:cellpose.core:** TORCH CUDA version installed and working. **
INFO:cellpose.core:>>>> using GPU
INFO:cellpose.models:>> cyto3 << model set to be used
INFO:cellpose.models:>>>> loading model /home/tim/.cellpose/models/cyto3
INFO:cellpose.models:>>>> model diam_mean =  30.000 (ROIs rescaled to this size during training)
INFO:cellpose.models:channels set to [1, 3]
INFO:cellpose.models:~~~ FINDING MASKS ~~~
INFO:cellpose.models:>>>> TOTAL TIME 12.93 sec
diameter set to: 28
Quantifying features after segmentation!
Processing channels: 100%|██████████| 59/59 [00:14<00:00,  4.00it/s]
Done!

Viusalizing the segmentation result

Not every dataset works equally well with all segmentation models due to differences in tissue type, structure or image quality. Therefore, it is of major importance to check the segmentation results before continuing with the data analysis. the ‘show_masks’ function selects random tiles of a user defined size from the image to provide examples to evaluate the segmentation quality. If the segmentation quality is not acceptable a different model should be tried. For especially challenging datasets users can also try to retrain a model specifically for their images.

overlay_data1, rgb_images1 = sp.pl.show_masks(
    seg_output=seg_output1,
    nucleus_channel = 'DAPI', # channel used for nuclei segmentation (displayed in blue)
    additional_channels = ["CD45", "betaCatenin"], # additional channels to display (displayed in green - channels will be combined into one image)
    show_subsample = True, # show a random subsample of the image
    n=2, #need to be at least 2
    tilesize = 300,# number of subsamples and tilesize
    rand_seed = 1)
../_images/b18a639bb2632424688524a4726f8c173fc3f300bb2add065908ba9481fc0be1.png ../_images/8c7e622460198447b274fff9d42c48d91457039fe6c45a7039950c60f5e345b9.png
overlay_data2, rgb_images2 = sp.pl.show_masks(
    seg_output=seg_output2,
    nucleus_channel = 'DAPI', # channel used for nuclei segmentation (displayed in blue)
    additional_channels = ["CD45", "betaCatenin"], # additional channels to display (displayed in green - channels will be combined into one image)
    show_subsample = True, # show a random subsample of the image
    n=2, #need to be at least 2
    tilesize = 300, # number of subsamples and tilesize
    rand_seed = 3) 
../_images/386a080eb84fbaa0840755c3984a455c9bf80dfc6af9843f07600266c9eb5fe0.png ../_images/9c69f1e97c89e84a7888c5168eb0c0a701fc1e424f46e9757693acb390973779.png

Save the segmentation result

After successful segmentation, the images and masks can be stored in a pickle file for later easy access.

#Save segmentation output
with open(output_dir + 'seg_output_tonsil1.pickle', 'wb') as f:
    pickle.dump(seg_output1, f)

with open(output_dir + 'seg_output_tonsil2.pickle', 'wb') as f:
    pickle.dump(seg_output2, f)
    
#Save the overlay of the data
with open(output_dir + 'overlay_tonsil1.pickle', 'wb') as f:
    pickle.dump(overlay_data1, f)

with open(output_dir + 'overlay_tonsil2.pickle', 'wb') as f:
    pickle.dump(overlay_data2, f)