ThrombusSegmentator
This is the official documentation for the ThrombusSegmentator model, a deep-learning pipeline that
segments thrombus regions from CT (Computed Tomography) volumes. The model is packaged as a Tango artifact
under the name thrombussegmentator and is built on top of MONAI's SegResNet
architecture.
Description
ThrombusSegmentator is a Python package that implements a complete pipeline for binary semantic
segmentation of thrombus in CT imaging. Two SegResNet networks are trained and shipped together:
- a 3-D network that operates on the full CT volume, and
- a 2-D network that operates slice-by-slice on the axial planes of the same volume.
Both branches share identical preprocessing (CT intensity windowing, median smoothing, intensity rescaling) and are invoked jointly, producing two complementary segmentation masks for the same input. Packaging the network as an open and transparent Tango artifact lets it be trained, tracked and served through the standard Tango / MLflow tooling.
It provides:
- Model training & tracking: train the 2-D and 3-D segmentation networks and log them (together with their weights and inference signature) to an MLflow tracking server and model registry.
- File-based inference: download a CT volume, run both segmentation branches, and upload the resulting masks — all driven by pre-signed URLs.
- Explainability hook: an optional explainer that, in addition to the masks, returns raw logits and
aggregate scores (
thrombus_probability,confidence).
Once registered, the Tango driver exposes a single predict entrypoint that:
- Accepts a request mapped to a single-row pandas DataFrame carrying an
input_file_urland anoutput_file_url - Performs download, preprocessing, 2-D and 3-D inference, and post-processing in one call
- Returns a synchronous status envelope while writing the predicted segmentation masks to the
output_file_url
Inputs and Outputs
Inputs
The model is invoked with file references, not inline payloads. The Tango request is translated by the
ModelMapper (src/thrombussegmentator/ml_models/model_mapper.py) into a single-row pandas DataFrame with
two string columns:
input_file_url— pre-signed URL of the.npzfile holding the CT volume to segmentoutput_file_url— pre-signed URL the model POSTs the result to
Input file (.npz)
The file referenced by input_file_url is a NumPy .npz archive containing:
volume— the CT volume as a 3-D NumPy array (e.g. shape(H, W, D))non_mdc_mean— scalar mean used to center the CT intensity windownon_mdc_var— scalar variance used to derive the intensity window width
The non-contrast medium statistics can be estimated with TotalSegmentator from the predicted Hounsfield Units (HU) of the Inferior Vena Cava.
A working example is provided at src/data/input_file_example.npz (volume of shape (60, 60, 61),
non_mdc_mean = 250, non_mdc_var = 1000).
Preprocessing
Each input volume is, in order (thrombus_segmentato_predictor.py):
- CT-windowed using
non_mdc_mean/non_mdc_varto map Hounsfield values into the[0, 255]range - Median-smoothed (radius 1)
- Rescaled to the
[-1, 1]intensity range - Inferred with MONAI's
SlidingWindowInfererover a single-channel input, using a ROI of256×256×64for the 3-D branch and256×256for the 2-D branch (volumes smaller than the ROI are padded).
Outputs
Prediction Outputs
The model writes its result to output_file_url as a compressed .npz archive containing two
voxel/pixel-wise binary masks:
pred_2d— mask produced by the 2-D branchpred_3d— mask produced by the 3-D branch
Both masks are uint8 arrays where:
0: background1: thrombus
The decision is a hard argmax over the two output channels (no tunable threshold). The synchronous HTTP
response body is a small status envelope:
{ "status": "OK" }
Explanation Outputs
When the optional ModelExplainer is enabled, predict returns a richer JSON object instead of writing a
file:
| Field | Meaning |
|---|---|
pred_2d_mask | 2-D branch binary mask (list) |
pred_3d_mask | 3-D branch binary mask (list) |
logits_2d | Raw 2-D network logits |
logits_3d | Raw 3-D network logits |
thrombus_probability | Mean softmax probability of the foreground (thrombus) class (3-D) |
confidence | Mean of the per-voxel maximum softmax probability (3-D) |
Artifacts
A model training run logs the following artifacts to the MLflow tracking server / registry:
.
├── thrombus_segmentator/ # ThrombusModel (pyfunc) — runs 2-D + 3-D inference
│ └── artifacts/
│ ├── model2d # best_metric_model_2d.pth (SegResNet weights, 2-D)
│ └── model3d # best_metric_model_3d.pth (SegResNet weights, 3-D)
└── thrombus_segmentator_mapper/ # ModelMapper (pyfunc) — request/response mapping
The mapper is also registered under the model registry name thrombus_segmentator_mapper.
Model signature
The signature is inferred from src/data/model_file_signature.json and describes the file-reference
contract exchanged with the Tango driver:
{
"inputs": [
{ "name": "input_file_url", "type": "string" },
{ "name": "output_file_url", "type": "string" }
],
"outputs": [
{ "name": "input_file_url", "type": "string" },
{ "name": "output_file_url", "type": "string" }
]
}
Build
Init project
Install requirements.
pip install -r requirements.txt
Run a development environment
Install requirements.
pip install -r requirements-dev.txt
If useful, install a local version of tango-interfaces.
pip uninstall tango-interfaces
pip install -e <path>/tango-interfaces/
Run tracking server and model registry
Run a local tracking server and model registry (see run_tracking_server.sh).
mlflow ui
Prepare the environment for training/serving
Export the following environment variables.
export MLFLOW_TRACKING_URI="http://127.0.0.1:5000"
export MLFLOW_EXPERIMENT_NAME="thrombus-segmentator"
Use these if the server requires authentication.
export MLFLOW_TRACKING_USERNAME=<username>
export MLFLOW_TRACKING_PASSWORD=<password>
Make a training run
Ensuring the environment variables above are exported, launch a run (this operation takes some time as it
creates the environment). See run_train_model.sh for a ready-made script.
mlflow run --env-manager local ./src
The entrypoint (src/thrombussegmentator/main.py) accepts the following parameters:
model_name(str, defaultthrombussegmentator) — registered model namen_trials(int, default5)test_size(float, default0.2)
Note: the training configs (
src/thrombussegmentator/configs/config_2d_model.json,config_3d_model.json) andtraining.pycurrently contain hardcoded, developer-local dataset and config paths (/Users/erichr/...). Pointdataset_dir/ the config paths at your own.npyvolume + label corpus before training on real data.
Running local model server
Install requirements.
pip install -r requirements-dev-modelserver.txt
Run a model server
Access the experiment from the tracking server web UI at http://127.0.0.1:5000.

Copy the experiment run id.

Serve the model by name (see run_model_server.sh).
export MODEL_NAME="thrombussegmentator"
mlflow models serve -m "models:/$MODEL_NAME/latest" --enable-mlserver -p 5001
Alternatively, set the run id and serve by run.
export MLFLOW_RUN_ID=<runid>
mlflow models serve -m runs:/$MLFLOW_RUN_ID/thrombus_segmentator --enable-mlserver -p 5001