Machine Learning Model

A Machine Learning (ML) Model is the entry point class for performing inference with models trained in external frameworks or with the imfusion-ml-training python module.

An ML Model consists of two main components:

  1. Data pipelines for preparing the input to be fed to the model, and post-processing the model prediction. Those are specified by two OperationSequence objects, one for the pre-processing and the other for the post-processing.

  2. An Engine abstracting and interfacing the external framework used to train the model, i.e. pytorch. See the Engine page for more details.

Each of these components is configured via a yaml file that we refer to as the Inference Yaml Config. Upon successful configuration, the MachineLearningModel performs the inference by calling the predict(const DataItem& input) method.

First, the pre-processing operations are applied to the input DataItem, including any patch splitting operations if needed. The result is then fed into the Engine, which performs the actual inference. If the input image was split into patches, the model is applied to each patch sequentially and the results are recombined into a single prediction.

The Engine returns a DataItem with the same number of fields as the number of outputs specified in the Inference Yaml Config. The post-processing operations are then applied to the Engine output, including any recombination operations, and the final result is returned.

Let’s see how this can all be expressed with the Inference Yaml Configuration.

Inference YAML Config

Below is a configuration file that sets up a Model for semantic segmentation. As an example, let’s assume we trained a model that segments the Kidneys on a CT scan, and if a tumor is present it detects it as well. This is a standard use-case, with one input and one prediction returned.

Version: 8
Type: NeuralNetwork
Name: KidneySegmentation
Description: Segmentation of kidneys and tumor if present. # Provides a short description of what the model is about
# The prediction output is an image like the input. Supported outputs are [Image, Vector, Keypoints, BoundingBoxes]
PredictionOutput: Image
# Semantic segmentation is a classification problem for every pixel in the input Image. Possible types are [Classification, Regression]
PredictionType: Classification

#############################################################################################
# Configure the backend framework to be used as Engine to perform the actual inference
#############################################################################################
Engine:
  # Specifies with backend framework needs to be used to run the trained/saved/traced model
  Name: torch # Could be [torch, onnx, tensorrt] from a C++ plugin, or a python engine [pytorch, pyonnxruntime, pyopenvino, coreml, or any custom engine]
  ModelFile: traced_model.pt # Path to the actual model file (could be a onnx file), either relative to the current working directory or an absolute path
  ForceCPU: false # Set it to true if you want to perform the inference on the CPU instead of the GPU
  # ... here we can define other parameters specific to the Engine used, i.e. `Version: 2.2.2`
  Version: 2.2.2

# Controls the verbosity level of model logging
Verbose: false
# Maximum number of images to run through the network simultaneously
MaxBatchSize: 1
# Names of the different label values encoded as channels of the model prediction.
# In our example, 0: Background, 1: Kidney, 2: Tumor. Note that we don't name the Background
LabelNames: [Kidney, Tumor]

#############################################################################################
# Sequence of preprocessing operations run before the network
# (all available operations are available in the Python documentation of the SDK)
#############################################################################################
PreProcessing:
  # If the image has a non-identity rotation matrix, bake the transformation to the image voxels
  # This is necessary because the engine implementation input is typically a tensor object, without any
  # notion of the image coordinate system.
  - BakeTransformation: {}
  # Resample the image to a fixed resolution of 1.5mm
  - Resample:
      resolution: 1.5
  # Normalize the image intensity values using percentile-based scaling
  - NormalizePercentile:
      min: 0.001
      max: 0.999
      clip: False
  # For pixelwise (fully convolutional) models, it might be necessary to split the input in sub-images
  # because of GPU memory constraints, especially for 3D volumes.
  # Each of those images will be fed into the network and the predictions will be recombined.
  # This operation can be removed for imagewise models.
  - SplitIntoPatchesOperation:
      patch_size: [96, 96, 96] # Size of the patches to extract from the input image, for UNets make sure it is a multiple of 2**(num_downsampling_layers)
      patch_step_size: 1.0
      padding_mode: Mirror # This might be used if the image to split is actually smaller than the roi size

#############################################################################################
# Sequence of preprocessing operations run
# after the network and the recombination of the sub-images
# (all available operations are available in the Python documentation of the SDK)
#############################################################################################
PostProcessing:
  # Recombine the patches into a single prediction image
  - RecombinePatches:
      device: ForceCPU
      mode: Weighted
  - ResampleToInput: {} # Resample the prediction image back to the original image
  - ArgMax: {}  # Convert the multi-channel probability map to a label map

# Acknowledge any public or licensed database/codebase used in the model development. Omit this section when using private datasets
Acknowledgments:
  Dataset-Name: # name of the public dataset used, i.e. "Total Segmentator"
    Authors: <name of the authors with affiliation>
    Website: <url-to-database>
    License: <license type> # i.e 'Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode'
  # Whether the involved licensing allows to use the model commercially.
  CommercialUseAllowed: True

In the case of single input models returning a single prediction, like in the above example, there are a number of fields in the YAML that are omitted and defaulted. Relevant fields for which it can be useful to know the default value are:

  • PreprocessingInputFields: if not specified, a single input field is assumed and is named [Input].

  • Engine.InputFields: if not specified, a single input field is assumed and is named [Input].

  • Engine.OutputFields: if not specified, a single output field is assumed and is named [Prediction].

This means that in the above example, the input data item given to the MachineLearningModel::predict function shall have a single field named [Input], and the output data item will have a single field named [Prediction]. This is the most common case, and there is also an overload of the predict method that takes a SharedImageSet as input and returns a SharedImageSet as output.

However, models can be more complex than that. The MachineLearningModel class and the Inference Yaml Configuration support more generic multi-input, multi-output models, of heterogeneous types, and both pre-processing and post-processing can produce DataItems with a different number of fields as a result. Let’s define a more complex situation. Let’s assume we trained a model that detect metastases in brain MRI scans. The model takes as input three MRI sequences (T1, T2 and FLAIR) which are combined into a single multi-channel image, and produces a binary mask together with a set of bounding boxes delimiting each lesion. Furthermore, for the input pre-processing, we used some keypoint-based approach to align the input scans to a common frame of reference. Imagine for instance that we have another model detecting the Anterior and Posterior Commissures, and a method for determining the mid-sagittal plane of the brain. For the sake of the example, this method is encapsulated in an Operation called “AlignToACPCReference”, which takes an image and a set of keypoints as input and returns the aligned image.

Let’s see how this can all be expressed with the Inference Yaml Configuration:

Version: 8
Type: NeuralNetwork
Name: MetastasesDetection
Description: Detection and Segmentation of brain metastases on MRI scan
# We output two prediction here, a segmentation mask and the lesion bounding boxes
PredictionOutput: [Image, BoundingBoxes]
# The number of prediction type has to math the number of prediction outputs. In this case,
# we have a semantic segmentation and a bounding box regression
PredictionType: [Classification, Regression]
# Since the pre-processing has a non-trivial pipeline, we need to specify the preprocessing input fields explicitly.
# The default value is [Input], which is automatically set when the input DataItem contains a single Element.
PreprocessingInputFields: [T1, T2, FLAIR, ACPCPoints] # Default value: [Input]
# The operations in the pre-processing must specify which fields they need to operate on
PreProcessing:
- MergeAsChannels: # applies the affine transformation of the input such that the results transformation is the identity
    apply_to: [T1, T2, FLAIR] # the order matters here
    output_field: MRI
    remove_fields: True # the resulting DataItem now contains the fields [MRI, ACPCPoints]
- AlignToACPCReference: # Example of user-defined operation
    image_to_align: MRI
    reference_points: ACPCPoints
- Remove: # we don't need them anymore
    apply_to: ACPCPoints
# At this point we just have a single field in the DataItem: MRI. After this the operation sequence
# is the same as for a single input model
- Resample: # Resamples all the elements in the input to the desired target resolution
    resolution: 1.0
- NormalizeUniform:
    min: 0.0
    max: 1.0
# For pixelwise models, split the input into patches for processing
- SplitIntoPatches:
    roi_size: [96, 96, 96] # Size of the patches to extract from the input image, for UNets make sure it is a multiple of 2**(num_downsampling_layers)
    patch_step_size: 1.0
    padding_mode: Mirror # This might be used if the image to split is actually smaller than the roi size

Engine:
  Name: torch # Could be [torch, onnx, tensorrt]
  ModelFile: traced_model.pt # Path to the actual model file (could be a onnx file)
  ForceCPU: false # Set it to true if you want to perform the inference on the CPU instead of the GPU
  InputFields: [MRI] # This has to match the field from the pre-processing that we want to use
  # Names of the output fields, this has to match the number of prediction outputs from the model
  # If this is not specified, a single output is assumed and the output field will be named [Prediction]
  OutputFields: [Tumor, LesionBoxes]
  # ... here we can define other parameters specific to the Engine used, i.e. `TorchVersion: 2.2.2`

# With multiple outputs, each output can have its own name mapping
LabelNames:
  Tumor: [MetastasesMask] # Display the value 1 of the label map as "MetastasesMask" instead of "1" in the DisplayOptions of the Tumor segmentation map.
  LesionBoxes: [Lesion] # Display the value 1 of the BoundingBoxes label as "Lesion" instead of "1" in the DisplayOptions of the LesionBoxes.

PostProcessing:
  # Recombine the patches into a single prediction image
  - RecombinePatches:
      device: ForceCPU
      mode: Weighted
  # We apply the post-processing only to the "Tumor" prediction output. If ``apply_to`` is not specified, it is applied to all fields that are handled by the operation.
  # In this particular example, omitting the ``apply_to`` field is equivalent to ``apply_to: [Tumor]``, since neither the ``ResampleToInput`` nor the ``ArgMax``
  # operations handle ``BoundingBoxes`` elements.
  - ResampleToInput:
      apply_to: Tumor
  # we apply it only to the "Tumor" prediction output
  - ArgMax:
      apply_to: Tumor

Note

See the Changelog of the Inference YAML Configuration.