Machine Learning Model
A Machine Learning (ML) Model is the entry point class for performing inference with
models trained in external frameworks or with the imfusion-ml-training
python module.
An ML Model consists of three main components:
Data pipelines for preparing the input to be feeded to the model, and post-processing the model prediction. Those are specified by two OperationSequence objects, one for the pre-processing and the other for the post-processing.
A Sampling strategy to tile the input image into patches and recombine the predictions relative to each tile. This is necessary in case the feature maps produced by the model exceed the system RAM/VRAM.
An Engine abstracting and interfacing the external framework used to train the model, i.e. pytorch. See the Engine page for more details.
Each of these components is configured via a yaml file that we refer to as the Inference Yaml Config.
Upon successfule configuration, the MachineLearningModel
performs the inference by calling the predict(const DataItem& input)
method.
First, the pre-processing operations are applied to the input DataItem, then the sampling strategy is applied to the output of the pre-processing. The result is then fed into the Engine, which performs the actual inference.
The Engine returns a DataItem with the same number of fields as the number of outputs specified in the Inference Yaml Config. The post-processing operations are then applied to the Engine output, and the final result is returned.
Let’s see how this can all be expressed with the Inference Yaml Configuration.
Inference YAML Config
Below is a configuration file that sets up a Model for semantic segmentation. As an example, let’s assume we trained a model that segments the Kidneys on a CT scan, and if a tumor is present it detects it as well. This is a standard use-case, with one input and one prediction returned.
Version: 6
Type: NeuralNetwork
Name: KidneySegmentation
Description: Segmentation of kidneys and tumor if present. # Provides a short description of what the model is about
# The prediction output is an image like the input. Supported outputs are [Image, Vector, Keypoints, BoundingBoxes]
PredictionOutput: Image
# Semantic segmentation is a classification problem for every pixel in the input Image. Possible types are [Classification, Regression]
PredictionType: Classification
#############################################################################################
# Configure the backend framework to be used as Engine to perform the actual inference
#############################################################################################
Engine:
# Specifies with backend framework needs to be used to run the trained/saved/traced model
Name: torch # Could be [torch, onnx, tensorrt] from a C++ plugin, or a python engine [pytorch, pyonnxruntime, pyopenvino, coreml, or any custom engine]
ModelFile: traced_model.pt # Path to the actual model file (could be a onnx file), either relative to the current working directory or an absolute path
ForceCPU: false # Set it to true if you want to perform the inference on the CPU instead of the GPU
# ... here we can define other parameters specific to the Engine used, i.e. `Version: 2.2.2`
Version: 2.2.2
# Controls the verbosity level of model logging
Verbose: false
# Maximum number of images to run through the network simultaneously
MaxBatchSize: 1
# Names of the different label values encoded as channels of the model prediction.
# In our example, 0: Background, 1: Kidney, 2: Tumor. Note that we don't name the Background
LabelNames: [Kidney, Tumor]
#############################################################################################
# Sequence of preprocessing operations run before the network
# (all available operations are available in the Python documentation of the SDK)
#############################################################################################
PreProcessing:
# If the image has a non-identity rotation matrix, bake the transformation to the image voxels
# This is necessary because the engine implementation input is typically a tensor object, without any
# notion of the image coordinate system.
- BakeTransformation: {}
# Resample the image to a fixed resolution of 1.5mm
- Resample:
resolution: 1.5
# Normalize the image intensity values using percentile-based scaling
- NormalizePercentile:
min: 0.001
max: 0.999
clip: False
#############################################################################################
# For pixelwise (fully convolutional) models, it might be necessary to split the input in sub-images
# because of GPU memory constraints, especially for 3D volumes.
# Each of those images will be fed into the network and the predictions will be recombined.
# This section can be removed for imagewise models.
#############################################################################################
Sampling:
# Maximum size of the sub-image (set to -1 if you never want to split the image)
- MaxSizeSubdivision: 96
# Some network architectures require each sub-image dimension to be a multiple of this number (i.e. UNet)
- DimensionDivisor: 16
# Remove the padding from the prediction (after recombination) that was perform on the model input
- SkipUnpadding: false
# Sub-images are extracted with an overlap (in pixels) in order to avoid border effect
- PixelsOverlap: 32
# Weigh the different contributions at each pixel of overlap regions based on their position
- RecombineWeighted: true
# How to pad the image when extracting sub-images at the border. This is also used
# when the image needs to ba padded to next multiple of `DimensionDivisor`.
- PaddingMode: Mirror # Possible values are [Mirror, Zero, Clamp]
#############################################################################################
# Sequence of preprocessing operations run
# after the network and the recombination of the sub-images
# (all available operations are available in the Python documentation of the SDK)
#############################################################################################
PostProcessing:
- ResampleToInput: {} # Resample the prediction image back to the original image
- ArgMax: {} # Convert the multi-channel probability map to a label map
# Acknowledge any public or licensed database/codebase used in the model development. Omit this section when using private datasets
Acknowledgments:
Dataset-Name: # name of the public dataset used, i.e. "Total Segmentator"
Authors: <name of the authors with affiliation>
Website: <url-to-database>
License: <license type> # i.e 'Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode'
# Whether the involved licensing allows to use the model commercially.
CommercialUseAllowed: True
In the case of single input models returning a single prediction, like in the above example, there are a number of fields in the YAML that are omitted and defaulted. Relevant fields for which it can be useful to know the default value are:
PreprocessingInputFields
: if not specified, a single input field is assumed and is named[Input]
.Engine.InputFields
: if not specified, a single input field is assumed and is named[Input]
.Engine.OutputFields
: if not specified, a single output field is assumed and is named[Prediction]
.
This means that in the above example, the input data item given to the MachineLearningModel::predict
function shall have a single field named [Input]
,
and the output data item will have a single field named [Prediction]
. This is the most common case, and there is also an overload of the predict
method
that takes a SharedImageSet as input and returns a SharedImageSet as output.
However, models can be more complex than that. The MachineLearningModel class and the Inference Yaml Configuration support more generic multi-input, multi-output models, of heterogeneous types, and both pre-processing and post-processing can produce DataItems with a different number of fields as a result. Let’s define a more complex situation. Let’s assume we trained a model that detect metastases in brain MRI scans. The model takes as input three MRI sequences (T1, T2 and FLAIR) which are combined into a single multi-channel image, and produces a binary mask together with a set of bounding boxes delimiting each lesion. Furthermore, for the input pre-processing, we used some keypoint-based approach to align the input scans to a common frame of reference. Imagine for instance that we have another model detecting the Anterior and Posterior Commissures, and a method for determining the mid-sagittal plane of the brain. For the sake of the example, this method is encapsuleted in an Operation called “AlignToACPCReference”, which takes an image and a set of keypoints as input and returns the aligned image.
Let’s see how this can all be expressed with the Inference Yaml Configuration:
Version: 6
Type: NeuralNetwork
Name: MetastasesDetection
Description: Detection and Segmentation of brain metastases on MRI scan
# We output two prediction here, a segmentation mask and the lesion bounding boxes
PredictionOutput: [Image, BoundingBoxes]
# The number of prediction type has to math the number of prediction outputs. In this case,
# we have a semantic segmentation and a bounding box regression
PredictionType: [Classification, Regression]
# Since the pre-processing has a non-trivial pipeline, we need to specify the preprocessing input fields explicitly.
# The default value is [Input], which is automatically set when the input DataItem contains a single Element.
PreprocessingInputFields: [T1, T2, FLAIR, ACPCPoints] # Default value: [Input]
# The operations in the pre-processing must specify which fields they need to operate on
PreProcessing:
- MergeAsChannels: # applies the affine transformation of the input such that the results transformation is the identity
apply_to: [T1, T2, FLAIR] # the order matters here
output_field: MRI
remove_fields: True # the resulting DataItem now contains the fields [MRI, ACPCPoints]
- AlignToACPCReference: # Example of user-defined operation
image_to_align: MRI
reference_points: ACPCPoints
- Remove: # we don't need them anymore
apply_to: ACPCPoints
# At this point we just have a single field in the DataItem: MRI. After this the operation sequence
# is the same as for a single input model
- Resample: # Resamples all the elements in the input to the desired target resolution
resolution: 1.0
- NormalizeUniform:
min: 0.0
max: 1.0
Engine:
Name: torch # Could be [torch, onnx, tensorrt]
ModelFile: traced_model.pt # Path to the actual model file (could be a onnx file)
ForceCPU: false # Set it to true if you want to perform the inference on the CPU instead of the GPU
InputFields: [MRI] # This has to match the field from the pre-processing that we want to use
# Names of the output fields, this has to match the number of prediction outputs from the model
# If this is not specified, a single output is assumed and the output field will be named [Prediction]
OutputFields: [Tumor, LesionBoxes]
# ... here we can define other parameters specific to the Engine used, i.e. `TorchVersion: 2.2.2`
# With multiple outputs, each output can have its own name mapping
LabelNames:
Tumor: [MetastasesMask] # Display the value 1 of the label map as "MetastasesMask" instead of "1" in the DisplayOptions of the Tumor segmentation map.
LesionBoxes: [Lesion] # Display the value 1 of the BoundingBoxes label as "Lesion" instead of "1" in the DisplayOptions of the LesionBoxes.
# The sampling is performed on the pre-processing output
Sampling:
# Maximum size of the sub-image (set to -1 if you never want to split the image)
- MaxSizeSubdivision: 96
# Some network architectures require each sub-image dimension to be a multiple of this number (i.e. UNet)
- DimensionDivisor: 16
# Remove the padding from the prediction (after recombination) that was perform on the model input
- SkipUnpadding: false
# Sub-images are extracted with an overlap (in pixels) in order to avoid border effect
- PixelsOverlap: 32
# Weigh the different contributions at each pixel of overlap regions based on their position
- RecombineWeighted: true
# How to pad the image when extracting sub-images at the border. This is also used
# when the image needs to ba padded to next multiple of `DimensionDivisor`.
- PaddingMode: Mirror # Possible values are [Mirror, Zero, Clamp]
PostProcessing:
# We apply the post-processing only to the "Tumor" prediction output. If ``apply_to`` is not specified, it is applied to all fields that are handled by the operation.
# In this particular example, omitting the ``apply_to`` field is equivalent to ``apply_to: [Tumor]``, since neither the ``ResampleToInput`` nor the ``ArgMax``
# operations handle ``BoundingBoxes`` elements.
- ResampleToInput:
apply_to: Tumor
# we apply it only to the "Tumor" prediction output
- ArgMax:
apply_to: Tumor
Note
See the Changelog of the Inference YAML Configuration.