Export
The Export tab allows you to export all datasets of the project in a standardized way. Both the images and the annotation will be copied to the selected Export folder, and a text file with the list of all exported datasets, as well as the path of the original files, will be saved.
The left panel of the window show a summary of the project.
Selection
You can select whether you want to export the whole dataset or only a part of it. One typical choice is to export only the annotated datasets, but a more refined choice can be made using filters in the Database tab and selecting “Export only filtered datasets”.
Data Format
ImFusion Labels supports multiple output formats, including DICOM, MHD, NIFTI, PNG, etc. Some image formats allows to store multiple images per file. In such cases, the interface wil ask you whether you want to encode the image and the labels in the same file or in two different ones.
Label Encoding
This group allows you to select how the labels will be encoded in the exported files.
The encoding is entirely dependent on the type of project. If multiple types are enabled, a drop-down can be used to chose which type of annotation will be exported.
Pixelwise Segmentation: The labels will be encoded as a label map whose dimensions match the dataset. The values are integer in the range [0;255], and a value is associated to each label type. This can be used to export the labels with various levels of granularity, for instance when some labels include each other (e.g. an organ and lesions inside) or when grouping them make sense (encoding left kidney and right kidney as the same value).
Landmarks: Landmarks can be encoded either as dense label maps, or as multi-channels blos or as coordinates. In both “Label Maps” and “Multi-channels Blobs” mode, a label map of dimensions equal to the dataset is written. In “Label Maps” mode, it contains a single channel, and all voxels within the specified radius of a landmark are associated a value depending on the type of the landmark. In “Multi-channels Blobs” mode, the label file contains as many channels as there are type of landmarks, and the data type of the label map is floating point. The voxels within the specified radius of a landmark are assigned a value within [0; 1] following a Gaussian, in the channel corresponding to the type of the landmark. The “sharp” option creates blobs with a sharper definition than the smooth Gaussian blobs. In “Coordinates” mode, a json file describing each landmark is written for each dataset. The json file contains the coordinates (in both world and pixels coordinates) of each landmark, its type and which frame it belongs to.
Bounding boxes: Bounding boxes can be encoded either as dense label maps or as coordinates. In “Label Map” mode, a label map of dimensions equal to the dataset will be created, where the bounding boxes are filled with a value depending on their type. If a voxel belongs to multiple bounding boxes, the first bounding box type will take precedence. In “Coordinates” mode, a json file describing each bounding box is written for each dataset. In particular, the json file contains the coordinates (in both world and pixels coordinates) of each bounding box, its type and which frame it belongs to.
Imagewise Classification: For each dataset, a label map with a single value (of dimensions 1x1x1) is written. Tag filters determine Which value is associated to each dataset. If the tags of a dataset match multiple filters, the value associated to the first filter will be selected.
Split Training/Validation
Instead of generating a single data list as a text file for the whole database, ImFusion Labels can automatically split the database in a training and a validation subset. In that case, two data lists will be generated: one for the training, one for the validation.
The splitting can be configured in three different ways:
Random: This mode will randomly select a given percentage of the database entries as a training set and assign the rest as a validation set. For k-fold cross-validation, multiple splits can be generated this way. The Random Seed parameter allows to make this splitting reproducible: for any given seed, the splitting will always be the same (provided the database did not change). If the project has tags, the sampling of the datasets can take into account some of the tags to make sure that they are approximately balanced adequately between training and validation.
Cross validation: This mode will create multiple training and validation splits that can be used to validate a model using cross validation. All the validation splits contains different data and each datapoint appears in one validation set. The Number of splits parameter indicates the number of generated distinct splits. The Random Seed parameter allows to make this splitting reproducible: for any given seed, the splitting will always be the same (provided the database did not change).
Tags: This mode is only available if the project has tags. It allows the user to define the training and validation subsets according to particular tags. This can be useful if the splitting needs to be manually defined by the user. A warning will appear if at least one of the datasets appears in both training and validation subsets.
Pre-Processing Script
Note
This option is only available if a Python environment is correctly configured in the Settings.
Before the images and labels are resampled, you have the opportunity to apply a custom pre-processing on the data via a Python script. The Python script needs to implement such a function:
import imfusion as imf
import numpy as np
def process(imfImage, imfLabels):
newImage = imfImage.clone(True)
newLabels = imfLabels.clone(True)
return newImage, newLabels
where the input arguments imfImage and imfLabels are ImFusion SharedImageSets representing respectively the data and the label map, and the expected returned values are a new pair of ImFusion SharedImageSets.
Image Resampling
Before exporting the datasets, the software can apply image resampling, which makes any subsequent processing easier. You can either decide to resample all images to either a common size in pixels or to a common resolution.
Furthermore, some medical images contain information about the orientation of the patient. For a number of applications, taking this information into account is paramount to having a consistent dataset. The software allows you to apply this transformation before resampling the image so that you don’t have to worry about it in the exported dataset.
Data Augmentation
ImFusion Labels also includes data augmentation features, to artificially increase the size of your database and make machine learning model trained on them more robust. If this option is enabled, ImFusion Labels will export several times each dataset with either predefined transformations (e.g. flipping the images) or random perturbations (e.g. random deformations).
Reproducible Export
The data of a project can be re-exported easily using the command-line arguments of ImFusion Labels, for example as part of an automated pipeline. The following arguments are relevant:
–open-project: specify the path of the project to be exported (the folder which contains the project.xml file)
–export-data: indicate that the data of the project should be exported, instead of opening
–export-data-configuration (optional): specify an export configuration file. When exporting, the configuration file used is stored as export_options.xml. If this argument is not specified, the last saved state of the GUI will be used.
–export-data-folder (optional): specify the folder in which the data will be exported. Even though the folder is specified as part of the configuration, this argument can be used to override it.