![]() |
ImFusion SDK 4.3
|
#include <ImFusion/ML/Dataset.h>
Class for creating an iterable dataset by chaining data loading and transforming operations executed in a lazy fashion. More...
Class for creating an iterable dataset by chaining data loading and transforming operations executed in a lazy fashion.
Example Usage:
Public Member Functions | |
Dataset (bool verbose=false) | |
Constructs an empty dataset with no starting loader. | |
Dataset (const std::vector< FileReaderColumn > &dataLists, bool shuffle=false, bool verbose=false) | |
Constructs a dataset from a list of h5 filenames to be loaded. | |
Dataset (std::unique_ptr< DataReader > reader, bool verbose=false) | |
Constructs a dataset from an existing DataReader. | |
Dataset (const std::string &readerType, const Properties &readerProperties, bool verbose=false) | |
Create a Dataset by specifying the DataReader type and properties. | |
std::optional< DataItem > | next () |
Returns the next item in the dataset When the dataset is over, no element is returned. | |
std::optional< size_t > | size () const |
Determines the overall size of the dataset. | |
void | reset () |
Reset the dataset to the beginning of the data loading pipeline. | |
void | reinit () |
Reinit the dataset to a state equivalent to its state after construction, clearing all the state surviving reset() (i.e. | |
Cardinality | cardinality () const |
Get cardinality of the data set. | |
Dataset & | read (const std::string &readerType, Properties readerProperties, bool verbose=false) |
Dataset & | batch (int batchSize, bool pad=false, int overlap=0) |
Batches the next batchSize items in a single one before returning it. | |
Dataset & | split (int numItems=-1) |
Splits the content of the SharedImagesSets into SIS containing a single image. | |
Dataset & | repeat (int numEpochRepetitions, int numItemRepettiions=1) |
Repeats the epoch numEpochs times and each individual item numItemRepetitions times For both parameters, a value of -1 indicates an infinite repetition. | |
Dataset & | shuffle (int howMany=-1, int seed=-1) |
Shuffles the dataset. | |
Dataset & | map (std::function< void(DataItem &)> func, int numParallelCalls=1) |
Applies a mapping to each item of the dataset. | |
Dataset & | map (const std::string &funcKey, int numParallelCalls=1) |
Applies a mapping to each item of the dataset This overload is useful when configuring the loading pipeline from properties, the user can register a mapping function and use its registry key to configure the map decorator to use it. | |
Dataset & | filter (std::function< bool(const DataItem &)> func) |
Filters the dataset according to a user defined criterion. | |
Dataset & | filter (const std::string &funcKey) |
Filters the dataset according to a user defined criterion This overload is useful when configuring the loading pipeline from properties, the user can register a filtering function and use its registry key to configure the filter decorator to use it. | |
Dataset & | prefetch (size_t prefetchSize, bool syncToGl=true) |
Prefetches datasets in a separate thread, independently of the regular pipeline. | |
Dataset & | preprocess (const std::vector< Operation::Specs > &preprocPipeline, Phase execPhase=Phase::Always, int numParallelCalls=1) |
Applies a preprocessing pipeline to each item of the dataset. | |
Dataset & | preprocess (const std::vector< std::shared_ptr< Operation > > &operations, int numParallelCalls=1) |
Applies a preprocessing pipeline to each item of the dataset. | |
Dataset & | sample (const std::vector< Operation::Specs > &samplingPipeline, int samplerSelectionSeed=-1, int numParallelCalls=1) |
Samples from each item of the dataset. | |
Dataset & | sample (const std::shared_ptr< ImageROISampler > &sampler, int numParallelCalls) |
Samples from each item of the dataset. | |
Dataset & | sample (const std::vector< std::shared_ptr< ImageROISampler > > &samplers, const std::optional< std::vector< float > > &weights, int samplerSelectionSeed=-1, int numParallelCalls=1) |
Samples from each item of the dataset. | |
Dataset & | memoryCache (bool makeExclusiveCPU=false, bool lazy=true, int compressionLevel=0, bool shuffle=false, int numThreads=2) |
Caches the dataset already loaded. | |
Dataset & | diskCache (const std::string &location, bool lazy=true, bool reload=false, bool compression=false, bool shuffle=false) |
Caches the dataset loaded in a persistent manner (on a disk location) | |
template<typename Loader, typename... Params> | |
Dataset & | chain (Params &&... params) |
Chains the list of loaders with a custom defined one on top. | |
void | buildPipeline (const std::vector< DataLoaderSpecs > &specsList, Phase configPhase=Phase::Always) |
configure decorator calls via list of Properties | |
void | setRandomSeed (unsigned int seed) |
seed the data loading pipeline | |
bool | verbose () const |
void | setVerbose (bool verbose) |
|
explicit |
Constructs a dataset from a list of h5 filenames to be loaded.
std::optional< size_t > size | ( | ) | const |
Determines the overall size of the dataset.
void reset | ( | ) |
Reset the dataset to the beginning of the data loading pipeline.
This is useful for instance when the dataset is completely consumed, i.e. at the end of a training epoch. Note: reset doesn't necessarily restore the state of the dataset completely, stuff like data caches or seedings survive this call.
void reinit | ( | ) |
Reinit the dataset to a state equivalent to its state after construction, clearing all the state surviving reset() (i.e.
data caches)
Dataset & batch | ( | int | batchSize, |
bool | pad = false, | ||
int | overlap = 0 ) |
Batches the next batchSize
items in a single one before returning it.
batchSize | Number of consecutive items to batch together |
pad | Whether to ensure that last batch is of specified size. If true, the last item is repeatedly added to the batch until the size = batchSize |
overlap | Number of overlapping items shared by consecutive batches. Example: batch(3, false, 1): [1, 2, 3, 4, 5, 6, 7, 8] -> [{1, 2, 3}, {3, 4, 5}, {5, 6, 7}, {7, 8}] |
Dataset & split | ( | int | numItems = -1 | ) |
Splits the content of the SharedImagesSets into SIS containing a single image.
numItems | split return only the first 'numItems' items |
Dataset & shuffle | ( | int | howMany = -1, |
int | seed = -1 ) |
Shuffles the dataset.
howMany | number of consecutive elements to be shuffled. Default to -1: shuffles the entire dataset |
seed | Seeds the shuffling |
DataLoaderException | if howMany is not specified and the dataset is not countable. |
Dataset & map | ( | std::function< void(DataItem &)> | func, |
int | numParallelCalls = 1 ) |
Applies a mapping to each item of the dataset.
func | mapping function to be applied on each DataItem |
numParallelCalls | how many asynchronous threads are used for the mapping |
Dataset & map | ( | const std::string & | funcKey, |
int | numParallelCalls = 1 ) |
Applies a mapping to each item of the dataset This overload is useful when configuring the loading pipeline from properties, the user can register a mapping function and use its registry key to configure the map decorator to use it.
funcKey | Name of the mapping function registered in the MapFuncRegistry |
numParallelCalls | how many asynchronous threads are used for the mapping |
Dataset & filter | ( | std::function< bool(const DataItem &)> | func | ) |
Dataset & filter | ( | const std::string & | funcKey | ) |
Filters the dataset according to a user defined criterion This overload is useful when configuring the loading pipeline from properties, the user can register a filtering function and use its registry key to configure the filter decorator to use it.
funcKey | Name of the filtering function registered in the FilterFuncRegistry |
Dataset & prefetch | ( | size_t | prefetchSize, |
bool | syncToGl = true ) |
Prefetches datasets in a separate thread, independently of the regular pipeline.
The user can define how many images are prefetched.
prefetchSize | Size of the prefetch queue; up to prefetchSize images are pre-loaded. |
syncToGl | If true, the images are synchronized to the GPU memory after they have been pre-fetched. |
Dataset & preprocess | ( | const std::vector< Operation::Specs > & | preprocPipeline, |
Phase | execPhase = Phase::Always, | ||
int | numParallelCalls = 1 ) |
Applies a preprocessing pipeline to each item of the dataset.
preprocPipeline | specifies how to perform the preprocessing |
execPhase | specifies the execution phase of the preprocessing pipeline |
numParallelCalls | how many asynchronous threads are used for the preprocessing |
Dataset & preprocess | ( | const std::vector< std::shared_ptr< Operation > > & | operations, |
int | numParallelCalls = 1 ) |
Applies a preprocessing pipeline to each item of the dataset.
operations | operations that perform the processing |
numParallelCalls | how many asynchronous threads are used for the preprocessing |
Dataset & memoryCache | ( | bool | makeExclusiveCPU = false, |
bool | lazy = true, | ||
int | compressionLevel = 0, | ||
bool | shuffle = false, | ||
int | numThreads = 2 ) |
Caches the dataset already loaded.
makeExclusiveCPU | releases the GPU memory if true |
lazy | caches items when requested (otherwise caches the whole dataset at initialization) |
compressionLevel | controls compression. Higher means more compression, but slower. 0 disables compression |
shuffle | reshuffle the order of the cache every epoch |
numThreads | the number of items to prefetch from the cache in the background. The cache needs to create copies of the data so fetching is more expensive than one may anticipate. Only has an effect if makeExclusiveCPU is true. |
DataLoaderException | if the dataset is not countable |
std::bad_alloc | if the system runs out of memory |
Dataset & diskCache | ( | const std::string & | location, |
bool | lazy = true, | ||
bool | reload = false, | ||
bool | compression = false, | ||
bool | shuffle = false ) |
Caches the dataset loaded in a persistent manner (on a disk location)
location | path to the folder where all the data will be cached |
lazy | caches items when requested (otherwise caches the whole dataset at initialization) |
reload | try to reload the cache from a previous session |
compression | enable saving with compression |
shuffle | reshuffle the order of the cache every epoch |
DataLoaderException | if the dataset is not countable |
|
inline |
Chains the list of loaders with a custom defined one on top.
Loader | A data loader implementing the DataLoader interface. |
params | The construction parameters of the Loader class. |