ImFusion SDK 4.3
Monocular Depth Estimation

Guide to monocular depth estimation from single RGB images.

+ Collaboration diagram for Monocular Depth Estimation:

Guide to monocular depth estimation from single RGB images.

Monocular Depth Estimation Overview

This page provides detailed information and code examples for estimating depth from single RGB images using the MonocularDepthEstimationAlgorithm class. This algorithm can convert 2D images into depth maps and optionally generate 3D point clouds, making it useful for 3D reconstruction, augmented reality, and computer vision applications. Monocular depth estimation is a computer vision technique that estimates the relative depth of objects in a scene from a single RGB image. Unlike stereo vision or structured light systems, it doesn't require multiple cameras or special hardware, making it suitable for applications where only a single camera is available.

The MonocularDepthEstimationAlgorithm class provides a high-level interface for depth estimation, while the underlying MonocularDepthEstimation class handles the core estimation logic.

Monocular Depth Estimation Usage

The following example demonstrates basic usage of the MonocularDepthEstimationAlgorithm where we assume the input is

  1. A set of images given as inputImages of type std::unique_ptr<SharedImageSet>.
  2. (Optional) Intrinsic matrix given as K of type Eigen::Matrix3d.
#include <ImFusion/Base/SharedImageSet.h>
#include <ImFusion/Base/DataList.h>
#include <ImFusion/Vision/MonocularDepthEstimationAlgorithm.h>
#include <ImFusion/Vision/CameraCalibrationDataComponent.h>
using namespace ImFusion;
// Create camera calibration data component
cameraCalibration->p_K = K; // Set your camera matrix
// Add camera calibration to the image set
inputImages->addComponent(cameraCalibration.get());
// Create depth estimation algorithm on inputImages of data type std::unique_ptr<SharedImageSet>
MonocularDepthEstimationAlgorithm depthEstimator(inputImages.get());
// Configure algorithm parameters
depthEstimator.p_depthEstimatorName = "Depth Anything V2 Small"; // Use depth anything v2 small model.
depthEstimator.p_exportPointClouds = true; // Also generate point cloud (requires camera intrinsics).
// Run depth estimation
depthEstimator.compute();
// Extract results
OwningDataList output = depthEstimator.takeOutput();
// Access relative depth images
// Process point clouds if enabled
if (depthEstimator.p_exportPointClouds) {
auto pointClouds = outputs.extractAll<PointCloud>(Data::POINTSET);
for (auto& pc : pointClouds)
{
if (pc)
pointCloudsResult.emplace_back(std::move(pc));
}
}
// depthResult contains relative depth
// pointCloudsResult contains pointcloud
@ POINTSET
Set of points.
Definition Data.h:40
Algorithm for estimating relative depth from a single RGB image.
Definition MonocularDepthEstimationAlgorithm.h:14
Wrapper class to store a list of owned Data instances.
Definition OwningDataList.h:24
std::unique_ptr< SharedImageSet > extractFirstImage(Data::Kind kind=Data::UNKNOWN, Data::Modality modality=Data::NA)
Extract the first SharedImageSet instance of given kind and modality.
Data structure for point clouds.
Definition PointCloud.h:24
T emplace_back(T... args)
T make_unique(T... args)
Namespace of the ImFusion SDK.
Definition Assert.h:7
See also
MonocularDepthEstimationAlgorithm

Custom Models

MonocularDepthEstimation allow running your custom models through our machine learning engines.

This is done by tracing or scripting your Torch or exporting your ONNX model and linking it through a configuration file.

Here is an example configuration file in .yaml format for MonocularDepthEstimation:

Version: '8.0'
Name: CustomModel
Description: Monocular Depth Estimation
Name: torch
ModelFile: <path-to-model>
ForceCPU: false
Verbose: true
InputFields: [Image]
OutputFields: [Depth]
DisableJITRecompile: true
PreprocessingInputFields: [Image]
PreProcessing:
- MakeFloat: {}
- NormalizeUniform: {}
Sampling:
- SkipUnpadding: true
Base class and interface for images.
Definition Image.h:29
Generic interface for machine learning models serialized by specific frameworks (PyTorch,...
Definition Engine.h:47
PredictionOutput
Types of prediction output.
Definition MLCommon.h:56
@ NeuralNetwork
Neural network (standard value)
Definition MLCommon.h:42
@ ForceCPU
Force computation on the CPU.
Definition MLCommon.h:20

We require InputFields and OutputFields to have exactly the same names as mentioned above. In our current release, we only support the Depth output to be exposed to the user, returned as the first element by the model. Any additional outputs should still be appended in OutputFields and PredictionOutput with its return type but they will be discarded.

For further information on the machine learning configuration files, please refer to the Machine Learning Model page in our user documentation.

Search Tab / S to search, Esc to close