I don't have experience in generating custom disparity map from D415, but I can help you with some inputs.
1. First of all, one thing what I understood here is that your main target is to improve the Depth Map. 3D Map will eventually improve once you have a better depth map. Is that correct? Based on my this understanding, I think we should compare the following to understand root cause of the problem:
(i ) Depth Ground Truth
(ii) SDK generated Depth Map
(iii) your own depth Map
You posted a 3D map, which correctly explained the problem. But more software layers we involve in testing, there can be more confusion as to which layer is causing problem. So if you can share the depth map comparison image, we can be sure that depth map indeed is problem [and not the depth to 3D generation]. comparing custom depth map with ground truth and D415 depth map will also give more insight in to root cause. So, if possible can you provide the above 3 images for comparison?
2. Coming back to Depth Map quality, note that though depth from stereo is theoretically a perfect method to compute depth, but practically it has not yielded best result so far. Prior to Kinect, there was no good quality Depth Camera. The Kinect revolutiozed depth camera market by use of active IR. You can't expect to match performance of a Kinect just by using stereo imaging. I am sure that you already know about how Kinect computes the depth with active IR, but just sharing this link.
There are also many other material available on the technology with Kinect uses.
3. Since Kinect relies on active IR, it can only work indoors. Real Sense camera are on the other hand have been designed to work both indoors and outdoors, even under water. D415 uses a mix of active IR, and Stereo Depth. You can optionally turn on/off active IR, or experiment with various parameters like led power etc. Though Kinect is designed to work under very special scene conditions, D400 cameras target a broader work condtions. So user will have to play around with various settings and parameters to arrive at what works best for them. So one way to improve on D415 generated depth map quality will be to experiment with various parameters/settings available in realsense SDK. The other way of course is to try to generate your own custom depth map
4. About your own custom depth map, one potential concern you should have is that you will be solely relying on stereo imaging here. You won't be using active IR, so there should be a concern that you will never be able to match the performance of Kinect. In any case it will be interesting to debug the issue which you reported with custom depth. In order to look further in to this, I would appreciate if you could post the 3 depth map images for comparision, i.e. ground truth, d415 depth map, and your custom depth map. Then I can check further on the parameters you use.
The SDK-generated depth map is fine - that is to say, I have generated 3D data from the depth map, used them to build full 3D reconstructions, and did some measurements. The SDK gives me similar results to the ground truth. When you say you want to compare the "depth maps", I assume you mean converting the floating-point map of Z values normalized to fit in an 8-bit image? I'm not really clear what you're looking for there. I am fairly convinced that the disparity map is the issue, not the depth map.
As far as the rest of your comments about active IR and stereo imaging -
The Kinect works by calibrating a single camera with the dot pattern from the projector. This allows for generation of a "stereo" image, rectified from the projector's point of view to the camera's point of view. This allows for the generation of a disparity map created from the stereo image pair.
When it comes to the RealSense, as far as I know, the stereo cameras do not get calibrated with the projector dot pattern at all. There are 2 IR cameras as opposed to the 1 in the Kinect, so the second camera gets calibrated with the first camera which allows for rectifying the images. Without the projected dot pattern, reconstruction will still work as long as there are sufficient details in the scene. The projected pattern (aka, "active IR") is there simply to help guide the stereo reconstruction algorithm in areas where there are few details (a completely white wall with no distinguishing features, for example). There might be more going on behind the scenes with the D400-series, but that's my general understanding of how the technology works. Ultimately, I would expect to get better results than the Kinect with the D400-series since camera-to-camera calibration should be more accurate than projector-to-camera.
For reference, here are the original IR images I used to generate the disparity map:
In case anyone's wondering, the images are dark to human eyes, but that's not a problem for the disparity generation code. At least, issues are not going to manifest in the way that I am seeing right now.
I'd be pretty interested to hear from someone at Intel for some clarification on the issue I'm having with disparity map generation. Is this the right forum for that, or is the GitHub "Issues" page more appropriate?
We currently do not support this use case. If you tell us what you are trying to achieve we may be able to tell you how to best setup the camera.
Intel Customer Support
I am trying to get more accurate results when it comes to capturing 3D data in the near range (ideally 0.2-0.6m, ignoring device-specific limits). I have been able to modify the advanced settings for the D415 in order to get semi-decent near data (see scan in earlier post), although the results are still not quite as good as I was expecting. Generating a custom disparity map was an attempt to mitigate some of the artifacts in the RealSense-generated data. The bumps I am seeing are oddly persistent - if it was purely noise, then the temporal filter could smooth out the bumps, but that is not the case. My thought was that better results could potentially be obtained by employing a more thorough algorithm for the disparity generation stage, but the fact that I do not seem to be able to use the IR images for this purpose is a bit of a bummer.
Ultimately, I may consider calibrating the device using OpenCV and retrieving the uncalibrated/unrectified 1080p IR images and go from there. It will completely omit most of the RealSense-based processing from the picture; I am aware that this defeats the purpose of the device somewhat. In any case, "we currently do not support this use case" is a perfectly acceptable answer (even if it's not what I was hoping for), so I'm marking this topic "answered".
Thanks for your time.
Our team wants to help you further. Can you send an RGB image of what you are trying to scan? The distance is between .2m - .6m, correct?
Intel Customer Support
Here's a photo of the object with the D415 for size reference:
Ultimately, I am looking to scan body parts (heads, feet, hands, etc, but not full-body scans). I realize the D415 near range supposedly maxes out at 0.3m, but I am waiting on a D435 (ordered already, but unsure of when it will ship) which I believe is supposed to work closer to that 0.2m distance.
Thanks for raising the questions. When I look at the stereo images I see vertical offsets in the projected IR pattern... up to
10 pixels... look in area of chin. But when I look at the marks and writing that you have made on the model, there don't seem to be any vertical offsets in your marks. The model looks like it is made of Styrofoam... and I am wondering if the IR isn't passing through the surface and reflecting from beneath in some areas depending upon the angle?
Try doing the analysis without the IR projector. I have my own rat-model approach to stereo analysis and would be interested in your pictures and results.
Also... the disparity map above... is that yours or from the camera?
I assume it is from your work... the 3D reconstruction sure seems like there is a math error somewhere... so programming error?
I am waiting on my D435 and haven't done much poking around yet. But I was immediately confused about disparity values... I would think that if your have an error, it is in the process of going from pixel offsets to calculated disparity?
If you could post an image that has your found pixel offsets set to gray, I would like to try to put it through my process.
I would recommend exploring more optimizing the settings of the RealSense Camera.
It was very helpful that you sent an image. We can now reproduce this and show you what we see.
Here is with the D415 at ~47cm distance, with 1280x720 resolution, 30fps.
Here is the depth map, down-sampled by 3x in x-y.
And here is it with a small amount of post-processing. Using our post processing exponential running average with alpha=0.3 and delta=20, and edge-preserving filter with alpha=0.6 and delta=8.
Also here is with texture applied:
And if you use want to remove the IR pattern from the left RGB channel you can do that too by using a proper color correction matrix:
Please check out this link for more on how to optimize the camera for your usage.
Thank-you for this comprehensive feedback. That document is extremely useful, and it is not something I had come across until now. Is it linked anywhere? In any case, it is full of great info. The colour correction option for the D415 is definitely interesting.
While I am using the included tripod and a static object for some of my testing right now, ultimately we will want to move around the body parts we are scanning, likely in a handheld form factor. This does mean that a temporal filter would not work for us, but the edge-preserving smoothing could definitely be useful.
Since we are looking at potentially using this sensor as a handheld device, I am a bit leery of the rolling shutter cameras in the D415 due to motion artifacts. I will have to do some more testing with regards to this caveat. In terms of which one will be better for our purposes (D415 vs D435), it is probably going to come down to whether the better RMS results of the D415 (according to the document) outweigh the artifacts caused by movement and the rolling shutter. It's a shame that OV doesn't appear to offer the global shutter equivalent of the D415 cameras.
I should have some time later this week to attempt reproducing the results you have posted. Thanks again - extremely impressed with the support for these devices thus far.
Thank you for the kind feedback. The tuning whitepaper can be found at:
Intel Customer Support
Thanks for sharing this very informative document. I have some questions w.r.t. to the Depth Tuning whitepaper. Can you please respond.
(1) Under Section-1, Optimal Depth Resolution is mentioned as 848x480, but for D435 it is mentioned as 1280x720. This difference is despite the fact that both sensors have max Depth resolution of 1280x720. What makes 848x480 as optimal Depth resolution for D435, whereas for D415 the optimal resolution is 1280x720? Is it related to the fact that IR sensor on D435 is 1280x800, and IR sensor on D415 is 1920x1080?
(2) Under Section-3.e what is HFOV - is it Horizontal FOV of the IR sensors? Under section 3.e on page-3 HFOV of D435 is mentioned as 90 degrees and HFOV of D415 is mentioned as 65 degrees. But under section 11.b on page-10, HFOV for D415 is mentioned differently as 64 degrees and HFOV of D435 is mentioned as 86 degrees. A table on same page, again has different figures, it mentions HFOV of D415 as 69.4 degrees and HFOV of D435 as 91.2 degrees. What is the most correct values to be considered?
(3) under Section-3.e, Theoretical limit on RMS error is given as
RMS error (mm) = Distance(mm)^2 x Subpixel / (focal-length(pixels) x Baseline (mm))
And there are plots which are taken with subpixel=0.08. What exactly subpixel refers to in above equation and on what factors will its value depend? i.e. what value should we use for of calculation RMS error on our setup?
1. Yes, the optimal resolution related to the resolution of the input sensors. The Intel ASIC rectifies images, aligns them, and then scales the output. A rule of thumb is that you lose >20% of the input resolution by creating the overlapping rectified images, due to manufacturing tolerances and lens distortion. For the output, the ASIC can scale the output to many different resolutions, but at some point the up-scaling yield zero net improvement. The D415 has 2MP input sensors, and the D435 has 1MP input sensors.
2. The output resolution of the depth is nominally 90HFOV. We can look at being more consistent, but variations occur due to manufacturing tolerances.
3. The subpixel number is somewhat empirical. It is affected by the structure in the scene, the depth algorithm, the calibration quality, and the quality of the input images (ex. in focus, and well lit, high contrast). For a passive textured target (i.e. normal light illumination, not laser) we have measured less than 0.05, but in general expect <0.1 for a "good" unit. Note that areas that have no texture (like a what wall) will have much higher subpixel number, unless you project a texture.
RealSense Group CTO, Intel
Thanks for clarifying on above. I have one last query on white paper contents where the computed values of MinZ does not match with the values specified in white paper. Computed MinZ values differ from specified values by about 5.3% for D435, and about 11.35% for D415. I was curious to know what contributed to above difference in values. Is it due to a different formula being used, or different HFOV & baseline values being used, or because of using some empirical values rather than theoretical values? For details of computed vs specified MinZ values please see below.
(1) As per the white paper
focal-length(pixels) = Xres (pixels) / 2 tan(HFOV/2)
MinZ(mm) = focal-length(pixels) x Baseline(mm)/126
(a) considering the D435 case, where
- HFOV is 90 degree
- Baseline is 55 mm
- and considering Xres 848
focal-length comes out to 424.35
and MinZ comes out to be 18.52 cm
This matches somewhat with the value of 19.5 cm specified in white paper (about 5.3% differenct from specified value).
(b) Also, considering the D415 case, where
- HFOV is 65 degree
- Baseline is 50 mm
- and considering Xres 1280
focal-length comes out to 1005.268
and MinZ comes out to be 39.89 cm
This is in fact much different from the value of about 45 cm mentioned in white paper (about 11.35% differenct from specified value).