This message was posted on behalf of Intel Corporation
The depth post processing capability may help significantly reduce the noise.
It is available (by default) in the viewer (since 2.9.0) and also available through code (code sample is coming).
The post-processing header file is https://github.com/IntelRealSense/librealsense/blob/ba01147d65db16fdf4da36a3e718fe81c8421034/include/librealsense2/h/rs_processing.h.
You can do a search for "post-processing" in https://github.com/IntelRealSense/librealsense and you will find where and how it is used.
Intel Customer Support
I have similar concerns. Post processing is not done in an asic in the camera as far as I understand, which means that we need to consume computer capacity for the post processing. Many realtime applications, such as ours, are already pushing the limits in terms of computer capacity. The more clean and accurate data can we get, the less capacity we need to waste to improve depth quality before doing what we really want to do. I have tried the post processing filters and I can't find a reasonable realtime setting that gets close to the depth quality of the Kinect. Because we have a dynamic scene, we can't use the time averaging filter.
Another thing I notice is that nearby objects tend to coalesce (depth contrast is lost between them) much more easily than Kinect. If we define resolution as the smallest distance two points can be distinguished, then the Kinect outperforms Reasense by far even though Realsense nominally has about twice the spatial resolution in terms of depth pixels. In our world, it's the practical resolution that matters.
Finally, cylindrical or spherical surfaces appear very flat. Look att for instance the body segments of a person standing about 1.5 from the (D435) camera, which roughly corresponds to the distance necessary to get the full body length inside the FOV. Very limited depth extension compared to Kinect. With post processing (excluding time averaging), I think this might get even worse.
We had a lot of expectations on this sensor. Any response to these concerns than may give more hope?
Are you using the D415 in 1280x720 resolution? This would give best depth error (smallest ripple), better than lower resolutions.
I suggest using this and then subsample using a "non-zero median" or "non-zero mean" in order to reduce X, Y resolution so you can data process all subsequent data faster.
Yes, there are a bunch of steps in stereo algorithms that result in a type of spatial convolution, so down-sampling by ~2x in each dimension will lead to minimal real degradation in X-Y resolution, but great computation improvement for subsequent data, and if you use a the recommended sub-sampling, it will clean up the data a bit as well.
Please do also try different depth preset. Use the RS Viewer to try them out.
Finally, we do have other models of our depth cameras in-house for certain partners to evaluate that give better depth error at longer range, BUT we made the decision to release these two different models in volume - the D415 and D435. We found that these covered 80% of intended usages, while being sensitive to 1. size, 2. power and 3. cost:
The D435 is great because it has a 90 HFOV (horizontal) and uses 1MP Global Shutter monochrome stereo imagers.
The D415 is great because it has about 2x the depth resolution, uses 2MP color stereo imagers, and is smaller, but the trade-off is 65HFOV.
Thanks for this response, I'm in a similar position to the original poster. We're exploring the realsense camera as an alternative to the Kinect and currently hoping we can use it but struggling with the same issues regarding the depth data.
I'm trying to plug the camera into existing software written in .Net. Am I able to access the post-processing algorithms directly or will I have to wrap the methods in the c++ library.
I'm interested in body scanning so we don't need very high detail but accurately capturing shape (of the torso for example) is pretty crucial.
I noticed the noise in the depth signal too and I implemented a simple filter to try and reduce the presence of the noise. I believe I have implemented a non-zero mean filter. That is, I averaged the depth signal over a number of frames, ignoring zero values.
This does a good job of reducing the noise but it also seems to be overly smoothing the point cloud across features. As I understand this, it may be that the noise is not in the depth direction only but is in the x and y directions too -- smoothing adjacent features.
This can be seen in scans I took of a mannequin. One with the Kinect 2.0, the other with the realsense D435 (haven't received 415 yet). I have filtered the D435 depth signal, the Kinect 2.0 is raw.
The images below show the Kinect, then the realsense and then a comparison of the two in terms of closest point distance. You can see the curvature of the mannequin is really smoothed for the realsense. The differences are clear on the comparison too, centimetres of difference.
Am I over-filtering? Is there a way of minimising noise without losing shape resolution? Any guidance gratefully received.
Thanks for the advice Anders. I think we need to stick to the D435 since we capture dynamic events up to 8 m/s. A rolling shutter may give ambiguous results across the image, but I don't have any proof of this. Also, my impression was that the depth quality of the D415 was worse when I tested it. This could perhaps be because I had to have it at 1.75 m from the scene to fit everything inside the FOV. With the D435, the corresponding distace was 1.40 m. We would like to use maximum sample rate of 60 or 90 fps which allows us a maximum resolution of 848x480 of the depth data. I hope we can find a depth preset that satisfies our needs. However, when I look at the very first picture we took in our app (see below) at 1.5 m from the subject using the default setting (30 fps, 1280x720, no post processing), I can't help to feel som doubts. No facial features whatsoever, flat segments and vanishing wrists and ankles. I got the complete D400 sample kit here but as far as I understand, all modules perceive depth in the same way as the D415 and D435 cameras. I keep my fingers crossed.
I tried to reproduce. What I did do was tilt the camera into portrait mode (90deg), so that a user could be closer. Since RMS depth error goes as square of distance away, it is nice to have people as close as possible.
This is grabbed at 848x480 with the D435 at 90fps. To show it I did post-process slightly, primarily to down-sample it by 2x so the grid is more visible.
I am also attaching a small "home-made" movie. At least we can align on what we are both seeing.
In this video I capture with and without texture and with and without post-processing.
This is basically the performance. We can tweak settings to get rid of some "spray" around edges (feet start merging with the carpet). Trying some of the different depth presets will give you an idea.
Also, one approach to consider if you want even better resolution is to use multiple synced cameras pointing at different parts of the body (top and bottom for example). This allows users to get much closer which will greatly improve the depth.
Thanks for sharing Anders. I think we reach more or less the same quality. You get less shrinking of the distal segments but that can be due to that you are closer to the camera. It is quite obvious to us now that we will have to work much harder with the Realsense depth data than we expected. Our conclusion is that we do have a significant loss of detail at full-body view distance compared to Kinect. I like your idea of recording different parts of the body with separate cameras but it presents another challenge of course in terms of stitching it all together.