It is worth mentioning that Kinect and RealSense 400 Series cameras cannot be compared directly because they use different technologies. Kinect 1 uses Structured or Coded Light and Kinect 2 uses Time of Flight. The 400 Series cameras use Stereoscopic imaging based on left and right IR sensors.
Excellent tips for tuning and optimizing 400 Series cameras and the images that they capture can be found in the presentation document linked to below.
Looking at your video though, the most obvious candidate for the cause of your image problems is the multiple number of what seem to be strip-lights in the ceiling. Unlike bulb-based lights, strip lights are known as fluorescent lights because they contain a gas. This gas flickers at frequencies hard to see with the human eye and can cause noise in the stream.
FWIW - with the D435 you are unlikely to every achieve depth data of the quality of the Kinect 2.0 - its just not technically possible.
Read up on RMS errors in the tuning documents = the DISTANT parts of the frame are likely to fluctuate worse and worse depending on how far away the deepest part of the scene is.
Yes even at 2m the errors are already well and alive in the depth data which translates to the flutter and jutter you see in the background of the frame.
The Key issue is how rapidly the D435 scales up the errors - and why the D415 is the only currently viable solution for anything over 1.5m if you want anything close to a stable background.
The graph below estimates how depth RMS error is expected to increase with distance on the D415 (green line) and D435 (orange).
The D435 has greater RMS error over distance, though it does also have advantages such as its wider FOV, smaller minimum distance (MinZ) for closer scanning, and faster shutter.
Left-click on the image to see it in full size.