I experimented with this once with hand joint tracking using the Unity game creation engine's implementation of RealSense. I found that although - like you said yourself - you needed to face the palm towards the camera to activate tracking, thereafter the camera could track the joints of the hand from above. The same principle applies to turning the hand towards the chest, facing the palm away from the camera. Each time the camera lost tracking, you would have to restart it with a palm-forwards and then resume the turned-away pose.
I seem to remember that gestures could be recognized from above when looking at the back of the hand, but my memory is not concrete on this as it was 3 years ago that I did the test.
In the early days of RealSense, there was actually an exotic desktop PC called the HP Sprout that had a built-in F200 camera mounted in an overhead facing-down position above a flat scanning bed. Sprout does not seem to support gesture inputs though. HP has updated it with a new model since then, the Sprout Pro, but this uses a different camera for the depth scanning operations and so should be avoided if RealSense depth scanning is required.
I suspect that tracking is less likely to stall if you use blob tracking instead of joint tracking for tracking the back of the hand, though blob tracking requires the camera to be much closer to the hand than with joint tracking. Blob tracking recognizes a large, flat-ish area of skin rather than precise joint points.