I recently found a bug in the implementation of RCCE_reduce. The function runs
correctly with the assumption that the root node performing the computation
does not require input buffer after calling the function. This assumption is
not true always and hence can lead to problems in certain situations.
I know there is a better implementation of RCCE_reduce in rcce_comm library but
thought like I should point out this bug so that it can be corrected in future