Usually scaling the input images to a range helps to initialize the weights better at the start of training. If we don’t scale the images during the training sometimes the error gradients may
explode which can result in unstable network preventing the network to learn on the training data. The training gets slower. Also, if we have different image sizes the model during the training may overfit to all the sizes and would not generalize. Regrading documentation referring a few below.
The following book discusses this subject
Deep Learning (Adaptive Computation and Machine Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville