Abstract
The most successful stochastic gradient methods in deep learning have an incremental characteristic and are executed on multiple Graphics Processing Units (GPUs) in large training data sets. It is expected to divide the large group of training data into mini-batches, each containing a subset of this data, to keep the processing of each iteration of the method computationally viable (calle…