Adapting Individual Learning Rates for SGD and ADAM Optimizers
This thesis investigates the integration of the individual learning rates from Rprop into SGD and Adam optimization methods. The individual learning rates are updated separately from the weights. A major focus of this work is the role of mini-batch size in updating these learning rates. Empirical analysis demonstrates that selecting an appropriate mini-batch size can accelerate convergence and imp
