Open Conference Systems, Seminar Nasional Sanata Dharma Berbagi 2022

Font Size: 
THE EFFECTIVENESS OF THE MIXED EPSILON-GREEDY AND UPPER CONFIDENCE BOUND ALGORITHM FOR SOLVING MULTI-ARMED BANDIT PROBLEM
Hartono Hartono

Last modified: 2022-10-10

Abstract


The problem of multi-armed bandits is often encountered when someone has to make decisions in a sequence on several options whose effectiveness is not yet known. To be able to choose optimally, one must do exploration and exploitation first. Exploration is used to collect information regarding the effectiveness of each available option. On the other hand, exploitation is an activity to optimize the intended objective function by always choosing the best known option so far from exploration. These two processes must be carried out in a balanced and careful manner in order to obtain options that can optimize the desired objective function. One of the algorithms for conducting exploration and exploitation is the epsilon-greedy algorithm. This algorithm is often used because it is simple and easy to implement. However, this algorithm has some weaknesses. Because this algorithm usually only provides a small probability for exploration, the exploration is less than optimal. In addition, because the exploration factor has a constant value, exploration will also be carried out even though the optimal choice has been indicated from the feedback of the previous exploration. To overcome this, in this study, modifications to the epsilon-greedy algorithm will be made. The constant exploration factor will be replaced by a factor that is getting smaller from one stage to the next. This factor is taken from the exploration term in the upper confidence bound algorithm. The effectiveness of this mixed algorithm will be numerically simulated with a number of cases of the multi-armed bandit problem.


Keywords


epsilon-greedy algorithm, exploration and exploitation, multi-armed bandit problem, upper confidence bound algorithm