We also studied several algorithms for certain classes of non-stationary environments and standard notion of regret in. We designed several Upper-Confidence Bound (UCB)-based policies that are order-optimal with respect to the minimax regret, i.e., the minimum worst-case regret. In, we focused on both light-tailed and heavy-tailed reward distributions and analyzed the notion of the worst-case regret. In a non-stationary MAB problem, distribution of rewards associated with each arm are assumed to be time-varying and the total variation in the expected rewards is subject to a variation budget. We proposed a novel algorithm that achieves a worst case regret matching the lower bound while maintaining a distribution-dependent logarithm regret. In, we study the stationary MAB problem with heavy-tailed reward distribution and analyze the worst-case regret, which is the supremum of the regret over the set of admissible reward distributions. Heavy-tailed and Non-stationary MAB Problems. A satisficing objective replaces maximization and perfect learning with satisfaction and sufficient learning. In, we introduce the notion of satisficing in multiarmed bandit problem. A trained human operator’s decision making implicitly involves good correlation scales that can be learned from their actions, and can be used by the automation operating in unknown and uncertain environments. Using empirical data from human-subject experiments, we showed that the proposed heuristics capture human decision-making performance well, and that a person’s decision-making performance critically depends on the Bayesian priors, particularly on the correlation scale. These heuristics are based on the so-called upper confidence bound and Boltzman action selection rule. The spatial embedding introduces a correlation scale across the values of options. In, we develop Bayesian heuristics with provable performance for stochastic MAB problems with spatially embedded options, i.e., the set of options constitutes a physical landscape. Human Decision-making and Satisficing in MAB Problems. Sequential hypothesis testing concerns the speed-accuracy trade-off: deciding quickly versus reliably on a set of alternatives. exploit trade-off, i.e., choosing between the most informative and the most rewarding actions trade-off. The multiarmed bandit (MAB) problem, in which a decision maker allocates a single resource by repeatedly choosing one among a set of competing alternative options, exemplifies the explore vs. Multiarmed Bandit (MAB) Problems and Sequential Hypothesis Testing Information-theoretic Search and Surveillanceġ. Multiarmed Bandit Problems and Sequential Hypothesis TestingĢ. Our efforts are focused on the following two key themes:ġ. We focus on foundational research on sequential decision-making under such uncertain environments. They should also adapt to the state of the environment as information is collected. Policies for automated agents operating in complex and uncertain environments need to address several decision-making trade-offs. Automation is playing an increasingly important role in diverse areas, including environmental monitoring, energy management, healthcare, manufacturing, intelligence gathering, and defense.
0 Comments
Leave a Reply. |