Venske S, de Almeida C and Delgado M. (2024). Metaheuristics and machine learning: an approach with reinforcement learning assisting neural architecture search. Journal of Heuristics. 30:3-4. (199-224). Online publication date: 1-Aug-2024.

https://doi.org/10.1007/s10732-024-09526-1

Du B, Qian K, Claudel C and Sun D. Multiagent Online Source Seeking Using Bandit Algorithm. IEEE Transactions on Automatic Control. 10.1109/TAC.2022.3232190. 68:5. (3147-3154).

https://ieeexplore.ieee.org/document/9999311/

Liu X, Zuo J, Wang S, Joe-Wong C, Lui J and Chen W. Batch-size independent regret bounds for combinatorial semi-bandits with probabilistically triggered arms or independent arms. Proceedings of the 36th International Conference on Neural Information Processing Systems. (14904-14916).

/doi/10.5555/3600270.3601354

Pesquerel F, Saber H and Maillard O. Stochastic bandits with groups of similar arms. Proceedings of the 35th International Conference on Neural Information Processing Systems. (19461-19472).

/doi/10.5555/3540261.3541750

Zhang R and Combes R. On the suboptimality of thompson sampling in high dimensions. Proceedings of the 35th International Conference on Neural Information Processing Systems. (8345-8354).

/doi/10.5555/3540261.3540899