Abstract
This paper presents a simulation-optimization approach to strategic workforce planning based on deep reinforcement learning. A domain expert expresses the organization’s high-level, strategic workforce goals over the workforce composition. A policy that optimizes these goals is then learned in a simulation-optimization loop. Any suitable simulator can be used, and we describe how a simulator can be derived from historical data. The optimizer is driven by deep reinforcement learning and directly optimizes for the high-level strategic goals as a result. We compare the proposed approach with a linear programming-based approach on two types of workforce goals. The first type of goal, consisting of a target workforce, is relatively easy to optimize for but hard to specify in practice and is called operational in this work. The second, strategic, type of goal is a possibly non-linear combination of high-level workforce metrics. These goals can easily be specified by domain experts but may be hard to optimize for with existing approaches. The proposed approach performs significantly better on the strategic goal while performing comparably on the operational goal for both a synthetic and a real-world organization. Our novel approach based on deep reinforcement learning and simulation-optimization has a large potential for impact in the workforce planning domain. It directly optimizes for an organization’s workforce goals that may be non-linear in the workforce composition and composed of arbitrary workforce composition metrics.
Y. Smit and F. den Hengst—Authors contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The average number of direct reports of managers in the organization.
- 2.
A metric to express responsibilities and expectations of a role in the organization, usually associated with compensation in some way.
- 3.
For a model with \(n=30\) cohorts and \(X_{\text {max}}=100\) maximum employees per cohort, the number of transitions in the Markov chain is \(|\mathcal {S}\times \mathcal {S}| = \prod _{i=1}^{n}(S_{\text {max}}+1)^2 \approx 10^{120}\).
- 4.
Code and data for hypothetical use case available at https://github.com/ysmit933/swp-with-drl-release. Real-life use case data will be made available upon request.
References
April, J., Better, M., Glover, F.W., Kelly, J.P., Kochenberger, G.A.: Ensuring workforce readiness with optforce (2013). Unpublished manuscript retrieved from opttek.com
Banyai, T., Landschutzer, C., Banyai, A.: Markov-chain simulation-based analysis of human resource structure: how staff deployment and staffing affect sustainable human resource strategy. Sustainability 10(10), 3692 (2018)
Bhulai, S., Koole, G., Pot, A.: Simple methods for shift scheduling in multiskill call centers. Manuf. Serv. Oper. Manage. 10(3), 411–420 (2008)
Burke, E.K., De Causmaecker, P., Berghe, G.V., Van Landeghem, H.: The state of the art of nurse rostering. J. Sched. 7(6), 441–499 (2004)
Cotten, A.: Seven steps of effective workforce planning. IBM Center for the Business of Government (2007)
Davis, M., Lu, Y., Sharma, M., Squillante, M., Zhang, B.: Stochastic optimization models for workforce planning, operations, and risk management. Serv. Sci. 10(1), 40–57 (2018)
De Feyter, T., Guerry, M., et al.: Optimizing cost-effectiveness in a stochastic Markov manpower planning system under control by recruitment. Ann. Oper. Res. 253(1), 117–131 (2017)
Fei, Y., Yang, Z., Chen, Y., Wang, Z., Xie, Q.: Risk-sensitive reinforcement learning: Near-optimal risk-sample tradeoff in regret. Adv. Neural. Inf. Process. Syst. 33, 22384–22395 (2020)
Gaimon, C., Thompson, G.: A distributed parameter cohort personnel planning model that uses cross-sectional data. Manage. Sci. 30(6), 750–764 (1984)
Grinold, R., Stanford, R.: Optimal control of a graded manpower system. Manage. Sci. 20(8), 1201–1216 (1974)
Heger, J., Voss, T.: Dynamically changing sequencing rules with reinforcement learning in a job shop system with stochastic influences. In: 2020 Winter Simulation Conference (WSC), pp. 1608–1618 (2020)
den Hengst, F., François-Lavet, V., Hoogendoorn, M., van Harmelen, F.: Planning for potential: efficient safe reinforcement learning. Mach. Learn. 111, 1–20 (2022)
Jaillet, P., Loke, G.G., Sim, M.: Strategic workforce planning under uncertainty. Oper. Res. 70, 1042–1065 (2021)
Jnitova, V., Elsawah, S., Ryan, M.: Review of simulation models in military workforce planning and management context. J. Defense Model. Simul. 14(4), 447–463 (2017)
Kant, J.D., Ballot, G., Goudet, O.: WorkSim: an agent-based model of labor markets. J. Artif. Soc. Soc. Simul. 23(4), 4 (2020)
Rao, P.P.: A dynamic programming approach to determine optimal manpower recruitment policies. J. Oper. Res. Soc. 41(10), 983–988 (1990)
Roijers, D.M., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multi-objective sequential decision-making. J. Artif. Intell. Res. 48, 67–113 (2013)
Romer, P.: Human capital and growth: theory and evidence (1989)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sing, C., Love, P., Tam, C.: Stock-flow model for forecasting labor supply. J. Constr. Eng. Manag. 138(6), 707–715 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Smit, Y., den Hengst, F., Bhulai, S., Mehdad, E. (2023). Strategic Workforce Planning with Deep Reinforcement Learning. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2022. Lecture Notes in Computer Science, vol 13811. Springer, Cham. https://doi.org/10.1007/978-3-031-25891-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-25891-6_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25890-9
Online ISBN: 978-3-031-25891-6
eBook Packages: Computer ScienceComputer Science (R0)