There are a lot of blog posts talking about the fact that testing in prod should not be a taboo like it may have been in the 90s. I've read some of these [1] [2], I get the arguments in favour of it, and I want to try some experiments.
My question is -- how does one go about doing it _safely_? In particular, I'm thinking about data. Is it common practice to inject fabricated data into a prod system to run such tests? What's the best practice or prior art on doing this well?
Ultimately, I think this will end up looking like implementing SLIs and SLOs in PROD, but for some of my SLOs, I think I need to actually _fake_ the data in order to get the SLIs I need, so how to do this?
Suggestions appreciated -- thanks.
[1] https://increment.com/testing/i-test-in-production/
[2] https://segment.com/blog/we-test-in-production-you-should-too/
Some common ways to go about it:
- Feature flags: every new change goes into your codebase behind a flag. You can flip the flag for a limited set of users and do a broader rollout when ready.
- Staged rollouts: have staging/canary etc. environments and roll out new deployments to them first. Observe metrics and alerts to check if something is wrong.
- Beta releases: have a group of internal/external power users test your features before they go out to the world.