This is the meta task for the September 2023 Datacenter switchover (eqiad -> codfw).
Schedule
Switchover
- Services & Caching (Traffic): Tuesday, September 19th, 2023 14:00 UTC
- MediaWiki: Wednesday, September 20th, 2023 14:00 UTC
This is the meta task for the September 2023 Datacenter switchover (eqiad -> codfw).
kamila@cumin1001 - Cookbook cookbooks.sre.discovery.datacenter pool all services in eqiad: Datacenter Switchover: eqiad repool - T345263 started.
Mentioned in SAL (#wikimedia-operations) [2023-09-27T14:08:01Z] <kamila@cumin1001> START - Cookbook sre.discovery.datacenter pool all services in eqiad: Datacenter Switchover: eqiad repool - T345263
Mentioned in SAL (#wikimedia-operations) [2023-09-27T14:22:31Z] <claime> Repooling eqiad services in progress - T345263
kamila@cumin1001 - Cookbook cookbooks.sre.discovery.datacenter pool all services in eqiad: Datacenter Switchover: eqiad repool - T345263 completed.
Mentioned in SAL (#wikimedia-operations) [2023-09-27T14:29:15Z] <kamila@cumin1001> END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all services in eqiad: Datacenter Switchover: eqiad repool - T345263
Mentioned in SAL (#wikimedia-operations) [2023-09-27T16:09:27Z] <kamila_> Pooled back eqiad for traffic after the DC switchover (T345263)
All disruptive switchover-related work is finished and things are stable. The switchover went smoothly and had minimal user impact, while also uncovering issues that we need to know about in order to improve our infrastructure and processes. The read-only period lasted 2min 22s.
I am extending thanks to everyone involved in the preparation as well as everyone who was present during the switchover, helping monitor or fix issues. A special shoutout goes to everyone that contributed to MultiDC, which massively reduces switchover complexity, and our DBAs, who do the really scary part. Thanks as well to Community Relations who notified communities of the read-only window.
One of the issues we discovered during the switchover was a MW-on-k8s capacity issue in codfw. My colleagues were able to address it very quickly. This is the sort of issue that the switchover process is meant to find, so it worked as intended :-)