This has been replaced by the tofuinfratest project and can be removed.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | fnegri | T379133 SystemdUnitDown cloudcontrol1007:9100 The systemd unit opentofu-infra-diff.service on node cloudcontrol1007 has been failing for more than two hours. | |||
Resolved | rook | T379076 Remove tf-infra-test project |
Event Timeline
Unfortunately OpenStack is terrible and deleting a project does not seem to delete the VMs that were in that project. I noticed this because the /usr/local/bin/wmcs-dnsleaks script in cloudcontrol nodes started to fail:
Nov 05 17:45:06 cloudcontrol1006 wmcs-dnsleaks[393289]: keystoneauth1.exceptions.http.NotFound: Could not find project: tf-infra-test. (HTTP 404)
A few other things to check when deleting a project are listed here: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Projects_lifecycle#Deleting_a_project
Now I'm not sure how to find the names/ids of the VMs that were part of that project, because if I try wmcs-openstack server list --project tf-infra-test it fails with No project with a name or ID of 'tf-infra-test' exists.. 😆
I'm logging off for today, @rook let me know if you manage to find and delete those VMs, in any case I don't think this is causing any serious issue so no rush to fix it today.
I believe 40560d4a-6b06-49be-bfcd-2565666ef95d is our system:
openstack server show 40560d4a-6b06-49be-bfcd-2565666ef95d +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | cloudvirt1065 | | OS-EXT-SRV-ATTR:hostname | tf-bastion | | OS-EXT-SRV-ATTR:hypervisor_hostname | cloudvirt1065.eqiad.wmnet | | OS-EXT-SRV-ATTR:instance_name | i-0008f549 | | OS-EXT-SRV-ATTR:kernel_id | | | OS-EXT-SRV-ATTR:launch_index | 0 | | OS-EXT-SRV-ATTR:ramdisk_id | | | OS-EXT-SRV-ATTR:reservation_id | r-swt70ova | | OS-EXT-SRV-ATTR:root_device_name | /dev/sda | | OS-EXT-SRV-ATTR:user_data | None | | OS-EXT-STS:power_state | Shutdown | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | stopped | | OS-SRV-USG:launched_at | 2024-06-20T13:21:14.000000 | | OS-SRV-USG:terminated_at | None | | accessIPv4 | | | accessIPv6 | | | addresses | lan-flat-cloudinstances2b=172.16.5.68 | | config_drive | | | created | 2023-06-26T17:00:34Z | | description | run terraform from here | | flavor | description=, disk='20', ephemeral='0', extra_specs.aggregate_instance_extra_specs:ceph='true', extra_specs.aggregate_instance_extra_specs:network-agent='ovs', extra_specs.quota:disk_read_iops_sec='5000', | | | extra_specs.quota:disk_total_bytes_sec='200000000', extra_specs.quota:disk_write_iops_sec='500', id='g4.cores2.ram4.disk20', is_disabled=, is_public='True', location=, name='g4.cores2.ram4.disk20', | | | original_name='g4.cores2.ram4.disk20', ram='4096', rxtx_factor=, swap='0', vcpus='2' | | hostId | b5e697050cf670f9f86d5b7b77304ae3d457927bda9222b5d49bf493 | | host_status | UP | | id | 40560d4a-6b06-49be-bfcd-2565666ef95d | | image | debian-12.0-bookworm (deprecated 2023-11-27) (d753f466-40e4-471d-ba4b-7cbab78d827b) | | key_name | None | | locked | False | | locked_reason | None | | name | tf-bastion | | progress | None | | project_id | tf-infra-test | | properties | | | security_groups | name='default' | | server_groups | [] | | status | SHUTOFF | | tags | | | trusted_image_certificates | None | | updated | 2024-09-03T17:36:11Z | | user_id | rook | | volumes_attached | | +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Do we feel that running openstack server delete 40560d4a-6b06-49be-bfcd-2565666ef95d would be safe?
Do we feel that running openstack server delete 40560d4a-6b06-49be-bfcd-2565666ef95d would be safe?
I think so... but the openstack API might have different opinions. I think it's worth trying.
Looks like it took
openstack server show 40560d4a-6b06-49be-bfcd-2565666ef95d No Server found for 40560d4a-6b06-49be-bfcd-2565666ef95d
I don't see that file on either cloudcontrol1005.eqiad.wmnet or cloudcontrol1007.eqiad.wmnet
We track project existence via opentofu for Cloud VPS.
The tf-infra-test project needs to be deleted from the tofu-infra repository, see https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/OpenTofu
Because it was deleted by hand, the project has been created again by the system.
I don't see that file on either cloudcontrol1005.eqiad.wmnet or cloudcontrol1007.eqiad.wmnet
My bad, I thought it was installed on all cloudcontrols, but apparently it's only installed in cloudcontrol1006.
I did run it now, and it's no longer failing.
fnegri opened https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/116
Remove project tf-infra-test
fnegri merged https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/116
Remove project tf-infra-test
It doesn't have a replacement with no - symbol. Though it has also never really been implemented. So both keeping and removing it are an appropriate choice.
It doesn't have a replacement with no - symbol. Though it has also never really been implemented. So both keeping and removing it are an appropriate choice.
Thanks, I vote for removing it then, a new project without the - symbol can be created when/if we want to implement the tests in codfw.
fnegri merged https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/117
Remove tf-infra-dev project in codfw
Of course I did the same mistake and I deleted tf-infra-dev without checking if it contained servers of other resources.
@rook found two vms (8bb53a94-dee2-4aec-8aeb-e752b8bb0cf0 and c31e3cdd-eece-44bb-bf2b-fa9d49bee66f) that belonged to tf-infra-dev. I have now deleted those with openstack server delete.
There might be other resources but it's tricky to find them. I discovered openstack provides the project cleanup command, but that only works on existing projects, not on deleted ones.