USCA_4 service issues after AC and power outages
Incident Report for Bandwagon Host Status
Resolved
Confirmed no outstanding issues. We are closing this incident.
Posted Jun 07, 2024 - 10:59 PDT
Update
Node 1121 is online now. All remaining VMs will start within 20 minutes.

We are performing a final verification of all systems to make sure there are no outstanding items before closing this incident.
Posted Jun 07, 2024 - 10:21 PDT
Update
Node 1121: RAID controller has been replaced however it requires a firmware update. We're starting the update procedure.
Posted Jun 07, 2024 - 09:35 PDT
Update
Node 1123 has been restored. All VMs will start within 15 minutes.

The last node 1121 is still being worked on.
Posted Jun 07, 2024 - 09:19 PDT
Update
Most nodes have been restored.
Node 1120 is starting, all VMs should be up within 15 minutes.

Node 1121 is having RAID issues, we are replacing the controller.
Node 1123 is having CPU issues, we are preparing a replacement chassis to swap all SSDs.
Posted Jun 07, 2024 - 08:20 PDT
Update
We have restored approx. 50% of affected VMs in USCA_4. Unfortunately, we are facing hardware failures due to earlier datacenter overheating issue.
Posted Jun 07, 2024 - 07:04 PDT
Update
We are continuing to work on a fix for this issue.
Posted Jun 07, 2024 - 06:00 PDT
Identified
We have determined that the temperature sensors in all servers show erratic values after the last incident. We are going to gracefully shut down all equipment and attempt to fully reset iDrac controllers.
Posted Jun 07, 2024 - 05:23 PDT
Investigating
We have noticed degraded performance of the USCA_4 datacenter after the previous issue was resolved. We are investigating this further.
Posted Jun 07, 2024 - 05:14 PDT