Processing delays with accounts/ERP sync and order confirmations
Incident Report for Prospect CRM's
Postmortem

Summary

Incident Start time: 8:30am

Incident End Time: 9:30am

Business Services Impacted: Cloud integration services (including order confirmation) for a subset of customers

The Microsoft West Europe datacentre experienced an issue that meant that they had to switch from utility power to generator power, and a subset of these generators failed to take over as expected during the switch, leading to the impact. Power was restored by Microsoft engineers at 9am, most services were restored by 9:15, however the incident was still live on their status page (https://azure.status.microsoft/en-gb/status/) for hours after this time. Microsoft have posted their own Post-Mortem on their incident and plans going forward; https://azure.status.microsoft/en-gb/status/history/

 

The product team were able to quickly identify that the Orchestrated Integration Ring 2 was timing out requests (due to the outage) and quickly switched all cloud ERP integrations to direct to ring 1 that was still answering requests successfully. However, any order confirmation requests sent to Ring 2 during the outage had to be terminated, and customers were advised to retry the confirmation. 

Timeline

At approximately 8:30am on 20/10/23, requests being sent to Ring 2 of the Orchestrated cloud integration servers were no longer completing, due to a power issue at Microsofts West Europe datacentre. 

This was alerted both inhouse and by users, and the product team were already aware and had notified the support team when requests were coming in.

The Product Team Lead switched cloud integrations pointing at ring 2 to ring 1 which was not experiencing this issue, and any subsequent requests were completing successfully, this was completed by 9:30am. Any order confirmation requests sent to Ring 2 were unable to complete, and to avoid duplication, were terminated. This meant that the orders had to be requeued.

 

A list of affected orders were provided to the support team to ensure all customers affected directly had been informed and were able to resolve orders that had not confirmed successfully. 

This incident was left open until Microsoft closed theirs around 3pm. Product team plan on allowing the regular update process to disperse integrations back to ring 2 rather than perform any manual migration.

Posted Oct 25, 2023 - 11:41 BST

Resolved
This incident has been resolved. Microsoft have now successfully recovered the impacted services.
Posted Oct 20, 2023 - 14:40 BST
Update
Microsoft have restored power to the impacted infrastructure and are working on recovering the remaining services.

https://azure.status.microsoft/en-gb/status
Posted Oct 20, 2023 - 12:37 BST
Update
Microsoft have acknowledged the incident currently affecting their West Europe infrastructure and are working to restore service.

https://azure.status.microsoft/en-gb/status
Posted Oct 20, 2023 - 10:08 BST
Monitoring
A fix has been implemented and we are monitoring the results. Any orders confirmed to Cloud ERP systems prior to 9:30am and are sat in the "Waiting to be Confirmed" status will need to be cancelled, restored and reconfirmed.
Posted Oct 20, 2023 - 09:42 BST
Identified
The issue has been identified and a fix is being implemented.
Posted Oct 20, 2023 - 09:22 BST
Investigating
Some customers my be experiencing delays with data synchronising between their accounts/ERP system and Prospect, including order confirmations. We are currently investigating the issue and will provide further updates as information becomes available.
Posted Oct 20, 2023 - 09:15 BST
This incident affected: Automation and Integration (Cloud Inventory Management Integration Services, Hybrid & On-Premise ERP Integrations).