Degraded performance of Prospect CRM

Incident Report for Prospect CRM's

Postmortem

Summary

Incident Start time: 09:45am

Incident End Time: 11:15am

Business Services Impacted: CRM SPA app for a subset of customers

A scaling change made on the Azure infrastructure to mitigate database performance issues appears to have caused degraded CRM performance. One server was restarted which caused a spike in response times for 5-10 minutes while it warmed up, but performance was still slow on another server.

The scaling change was reverted and the second server was restarted, which also experienced a smaller spike while it warmed, then normal CRM performance was resumed.

Timeline

09:45am CRM slowness was reported by customers. This was raised with our infrastructure team who started investigating the issues.

10:12am The scaling changes made on Friday were reverted and one database instance was restarted.

10:25am Issue was raised again internally by staff as the instance restart had not fully resolved all issues.

10:39am The Devops team reported that a second instance wn1mdwk0000V8 was still suffering from performance issues and this was rebooted around 10:46am.

Staff internally were advised the instances had been restarted and would continue to be slow for 5-10 minutes until they had warmed up.

This incident was marked as resolved at 11:17am once several customers had confirmed their performance issues were resolved.

Going Forward

We have identified the root cause of the issues caused by scaling the services, and have an updated deployment plan to stop these updates from causing performance spikes in future once any service is restarted.

The Devops team have an update to review and deploy, to work around the issue of Microsoft Azure sending requests to servers before they are ready. This should help resolve the performance spikes when updates are deployed and/or services are restarted.

Posted Dec 04, 2023 - 14:21 GMT

Resolved

This incident has been resolved.

Posted Dec 04, 2023 - 11:17 GMT

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Dec 04, 2023 - 10:49 GMT

Identified

The issue has been identified and a fix is being implemented.

Posted Dec 04, 2023 - 10:42 GMT

Update

We are continuing to investigate this issue.

Posted Dec 04, 2023 - 10:30 GMT

Investigating

Some customers have reported slower than expected performance of Prospect CRM and associated services. We are currently investigating and will provide further updates as information becomes available.

Posted Dec 04, 2023 - 10:28 GMT

This incident affected: Individual Prospect Apps (Prospect CRM App) and APIs, Login, and Core Platform.