Investigating reports of Non-Streamed data missing from CPM Monitoring

Incident Report for Vyopta

Postmortem

The incident was primarily caused by a critical microservice responsible for data processing experienced an unhandled exception that led to a crash. Unfortunately, this crash went undetected, causing a data processing stall. As a result, data processing tasks were not completed within the expected timeframe

Upon discovering the data processing stall, our operations team promptly initiated an investigation to identify the cause. To resolve the issue, we first restarted the crashed microservice, which allowed the data processing to resume. With the investigation we did not detect any data loss.

We are continuiously working on improving our monitoring infrastructure to include automated checks for microservice health and performance. This includes proactive monitoring of critical metrics, such as resource usage, response times, and error rates, to detect and alert us of any anomalies or crashes.

We regret the inconvenience caused by the data processing stall resulting from the undetected microservice crash, and we are committed to implementing the necessary measures to prevent such incidents in the future.

Posted Jul 06, 2023 - 12:01 CDT

Resolved

We have been monitoring the service since the fix has been implemented and have not seen the issue arise. All Non-Streamed data going to CPM Monitoring has been processed and no service delays are observed.

Posted Jul 05, 2023 - 08:46 CDT

Monitoring

The service responsible for sending Non-Streamed data to CPM Monitoring had failed. The service was started and Non-Streamed data is now making it through. The service is processing all the data, therefore a delay in Microsoft Teams and Cisco Call Manager will be observed. We will monitor the progress and update once complete.

Posted Jul 03, 2023 - 14:40 CDT

Investigating

We are currently investigating reports of Non-Streamed data missing from CPM Monitoring. This currently impacts Microsoft Teams calls and Cisco Call Manager.

Posted Jul 03, 2023 - 14:00 CDT

This incident affected: Commercial - my.vyopta.com (CPM Monitoring).