Jump to content

Service unavailable across multiple instances [RESOLVED]


Victor

Recommended Posts

The issue which caused the service outage has been fixed. We are investigating to identify what caused the issue and will come back with updates. We are truly sorry for all the troubles this issue has caused.

@Stephen.whittle as with any issue like this we need to investigate and understand the cause (what and how). Once this is done we will post the details on https://status.hornbill.com/ and on the forum thread.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

  • Victor changed the title to Service unavailable across multiple instances [RESOLVED]

@all

We have now completed the RCA for this incident. Details as follows:

On Friday, 24th January, at 10:47, our monitoring systems alerted us to a problem with our configuration servers. The root cause has been identified and relates to an error that caused a partial replication of our configuration database. During this time, some customer instances were effected and exhibited extended API response times, which for all intents made the affected instances unavailable. The problem was detected and rectified by 10:58, 11 minutes after the first alert was raised.

We have levels of redundancy built into our configuration database deployment, but this was a new failure scenario we have not previously seen, and our resilience strategy failed under these conditions. We have now implemented a fix and deployed to production to ensure that in the future, under similar circumstances, we will not see the same failure mode.

We unreservedly apologize for all inconveniences this caused.

Link to comment
Share on other sites

  • Victor locked, unpinned and unfeatured this topic
Guest
This topic is now closed to further replies.
×
×
  • Create New...