Service unavailable across multiple instances [RESOLVED]

Victor · January 24, 2020

Service is unavailable across multiple instances. We are looking into this.

Stephen.whittle · January 24, 2020

Ours is back up Victor and is seemingly ok performance wise. Are you able to provide any assurance yet with regards to the resolution?

Stuart Torres-Catmur · January 24, 2020

Ditto - up for us too

Victor · January 24, 2020

The issue which caused the service outage has been fixed. We are investigating to identify what caused the issue and will come back with updates. We are truly sorry for all the troubles this issue has caused.

@Stephen.whittle as with any issue like this we need to investigate and understand the cause (what and how). Once this is done we will post the details on https://status.hornbill.com/ and on the forum thread.

Stephen.whittle · January 24, 2020

Thanks as always @Victor

nasimg · January 24, 2020

Hornbill has stopped again

Victor · January 24, 2020

@nasimg we're looking into it

Victor · January 24, 2020

@nasimg your instance should now be back up and running

nasimg · January 24, 2020

Yes - hopefully its going to stay up.

Victor · January 28, 2020

@all

We have now completed the RCA for this incident. Details as follows:

On Friday, 24th January, at 10:47, our monitoring systems alerted us to a problem with our configuration servers. The root cause has been identified and relates to an error that caused a partial replication of our configuration database. During this time, some customer instances were effected and exhibited extended API response times, which for all intents made the affected instances unavailable. The problem was detected and rectified by 10:58, 11 minutes after the first alert was raised.

We have levels of redundancy built into our configuration database deployment, but this was a new failure scenario we have not previously seen, and our resilience strategy failed under these conditions. We have now implemented a fix and deployed to production to ensure that in the future, under similar circumstances, we will not see the same failure mode.

We unreservedly apologize for all inconveniences this caused.

Sign In

Service unavailable across multiple instances [RESOLVED]

Recommended Posts

Victor

Link to comment

Share on other sites

Stephen.whittle

Link to comment

Share on other sites

Stuart Torres-Catmur

Link to comment

Share on other sites

Victor

Link to comment

Share on other sites

Stephen.whittle

Link to comment

Share on other sites

nasimg

Link to comment

Share on other sites

Victor

Link to comment

Share on other sites

Victor

Link to comment

Share on other sites

nasimg

Link to comment

Share on other sites

Victor

Link to comment

Share on other sites

Browse

Activity

Hornbill

Supportworks