error Service Manager Issues [RESOLVED]

Jeremy · May 13, 2019

So today we are experiencing the issue in the image below on several peoples logins to various applications, there are also reports of slowness and 'cannot connect to database' errors.... is there something that is going on that we need to be aware of?

nasimg · May 13, 2019

Same here

Nasim

Jeremy · May 13, 2019

Searching is producing this error

davidrb84 · May 13, 2019

Checker on support says

stuartmclennan · May 13, 2019

Mine is also the same, can you please advise, we cannot gain access at all.

Victor · May 13, 2019

@all

We are experiencing an issue with one of our data servers. Our infrastructure team is working on this. Will keep you updated.

nasimg · May 13, 2019

Came back but has gone down again

Nasim

Victor · May 13, 2019

@Jeremy @nasimg @davidrb84 @stuartmclennan

~~The issue has been resolved and full functionality restored on all affected instances. Let us know if any issues still persist. We are looking to see what caused the issue.~~

EDIT: it appears the issue is not fully resolved yet, working on it...

stuartmclennan · May 13, 2019

@Victor Hi victor, our system has went down again.

Victor · May 13, 2019

@all

Please follow https://status.hornbill.com/ for further updates

Our infrastructure team is working on this. Will update with more info.

Victor · May 13, 2019

@all

Infrastructure team confirms the issue should now be resolved and full functionality restored. Let us know if any issues. We are looking to see what cause the issue and will update when we have more information.

We are deeply sorry for all the trouble this has caused.

Jeremy · May 13, 2019

thanks @Victor for the update, we are sure that you are all working hard for us and dealing with our long list of demands!

Deen · May 15, 2019

@all

Our Infrastructure team have completed their analysis and have determined that the root cause was due to the following:

At 14:09 our monitoring systems alerted us simultaneously to a number of issues with around 10% of our customer instances. All issues were related to performance of underlying disks on a given node which would have resulted in customers reporting below expected performance or occasional disconnects. . Our cloud team immediately identified the root cause as a disk concurrency issue effecting 1 of the underlying node and began reducing the load.

The issue was resolved by 14:13.

At 14:16 the same issue occurred again and we undertook the same steps to resolve. This was finally ended at 14:19

The root cause has been identified as a issue with session cloning (usually during elevation of Flowcode) when the same or multiple sessions are repeatedly cloned in a very short time and these have a large volume of cached data. The chance of this combination of events is small.

This caused concurrency issues with the other instances running on the same node/disks.

We have now identified the root cause and have a development plan to prevent the issue going forward (Session Cloning will no longer copy cached data unless forced) and we would expect to see this changed rolled out over the next few weeks (Given the likely hood of this occurring we do not see the need to produce a patch)

We apologise for any inconvenience this may have caused.

error Service Manager Issues [RESOLVED]

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in