Jeremy Posted May 13, 2019 Posted May 13, 2019 So today we are experiencing the issue in the image below on several peoples logins to various applications, there are also reports of slowness and 'cannot connect to database' errors.... is there something that is going on that we need to be aware of?
stuartmclennan Posted May 13, 2019 Posted May 13, 2019 Mine is also the same, can you please advise, we cannot gain access at all.
Victor Posted May 13, 2019 Posted May 13, 2019 @all We are experiencing an issue with one of our data servers. Our infrastructure team is working on this. Will keep you updated.
Victor Posted May 13, 2019 Posted May 13, 2019 @Jeremy @nasimg @davidrb84 @stuartmclennan The issue has been resolved and full functionality restored on all affected instances. Let us know if any issues still persist. We are looking to see what caused the issue. EDIT: it appears the issue is not fully resolved yet, working on it...
stuartmclennan Posted May 13, 2019 Posted May 13, 2019 @Victor Hi victor, our system has went down again.
Victor Posted May 13, 2019 Posted May 13, 2019 @all Please follow https://status.hornbill.com/ for further updates Our infrastructure team is working on this. Will update with more info.
Victor Posted May 13, 2019 Posted May 13, 2019 @all Infrastructure team confirms the issue should now be resolved and full functionality restored. Let us know if any issues. We are looking to see what cause the issue and will update when we have more information. We are deeply sorry for all the trouble this has caused.
Jeremy Posted May 13, 2019 Author Posted May 13, 2019 thanks @Victor for the update, we are sure that you are all working hard for us and dealing with our long list of demands! 1
Deen Posted May 15, 2019 Posted May 15, 2019 @all Our Infrastructure team have completed their analysis and have determined that the root cause was due to the following: At 14:09 our monitoring systems alerted us simultaneously to a number of issues with around 10% of our customer instances. All issues were related to performance of underlying disks on a given node which would have resulted in customers reporting below expected performance or occasional disconnects. . Our cloud team immediately identified the root cause as a disk concurrency issue effecting 1 of the underlying node and began reducing the load. The issue was resolved by 14:13.At 14:16 the same issue occurred again and we undertook the same steps to resolve. This was finally ended at 14:19The root cause has been identified as a issue with session cloning (usually during elevation of Flowcode) when the same or multiple sessions are repeatedly cloned in a very short time and these have a large volume of cached data. The chance of this combination of events is small. This caused concurrency issues with the other instances running on the same node/disks.We have now identified the root cause and have a development plan to prevent the issue going forward (Session Cloning will no longer copy cached data unless forced) and we would expect to see this changed rolled out over the next few weeks (Given the likely hood of this occurring we do not see the need to produce a patch) We apologise for any inconvenience this may have caused.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now