Jump to content

Victor

Administrators
  • Content Count

    4,141
  • Joined

  • Last visited

  • Days Won

    100

Victor last won the day on March 27

Victor had the most liked content!

Community Reputation

524 Excellent

6 Followers

About Victor

  • Rank
    Hornbill Sith Lord
  • Birthday 02/22/2010

Recent Profile Visitors

3,074 profile views
  1. @Anthony Albon have aread on our wiki here: https://wiki.hornbill.com/index.php/Using_the_Timesheet_Manager_Plugin and here: https://wiki.hornbill.com/index.php/Configuring_the_Timesheet_Manager_Plugin_for_Service_Manager. General info about TM is here: https://wiki.hornbill.com/index.php/Timesheet_Manager Let us know if you have any further queries
  2. @all On 26/03/2020 we deployed a platform software automatic update in our host controllers which was applied on all live instances overnight. Code changes in this update caused IPC blocking behavior that would cause Hornbill APIs to become unresponsive and backlog. The APIs would eventually time out and fail which resulted in service disruption and outage across multiple customer instances. How this event unfolded: at 07:20 customers reported that their mailbox is receiving email processing failures notifications. Initial investigation revealed there was an issue processing certain emails where Hornbill was unable to process them. Initial measures revolved around fixing this issue and initial measures indicated the issue was resolved. At 08:30 it became apparent the issue was not isolated to the mail service and other areas were also affected (confirmed by reports of various issues with BPE) and the initial measures also did not appear to be effective as the issue resurfaced. Further investigation revealed that for an unknown reason at that time, all HTTP connections were reporting failures, suggesting the issue is affecting the instance services. We deployed a further set of measures by restarting services on all affected instances. Although this resolved the issue, it only worked briefly, the issue resurfaced shortly after the restart which indicated the issue lays somewhere outside the instance services. At 08:45 we located the issue in the logging system managed by the host controller and a decision was made to roll back the host controller update deployed the previous day and applied overnight. We immediately started the rollback on all live nodes and we completed this operation at 10:20 when full service was restored across all customer instances. While we know the issue lays somewhere in the event logging mechanism we don't have the full details of how the IPC blocking occurred and we are conducting further investigations. As an interim measure, we will not update our host controllers until the issue is fully understood and the code changes to prevent the issue from happening. Further measure includes additional checks in place to ensure the event log mechanism will not cause this or similar issues in the future. We unreservedly apologise for all the troubles caused by this service outage. If you have any queries about this or if we can be of assistance with anything else please let us know.
  3. @Alberto M currently is an intermittent issue, devs and cloud teams are looking into it... there seems to be something triggering a heavy load, cloud is fixing it but it reappears... we're looking into it.
  4. @Alberto M yes there were a few instances under heavy load due to a database server having a temporary issue. However it should be ok now...
  5. @Tina.Lapere Yes Just to clarify what setting the Action does. So for any attribute (like status) there is an action field. Setting this action will determine how or better said when the respective attribute is set, is the respective attribute value set when a user is created, is the attribute set when the user is updated or both. LDAP import tool will go through all the records/users atsource and will import them into Hornbill. If the user at source does not exist it will be created. If the user exists, it will be updated. The action field only determines when the respective attribute is set. Hope it makes sense.
  6. @Tina.Lapere Yes, the command line is what the scheduler runs. There is an example on https://wiki.hornbill.com/index.php/LDAP_User_Import at the bottom in Scheduling Overview that shows an example. There is also a Testing Overview section that's worth looking through. Enabling and disabling and setting various options in "Command Line" interface will adjust the command line accordingly. Once you have a set up in place you will have a command ready to run. The fields marked with red are mandatory. For example if you turn ON the dryrun argument will change the command line to include dryrun=true or if you put a value for logprefix it will be reflected in the command line.
  7. When using the v3, the "old config file" won't be used anymore... the import will pick up the config from Hornbill. There is a "Command Line" tab that shows you what and how it needs to run
  8. @Gareth Watkins sorry for the later reply. The issue you highlighted is not the reason why you experience this. It's something else, which we also noticed internally and is being looked at.
  9. @all The fix for this morning issue has been deployed across all affected instances and full service restored in all instances. We are deeply sorry for all the troubles this has caused for you. If there are any other issues resulted from this disruption please let us know via the ongoing support requests (if you have one) or let us know here so we can assist with this. We are also conducting a postmortem analysis of the incident and once this analysis is complete we will come back with the details here.
  10. @Paul Alexander thanks for the update. Just t confirm, I did see yours but main focus is now to ensure that service on all instances is restored. Then I will come back to each individual mention/issue/follow up issue afterwards.
  11. @all The status page was updated as we are deploying a fix on all instances. The fix was/is deployed on an instance by instance basis so full service should be restored on all instances one by one. Deploying the fix is quite quick so I would expect all instances to be fully operational in the next 20 min. If you still experience issues after this please let me know. The fix does not apply retroactively so if you have workflows in a suspend state or emails that were not processed we will have to look at this individually.
  12. @Shamaila Naim no, we can see the issue from here. Your instance was addressed, you should no longer see any issues from now on, let us know please if you notice anything new... @Logan Graham yes, various functionality is affected...
  13. @all Some instances are fixed, some are still affected. We are working as fast as we can to restore the service everywhere. I am really sorry for this but unfortunately we need to fix each instance individually so some will be restored sooner and some a with a slight delay. But I assure you we are working as fast as we can.
  14. @Paul Alexander see my above comment
  15. @all We will be restarting all affected instances so you might notice 1-2 min of complete downtime.
×
×
  • Create New...