Jump to content

Keith Stevenson

Root Admin
  • Posts

    2,613
  • Joined

  • Last visited

  • Days Won

    40

Everything posted by Keith Stevenson

  1. All, Below is a list of common reasons why a message may fail to be delivered. The Audit trail (shown above by clicking on the Red Mail icon against the message) will give you a status code showing the reason for the failure. Note that nearly all these errors will be outside of Hornbills control as they refer to the recipients server (or service that sits in front of recipients server such as MessageLabs or Mimecast for spam protection) over which we have no control. 420 - Timeout - The recipients server timed out during sending and reset the connection. You should contact the recipients mail server administrator and request that they increase their timeout. 421 - Too Busy - The recipients mail server is too busy to process the request. If this occurs repeatedly (and every time we retry we get this error) You should contact the recipients mail server administrator and request that they increase their server resources to accommodate expected traffic. . 422 - The Size of Mail (Attachments) exceed the recipients mailbox quota. You should contact the recipients mail server administrator and request that they increase their quota or free some space. 450 - Unable to relay - The server you have specified in your SMTP configuration is unable to relay the message. You should either contact the SMTP server administrator and ask them to allow relay or choose a different SMTP Server (Note that this will not occur if using Hornbill Direct Outbound email). This list is not exhaustive but should cover most errors. If you have any questions concerning specific error codes not listed above please feel fee to ask A more complete list along with links to the appropriate RFC is available from https://www.iana.org/assignments/smtp-enhanced-status-codes/smtp-enhanced-status-codes.xhtml Kind Regards Keith Stevenson
  2. All, We have added new functionality to make it easier to see failed delivery emails. This is shown below as a red triangle and count of Unread messages in sent items. Note that this will now show all previously failed deliverys not just new items. Therefore even though the count may be high these could all be historic. In Hornbill the process for sending of email is as below User clicks Send Email Email goes into Outbox and Server attempts to send to recipients. If email is successfully sent to all recipients it is moved into Sent Items and Marked as Read If email is not sent to any of the specified recipients it is retried (this happens upto 9 times with retry period getting longer each time, Starts at 1 minute , 5 minutes, 30 minutes, 60 minutes,120 minutes , 240 minutes, 480 minutes, 960 minutes, 1440 minutes). The email at this point will remain in the outbox. If email is still not sent to any of the specified recipients after all 9 tries it is moved into the Sent Items, marked as Failed (See below) and Unread. The counter against the Sent Items is increased by 1 and the red triangle shown. The failed message can be found by going into the Sent Items, clicking the order by and choosing Status. Then the arrow to get all unread at the top. The failed delivery recipient is shown as a red mail icon (If your email had multiple recipients several may be green with 1 or more red.) as below If you click on the Red Mail Icon and choose Delivery Status you will be able to see the full audit trail of the attempts to deliver the message. There may be many reasons why an email has failed to deliver (From misconfiguration to non existent sender.) (See subsequent post where I will explain some of the reasons in greater detail). However once you have reviewed the failed email and decided the root cause you can highlight the email and choose Mark As Read, removing the item from the count of failures. We hope this clarifies the new notification. Kind Regards Keith Stevenson
  3. All, That query looks a little wrong and its unlikely to ever return so will most likely time out. We will review your requirements and post back a better solution shortly. In the meantime can we ask you not to run the above as it may degrade your instances performance whilst it locks the tables running the queries. Kind Regards Keith Stevenson
  4. Luckily the temperature has been too hot to turn on main PC (It has more radiators than than the house for cooling and when its already 30c inside at 5AM adding more heat is bad idea..) so not been on steam for over 2 weeks (which for me is near eternity).. That said I still have over 1k games that I have played yet so Im happy to wait for Winter sale...
  5. Josh, I know a couple of our customers on role out used the idea of lottery and informed their end users that for every call logged via the portal they would be entered into a lottery (doesnt have to be much but something like Alexa or google home as prize). People soon stopped emailing and instead opted for chance to win something. Kind Regards Keith Stevenson
  6. All, Please see below the full post mortem. We have already taken steps to ensure that this issue can not occur again and have re-escalated the issue to developers in the CentOS team. HTL INCIDENT REPORT 020720180001 At 15:20 on Monday 2nd July 2018 our monitoring detected Slow queries and Virtual cache locks on one of our database servers, the root cause was found to be an XFS cache\write issue which was preventing subsequent writes. All queries were therefore stacked as pending in memory. The resolve was to clear the cache locks which was performed at 15:24 and all was resolved at 15:30. A similar type of issue has occurred before and unfortunately has not been resolved in a later kernel as anticipated. Post incident we have now set a scheduled job to clear the cache\locks every 30 days (Given that we have only ever seen this issue 3 times over 2 years on less than 2% of databases servers we expect this frequency to be more than sufficient) and escalated this to the OS developers. The schedule for clearing will be reviewed every few days up until the 30 days to ensure this is sufficient. Kind Regards Keith Stevenson
  7. All, The issue has now been resolved and the instances effected once again operating as expected. We will provide a full post mortem shortly Kind Regards
  8. Dear Ricky. Over the last few days we have noticed a number of issues with logging into Office365. Mostly this is intermittent and Microsoft report the following. If you are sure that the credentials are correct and IMAP login has been permitted for the given user in Office365 then the below is most likely the root cause and if so contacting Microsoft is the best course of action. Kind Regards Keith Stevenson
  9. Steam Summer sale is not good.. Already have too many games that I wont have time to play but there on Sale so must be purchased. I need more time.
  10. All, The fix has now been tested (1 line change so small testing) and deployed to live. A fuller post mortem on how this occurred and why the issue was not caught in testing will be provided in due course. Kind Regards Keith Stevenson
  11. All, We have now produced a fix on our dev servers and are testing this to ensure it functions as expected. Once confirmed we will push this out to live. We expect this to take no longer than 1 hour (Worst case) and will provide a further update in 15 minutes. Kind regards Keith Stevenson
  12. All, Thanks for the posts. We have identified the root cause and are in the process of providing a fix. We will update this post in next 20 mins. Kind Regards Keith Stevenson
  13. Stuart, Thanks for the post. We can inform you we only support open protocols and not Microsoft's own propriety protocols. Below are the full list of supported protocols (Encrypted ones are POP3 over SSL\TLS and IMAP4 over SSL\TLS) https://wiki.hornbill.com/index.php/E-Mail_Protocol_Support Kind Regards Keith Stevenson
  14. Richard, Just to clarify. This is not a security incident with Hornbill but with your Exchange server. Hornbill fully supports (and has done since day 1) TLS and SSL for POP, IMAP and SMTP. Once you have enabled this on your exchange server you can simply change the connector to utilise the required encryption via the Admin portal. Kind Regards Keith Stevenson
  15. All, Thanks for the post. This was due to be expired today as the Trust page is being moved to www.hornbill.com for the general information and https://status.hornbill.com/ for the site status\uptime and outage reporting ,. We are just awaiting the announcement to go out. Was there anything specific you were after that I can provide in meantime? Kind regards Keith Stevenson
  16. Im a casual gamer. Main Machine I7 - 4770K CPU 3.50GHz @ 4.6Ghz 32 GB RAM Dual AMD Fury X 256GB SSD C 3TB SCSI D 990GB SSD E (48TB in NAS) Dual Samsung 28" 4K Monitors Razer Firefly Mousemat Razer Naga Mouse Razer BlackWido Chroma Keyboard HTC VIVE Main Laptop MS Surfacebook - I7 16GB RAM 512GB SSD Consoles PS1/PS2/PS3/PS4 XBOX/XBOX360/XBOX1 SNES Favorite Games - Anything by Bethesda Stats from Steam
  17. Shamaila, This should now be functioning. We noticed an issue with your Instance but its been resolved now (Will provide further details once we understand root cause) Kind regards
  18. Rohit, We have completed the investigation and identified the root cause. For a given API request type the server was unable to connect to the underlying service over a secure socket (SSL). This caused an error which was unhandled exception that caused the service to fail. Our development team have already identified the offending lines of code and a fix provided which will be deployed in the next update. (We have also checked that all other API request types already have the appropriate error handling in place). We hope this clarifies your query. Kind Regards Keith Stevenson
  19. Rohit, We are reviewing the logs to understand what happened and will provide an update as soon as its clear. Kind Regards
  20. Rohit, Can you confirm this is now working. Kind Regards
  21. Nasim, Thanks for the confirmation. Unfortunately the error message is somewhat generic and doesnt relate to the issues last week (See https://community.hornbill.com/topic/11663-hornbill-connectivity-issues/ for overview of those. We will continue our investigation and look at a less agnostic error message. Kind Regards Keith Stevenson
  22. Nasim, We have looked at your instance and can see people\analysts performing requests at a fair rate. Can you confirm that this is still effecting people and if so does it effect all analysts. Kind regards Keith Stevenson
  23. As with any customer impacting issue (Performance or Outage) we conduct a review\post mortem of the event to understand the root cause, identify any changes that are needed and put in place steps to ensure that the likelihood of a similar event in the future is reduced. This week we have had three short time windows during which there are around 10% of our customers in production would have experienced issues (Either slowness or timeout errors), so wanting to be totally transparent we wanted to publish more information about these three incidents. Hornbill has seen significant take-up as our customers adopt and expand use and find new and interesting ways of using our platform, API’s and integrations. We monitor all aspects of our systems and application performance and take the quality of the service very seriously. The ever-evolving patterns of use mean we are always tuning and optimising our systems. We are currently in the process of re-structuring our platform because we are transitioning to a 100% micro services architecture, a fundamental function of which has to be service discovery and to properly facilitate that we have had to roll out a new TCP addressing scheme. In parallel with that we are periodically introducing new compute resources (servers, storage etc.) to ensure we can take advantage of the latest hardware to get optimum performance. It is natural for a service provider like us to make these types of changes and its generally totally transparent to our customers. However, the overall root cause for the issues was ultimately a combination of unexpected individual outcomes of the work we are doing in these areas. In summary, issues would have effected around 5% in of our EU customers for a total of 2 Hours during which time they would have seen intermittent errors or slow performance (Around 2% would have experienced both issues), so while on the face of it these were all the same problem, there were actually three completely separate issues that looked like the same thing. HTL INCIDENT REPORT 131120170001 At 14:00 on Monday 13th Nov 2017 our monitoring systems detected slow queries on one of our primary database servers, the root cause was found to be exceptional disk load on its hypervisor due to migration of some services from one HV to another. Under normal circumstances a migration of this nature would be fine and quite fast, but there was also a high load (IOPs) on the underlying storage which slowed things right down. It was unwise to terminate the migration once started and so we had to make the decision to let it complete. The action was completed at 15:48 and all monitors reported normal. We do not migrate services like this every day but the constant storage writes and the impact this had on the performance of the service for some customers was unexpected. We have now changed our process such that any virtual machine with either a disk over 100GB or high IOPs will not be moved during core hours for its physical location. We have good statistical information about when services are busy and quite so this is easy to plan for. HTL INCIDENT REPORT 141120170001 At 09:45 on Tuesday 14th Nov 2017 a subset of customers started to report issues with failures due to some SQL queries failing. It was the same set of customers impacted as 131120170001 and the cause was down to zombied database connections. In our platform, all SQL queries are run over HTTP using our own in-house developed HTTP <-> MySQL proxy. The proxy handles connection pooling, reuse and database security. As a result of the migration of services and IP changes, our data service proxy ended up with TCP connections that it though was still active, but were not. A simple restart of the data proxies resolved this issue but it took us a few minutes minutes to figure out what was happening. The resolve was to restart the Data services on the effect nodes and all connections were resolved by 09:56. Post incident, we are making some code changes to the data service proxies to deal with unexpected zombie connections under these specific circumstances. This fix will be rolled out in the next week or so. HTL INCIDENT REPORT 141120170002 At 17:20 on Tuesday 14th Nov 2017 monitoring detected Slow queries on one of our database servers, the root cause was found to be a virtual disk lock applied by XFS which was preventing subsequent writes. All queries were therefore stacked as pending in memory. The system was performing slowly because it was reaching memory limits and swapping, which was slowing things down. Once we recognised the problem and cleared the lock at 17:26 and all monitors reported normal. The lock was caused, in part, by the original issue on Monday and the creation of the large delta file for the migrated virtual machine. Post incident a new monitoring check has been added to specifically look at the XFS cache\locks and alert any issues before it would impact in the way it did. HTL INCIDENT REPORT 151120170001 At 19:01 on Thursday 16th Nov 2017 customers reported that new login requests were unsuccessful. Existing connections remained functional. The root cause was found to be related to the IP address changes that were made earlier in the day, which were undetected until the DNS cache expiry around 19:00. There appears to be a reference in our frontend application code to a legacy API endpoint name which we made change to some months ago, the new IP scheme did not include this legacy endpoint. Post incident we have reduced the TTL on all DNS records internally and have reviewed our change process with the additional step of any IP changes requiring a DNS. We are not expecting a recurrence of this type of problem because have no expectation there will ever be a requirement to change our IPv4 addressing scheme in the future. We apologise unreservedly for any inconvenience caused during this week, and we will continue to do everything in our power to ensure we do not see a repeat of the above issues. The nature of IT means these sorts of things happen from time to time, and when they do we take these things very seriously and worry about them so our customers don’t ever have to.
  24. Giuseppe, Thanks for the reply. From the logs we can see that this stopped working at 00:10 this morning. Can you confirm that you can login to Outlook Portal as servicedesk@datalogic.com (rather than login as another user and switch to the mailbox). If so then re-entering the Password on the mail connector properties in Hornbill Admin might resolve this. Kind regards Keith Stevenson
×
×
  • Create New...