Jump to content

Workflow suspended indefinitely at suspend wait for category node - how do I stop this?


Estie

Recommended Posts

Hi,

We have a large number of suspended workflows which seem to have been suspended indefinitely at the 'suspend wait for category node'

This seems to have been happening for a large number of incidents since we started using Hornbill back in 2021.

This means that the workflow is not processing other stages - importantly the resolve timer is not being marked, and we are not receiving feedback.

Our main incident process has a lock resolve node followed later by suspend wait for owner, priority, and category nodes in the initial assessment stage. 

The owner and priority suspend nodes seem to be working as expected.

The suspend wait for category node seems to have caused many of our workflows to be suspended indefinitely.

I have noticed the following in our BPM-

- There is no stage checkpoint for the category being set

- There is no expiry set on the suspend wait for category - but the suspend wait for priority node which works does not have an expiry set

Would these be the problem?  Should these options be set in the BPM?

If a manual action - ie adding a category - is not taking place what happens to the suspend node? 

My understanding is that there is a max count loop of 1000 set on workflows.  What happens when this is reached?

Happy to upload a screenshot of the BPM if needed.

 

 

Link to comment
Share on other sites

9 minutes ago, Stefania Tarantino said:

We have a large number of suspended workflows which seem to have been suspended indefinitely at the 'suspend wait for category node'

The first step is a simple one - have these Requests been assigned a category?
If not, they will remain suspended until they are.

Link to comment
Share on other sites

@Steve Giller- no they haven't.
However a large number of the incidents have been resolved or closed.

When investigating it appears that the lock resolution action has somehow been overridden.  This seems to happen when an incident is reassigned to another team without an owner (before resolution and without a category).

This is quite an issue for us.

 

Link to comment
Share on other sites

@Stefania Tarantino

19 minutes ago, Stefania Tarantino said:

The owner and priority suspend nodes seem to be working as expected.

The suspend wait for category node seems to have caused many of our workflows to be suspended indefinitely.

Does this mean that in your view, the Wait For Category node does not work as expected? What is your expectation in regards to this node?

19 minutes ago, Stefania Tarantino said:

There is no expiry set on the suspend wait for category - but the suspend wait for priority node which works does not have an expiry set

Would these be the problem?  Should these options be set in the BPM?

What would be the problem? Is that the workflows should have progressed to completion? If yes, then this is what @Steve Giller advised above, the workflow will suspend until the category is set on the request. If the node is not set with an expiry period then it will wait indefinitely, until the category is set. As to if the node should be configured with an expiry, it entirely depends on how you want/need it to work... if it should not wait indefinitely then set up an expiry that will progress the workflow automatically when the threshold is reached.

19 minutes ago, Stefania Tarantino said:

If a manual action - ie adding a category - is not taking place what happens to the suspend node?

As per above, the workflow will be suspended at this node indefinitely. Only setting the category will resume the workflow, no other action will do this.

 

19 minutes ago, Stefania Tarantino said:

My understanding is that there is a max count loop of 1000 set on workflows.  What happens when this is reached?

What is your understanding of "max count loop"? I think there is some confusion here in regards to "max count loop" and the suspend node...

Link to comment
Share on other sites

@Victor @Steve Giller
 

What I would like is for the category to be set before the incident is progressed.  I am expecting that if the category is not set the incident will not be able to be resolved or closed given that we have a lock resolve action node in place in the BPM.

Perhaps the issue is not the suspend wait for category but the lock resolve action which somehow appears to be being overridden.

About a month ago we revoked a full access role from many of our analysts, which may have caused them to be able to override the lock resolve action. 

However we are still seeing incidents being resolved/closed without a category.

 

 

Link to comment
Share on other sites

@Stefania Tarantino

2 minutes ago, Stefania Tarantino said:

What I would like is for the category to be set before the incident is progressed.  I am expecting that if the category is not set the incident will not be able to be resolved or closed given that we have a lock resolve action node in place in the BPM.

There are ways to achieve this but I need to clarify that there is no specific built-in functionality to prevent closing a request without a category. Another thing to clarify is that workflows are requests are two independent entities that can do things and progress independently of each other. They can of course interact and they can be designed to interact but in essence, they are separate entities with their own paths and lifetimes. This means that, for example, a request can be manually closed by an analyst while the workflow does something completely different or simply is suspended at some point that has nothing to do with a request closure. This is an important aspect to keep in mind when designing workflows.

If you want (to enforce) the category to be set on a request before is closed, one approach is locking actions. You can have the workflow lock the Resolve action and unlock it once the category has been provided. As you noticed, there are SM roles that can override the lock so analysts need to have the appropriate roles and rights for this to work. Also, you would need to ensure the request cannot be closed by another means, such as auto tasks which can run from custom buttons.

12 minutes ago, Stefania Tarantino said:

However we are still seeing incidents being resolved/closed without a category.

If you have all the right configs in place, then this might be an issue that we should investigate, so if that's the case raise a support request with us.

What about the "max count loop"? This has nothing to do with suspend nodes so I wanted to clarify this as well, if needed.

Link to comment
Share on other sites

42 minutes ago, Stefania Tarantino said:

When investigating it appears that the lock resolution action has somehow been overridden.

The most likely reason for this is Users with "Full Access" Roles - see the Access Control section of the Service Manager Business Process Workflow wiki page.

Again, if this is the reason then whether a need to educate Users in your process or to reduce the level of access some Users have would depend on your internal processes. It may be that a Custom Role is required to allow selective aspects of a "Full Access" Role.

Link to comment
Share on other sites

22 minutes ago, Victor said:

@Stefania Tarantino

If you want (to enforce) the category to be set on a request before is closed, one approach is locking actions. You can have the workflow lock the Resolve action and unlock it once the category has been provided. As you noticed, there are SM roles that can override the lock so analysts need to have the appropriate roles and rights for this to work. Also, you would need to ensure the request cannot be closed by another means, such as auto tasks which can run from custom buttons.

If you have all the right configs in place, then this might be an issue that we should investigate, so if that's the case raise a support request with us.

What about the "max count loop"? This has nothing to do with suspend nodes so I wanted to clarify this as well, if needed.

In that case given that we are already using the lock resolve action I will raise a support ticket.

I am not really sure what the max loop count is - I assumed that this is the max number of times a process is checked eg for if a category has been set yet.

Is that the case?

 

22 minutes ago, Victor said:

@Stefania Tarantino

Another thing to clarify is that workflows are requests are two independent entities that can do things and progress independently of each other. They can of course interact and they can be designed to interact but in essence, they are separate entities with their own paths and lifetimes. This means that, for example, a request can be manually closed by an analyst while the workflow does something completely different or simply is suspended at some point that has nothing to do with a request closure. This is an important aspect to keep in mind when designing workflows.

I have come to realise this and am a bit confused as to why this is the case

I am not sure what the point of suspending the workflow is if the requests can work independently.  Something which is causing us extra workload.

Link to comment
Share on other sites

@Stefania Tarantino I think the keyword here is can... the request and workflow can (and at many staged they do) work independently. Obviously this does not mean that a request and it's associated workflow are completely independent... only means that they can be, or better said can perform actions independently, at various points in their lifetimes.

I'll try and make this more clear with two examples:

First example: we have, let's say, a very simple basic scenario. An alert email creates a request. This request will always be assigned to team A user X, will have the category NNN and will email user or contact Y something. Nothing more, this how that particular service desk operates for this scenario. Here, you can have the workflow automate some actions so user X does not have to perform them manually. The workflow here will initiate at the same time as the request (this is always like this). The workflow then will assign the request, set the category and send the email and then it will end (complete). All this time the workflow performed these actions there was no manual action on the request. There wasn't any analyst to perform an update, set a priority or otherwise action on that request in any shape or form. You might also notice in this scenario that once the workflow completes, the request is very much active. It's still in a "new" state, still appears in user X queue. At this point user X can perform other actions on the request, for example, set a priority, send another email to a possible interested party. Can resolve and close the request. All these actions performed by user X happened outside the workflow, they happened while the workflow was completed thus having no influence in any shape or form on the request. From all this perspective we can say that the workflow acted independently from the request (when it assigned categorised and emailed) and the request acted independently from the workflow (when it was prioritised resolved and closed).

Second example: we now have a more common scenario, which is a, let's say, "classic" auto closure sequence. You have a request with an associated workflow. Upon closing the request, the service desk will wait for customer feedback for 2 working days. If the customer feedback is received within the timeframe, depending on the feedback the request reopened. If no feedback is received after 2 days, an email is sent to customer as a reminder for feedback then the service desk will wait for customer feedback for another 2 working days. If the customer feedback is received within the new timeframe, depending on the feedback the request can be reopened. If no feedback is received after these other 2 days, no further actions are taken. Here you can have the workflow automate this sequence so the user/analyst does not have to specifically monitor, chase and progress the respective request based on the feedback. Initially, in the early stages, you can have the request and workflow doing a number of things. And then, upon closure (when the request is closed) you can have the workflow perform the auto-closure sequence (wait for feedback - with expiry perhaps, when not ok feedback is received, reopen the request, when feedback expired, send email reminder and repeat from wait for feedback). You can see here that once the request is closed, the workflow is very much active and will perform a number of things. The analyst does not interact with the request anymore (unless is reopened) and assuming that ok feedback is received or no feedback is received, the workflow will wait and chase completely independent from the request, there is no other activity on this request since closure.

I hope this clarifies a bit in what way and how requests and workflow can (and many times will) perform actions independently. In most scenarios, the request and it's workflow have many touching points, they often interact with each other and, if one desires, they can be configured to work and behave in perfect sync, although this is not really practical and not the true purpose of a business process.

3 hours ago, Stefania Tarantino said:

Something which is causing us extra workload.

I would like to know more in what way the suspend nodes are causing you extra work? Can you detail on this please?

 

3 hours ago, Stefania Tarantino said:

I am not really sure what the max loop count is - I assumed that this is the max number of times a process is checked eg for if a category has been set yet.

Max loop count is a mechanism when you have a loop set in your workflow. Loops work with decision nodes where you have a decision branch connecting to a node prior to the decision. An example of a loop would be in the auto-closure sequence I described above where after the email reminder the workflow "loops back" to waiting for customer feedback. Max loop count specifies how many times a workflow can loop in a sequence, which is 1000.

Suspend nodes don't work with loops, there is no loop mechanism here, they are event based. This is where an even triggers an action, for the category example, the event/trigger is the analyst setting the category which resumes the workflow or the event/trigger can be the node expiry (if configured) which also would resume the workflow.

 

  • Thanks 1
Link to comment
Share on other sites

@Victor Thanks for the explanations above.

The extra work being caused goes back to my comments in the below post about the number of indefinitely suspended workflows which have accumulated and need analysing to understand the reason they are suspended. 
The suspended workflow screen is now regularly not responding which I think means it is timing out maybe due to the number of indefinitely suspended workflows in there.
Surely good housekeeping practice dictates that these workflows should be cleared out, rather than simply left there?

Also as per my earlier comments we need to resolve the issue of automated actions in the later stages of the BPM not being completed and ensure our reporting is accurate.

I think our process needs changing but I am not sure of the best way of doing this.

 

Link to comment
Share on other sites

@Stefania Tarantino

13 minutes ago, Stefania Tarantino said:

The extra work being caused goes back to my comments in the below post about the number of indefinitely suspended workflows which have accumulated and need analysing to understand the reason they are suspended. 

I understand, but I don't see how the product has caused extra work? The workflows have been configured to be suspended indefinitely and they have, I don't see any fault here...

13 minutes ago, Stefania Tarantino said:

Surely good housekeeping practice dictates that these workflows should be cleared out, rather than simply left there?

Sorry, I am not sure what you mean here... workflows left where?

13 minutes ago, Stefania Tarantino said:

Also as per my earlier comments we need to resolve the issue of automated actions in the later stages of the BPM not being completed and ensure our reporting is accurate.

13 minutes ago, Stefania Tarantino said:

I think our process needs changing but I am not sure of the best way of doing this.

You can reach out to customer success, I am certain they can arrange a session with our product specialists to review your current configuration and problematic areas that you noticed.

Link to comment
Share on other sites

@Stefania Tarantino running workflows(*) will be displayed in this list, which is the intended purpose here. This allows any Hornbill administrator to review running workflows or failed ones for example to identify issues, potential gaps, blockers, etc for any workflow configuration. We would not want to empty this list, if we do that, Hornbill admins or BP designers will have no access to running workflows and in your case, you will not be able to access any (old) suspended workflow to identify why they are suspended and correct and optimised this in future configurations. Why would you say it would be a good practice to empty this list? I am trying to understand this because emptying this list as the term implies, would only remove them from view, any problematic workflow will still exist in an instance but now you will not know about nor you will have access to it.

*we define a running workflow as any workflow that is not completed, failed or cancelled (this means a suspended workflow is a running workflow aka not completed)

Link to comment
Share on other sites

  • 7 months later...

Afternoon All,

I have read through this conversation about three times, and I am still none the wiser. Though I am experiencing the exact same behaviour as the OTP. I have been working through our Resolved tickets to see why they are not now set to Closed after the prescribed period, or in some instances immediately on completion. I have looked at the Suspended list and have found examples where it has stopped at the Suspend wait for Logging Category, further investigation on the examples revealed:

- The Tickets were not categorised;

- The User that actioned the Resolution does not have Access Control rights to override the Lock;

- The observation made previously where the override is allowed due to assignment seems also to be true in my examples. Specifically, assignment to an Owner, not necessarily assignment to a different team.

This really does need to be looked at as we will end up with potentially ever growing Resolved list, which is inappropriate for our, and probably most other organisation where there is an expectation for Tickets to move to a Closed state after an action or period of time.

Thanks

Osman

Link to comment
Share on other sites

@Osman as I explained above, based on my current understanding of how this is designed to work, the suspended workflows end up suspended indefinitely due to a workflow/request setup misconfiguration. If you believe the configuration is correct but for some (unknown) reason the workflow does not behave as designed or some other functionality does not work as intended (e.g. override locked actions), please use the advertised channels to raise a support request with our team for further investigation (https://www.hornbill.com/support/)

 

Link to comment
Share on other sites

  • 4 weeks later...
On 11/18/2022 at 10:21 AM, Estie said:

About a month ago we revoked a full access role from many of our analysts, which may have caused them to be able to override the lock resolve action.

The Service Desk Admin Role also provides this right (for all Request Types)
It may be worth checking whether this is a factor.

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...