Here’s a common situation that could cause problems when pausing jobs and a tip to prevent it.
When it comes to pausing jobs you should know the possible impact this could create with deduplication. In general, pausing a job will not impact anything with the job, but this can change depending on other jobs that are running or will be run. Here is an example of something that can happen:
We start a job called Job A. Then we pause Job A and start another job (Job B) in the same custodian.
We are de-duplicating at the custodian level.
Let’s say Job A has processed 10 files before it was paused, one of those files was found in Job B > Job B will list the item as a duplicate of the item in Job A.
Let’s let Job B finish and unpause Job A and let it finish
The last item in Job A was a duplicate of an item in Job B > Job A will deduplicate it out, keeping the item from Job B
In this case, both jobs have an item that was deduplicated from the other. Whichever job discovers the item first will be the reference, and all other jobs will mark their items as duplicates, even if one job finishes before the other or if a job is paused.
Let’s go over another scenario:
Let’s start Job A and let it get through the same 10 documents again, then pause it
Now we will start Job B and let it finish. One of Job B’s items is a duplicate of one of the items in Job A that is done with processing and gets deduplicated
Now we delete Job A. What happens to the item in Job B? It still is marked as a duplicate, and will still be deduplicated.
Now we are in a problem where an item is getting marked as a duplicate despite the original job being deleted. Normally if we delete Job A before starting Job B, this wouldn’t cause a problem, but because the jobs were active at the same time, the jobs are now “linked” by their duplicate items, and deleting one job means potentially re-running the other.
Knowing this ahead of time in the event you need to pause a job will ensure nothing important to your review gets missed.