DocumentTaskEngine Class

Provides functionality to extract content from hundreds to thousands of documents as a single task (see DocumentTaskSettings), or from "large" archives and mail store containers that deserve their own separate tasks. The DocumentTaskEngine is a highly parallel document extraction engine that completely unrolls and processes deep parent document/child document (attachments/embedded objects/media) hierarchies.

Definition

Namespace: OpenDiscoverSDK.Platform
Assembly: OpenDiscoverSDK (in OpenDiscoverSDK.dll) Version: 2026.2.6.0 (2026.02.06)
C#
public class DocumentTaskEngine
Inheritance
Object    DocumentTaskEngine

Remarks

See methods RunTask and RunTaskBlocking, these methods provide highly parallel and very fast processing of batches of documents and processing of "large" archives or mail stores as a single task. Large archives and mail stores can also be broken into multiple separate tasks, see IsPartitioned.

When processing hundreds to thousands of documents as a single task, these documents should not add up to more than 4-5 gigabytes in combined file size, or else the outputted document data archive (.dda) could become too large to read into memory (see DocumentDataArchiveReader). If processing > 10 gigabytes of documents, break the documents into 2 to 4 gigabytes sized DocumentTaskSettings tasks and for "large" archive and mail stores ("large" being a subjective term) create separate tasks for these 'large' archives and mail stores using SingleArchive or SingleMailStore processing type, respectively.

For archives, be aware of the expansion size before deciding how to process. Archives can have very high compression ratios, for example, a 500 MB sized archive could expand into 50 GB worth of files. It is wise to test archives for true expansion size before expanding/extracting.

Breaking large document processings sets into separate tasks aides in distribution across multiple desktops or VMs and also aides in re-queuing of any failed task(s).

Note: DocumentTaskEngine can handle long file paths (>255 characters in length) for input documents.

Constructors

DocumentTaskEngine Constructor.

Properties

CurrentNumberOfArchivesInProcess If a task is running, returns the current number of archives (Zip/7zip/Tar/Rar/etc) currently being processed.
CurrentNumberOfDatabasesInProcess If a task is running, returns the current number of databases currently being processed.
CurrentNumberOfLargeDocumentsInProcess If a task is running, returns the current number of "large" documents currently being processed. "Large" documents are defined by LargeDocumentCritera.
CurrentNumberOfMailStoresInProcess If a task is running, returns the current number of mail stores (PST/OST/MBOX/etc) currently being processed.
DocumentMetadataToFileStoreQueueCount If a task is running, returns the current number of documents waiting have their extracted metadata written to file store. This is the last step for a processed document.
EmbeddedAndTextToFileStoreQueueCount If a task is running, returns the current number of extracted embedded documents and documents with extracted text waiting to be written to file store.
ExtractedDocumentQueueCount If a task is running, returns the current number of extracted documents waiting to be processed.
InputDocumentQueueCount If a task is running, returns the current number of documents waiting to be read for processing.
IsFileStoreWriterComplete Returns true if a task is currently running and the task has completed writing all attachments and extracted text to ar or flat files; false if still busy or a task is not running.
IsolateCorruptDocument RESERVED - DO NOT USE OR SET PROPERTY. Reserved for internal testing.
IsProcessingDocumentsComplete Returns true if a task is currently running and the task has completed processing all documents; false if still busy or a task is not running.
IsReadingDocumentComplete Returns true if a task is currently running and the task has completed reading all the input documents; false if still has documents to read or a task is not running.
IsTaskRunning True if a task is currently being executed via a previous call to method RunTask.
LargeDocumentQueueCount If a task is running, returns the current number of "large" documents that require special processing that are waiting to be processed. "Large" documents are defined by LargeDocumentCritera.
NumInputDocuments If a task is running, returns the total number of input documents to be processed for this task.
ProcessedDocuments If task is currently running, returns null. If task has finished running this property returns the input document hierarchy, that is extracted children documents (embedded, attachments, and container items) are populated (if any) in ChildDocuments.
ReadDocumentQueueCount If a task is running, returns the current number of documents read and waiting to be processed.
TaskPercentComplete If a task is running, returns the task's estimated percent complete (0-100).
TotalArchivesProcessed If a task is running, returns the total number of archives (Zip/7zip/Tar/Rar/etc) that have currently been fully processed.
TotalDatabasesProcessed If a task is running, returns the total number of databases that have currently been fully processed.
TotalDocumentsProcessed If a task is running, returns the total number of documents that have currently been fully processed (includes extracted embedded and container items).
TotalInputDocumentsProcessed If a task is running, returns the total number of input documents to task that have currently been fully processed (does NOT include extracted embedded and container items).
TotalMailStoresProcessed If a task is running, returns the total number of mail stores (PST/OST/MBOX/etc) that have currently been fully processed.

Methods

AbortTask Aborts the currently executing task started by RunTask or RunTaskBlocking. Aborting may cause the host to crash so should only be used to stop a rogue, or long running document task. Any cleanup, database updating, and task scheduler notifications should be done prior to calling this method.
EqualsDetermines whether the specified object is equal to the current object.
(Inherited from Object)
GetHashCodeServes as the default hash function.
(Inherited from Object)
GetTypeGets the Type of the current instance.
(Inherited from Object)
RunTask Asynchronously executes the document task defined by the constructor DocumentTaskSettings argument.
RunTaskBlocking Executes the document task defined by the constructor DocumentTaskSettings argument synchronously (blocking).
ToStringReturns a string that represents the current object.
(Inherited from Object)

Events

Completed Task is completed event.
FatalException Fatal exception event.
LogUpdated Task log updated event.
LongProcessingDocumentWarning Long processing document warning event.

See Also