Document |
public class DocumentTaskEngine
The DocumentTaskEngine type exposes the following members.
| Name | Description | |
|---|---|---|
| DocumentTaskEngine | Constructor. |
| Name | Description | |
|---|---|---|
| CurrentNumberOfArchivesInProcess | If a task is running, returns the current number of archives (Zip/7zip/Tar/Rar/etc) currently being processed. | |
| CurrentNumberOfDatabasesInProcess | If a task is running, returns the current number of databases currently being processed. | |
| CurrentNumberOfLargeDocumentsInProcess | If a task is running, returns the current number of "large" documents currently being processed. "Large" documents are defined by LargeDocumentCritera. | |
| CurrentNumberOfMailStoresInProcess | If a task is running, returns the current number of mail stores (PST/OST/MBOX/etc) currently being processed. | |
| DocumentMetadataToFileStoreQueueCount | If a task is running, returns the current number of documents waiting have their extracted metadata written to file store. This is the last step for a processed document. | |
| EmbeddedAndTextToFileStoreQueueCount | If a task is running, returns the current number of extracted embedded documents and documents with extracted text waiting to be written to file store. | |
| ExtractedDocumentQueueCount | If a task is running, returns the current number of extracted documents waiting to be processed. | |
| InputDocumentQueueCount | If a task is running, returns the current number of documents waiting to be read for processing. | |
| IsFileStoreWriterComplete | Returns true if a task is currently running and the task has completed writing all attachments and extracted text to ar or flat files; false if still busy or a task is not running. | |
| IsolateCorruptDocument | RESERVED - DO NOT USE OR SET PROPERTY. Reserved for internal testing. | |
| IsProcessingDocumentsComplete | Returns true if a task is currently running and the task has completed processing all documents; false if still busy or a task is not running. | |
| IsReadingDocumentComplete | Returns true if a task is currently running and the task has completed reading all the input documents; false if still has documents to read or a task is not running. | |
| IsTaskRunning | True if a task is currently being executed via a previous call to method RunTask. | |
| LargeDocumentQueueCount | If a task is running, returns the current number of "large" documents that require special processing that are waiting to be processed. "Large" documents are defined by LargeDocumentCritera. | |
| NumInputDocuments | If a task is running, returns the total number of input documents to be processed for this task. | |
| ProcessedDocuments | If task is currently running, returns null. If task has finished running this property returns the input document hierarchy, that is extracted children documents (embedded, attachments, and container items) are populated (if any) in ChildDocuments. | |
| ReadDocumentQueueCount | If a task is running, returns the current number of documents read and waiting to be processed. | |
| TaskPercentComplete | If a task is running, returns the task's estimated percent complete (0-100). | |
| TotalArchivesProcessed | If a task is running, returns the total number of archives (Zip/7zip/Tar/Rar/etc) that have currently been fully processed. | |
| TotalDatabasesProcessed | If a task is running, returns the total number of databases that have currently been fully processed. | |
| TotalDocumentsProcessed | If a task is running, returns the total number of documents that have currently been fully processed (includes extracted embedded and container items). | |
| TotalInputDocumentsProcessed | If a task is running, returns the total number of input documents to task that have currently been fully processed (does NOT include extracted embedded and container items). | |
| TotalMailStoresProcessed | If a task is running, returns the total number of mail stores (PST/OST/MBOX/etc) that have currently been fully processed. |
| Name | Description | |
|---|---|---|
| AbortTask | Aborts the currently executing task started by RunTask or RunTaskBlocking. Aborting may cause the host to crash so should only be used to stop a rogue, or long running document task. Any cleanup, database updating, and task scheduler notifications should be done prior to calling this method. | |
| CreateNistRdsDatabase | Creates a NIST National Software Reference Library (NSRL) Reference Data Set (RDS) database that can be used by DocumentTaskEngine to de-NIST documents while processing (see PerformNistCheck and NistRdsDatabasePath). | |
| Equals | Determines whether the specified object is equal to the current object. (Inherited from Object) | |
| GetHashCode | Serves as the default hash function. (Inherited from Object) | |
| GetType | Gets the Type of the current instance. (Inherited from Object) | |
| RunTask | Asynchronously executes the document task defined by the constructor DocumentTaskSettings argument. | |
| RunTaskBlocking | Executes the document task defined by the constructor DocumentTaskSettings argument synchronously (blocking). | |
| ToString | Returns a string that represents the current object. (Inherited from Object) |
| Name | Description | |
|---|---|---|
| Completed | Task is completed event. | |
| FatalException | Fatal exception event. | |
| LogUpdated | Task log updated event. | |
| LongProcessingDocumentWarning | Long processing document warning event. |
See methods RunTask and RunTaskBlocking, these methods provide highly parallel and very fast processing of batches of documents and processing of "large" archives or mail stores as a single task. Large archives and mail stores can also be broken into multiple separate tasks, see IsPartitioned.
When processing hundreds to thousands of documents as a single task, these documents should not add up to more than 4-5 gigabytes in combined file size, or else the outputted document data archive (.dda) could become too large to read into memory (see DocumentDataArchiveReader). If processing > 10 gigabytes of documents, break the documents into 2 to 4 gigabytes sized DocumentTaskSettings tasks and for "large" archive and mail stores ("large" being a subjective term) create separate tasks for these 'large' archives and mail stores using SingleArchive or SingleMailStore processing type, respectively.
For archives, be aware of the expansion size before deciding how to process. Archives can have very high compression ratios, for example, a 500 MB sized archive could expand into 50 GB worth of files. It is wise to test archives for true expansion size before expanding/extracting.
Breaking large document processings sets into separate tasks aides in distribution across multiple desktops or VMs and also aides in re-queuing of any failed task(s).
Note: DocumentTaskEngine can handle long file paths (>255 characters in length) for input documents.