Click or drag to resize

DocumentDataArchiveReader Class

Reads all document records from an existing document data archive (.dda) into memory.
Inheritance Hierarchy
SystemObject
  OpenDiscoverSDK.Platform.ArchiveDocumentDataArchiveReader

Namespace: OpenDiscoverSDK.Platform.Archive
Assembly: OpenDiscoverSDK (in OpenDiscoverSDK.dll) Version: 2025.4.4.0 (2025.4.4)
Syntax
C#
public class DocumentDataArchiveReader : IDisposable

The DocumentDataArchiveReader type exposes the following members.

Constructors
 NameDescription
Public methodDocumentDataArchiveReader Constructor.
Top
Properties
 NameDescription
Public propertyClassificationCount Gets a dictionary that contains IdClassification as key and the count of documents that have that file format classification as values.
Public propertyContentResultCount Gets a dictionary that contains ContentResult as key and a ContentResultInfo as value.
Public propertyCreationDate Archive creation date (UTC).
Public propertyDirectoryHierarchy All document data in input directory hierarchy. The hierarchy also contains document parent/child relationships.
Public propertyDocumentArchiveFolderPath The root folder of the document data archive.
Public propertyDocumentByControlNumber Returns all documents by DocControlNumber provided that the Documents had DocControlNumber set.
Public propertyDocumentByDocGuid Gets a dictionary with DocGuid key and associated document value.
Public propertyEntityItemDocuments All documents with at least 1 entity item found in extracted text and/or metadata.
Public propertyExcludedDocuments All documents with Result set to ExcludedType.
Public propertyFlatRecords Gets all archive document entries as a flattened (non-hierarchival) list.
Public propertyFormatIdCount Gets a dictionary that contains Id as key and the count of documents that have that file format identification as values.
Public propertyHasReadErrors True if there were errors reading the document data archive (.dda).
Public propertyHierarchicalRecords Gets all document data archive entries with parent/child hierarchy.
Public propertyIssueDocuments All documents that do not have Result values set to either Ok, EmptyFile, ExcludedType, or RequeueAsSeparateTask
Public propertyLongRunningDocuments All documents that have Result values set to LongRunningProcessingError.
Public propertyNistDocuments All documents whose SHA1BinaryHash match a SHA1 hash in the NIST hash database (see PerformNistCheck and NistRdsDatabasePath).
Public propertyPdfDocumentsWithFailedPages All PDF documents with at least 1 failed PDF page.
Public propertyReaderMode The DocumentDataArchiveReaderMode of this instance.
Public propertyReadErrors If HasReadErrors is true, this property will hold read error information.
Public propertyRequeueDocuments All documents with Result set to either RequeueAsSeparateTask or UserRequeueAsSeparateTask.
Public propertySettings Task settings that were used to create this document data archive output.
Public propertySHA1BinaryHashMatchGroups Gets a list of HashMatchGroup that contain documents that have the same SHA1BinaryHash value.
Public propertySHA1ContentHashMatchGroups Gets a list of HashMatchGroup that contain documents that have the same SHA1ContentHash value.
Public propertyTotalFlatRecordSize Total size in bytes of all documents in FlatRecords.
Public propertyTotalNumOfDocumentRecords Total number of document records in document data archive.
Public propertyTotalSHA1BinaryHashMatches Gets total number of documents that have same SHA1BinaryHash.
Public propertyTotalSHA1ContentHashMatches Gets total number of documents that have same SHA1ContentHash.
Public propertyUnknownDocuments All documents with FormatId set to either Unknown or UnknownCompoundFile.
Public propertyVersion Archive format version.
Top
Methods
 NameDescription
Public methodDispose Dispose.
Public methodEqualsDetermines whether the specified object is equal to the current object.
(Inherited from Object)
Public methodGetDuplicateDocumentGroups Gets all duplicate document groups present in the document data archive (.dda).
Public methodGetHashCodeServes as the default hash function.
(Inherited from Object)
Public methodGetTypeGets the Type of the current instance.
(Inherited from Object)
Public methodReadDocumentFromControlNumberIndex Reads a document from the DocumentControlNumberIndex given by the 'docControlNumber' argument. This archive must have been constructed with ControlNumberIndexAndHeaderOnly mode, or else this method will throw an exception.
Public methodToStringReturns a string that represents the current object.
(Inherited from Object)
Top
Remarks

Dispose of this DocumentDataArchiveReader instance when done with it so that it releases an internal BinaryReader and all resources.

If document data archive (.dda) is too 'large' to read into memory or if user doesn't need the extra summary information or hierarchical document relationships returned by this class, then consider using class DDARecordReader.

To use this class, keep document processing job tasks to 3-5 GB in total input size so that whole document data archives can be read into memory. Large archives and mail stores should have their own tasks and if too 'large', they should be partitioned into smaller processing tasks (see IsPartitioned and TotalPartitions for more information).

A document data archive (always named "DocumentDataArchive.dda") holds extracted document data (metadata/attributes/etc) from processed documents. Documents stored in a document archive file may also have links to either external individual extracted text/attachment files or links to external text/attachment archive files that act as compact archive containers for this information.

See Also