DocumentDataArchiveReader Class

Reads all document records from an existing document data archive (.dda) into memory.

Inheritance Hierarchy

SystemObject
OpenDiscoverSDK.Platform.ArchiveDocumentDataArchiveReader

Namespace: OpenDiscoverSDK.Platform.Archive
Assembly: OpenDiscoverSDK (in OpenDiscoverSDK.dll) Version: 2025.4.6.0 (2025.4.6)

Syntax

Copy

public class DocumentDataArchiveReader : IDisposable

The DocumentDataArchiveReader type exposes the following members.

Constructors

	Name	Description
	DocumentDataArchiveReader	Constructor.

Top

Properties

	Name	Description
	ClassificationCount	Gets a dictionary that contains IdClassification as key and the count of documents that have that file format classification as values.
	ContentResultCount	Gets a dictionary that contains ContentResult as key and a ContentResultInfo as value.
	CreationDate	Archive creation date (UTC).
	DirectoryHierarchy	All document data in input directory hierarchy. The hierarchy also contains document parent/child relationships.
	DocumentArchiveFolderPath	The root folder of the document data archive.
	DocumentByControlNumber	Returns all documents by DocControlNumber provided that the Documents had DocControlNumber set.
	DocumentByDocGuid	Gets a dictionary with DocGuid key and associated document value.
	EntityItemDocuments	All documents with at least 1 entity item found in extracted text and/or metadata.
	ExcludedDocuments	All documents with Result set to ExcludedType.
	FlatRecords	Gets all archive document entries as a flattened (non-hierarchival) list.
	FormatIdCount	Gets a dictionary that contains Id as key and the count of documents that have that file format identification as values.
	HasReadErrors	True if there were errors reading the document data archive (.dda).
	HierarchicalRecords	Gets all document data archive entries with parent/child hierarchy.
	IssueDocuments	All documents that do not have Result values set to either Ok, EmptyFile, ExcludedType, or RequeueAsSeparateTask
	LongRunningDocuments	All documents that have Result values set to LongRunningProcessingError.
	NistDocuments	All documents whose SHA1BinaryHash match a SHA1 hash in the NIST hash database (see PerformNistCheck and NistRdsDatabasePath).
	PdfDocumentsWithFailedPages	All PDF documents with at least 1 failed PDF page.
	ReaderMode	The DocumentDataArchiveReaderMode of this instance.
	ReadErrors	If HasReadErrors is true, this property will hold read error information.
	RequeueDocuments	All documents with Result set to either RequeueAsSeparateTask or UserRequeueAsSeparateTask.
	Settings	Task settings that were used to create this document data archive output.
	SHA1BinaryHashMatchGroups	Gets a list of HashMatchGroup that contain documents that have the same SHA1BinaryHash value.
	SHA1ContentHashMatchGroups	Gets a list of HashMatchGroup that contain documents that have the same SHA1ContentHash value.
	TotalFlatRecordSize	Total size in bytes of all documents in FlatRecords.
	TotalNumOfDocumentRecords	Total number of document records in document data archive.
	TotalSHA1BinaryHashMatches	Gets total number of documents that have same SHA1BinaryHash.
	TotalSHA1ContentHashMatches	Gets total number of documents that have same SHA1ContentHash.
	UnknownDocuments	All documents with FormatId set to either Unknown or UnknownCompoundFile.
	Version	Archive format version.

Top

Methods

	Name	Description
	Dispose	Dispose.
	Equals	Determines whether the specified object is equal to the current object. (Inherited from Object)
	GetDuplicateDocumentGroups	Gets all duplicate document groups present in the document data archive (.dda).
	GetHashCode	Serves as the default hash function. (Inherited from Object)
	GetType	Gets the Type of the current instance. (Inherited from Object)
	ReadDocumentFromControlNumberIndex	Reads a document from the DocumentControlNumberIndex given by the 'docControlNumber' argument. This archive must have been constructed with ControlNumberIndexAndHeaderOnly mode, or else this method will throw an exception.
	ToString	Returns a string that represents the current object. (Inherited from Object)

Top

Remarks

Dispose of this DocumentDataArchiveReader instance when done with it so that it releases an internal BinaryReader and all resources.

If document data archive (.dda) is too 'large' to read into memory or if user doesn't need the extra summary information or hierarchical document relationships returned by this class, then consider using class DDARecordReader.

To use this class, keep document processing job tasks to 3-5 GB in total input size so that whole document data archives can be read into memory. Large archives and mail stores should have their own tasks and if too 'large', they should be partitioned into smaller processing tasks (see IsPartitioned and TotalPartitions for more information).

A document data archive (always named "DocumentDataArchive.dda") holds extracted document data (metadata/attributes/etc) from processed documents. Documents stored in a document archive file may also have links to either external individual extracted text/attachment files or links to external text/attachment archive files that act as compact archive containers for this information.

Reference

OpenDiscoverSDK.Platform.Archive Namespace