DocumentContent Class

Represents extracted document content.

Definition

Namespace: OpenDiscoverSDK.Interfaces.Content
Assembly: OpenDiscoverSDK.Interfaces (in OpenDiscoverSDK.Interfaces.dll) Version: 2026.2.6.0 (2026.02.06)
C#
[DataContractAttribute]
[KnownTypeAttribute(typeof(EmailDocumentContent))]
[KnownTypeAttribute(typeof(PdfDocumentContent))]
[KnownTypeAttribute(typeof(HtmlDocumentContent))]
[KnownTypeAttribute(typeof(ArchiveContent))]
[KnownTypeAttribute(typeof(MailStoreContent))]
[KnownTypeAttribute(typeof(DatabaseContent))]
[KnownTypeAttribute(typeof(BooleanProperty))]
[KnownTypeAttribute(typeof(DateTimeProperty))]
[KnownTypeAttribute(typeof(DoubleProperty))]
[KnownTypeAttribute(typeof(Int32Property))]
[KnownTypeAttribute(typeof(Int64Property))]
[KnownTypeAttribute(typeof(StringProperty))]
[KnownTypeAttribute(typeof(BooleanListProperty))]
[KnownTypeAttribute(typeof(DateTimeListProperty))]
[KnownTypeAttribute(typeof(DoubleListProperty))]
[KnownTypeAttribute(typeof(Int32ListProperty))]
[KnownTypeAttribute(typeof(Int64ListProperty))]
[KnownTypeAttribute(typeof(StringListProperty))]
public class DocumentContent
Inheritance
Object    DocumentContent
Derived
More

Remarks

This class is also the base class for special document classes EmailDocumentContent, HtmlDocumentContent, PdfDocumentContent, ArchiveContent, MailStoreContent, and DatabaseContent. These derived content class types have additional extracted content associated with them.

Constructors

DocumentContent Default constructor.
DocumentContent(IdResult) Constructor.

Properties

Attributes Document attributes. See DocumentAttributes for an enumeration of supported attributes.
ChildDocuments Child documents (attachments/embedded items). See remarks for the special cases of archives (.7z, zip, etc), media images, and mail stores (.pst, .ost, .mbox, etc.).
CustomMetadata Contains custom (user-defined) document metadata as a dictionary of metadata field names as keys and metadata field data as corresponding values.
EntityExtractionResult Document entity item extraction result.
ErrorMessage Gets or sets an error message associated with Result. This property is only set when Result is not set to Ok.
ErrorStackTrace Error (exception) stack trace associated with ErrorMessage. This property is only set when Result is not Ok and if an internal exception was caught.
ExtractedText Extracted text, see remarks for limitations.
FileEntropy Shannon entropy of the document's bytes.
FormatId Document format identification result from prior file identification (this object value was an input to content extractor factory and stored here for convenience).
HyperLinks Document hyperlinks.
IsEmailType If true, this document is an email document. This DocumentContent object should be cast to a EmailDocumentContent to get additional email document specific properties.
IsEncrypted Document is encrypted if this property is true.
IsHtmlType If true, document is an HTML document. This DocumentContent object should be cast to a HtmlDocumentContent to get additional HTML document specific properties.
IsPdfType If true, document is an PDF document. This DocumentContent object should be cast to a PdfDocumentContent to get additional PDF document specific properties.
LanguageIdResults Extracted text language identification results.
MD5BinaryHash MD5 binary document hash (hash of all document bytes).
MD5ContentHash MD5 content hash is a proprietary hash on only the content of a document file format.
Metadata Contains standard (non-user-defined) document metadata as a dictionary of metadata field names as keys and metadata field data as corresponding values.
Password The password found to decrypt the document by cycling through supplied password list.
Result Gets or sets the result of the content extraction. Check this value to see if content extraction was successful.
SHA1BinaryHash SHA-1 binary document hash (hash of all document bytes).
SHA1ContentHash SHA-1 content hash is a proprietary hash on only the content part of document file format.
SHA256BinaryHash SHA-256 binary document hash (hash of all document bytes).
SHA256ContentHash SHA-256 content hash is a proprietary hash on only the content part of document file format.
TextSourceType Gets or sets the method of the acquired document text (if any).

Methods

EqualsDetermines whether the specified object is equal to the current object.
(Inherited from Object)
GetHashCodeServes as the default hash function.
(Inherited from Object)
GetTypeGets the Type of the current instance.
(Inherited from Object)
ToStringReturns a string that represents the current object.
(Inherited from Object)

See Also