Document Content Class
Represents extracted document content.
Definition
Namespace: OpenDiscoverSDK.Interfaces.Content
Assembly: OpenDiscoverSDK.Interfaces (in OpenDiscoverSDK.Interfaces.dll) Version: 2026.2.6.0 (2026.02.06)
Assembly: OpenDiscoverSDK.Interfaces (in OpenDiscoverSDK.Interfaces.dll) Version: 2026.2.6.0 (2026.02.06)
C#
[DataContractAttribute]
[KnownTypeAttribute(typeof(EmailDocumentContent))]
[KnownTypeAttribute(typeof(PdfDocumentContent))]
[KnownTypeAttribute(typeof(HtmlDocumentContent))]
[KnownTypeAttribute(typeof(ArchiveContent))]
[KnownTypeAttribute(typeof(MailStoreContent))]
[KnownTypeAttribute(typeof(DatabaseContent))]
[KnownTypeAttribute(typeof(BooleanProperty))]
[KnownTypeAttribute(typeof(DateTimeProperty))]
[KnownTypeAttribute(typeof(DoubleProperty))]
[KnownTypeAttribute(typeof(Int32Property))]
[KnownTypeAttribute(typeof(Int64Property))]
[KnownTypeAttribute(typeof(StringProperty))]
[KnownTypeAttribute(typeof(BooleanListProperty))]
[KnownTypeAttribute(typeof(DateTimeListProperty))]
[KnownTypeAttribute(typeof(DoubleListProperty))]
[KnownTypeAttribute(typeof(Int32ListProperty))]
[KnownTypeAttribute(typeof(Int64ListProperty))]
[KnownTypeAttribute(typeof(StringListProperty))]
public class DocumentContent- Inheritance
- Object DocumentContent
- Derived
Remarks
This class is also the base class for special document classes EmailDocumentContent, HtmlDocumentContent,
PdfDocumentContent, ArchiveContent, MailStoreContent, and DatabaseContent.
These derived content class types have additional extracted content associated with them.
Constructors
| DocumentContent | Default constructor. |
| DocumentContent(IdResult) | Constructor. |
Properties
| Attributes | Document attributes. See DocumentAttributes for an enumeration of supported attributes. |
| ChildDocuments | Child documents (attachments/embedded items). See remarks for the special cases of archives (.7z, zip, etc), media images, and mail stores (.pst, .ost, .mbox, etc.). |
| CustomMetadata | Contains custom (user-defined) document metadata as a dictionary of metadata field names as keys and metadata field data as corresponding values. |
| EntityExtractionResult | Document entity item extraction result. |
| ErrorMessage | Gets or sets an error message associated with Result. This property is only set when Result is not set to Ok. |
| ErrorStackTrace | Error (exception) stack trace associated with ErrorMessage. This property is only set when Result is not Ok and if an internal exception was caught. |
| ExtractedText | Extracted text, see remarks for limitations. |
| FileEntropy | Shannon entropy of the document's bytes. |
| FormatId | Document format identification result from prior file identification (this object value was an input to content extractor factory and stored here for convenience). |
| HyperLinks | Document hyperlinks. |
| IsEmailType | If true, this document is an email document. This DocumentContent object should be cast to a EmailDocumentContent to get additional email document specific properties. |
| IsEncrypted | Document is encrypted if this property is true. |
| IsHtmlType | If true, document is an HTML document. This DocumentContent object should be cast to a HtmlDocumentContent to get additional HTML document specific properties. |
| IsPdfType | If true, document is an PDF document. This DocumentContent object should be cast to a PdfDocumentContent to get additional PDF document specific properties. |
| LanguageIdResults | Extracted text language identification results. |
| MD5BinaryHash | MD5 binary document hash (hash of all document bytes). |
| MD5ContentHash | MD5 content hash is a proprietary hash on only the content of a document file format. |
| Metadata | Contains standard (non-user-defined) document metadata as a dictionary of metadata field names as keys and metadata field data as corresponding values. |
| Password | The password found to decrypt the document by cycling through supplied password list. |
| Result | Gets or sets the result of the content extraction. Check this value to see if content extraction was successful. |
| SHA1BinaryHash | SHA-1 binary document hash (hash of all document bytes). |
| SHA1ContentHash | SHA-1 content hash is a proprietary hash on only the content part of document file format. |
| SHA256BinaryHash | SHA-256 binary document hash (hash of all document bytes). |
| SHA256ContentHash | SHA-256 content hash is a proprietary hash on only the content part of document file format. |
| TextSourceType | Gets or sets the method of the acquired document text (if any). |
Methods
| Equals | Determines whether the specified object is equal to the current object. (Inherited from Object) |
| GetHashCode | Serves as the default hash function. (Inherited from Object) |
| GetType | Gets the Type of the current instance. (Inherited from Object) |
| ToString | Returns a string that represents the current object. (Inherited from Object) |