Click or drag to resize

HtmlDocumentContent Class

Extracted HTML document content.
Inheritance Hierarchy
SystemObject
  OpenDiscoverSDK.Interfaces.ContentDocumentContent
    OpenDiscoverSDK.Interfaces.ContentHtmlDocumentContent

Namespace: OpenDiscoverSDK.Interfaces.Content
Assembly: OpenDiscoverSDK.Interfaces (in OpenDiscoverSDK.Interfaces.dll) Version: 2025.4.4.0 (2025.4.4)
Syntax
C#
[DataContractAttribute]
public class HtmlDocumentContent : DocumentContent

The HtmlDocumentContent type exposes the following members.

Constructors
 NameDescription
Public methodHtmlDocumentContent Constructor.
Public methodHtmlDocumentContent(IdResult) Constructor.
Top
Properties
 NameDescription
Public propertyAttributes Document attributes. See DocumentAttributes for an enumeration of supported attributes.
(Inherited from DocumentContent)
Public propertyBaseTarget The HTML "base" element tag specifies the base URL/target for all relative URLs in a HTML document. This property holds the 'target' attribute value of the "base" tag, if it exists. This value is null if not found in document.
Public propertyBaseUrl The HTML "base" element tag specifies the base URL/target for all relative URLs in a HTML document. These property holds the URL ('href') attribute value of the "base" tag, if it exists. This value is null if not found in document.
Public propertyChildDocuments Child documents (attachments/embedded items). See remarks for the special cases of archives (.7z, zip, etc), media images, and mail stores (.pst, .ost, .mbox, etc.).
(Inherited from DocumentContent)
Public propertyCustomMetadata Contains custom (user-defined) document metadata as a dictionary of metadata field names as keys and metadata field data as corresponding values.
(Inherited from DocumentContent)
Public propertyEntityExtractionResult Document entity item extraction result.
(Inherited from DocumentContent)
Public propertyErrorMessage Gets or sets an error message associated with Result. This property is only set when Result is not set to Ok.
(Inherited from DocumentContent)
Public propertyErrorStackTrace Error (exception) stack trace associated with ErrorMessage. This property is only set when Result is not Ok and if an internal exception was caught.
(Inherited from DocumentContent)
Public propertyExtractedText Extracted text, see remarks for limitations.
(Inherited from DocumentContent)
Public propertyFileEntropy Shannon entropy of the document's bytes.
(Inherited from DocumentContent)
Public propertyFormatId Document format identification result from prior file identification (this object value was an input to content extractor factory and stored here for convenience).
(Inherited from DocumentContent)
Public propertyHyperLinks Document hyperlinks.
(Inherited from DocumentContent)
Public propertyImageTags HTML 'img' tag information.
Public propertyIsEmailType If true, this document is an email document. This DocumentContent object should be cast to a EmailDocumentContent to get additional email document specific properties.
(Inherited from DocumentContent)
Public propertyIsEncrypted Document is encrypted if this property is true.
(Inherited from DocumentContent)
Public propertyIsHtmlType If true, document is an HTML document. This DocumentContent object should be cast to a HtmlDocumentContent to get additional HTML document specific properties.
(Inherited from DocumentContent)
Public propertyIsPdfType If true, document is an PDF document. This DocumentContent object should be cast to a PdfDocumentContent to get additional PDF document specific properties.
(Inherited from DocumentContent)
Public propertyLanguageIdResults Extracted text language identification results.
(Inherited from DocumentContent)
Public propertyMD5BinaryHash MD5 binary document hash (hash of all document bytes).
(Inherited from DocumentContent)
Public propertyMD5ContentHash MD5 content hash is a proprietary hash on only the content of a document file format.
(Inherited from DocumentContent)
Public propertyMetadata Contains standard (non-user-defined) document metadata as a dictionary of metadata field names as keys and metadata field data as corresponding values.
(Inherited from DocumentContent)
Public propertyPassword The password found to decrypt the document by cycling through supplied password list.
(Inherited from DocumentContent)
Public propertyResult Gets or sets the result of the content extraction. Check this value to see if content extraction was successful.
(Inherited from DocumentContent)
Public propertySHA1BinaryHash SHA-1 binary document hash (hash of all document bytes).
(Inherited from DocumentContent)
Public propertySHA1ContentHash SHA-1 content hash is a proprietary hash on only the content part of document file format.
(Inherited from DocumentContent)
Public propertySHA256BinaryHash SHA-256 binary document hash (hash of all document bytes).
(Inherited from DocumentContent)
Public propertySHA256ContentHash SHA-256 content hash is a proprietary hash on only the content part of document file format.
(Inherited from DocumentContent)
Public propertyTextSourceType Gets or sets the method of the acquired document text (if any).
(Inherited from DocumentContent)
Public propertyTitle The HTML "title" element text. This value is null if not defined in document.
Top
Methods
 NameDescription
Public methodEqualsDetermines whether the specified object is equal to the current object.
(Inherited from Object)
Public methodGetHashCodeServes as the default hash function.
(Inherited from Object)
Public methodGetTypeGets the Type of the current instance.
(Inherited from Object)
Public methodToStringReturns a string that represents the current object.
(Inherited from Object)
Top
See Also