Click or drag to resize

IDocumentContentExtractor Interface

Document content extractor interface.

Namespace: OpenDiscoverSDK.Interfaces.Extractors
Assembly: OpenDiscoverSDK.Interfaces (in OpenDiscoverSDK.Interfaces.dll) Version: 2025.4.4.0 (2025.4.4)
Syntax
C#
public interface IDocumentContentExtractor : IContentExtractor, 
	IDisposable

The IDocumentContentExtractor type exposes the following members.

Properties
 NameDescription
Public propertyContentExtractorType The derived, actual content extractor interface type.
(Inherited from IContentExtractor)
Public propertyLength Gets the document's length in bytes.
(Inherited from IContentExtractor)
Public propertySupportsChildrenExtraction If true, this content extractor supports attachment, embedded item, or container item extraction.
(Inherited from IContentExtractor)
Public propertySupportsContentHash Gets whether the specific document content extractor implementing this interface supports MD5ContentHash and SHA1ContentHash calculation.
Public propertySupportsDecryption If true, this content extractor supports decrypting password protected documents.
(Inherited from IContentExtractor)
Public propertySupportsMetadataExtraction If true, this content extractor supports metadata extraction.
(Inherited from IContentExtractor)
Public propertySupportsTextExtraction If true, this content extractor supports text extraction.
(Inherited from IContentExtractor)
Public propertyTag Allows user to associate an object with this content extractor.
Top
Methods
 NameDescription
Public methodDisposePerforms application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
(Inherited from IDisposable)
Public methodExtractContent Extracts the document's content.
Public methodOverrideContentExtractionSettings Allows for overriding the ContentExtractionSettings object used by a IContentExtractor instance that was returned by a call to OpenDiscoverSDK.ContentExtractorFactory.GetContentExtractor. See remarks for limitations.
(Inherited from IContentExtractor)
Top
Events
 NameDescription
Public eventContentExtractionHeartbeat Notification event that lets implementers of IContentExtractor know that content extraction is still under process. See remarks.
(Inherited from IContentExtractor)
Top
Remarks
This interfaces is used to extract content from documents (e.g., Microsoft Office, OpenDocument, PDF, XPS, email, etc.) that are NOT container types such as archives (ZIP, 7z, etc) or mailstores (e.g., PST, OST, MBOX, etc).
See Also