Click or drag to resize

ILargeEncodedTextExtractor Interface

"Large" encoded text file content extractor.

Namespace: OpenDiscoverSDK.Interfaces.Extractors
Assembly: OpenDiscoverSDK.Interfaces (in OpenDiscoverSDK.Interfaces.dll) Version: 2025.4.4.0 (2025.4.4)
Syntax
C#
public interface ILargeEncodedTextExtractor : IContentExtractor, 
	IDisposable

The ILargeEncodedTextExtractor type exposes the following members.

Properties
 NameDescription
Public propertyContentExtractorType The derived, actual content extractor interface type.
(Inherited from IContentExtractor)
Public propertyLength Gets the document's length in bytes.
(Inherited from IContentExtractor)
Public propertySupportsChildrenExtraction If true, this content extractor supports attachment, embedded item, or container item extraction.
(Inherited from IContentExtractor)
Public propertySupportsDecryption If true, this content extractor supports decrypting password protected documents.
(Inherited from IContentExtractor)
Public propertySupportsMetadataExtraction If true, this content extractor supports metadata extraction.
(Inherited from IContentExtractor)
Public propertySupportsTextExtraction If true, this content extractor supports text extraction.
(Inherited from IContentExtractor)
Top
Methods
 NameDescription
Public methodDisposePerforms application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
(Inherited from IDisposable)
Public methodExtractContent Extracts content from a "large" encoded text file and optionally writes encoded text contents of this file to the supplied stream as either UTF-16 or UTF-8 encoding (which unicode encoding depends on UseLargeDocumentUTF16Encoding).
Public methodOverrideContentExtractionSettings Allows for overriding the ContentExtractionSettings object used by a IContentExtractor instance that was returned by a call to OpenDiscoverSDK.ContentExtractorFactory.GetContentExtractor. See remarks for limitations.
(Inherited from IContentExtractor)
Top
Events
 NameDescription
Public eventContentExtractionHeartbeat Notification event that lets implementers of IContentExtractor know that content extraction is still under process. See remarks.
(Inherited from IContentExtractor)
Top
Remarks

"Large" is a subjective term defined by the LargeDocumentCritera property value.

This content extractor interface will not set the ExtractedText property due to the "large" size of the encoded text file.

The only content that this extractor extracts is MD5BinaryHash, SHA1BinaryHash, and SHA256BinaryHash hashes of the document. If ExtractContent(Stream)Stream argument 'textFileOutputStream' is not null, then this content extractor interface will also write the encoded text contents of this file to the supplied stream as either UTF-16 or UTF-8 encoding (which unicode encoding depends on UseLargeDocumentUTF16Encoding).

Writing the original encoded file to a new Stream (SHOULD be FileStream due to "large" size of file) is only really useful if file is not already in a easily indexable unicode encoding.

See Also