Click or drag to resize

UnsupportedFilterType Enumeration

Unsupported document filtering type.

Namespace: OpenDiscoverSDK.Interfaces.Settings
Assembly: OpenDiscoverSDK.Interfaces (in OpenDiscoverSDK.Interfaces.dll) Version: 2025.4.4.0 (2025.4.4)
Syntax
C#
[DataContractAttribute]
public enum UnsupportedFilterType
Members
Member nameValueDescription
None0 No binary-to-text filtering. For this enum value, the SDK API method "ContentExtratorFactory.GetContentExtractor" will NOT return a content extractor interface for unsupported format types.
Unsupported1

Perform binary-to-text filtering on unsupported/unknown document formats to extract text. If unsupported format is encrypted it will not be filtered (see UnsupportedAndEncrypted).

The binary-to-text filtering algorithm will attempt to extract as much UTF8, UTF-16LE (Latin languages only), and code page 1252 encoded text from the documents binary using a proprietary filtering algorithm. In many cases, useful text for indexing or searching can be extracted from unknown/corrupted/unsupported file formats using binary-to-text filtering.

For this enum value, the SDK API method "ContentExtratorFactory.GetContentExtractor" will either return a IUnsupportedExtractor or ILargeUnsupportedExtractor interface depending on the value of property LargeDocumentCritera and the document's file size.
UnsupportedAndEncrypted2

Perform binary-to-text filtering on unknown/unsupported document formats to get extracted text - even if unsupported format is identified as being encrypted.

For encrypted document formats, no meaningful text can be extracted via binary-to-text filtering unless internal parts of the document happen to reside in unencrypted regions (if any) of the document format. For encrypted formats, the utility of this enum value setting is mainly for document forensic analysis and not text extraction for the purpose of indexing/searching. Unless doing document forensic analysis, it is recommened for user to use Unsupported instead.

For this enum value, the SDK API method "ContentExtratorFactory.GetContentExtractor" will either return a IUnsupportedExtractor or ILargeUnsupportedExtractor interface depending on the value of property LargeDocumentCritera and the document's file size.

Remarks

Unsupported, unknown, and corrupted documents can have text extracted via a proprietary binary-to-text extraction algorithm.

The binary-to-text filtering algorithm will attempt to extract as much UTF8, UTF-16LE (latin languages only), and code page 1252 encoded text from the documents binary using a proprietary filtering algorithm. In many cases, useful text for indexing or searching can be extracted from unknown/corrupted/unsupported file formats using binary-to-text filtering.

See Also