Click or drag to resize

ContentExtractionSettings Class

Main document content extraction settings class.
Inheritance Hierarchy
SystemObject
  OpenDiscoverSDK.Interfaces.SettingsContentExtractionSettings
    OpenDiscoverSDK.Interfaces.Platform.SettingsDocumentTaskSettings

Namespace: OpenDiscoverSDK.Interfaces.Settings
Assembly: OpenDiscoverSDK.Interfaces (in OpenDiscoverSDK.Interfaces.dll) Version: 2025.4.4.0 (2025.4.4)
Syntax
C#
[DataContractAttribute]
public class ContentExtractionSettings

The ContentExtractionSettings type exposes the following members.

Constructors
 NameDescription
Public methodContentExtractionSettings Constructor.
Top
Properties
 NameDescription
Public propertyEmbeddedObjectExtraction Embedded document/attachment and embedded office media extraction setting.
Public propertyEntityExtractionSettings Options for entity extraction in extracted text, metadata, and URLs.
Public propertyExtractionType Text and metadata extraction setting.
Public propertyExtractOfficeTrackedChanges If true, appends tracked change information/text from office document formats (that support tracked changes) to the end of the document's extracted text; otherwise, tracked changes text is not appended to document's extracted text.
Public propertyHashing Document hashing settings.
Public propertyLanguageId Language identification of extracted text settings.
Public propertyLargeDocumentCritera Defines the "large" document criteria, in bytes, that determines what type of content extractor is returned by the content extractor factory for "large" unknown/unsupported formats and also "large" encoded text based formats.
Public propertyPdfDocument PDF document extraction settings.
Public propertyTimeZoneAndEmail Settings for document collection time zone and related extracted DateTime metadata and email extracted text DateTime display.
Public propertyUnsupportedFiltering Binary-to-text filtering of unsupported/unknown document file format settings.
Public propertyUseLargeDocumentUTF16Encoding Determines if UTF-16 or UTF-8 encoding is used when writing the 'large' (see LargeDocumentCritera) unknown/unsupported format binary-to-text extracted text or to re-encode a 'large' encoded text file to the provided Stream.
Top
Methods
 NameDescription
Public methodEqualsDetermines whether the specified object is equal to the current object.
(Inherited from Object)
Public methodGetHashCodeServes as the default hash function.
(Inherited from Object)
Public methodGetTypeGets the Type of the current instance.
(Inherited from Object)
Public methodToStringReturns a string that represents the current object.
(Inherited from Object)
Top
Remarks

An instance of this class is a required argument in a call to SDK API method ContentExtractorFactory.GetContentExtractor to control what type of content is extracted from documents by the IContentExtractor derived extraction interfaces.

See Also