Content Extraction Settings Class
Main document content extraction settings class.
Definition
Namespace: OpenDiscoverSDK.Interfaces.Settings
Assembly: OpenDiscoverSDK.Interfaces (in OpenDiscoverSDK.Interfaces.dll) Version: 2026.2.6.0 (2026.02.06)
Assembly: OpenDiscoverSDK.Interfaces (in OpenDiscoverSDK.Interfaces.dll) Version: 2026.2.6.0 (2026.02.06)
C#
[DataContractAttribute]
public class ContentExtractionSettings- Inheritance
- Object ContentExtractionSettings
- Derived
Remarks
An instance of this class is a required argument in a call to SDK API method ContentExtractorFactory.GetContentExtractor to control what type of content is extracted from documents by the IContentExtractor derived extraction interfaces.
Constructors
| ContentExtractionSettings | Constructor. |
Properties
| EmbeddedObjectExtraction | Embedded document/attachment and embedded office media extraction setting. |
| EntityExtractionSettings | Options for entity extraction in extracted text, metadata, and URLs. |
| ExtractionType | Text and metadata extraction setting. |
| ExtractOfficeTrackedChanges | If true, appends tracked change information/text from office document formats (that support tracked changes) to the end of the document's extracted text; otherwise, tracked changes text is not appended to document's extracted text. |
| Hashing | Document hashing settings. |
| LanguageId | Language identification of extracted text settings. |
| LargeDocumentCritera | Defines the "large" document criteria, in bytes, that determines what type of content extractor is returned by the content extractor factory for "large" unknown/unsupported formats and also "large" encoded text based formats. |
| PdfDocument | PDF document extraction settings. |
| TimeZoneAndEmail | Settings for document collection time zone and related extracted DateTime metadata and email extracted text DateTime display. |
| UnsupportedFiltering | Binary-to-text filtering of unsupported/unknown document file format settings. |
| UseLargeDocumentUTF16Encoding | Determines if UTF-16 or UTF-8 encoding is used when writing the 'large' (see LargeDocumentCritera) unknown/unsupported format binary-to-text extracted text or to re-encode a 'large' encoded text file to the provided Stream. |
Methods
| Equals | Determines whether the specified object is equal to the current object. (Inherited from Object) |
| GetHashCode | Serves as the default hash function. (Inherited from Object) |
| GetType | Gets the Type of the current instance. (Inherited from Object) |
| ToString | Returns a string that represents the current object. (Inherited from Object) |