Content |
[DataContractAttribute] public class ContentExtractionSettings
The ContentExtractionSettings type exposes the following members.
| Name | Description | |
|---|---|---|
| ContentExtractionSettings | Constructor. |
| Name | Description | |
|---|---|---|
| EmbeddedObjectExtraction | Embedded document/attachment and embedded office media extraction setting. | |
| EntityExtractionSettings | Options for entity extraction in extracted text, metadata, and URLs. | |
| ExtractionType | Text and metadata extraction setting. | |
| ExtractOfficeTrackedChanges | If true, appends tracked change information/text from office document formats (that support tracked changes) to the end of the document's extracted text; otherwise, tracked changes text is not appended to document's extracted text. | |
| Hashing | Document hashing settings. | |
| LanguageId | Language identification of extracted text settings. | |
| LargeDocumentCritera | Defines the "large" document criteria, in bytes, that determines what type of content extractor is returned by the content extractor factory for "large" unknown/unsupported formats and also "large" encoded text based formats. | |
| PdfDocument | PDF document extraction settings. | |
| TimeZoneAndEmail | Settings for document collection time zone and related extracted DateTime metadata and email extracted text DateTime display. | |
| UnsupportedFiltering | Binary-to-text filtering of unsupported/unknown document file format settings. | |
| UseLargeDocumentUTF16Encoding | Determines if UTF-16 or UTF-8 encoding is used when writing the 'large' (see LargeDocumentCritera) unknown/unsupported format binary-to-text extracted text or to re-encode a 'large' encoded text file to the provided Stream. |
| Name | Description | |
|---|---|---|
| Equals | Determines whether the specified object is equal to the current object. (Inherited from Object) | |
| GetHashCode | Serves as the default hash function. (Inherited from Object) | |
| GetType | Gets the Type of the current instance. (Inherited from Object) | |
| ToString | Returns a string that represents the current object. (Inherited from Object) |
An instance of this class is a required argument in a call to SDK API method ContentExtractorFactory.GetContentExtractor to control what type of content is extracted from documents by the IContentExtractor derived extraction interfaces.