Content |
The ContentExtractionSettings type exposes the following members.
| Name | Description | |
|---|---|---|
| EmbeddedObjectExtraction | Embedded document/attachment and embedded office media extraction setting. | |
| EntityExtractionSettings | Options for entity extraction in extracted text, metadata, and URLs. | |
| ExtractionType | Text and metadata extraction setting. | |
| ExtractOfficeTrackedChanges | If true, appends tracked change information/text from office document formats (that support tracked changes) to the end of the document's extracted text; otherwise, tracked changes text is not appended to document's extracted text. | |
| Hashing | Document hashing settings. | |
| LanguageId | Language identification of extracted text settings. | |
| LargeDocumentCritera | Defines the "large" document criteria, in bytes, that determines what type of content extractor is returned by the content extractor factory for "large" unknown/unsupported formats and also "large" encoded text based formats. | |
| PdfDocument | PDF document extraction settings. | |
| TimeZoneAndEmail | Settings for document collection time zone and related extracted DateTime metadata and email extracted text DateTime display. | |
| UnsupportedFiltering | Binary-to-text filtering of unsupported/unknown document file format settings. | |
| UseLargeDocumentUTF16Encoding | Determines if UTF-16 or UTF-8 encoding is used when writing the 'large' (see LargeDocumentCritera) unknown/unsupported format binary-to-text extracted text or to re-encode a 'large' encoded text file to the provided Stream. |