Click or drag to resize

ContentExtractionSettings Properties

The ContentExtractionSettings type exposes the following members.

Properties
 NameDescription
Public propertyEmbeddedObjectExtraction Embedded document/attachment and embedded office media extraction setting.
Public propertyEntityExtractionSettings Options for entity extraction in extracted text, metadata, and URLs.
Public propertyExtractionType Text and metadata extraction setting.
Public propertyExtractOfficeTrackedChanges If true, appends tracked change information/text from office document formats (that support tracked changes) to the end of the document's extracted text; otherwise, tracked changes text is not appended to document's extracted text.
Public propertyHashing Document hashing settings.
Public propertyLanguageId Language identification of extracted text settings.
Public propertyLargeDocumentCritera Defines the "large" document criteria, in bytes, that determines what type of content extractor is returned by the content extractor factory for "large" unknown/unsupported formats and also "large" encoded text based formats.
Public propertyPdfDocument PDF document extraction settings.
Public propertyTimeZoneAndEmail Settings for document collection time zone and related extracted DateTime metadata and email extracted text DateTime display.
Public propertyUnsupportedFiltering Binary-to-text filtering of unsupported/unknown document file format settings.
Public propertyUseLargeDocumentUTF16Encoding Determines if UTF-16 or UTF-8 encoding is used when writing the 'large' (see LargeDocumentCritera) unknown/unsupported format binary-to-text extracted text or to re-encode a 'large' encoded text file to the provided Stream.
Top
See Also