ContentExtractionSettings Properties

The ContentExtractionSettings type exposes the following members.

Properties

	Name	Description
	EmbeddedObjectExtraction	Embedded document/attachment and embedded office media extraction setting.
	EntityExtractionSettings	Options for entity extraction in extracted text, metadata, and URLs.
	ExtractionType	Text and metadata extraction setting.
	ExtractOfficeTrackedChanges	If true, appends tracked change information/text from office document formats (that support tracked changes) to the end of the document's extracted text; otherwise, tracked changes text is not appended to document's extracted text.
	Hashing	Document hashing settings.
	LanguageId	Language identification of extracted text settings.
	LargeDocumentCritera	Defines the "large" document criteria, in bytes, that determines what type of content extractor is returned by the content extractor factory for "large" unknown/unsupported formats and also "large" encoded text based formats.
	PdfDocument	PDF document extraction settings.
	TimeZoneAndEmail	Settings for document collection time zone and related extracted DateTime metadata and email extracted text DateTime display.
	UnsupportedFiltering	Binary-to-text filtering of unsupported/unknown document file format settings.
	UseLargeDocumentUTF16Encoding	Determines if UTF-16 or UTF-8 encoding is used when writing the 'large' (see LargeDocumentCritera) unknown/unsupported format binary-to-text extracted text or to re-encode a 'large' encoded text file to the provided Stream.