Click or drag to resize

PdfDocumentSettingsPageExtractedTextCriteria Property

Minimum PDF page extracted text length (in characters) criteria. See remarks.

Namespace: OpenDiscoverSDK.Interfaces.Settings
Assembly: OpenDiscoverSDK.Interfaces (in OpenDiscoverSDK.Interfaces.dll) Version: 2025.4.4.0 (2025.4.4)
Syntax
C#
[DataMemberAttribute]
public int PageExtractedTextCriteria { get; set; }

Property Value

Int32
Remarks

If ExtractionType is set to MetadataOnly then this property is ignored.

If the extracted text length of any PDF page is below the value of this property then the following data is added to PdfDocumentContent:

The PdfPageInfo information can aid users who plan on implementing OCR (optical character recognition) to augment text extraction in determining which, if any, PDF pages are a candidate for OCR.

The value of "1" (see below) is chosen as the default value because it is not uncommon to find a PDF page that is blank except for a page number. Users are encouraged to experiment on a PDF document collection and find values that work best for their particular needs.

Default property value: 1 [character]; at least 1 character of extracted text per PDF page must be extracted to pass this criteria (e.g., at least a page number on an otherwise blank PDF page).

Valid range: 0 - 500 [characters]; a value of 0 means that any length of text (including no text) passes this criteria and no pages will be marked failed due to page extracted text length.

See Also