Click or drag to resize

DocumentAttributes Enumeration

Document attributes. Document attributes give extra information about a document such as if it has hidden content, is password protected (encrypted), has macros, is inline image (e.g., image is an inline email image), has external document references, etc.

Namespace: OpenDiscoverSDK.Interfaces.Content
Assembly: OpenDiscoverSDK.Interfaces (in OpenDiscoverSDK.Interfaces.dll) Version: 2025.4.4.0 (2025.4.4)
Syntax
C#
[DataContractAttribute]
public enum DocumentAttributes
Members
Member nameValueDescription
PasswordProtected1 Document is password protected (encrypted). If document is an archive then this flag means that the archive central directory is encrypted and information on archive items is not available unless decrypted with password.
DefaultPassword2 Document is password protected (encrypted) with application default password. Some documents, like Excel and PowerPoint, are encrypted with their respective application default password under certain scenarios. To open the document using the application does not require the password (application automatically decrypts), but to extract content using 3rd party software does.
ArchiveItemsPasswordProtected6 Archive has password protected (encrypted) items.
PersonName7 Document has author, contributor, last-edited-by, or last-printed-by identifying metadata field(s) (does not include user defined metadata fields).
Macros8 Office document has macros.
Comments9 Document has user comments or notes (i.e., non-metadata comment/descriptions).
CustomMetadata10 Document has custom (user defined) metadata fields.
RevisionTracking11 Document has revisions being tracked.
ExternalFileAttachments12 Document has externally referenced attachments (files such as OneNote2010 can have external attachments (.onebin files)).
Template13 Document is a template.
Headers14 Document has page or sheet headers (not set for PowerPoint, all versions; not set for OpenDocument spreadsheets).
Footers15 Document has page or sheet footers (not set for PowerPoint, all versions; not set for OpenDocument spreadsheets).
OfficeLinkedContent20 Office 2007 or newer document has externally linked content either as hyperlinks or OLE linked files. Also supported for PDF and Open Document formats.
OfficeEmbeddedDocuments21 Office document has embedded document(s) (applies to Microsoft Office and OpenDocument formats).
OfficeEmbeddedPictures22 Office document has embedded picture(s) (applies to Microsoft Office and OpenDocument formats).
OfficeEmbeddedMedia23 Office document has embedded media files (applies to Microsoft Office 2007+ and OpenDocument formats).
OfficePictureLinkedContent24 Office 2007 or newer document has linked picture.
OfficeExternDataConnections25 Office 2007 or newer document has external data connections.
OfficeCustomXmlData26 Office 2007 or newer document has custom xml data parts.
OfficeWebExtensionAddIns27 Office 2007 or newer document has web extensions (e.g., such as task pane add-ins).
OfficeModernComments28 Office 365 document has user 'modern comments' which allow assigning tasks in comment threads and other features.
HiddenText35 Document has text characters or textboxes formatted as hidden.
WorkbookProtected40 Workbook is protected.
WorkbookProtectedWorksheets41 Workbook has protected worksheets.
WorkbookHiddenWorksheets42 Workbook has hidden worksheets.
WorkbookVeryHiddenWorksheets43 Workbook has very hidden worksheets.
WorksheetHiddenRows44 Worksheet has hidden rows.
WorksheetHiddenColumns45 Worksheet has hidden columns.
WorksheetAutoFilters46 Worksheet has auto-filters.
WorksheetPivotTables47 Worksheet has pivot tables.
WorkbookExternalWorkbookReferences48 Workbook has external spreadsheet references.
WorksheetThreadedComments49 Workbook has threaded comments (this applies to Excel for Office 365 which changed the way comments worked - comments are now threaded discussions).
PresentationHiddenSlides60 Presentation document has hidden slides.
PresentationHasSpeakerNotes61 Presentation document has speaker notes.
PdfPortfolio70 PDF document is a PDF Portfolio/Package (After Acrobat 8.0, the term PDF Portfolio, versus PDF Package, is used to to describe any document that contains a collection dictionary).
PdfXFA71 PDF document contains XFA form.
PdfAcroForm72 PDF document contains AcroForm (non-XFA) form.
PdfHasFailedPages73 PDF document contains one or more pages where text was not extracted due to a processing exception or the number extracted text characters did not meet the PageExtractedTextCriteria number of characters.
DominoXmlHasEncryptedItems80 Domino XML document (.dxl) has 'item' XML elements that are encrypted.
DominoXmlHasEncryptedAttachments81 Domino XML document (.dxl) has one or more encrypted attachments.
DominoXmlHasNativeMimeFlag82 Domino XML document (.dxl) has 'item' XML element named $NoteHasNativeMime with value of "1" (true).
DominoXmlHasNativeMimeElement83 Domino XML document (.dxl) has 'item' element with <mime> child, i.e., exported DXL data primarily stored in <mime> element.
DominoXmlHasNativeMimeBody84 Domino XML document (.dxl) has item 'body' element with 'rawitemdata' (RFC-822) formatted data.
OutlookEmailHasRefAttachment100 Outlook email object has attachment(s) that are referenced by a fully qualified file system path (HasReferenceAttachment)
OutlookEmailHasRefOnlyAttachment101 Outlook email object has attachment(s) that are referenced by a fully qualified path (HasReferenceOnlyAttachment)
OutlookEmailHasWebRefAttachment102 Outlook email object has an attachment(s) that are by web API reference only (HasWebReferenceAttachment).
DetectedSocialSecurityNumber200 Detected possible social security number(s) in extracted text or metadata.
DetectedIndividualTaxpayerIdNumber201 Detected possible Individual Taxpayer Identification Number(s) (ITIN) in extracted text or metadata. An ITIN is a tax processing number only available for certain nonresident and resident aliens, their spouses, and dependents who cannot get a Social Security Number (SSN).
DetectedCreditCard202 Detected possible credit card number(s) in extracted text or metadata.
DetectedBankAccount203 Detected possible bank account number(s) in extracted text or metadata.
DetectedIBAN204 Detected possible international bank account number(s) (IBAN) in extracted text or metadata.
DetectedInvestmentAccount205 Ddetected possible investment account number(s) in extracted text or metadata.
DetectedEmailAddress206 Detected possible email address(es) in extracted text or metadata.
DetectedEmailAddressAndName207 Detected possible email address associated with person's name.
DetectedEmailAddressAndIPAddress208 Detected email address and associated IP address.
DetectedPhoneNumber209 Detected possible phone number(s) in extracted text or metadata.
DetectedAddress210 Detected full physical address(es) in extracted text or metadata.
DetectedDateOfBirth211 Detected possible date of birth(s) in extracted text or metadata.
DetectedDriversLicense212 Detected possible driver's license number(s) in extracted text or metadata.
DetectedPassport213 Detected possible passport number(s) in extracted text or metadata.
DetectedMaidenName214 Detected possible (mother's) maiden names in extracted text or metadata.
DetectedHealthCareNumberID215 Detected possible health care insurance number/member ID in extracted text or metadata.
DetectedLicensePlateNumber216 Detected possible vehicle license plate number in extracted text or metadata.
DetectedVehicleIdentificationNumber217 Detected possible vehicle identification number (VIN) in extracted text or metadata.
DetectedSocialMediaAccount218 Detected possible social media account name in extracted text or metadata.
DetectedCryptoCurrencyAddress219 Detected possible cryptocurrency wallet address in extracted text or metadata.
DetectedIPv4Address230 Detected IPv4 address(es) in extracted text, hyperlinks, or metadata.
DetectedIPv6Address231 Detected IPv6 address(es) in extracted text, hyperlinks, or metadata.
DetectedMacAddress232 Detected MAC address(es) in extracted text, hyperlinks, or metadata.
DetectedIMEINumber233 Detected IMEI number in extracted text, hyperlinks, or metadata.
DetectedPassword270 Detected possible password(s) in extracted text or metadata.
DetectedUsername271 Detected possible login username(s) in extracted text or metadata.
DetectedNetworkName272 Detected possible Network, workstation, desktop, or computer name(s) in extracted text or metadata.
DetectedDatabaseCredential273 Detected possible database credential(s) (or url links to Azure, Sharepoint, AWS, etc. storage) in extracted text or metadata.
DetectedMachineReadableZone274 Detected machine-readable zone (MRZ) zone used by passports, immigration visas, travel documents and driver's licenses.
DetectedCustomEntityItem300 Detected user defined custom Entity in extracted text or metadata.
TextTruncatedToMaxAllowable1,000 Indicates that extracted text exceeded .NET maximum 2GB string data type total size in bytes (1,073,741,791 UTF-16 characters) and was truncated to maximum string size.
TextTruncatedForDocumentStore1,001 RESERVED. Indicates that extracted text exceeded document store's text per document storage limitation. This attributes is not set by Open Discover SDK. It is reserved for user processing workflows that export processed documents into a document database such as Elasticsearch, Ravendb, MongoDB, etc.
EntityDetectionScanLimited1,010

Indicates that entity detection scan on extracted text was limited to a maximum number of bytes (in case of binary blobs) or characters (in case of 'large' encoded text files).

  • For "large" text files that exceed 200 million characters, only the first 200 million characters are scanned for entity items.
  • For "large" unsupported binary files (blob) that exceed 100 million bytes (100MB), only the first 100 million characters are scanned for entity items.

MaxNumberOfDocumentOcrPagesLimited2,000 RESERVED. Optical Character Recognition (OCR) of document was limited to a user defined maximum number of document pages for OCR. This attribute only applies to Adobe PDFs and multi-page documents.
See Also