Click or drag to resize

Other

Supported Other file formats (IdClassification.Other - Other miscellaneous types)

  • All entries in table below are supported for file format identification.
  • 'X' in "Text" column indicates text extraction is supported for the file format.
  • 'X**' in "Text" column indicates text extraction is supported BUT binary-to-text filtering is used on partially parsed document records.
  • 'X' in "Metadata" column indicates metadata extraction is supported for the file format.
  • 'X' in "EmbeddedItem" column indicates embedded item/attachment extraction is supported for the file format.
  • 'X' in "ContentHash" column indicates a content hash is supported for the file format (see MD5ContentHash and SHA1ContentHash)

If a file format does not have a supported content extractor that extracts text then, optionally (default), a binary-to-text content extractor will be used to extract UTF-8, UTF-16, Windows-1252, and ASCII from the binary. In many cases, indexable text can be extract from unknown document formats.

Other Supported File Formats

File Format Id Enum Value

Text

Metadata

EmbeddedItem

ContentHash

Description

Unknown

File format that could not be identified.

ContainerUnextractable

Special identification for child files of containers (e.g., ZIP archives) that could not be extracted from their container, and thus, not identified.

EmptyFile

Empty file (file has 0 bytes of data).

HiddenEmptyAttachment

Internal ID Only: A Microsoft Outlook message attachment that is both hidden (Exchange property PidTagAttachmentHidden is 'true'), and empty (no binary data). Identifying this type of file allows it be excluded from extracted Outlook attachments.

OutlookEmailRefAttachment

X

Outlook email object attachment that is referenced only by a fully qualified file system path and whose file data is not contained in message object.

OutlookEmailRefOnlyAttachment

X

Outlook email object attachment that is referenced only by a fully qualified file path and whose file data is not contained in message (.msg) object.

OutlookEmailWebRefAttachment

X

Outlook email object attachment that is referenced only by a web API URL and whose file data is not contained in message (.msg) object.

UnknownCompoundFile

X

X

X

OLE2 compound file format of unknown application type.

EmptyCompoundFile

File is a valid Microsoft compound file format (OLE2) but has no storages or streams and a CLSID = 00000000-0000-0000-0000-000000000000. Although sometimes found as an embedded file, this file has no useful content.

CorruptedCompoundFile

File is a corrupted Microsoft compound file format (OLE2) and the specific file format type could not be determined.

OleLinkedObject

X

OLE linked object compound file that usually only contains an "1Ole" stream. This object is found embedded in documents and describes a link to an external object such as an Excel or Word document.

MicrosoftOfficeTheme

X

Microsoft Office Theme (document theme) (.thmx).

WindowsShortcut

X

Windows Shortcut file (Shell Link Binary File Format) (.lnk).

SettingContent_MS

X

A special Windows 'shortcut' file that opens Microsoft's new Windows Settings panel (Windows 8 and above) and which is featured primarily in Windows 10 over the old Control Panel system. Note: having this file type embedded in an Office 365 document is a security concern.

AutomaticDestinations_MS

Jump List file used by Windows 7 and allows one to quickly view items recently edited by a program that is pinned to your taskbar simply by right-clicking the icon (.automaticDestinations-ms).

CustomDestinations_MS

Jump List file used by Windows 7 and allows one to quickly view items recently edited by a program that is pinned to your taskbar simply by right-clicking the icon (.customDestinations-ms).

MicrosoftOfficeOwnerFile

Temporary file that is usually hidden and is created by Microsoft Office when a previously saved Microsoft Office document is opened for editing, printing, or review. This temporary file is called the "owner file" and contains the user name of person who opened the file. The file name begins with "~$" and the extension is the same as the original document.

MicrosoftOfficeOwnerFileOLE

Compound file formatted temporary file that is usually hidden and is created by Microsoft Office when a previously saved Microsoft Office document is opened for editing, printing, or review. This temporary file is called the "owner file" and contains the user name of person who opened the file. The file name begins with "~$$" (for Visio) and the extension is the same as the original document with a prepended '~'.

MicrosoftScriptletComponent

Microsoft ActiveX control which is used to render HTML pages. This control may be found embedded in legacy Office documents and can be a security risk.

WindowsClipboard

Windows clipboard. The clipboard is usually is a just in memory object but sometimes it may be saved in a .clp extension (.clp).

WindowsCardfile

Windows Cardfile address book application (included with Microsoft Windows 1.0 through Windows Me and Windows NT 4) (.crd).

ThumbsDB

X

Windows thumbnail cache (or Thumbs.db format) is a file format used by some versions of Microsoft Windows to store thumbnails of images(.db).

ThumbCacheDB

X

Windows Visa/7/8/10 thumbnail cache (.db).

ThumbCacheIndexDB

Windows Visa/7/8/10 thumbnail cache index (.db).

WindowsHelpFile

Windows Help File (.hlp).

WindowsRegistryFile

Microsoft Windows NT 4 (and later) Registry File (REGF) used to store system and application related data (.dat).

MicrosoftGraph

X

X

X

Microsoft Graph (originally known as Microsoft Chart) (.gra).

MicrosoftEquation

Microsoft Equation. This is the earliest Microsoft Equation version.

MicrosoftEquationEditor2

X

Microsoft Equation Editor 2.0 format. This format is found embedded in Office 97-2003 documents.

MicrosoftEquationEditor3

X

Microsoft Equation Editor 3.0 format. This format is found embedded in Office 97-2003 documents.

MicrosoftPhotoEditor3

X

X

X

Microsoft Photo Editor version 3.0 (image-editing application found in Microsoft Office 97–XP versions for Windows, classified as one of Microsoft Office Tools). This format is often found embedded in Office 97-2003 documents.

MicrosoftClipArtGallery

Microsoft Clip Art Gallery embedded object. This format is found embedded in Office 97-2003 documents and generally considered an unimportant embedded item (i.e., junk).

MicrosoftWordArt

Microsoft WordArt embedded object. This format is found embedded in Microsoft Office documents is decorative text that you can add to a document.

MicrosoftDraw1

Microsoft Draw 1.01 (packaged with Office).

MicrosoftDraw98

Microsoft Draw 98 (packaged with Office).

MicrosoftVBAProject

X

X

X

Microsoft VBA (Visual Basic for Applications) Project. This format is often found embedded in Microsoft Office documents.

MetafileOLE2Container

X

X

X

Metafile (.wmf) OLE2 compound file container. IPicture objects provide a language-neutral abstraction for bitmaps, icons, and metafiles.

DeviceIndependentBitmapOLE2Container

X

X

X

Device Independent Bitmap (.bmp) OLE2 compound file container. IPicture objects provide a language-neutral abstraction for bitmaps, icons, and metafiles.

EnhancedMetafileOLE2Container

X

X

X

Enhanced Metafile (.emf) OLE2 compound file container. IPicture objects provide a language-neutral abstraction for bitmaps, icons, and metafiles.

OutlookFileAttachment

X

Microsoft "Outlook File Attachment" - an OLE (compound file) wrapper around an attachment payload.

MicrosoftExchangeFolderShortcut

Microsoft Exchange public folder shortcut (.xnk).

WindowMediaPlayerCompressedSkin

Windows Media Player compressed 'skin' file (.wmz).

MicrosoftInfoPath

X

X

Microsoft InfoPath file (initially released as part of Microsoft Office 2003). InfoPath is an application used for designing, distributing, filling and submitting electronic forms containing structured data (.xsn).

OrgPlus

OrgPlus organizational chart file (Insperity Business Services, L.P.) (.opx).

SafariWebArchive

Mac OS Safari WebArchive file format (archived complete web pages) (.webarchive).

MicrographxClipArtIndex

Micrographx Clip Art Index or Pallete (.sbj).

InstallShieldCAB

InstallShield installation software "CAB" format and is a successor to InstallShield Z format (this format is not the same as Microsoft Cabinet format) (.cab).

InstallShieldZ

InstallShield installation software "Z" proprietary format (used in version 3 of InstallShield) (.z).

MicrosoftForms2_CheckBox

Microsoft Forms 2.0 Object Library Checkbox control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_ComboBox

Microsoft Forms 2.0 Object Library ComboBox control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_CommandButton

Microsoft Forms 2.0 Object Library CommandButton control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_Form

Microsoft Forms 2.0 Object Library Form control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_Frame

Microsoft Forms 2.0 Object Library Frame control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_HtmlCheckBox

Microsoft Forms 2.0 Object Library HTML CheckBox control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_HtmlHidden

Microsoft Forms 2.0 Object Library HTML Hidden control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_HtmlImage

Microsoft Forms 2.0 Object Library HTML Image control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_HtmlOption

Microsoft Forms 2.0 Object Library HTML Option control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_HtmlPassword

Microsoft Forms 2.0 Object Library HTML Password control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_HtmlReset

Microsoft Forms 2.0 Object Library HTML Reset control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_HtmlSelect

Microsoft Forms 2.0 Object Library HTML Select control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_HtmlSubmit

Microsoft Forms 2.0 Object Library HTML Submit control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_HtmlText

Microsoft Forms 2.0 Object Library HTML Text control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_HtmlTextArea

Microsoft Forms 2.0 Object Library HTML TextArea control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_Image

Microsoft Forms 2.0 Object Library HTML Image control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_Label

Microsoft Forms 2.0 Object Library HTML Label control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_ListBox

Microsoft Forms 2.0 Object Library HTML ListBox control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_MultiPage

Microsoft Forms 2.0 Object Library Multi-Page control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_OptionButton

Microsoft Forms 2.0 Object Library OptionButton control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_ScrollBar

Microsoft Forms 2.0 Object Library ScrollBar control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_TabStrip

Microsoft Forms 2.0 Object Library TabStrip control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_TextBox

Microsoft Forms 2.0 Object Library TextBox control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_ToggleButton

Microsoft Forms 2.0 Object Library ToggleButton control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MicrosoftForms2_SpinButton

Microsoft Forms 2.0 Object Library Spin Button control (embedded item found in Microsoft Office Documents). Not considered useful for content extraction.

MiniDumpFile

Windows MiniDump file used for reporting application crash data (.dmp;.mdmp).

AppleDesktopServicesStore

Apple Desktop Services Store (.DS_Store) is a file (hidden on macOS) that stores custom attributes of its containing folder (.DS_Store).