Click or drag to resize

DesktopPublishing

Supported DesktopPublishing file formats (IdClassification.DesktopPublishing - Desktop publishing document formats)

  • All entries in table below are supported for file format identification.
  • 'X' in "Text" column indicates text extraction is supported for the file format.
  • 'X**' in "Text" column indicates text extraction is supported BUT binary-to-text filtering is used on partially parsed document records.
  • 'X' in "Metadata" column indicates metadata extraction is supported for the file format.
  • 'X' in "EmbeddedItem" column indicates embedded item/attachment extraction is supported for the file format.
  • 'X' in "ContentHash" column indicates a content hash is supported for the file format (see MD5ContentHash and SHA1ContentHash)

If a file format does not have a supported content extractor that extracts text then, optionally (default), a binary-to-text content extractor will be used to extract UTF-8, UTF-16, Windows-1252, and ASCII from the binary. In many cases, indexable text can be extract from unknown document formats.

DesktopPublishing Supported File Formats

File Format Id Enum Value

Text

Metadata

EmbeddedItem

ContentHash

Description

MSPublisherCompoundFileCorrupted

Microsoft Publisher compound file corrupted. Unable to determine specific format version (.pub).

MSPublisher98to2003

X

X

X

X

Microsoft Publisher 98-2003 (.pub).

MSPublisher2007to2016

X

X

X

X

Microsoft Publisher 2007-2016 (.pub).

MSPublisherMhtml

X

X

X

Microsoft Publisher exported as MHTML (.mht).

SerifPagePlus

X

X

X

Serif PagePlus desktop publishing (page layout) program developed by Serif (.ppp).

SerifWebPlus

X

X

X

Serif WebPlus website design program for Microsoft Windows (.wpp).

PageMaker

X

X

X

Adobe PageMaker desktop publishing file format (.pm3;.pm4;.pm5;.pm6;.p65;.pm7;.pmd).