Click or drag to resize

Archive

Supported Archive file formats (IdClassification.Archive - Container document formats (usually compressed) that contain other documents (e.g., ZIP, 7z, tar, rar, etc))

  • All entries in table below are supported for file format identification.
  • 'X' in "Text" column indicates text extraction is supported for the file format.
  • 'X**' in "Text" column indicates text extraction is supported BUT binary-to-text filtering is used on partially parsed document records.
  • 'X' in "Metadata" column indicates metadata extraction is supported for the file format.
  • 'X' in "EmbeddedItem" column indicates embedded item/attachment extraction is supported for the file format.
  • 'X' in "ContentHash" column indicates a content hash is supported for the file format (see MD5ContentHash and SHA1ContentHash)

If a file format does not have a supported content extractor that extracts text then, optionally (default), a binary-to-text content extractor will be used to extract UTF-8, UTF-16, Windows-1252, and ASCII from the binary. In many cases, indexable text can be extract from unknown document formats.

Archive Supported File Formats

File Format Id Enum Value

Text

Metadata

EmbeddedItem

ContentHash

Description

ArchiveZipExe

X

Self extracting ZIP archive executable (.exe).

ArchiveRarExe

X

Self extracting RAR versions 2, 3, and 4 archive executable (.exe).

ArchiveRar5Exe

X

Self extracting RAR version 5 archive executable (.exe).

Archive7ZipExe

X

Self extracting 7z archive executable (.exe).

ArchiveLzhExe

X

Self extracting LZH/LZA archive executable (.exe).

ArchiveZip

X

X

ZIP archive file format (supports lossless data compression) (.zip;.zipx).

ArchiveZipEmpty

An empty Zip archive file with no files (ZIP contains only an "end of central directory" record) (.zip;.zipx).

ArchiveZipSplit

X

X

ZIP split archive segment. This segment (volume) is the end segment of an ZIP split archive and contains the archive central directory (.zip;.zipx).

ArchiveZipSplitSegment

ZIP split archive segment (volume). Usually has the following file name patterns: filename.zN; filename.zip.N; filename.zxN, where N = segment (volume) number and where N=01,02,03,...

ArchiveRar4

X

X

RAR archive file format versions 2, 3, and 4 (.rar).

ArchiveRar4Split

X

X

RAR split archive file format versions 3 and 4. This is first segment (volume) of the split archive (.rar).

ArchiveRar4SplitSegment

RAR split archive segment versions 3 and 4. This file is a segment (volume) of a RAR split archive. Usual file name pattern: "filename.partN.rar" where N= segment number and where N = 01, 02, 03, etc. (.rar)

ArchiveRar5

X

X

RAR archive file format version 5 (.rar).

ArchiveRar5Split

X

X

RAR split archive file format version 5. This is first segment (volume) of the split archive (.rar).

ArchiveRar5SplitSegment

RAR split archive segment version 5. This file is a segment (volume) of a RAR split archive. Usual file name pattern: "filename.partN.rar" where N= segment number and where N = 01, 02, 03, etc. (.rar)

ArchiveRarLegacy

X

X

RAR legacy archive file format (no format documentation is known) (.rar).

ArchiveRar4Encrypted

X

X

RAR archive file format versions 3 or 4 with encrypted headers. For archives with encrypted headers, no archive metadata or archive item information is available without password being applied first (.rar).

ArchiveRar4SplitEncrypted

X

X

RAR split archive file format versions 3 or 4 with encrypted headers. This is first segment (volume) of the split archive. For archives with encrypted headers, no archive metadata or archive item information is available without password being applied first (.rar).

ArchiveRar4SplitSegmentEncrypted

RAR split archive segment versions 3 or 4 with encrypted headers. This file is a segment (volume) of a RAR split archive. Usual file name pattern: "filename.partN.rar" where N= segment number and where N = 01, 02, 03, etc. For archives with encrypted headers, no archive metadata or archive item information is available without password being applied first (.rar).

ArchiveRar5Encrypted

X

X

RAR archive file format version 5 with encrypted headers. For archives with encrypted headers, no archive metadata or archive item information is available without password being applied first (.rar).

ArchiveRar5SplitEncrypted

X

X

RAR split archive file format version 5 with encrypted headers. This is first segment (volume) of the split archive. For archives with encrypted headers, no archive metadata or archive item information is available without password being applied first (.rar).

ArchiveRar5SplitSegmentEncrypted

RAR split archive segment version 5 with encrypted headers. This file is a segment (volume) of a RAR split archive. Usual file name pattern: "filename.partN.rar" where N= segment number and where N = 001, 002, 003, etc. For archives with encrypted headers, no archive metadata or archive item information is available without password being applied first (.rar).

Archive7Zip

X

X

7-Zip archive file format (supports lossless data compression) (.7z).

Archive7ZipEncrypted

X

X

7-Zip archive file format with encrypted headers. For archives with encrypted headers, no archive metadata or archive item information is available without password being applied first (.7z).

Archive7ZipSplit

X

X

7-Zip split archive file format. This is first segment (volume) of the split archive. Split 7-Zip segments usually have the following file name pattern: "filename.7z.N", where N=segment number (volume#) (.7z).

Archive7ZipSplitSegment

7-Zip split archive segment (volume), this is the second or greater part (segment) of the parts that make up a 7-Zip split archive. Split 7-Zip segments usually have the following file name pattern: "filename.7z.N", where N=segment number (volume#) (.7z).

ArchiveArc

ARC archive format (.arc).

ArchiveUnixCompress

X

X

Unix compress - LZW archive file format (It is the algorithm of the widely used Unix file compression utility 'compress', and is used in the GIF image format) (.Z).

ArchiveTar

X

X

TAR archive file format (.tar).

ArchiveStuffIt

Stuffit archive format (.sit).

ArchiveStuffItX

Stuffit X archive format (.sitx).

ArchiveLzh

X

X

Lzh archive file format (.lzh).

ArchiveLZ4

LZ4 compression stream format (.LZ4).

ArchiveLZFSE

Lempel-Ziv style data compression stream using Finite State Entropy coding format (LZFSE) (.lzfse).

ArchiveZstandard

Zstandard compressed file format (.zst).

ArchiveLZIP

lzip compressed archive file format (.lz).

ArchiveGZip

X

X

Gzip archive file format. Gzip normally is used to compress just single files (.gz;.tgz).

ArchiveLzma

X

X

Lzma raw archive format.

ArchiveRpm

X

X

RPM Package Manager (originally Red Hat Package Manager) software package format (.rpm).

ArchiveZoo

ZOO compressed archive (.zoo). Old and uncommon format.

ArchiveArj

X

X

ARJ (Archive by Robert Jung), proprietary archive file format (.arj).

ArchiveBZip2

X

X

Bzip2 archive file format. Bzip2 only compresses single files and is not a file archiver (.bz2).

ArchiveMSCab

X

X

Microsoft cabinet archive file format (.cab).

ArchiveDebian

X

X

Open Debian software package format. Debian packages are standard Unix ar archives (.deb, .udeb).

ArchiveXar

X

X

eXtensible ARchive format (XAR), is an open source archive file format introduced in Mac OS X 10.5. (.xar).

ArchiveXZ

X

X

xz archive file format (.xz).

ArchiveCpio

X

X

cpio archive file format, primarily installed on Unix-like computer operating systems (.cpio).

ArchiveBlakHole

BlackHole archive file format (proprietary ZipTV compression format) (.bh).

ArchiveWinAce

WinAce Compressed File (a proprietary compression algorithmn format) (.ace).

ArchiveMSCompiledHelp

X

X

Microsoft compiled HTML Help file format (.chm).

ArchiveMSHelp2

Microsoft Help 2.x is a file format (not released as a general help platform) (.hxs).

HDF4

HDF4 general purpose file format to store and organize large amounts of data (.hdf4;.h4;.hdf).

HDF5v1

HDF5 version 1, general purpose file format to store and organize large amounts of data (.hdf5;.h5;.hdf).

HDF5v2

HDF5 version 2, general purpose file format to store and organize large amounts of data (.hdf5;.h5;.hdf).

ArchiveArUnix

X

X

The 'archiver', mainly known as 'ar', is a Unix utility format that groups files as a single archive file (.ar;.a;.lib).