DataFile |
Supported DataFile file formats (IdClassification.DataFile - Data and data serialization document formats)
If a file format does not have a supported content extractor that extracts text then, optionally (default), a binary-to-text content extractor will be used to extract UTF-8, UTF-16, Windows-1252, and ASCII from the binary. In many cases, indexable text can be extract from unknown document formats.
File Format Id Enum Value | Text | Metadata | EmbeddedItem | ContentHash | Description |
|---|---|---|---|---|---|
Open Discover document data archive output file that stores extracted document metadata and attributes (.dda). | |||||
Open Discover attachment data archive output file that stores extracted container items and extracted document attachments (.ada). | |||||
Open Discover text data archive output file that stores extracted document text (.tda). | |||||
Binary Property List format for storing program settings and other data in Apple OS X, iOS, NextSTEP applications (.plist). | |||||
XML Property List format for storing program settings and other data in Apple OS X, iOS, NextSTEP applications (this XML format was introduced by Apple to replace the earlier format used in NeXTSTEP) (.plist). | |||||
X | JavaScript Object Notation (JSON) open standard format is a text based format to transmit data objects consisting of attribute–value pairs (.json). | ||||
X | JavaScript Object Notation for Linked Data (JSON-LD) format for encoding Linked Data using JSON (.jsonld). | ||||
Concise Binary Object Representation (CBOR) data format (.cbor). | |||||
Babylon Glossary Builder glossary file (.bgl). | |||||
GNU Gettext Machine Object file. MO (Machine Object) files are compiled, machine-readable PO (Portable Object) files (.mo;.gmo). | |||||
Microsoft Service Quality Monitoring file used to assist in monitoring quality of applications such as Windows Live Messenger, Microsoft Office, etc. (.sqm). | |||||
X | Mac OS BinHex 4.0 (binary-to-hexadecimal) format, used for sending binary files through email (.hqx). | ||||
X | AppleSingle version 1. This format contains both file contents and attributes. | ||||
X | AppleSingle version 2. This format contains both file contents and attributes. | ||||
X | AppleDouble resource fork version 1 (The AppleDouble format keeps the data fork of the file in its original format and filename). This format only stores the file attributes. | ||||
X | AppleDouble resource fork version 2 (The AppleDouble format keeps the data fork of the file in its original format and filename). This format only stores the file attributes. | ||||
X | X | Microsoft Binder (Microsoft Office 95, 97, and 2000. Discontinued after Office 2000) (.obd). | |||
X | Microsoft "oledata.mso": The MSO file allows other HTML email clients (other than Outlook) to render HTML email messages sent by Microsoft Outlook correctly. Other formats can contain MSO files, and these MSO files can contain useful embedded objects such as MS Office documents. | ||||
X | Time-stamped data that is used to bind a file with one or more time-stamp tokens obtained for that file. A Cryptographic Message Syntax (CMS) envelope is used as the time-stamped data content envelope (.tsd). | ||||
X | Comma separated value (CSV) file (.csv). | ||||
X | Tab separated value (TSV) file (.tsv;.tab). | ||||
Apache Parquet is an open source, column-oriented data file format (.parquet). |