Markup |
Supported Markup file formats (IdClassification.Markup - Markup document (e.g., XML or HTML))
If a file format does not have a supported content extractor that extracts text then, optionally (default), a binary-to-text content extractor will be used to extract UTF-8, UTF-16, Windows-1252, and ASCII from the binary. In many cases, indexable text can be extract from unknown document formats.
File Format Id Enum Value | Text | Metadata | EmbeddedItem | ContentHash | Description |
|---|---|---|---|---|---|
X | X | HyperText Markup Language (HTML) (.htm;.html). | |||
X | X | The fifth and current version of the HyperText Markup Language (HTML) standard (.htm;.html). | |||
X | X | Extensible Hypertext Markup Language (XHTML). | |||
Cascading Style Sheet (.css). | |||||
X | X | X | Microsoft Web Archive: MHT is a web page archive file format which is an MHTML (short for MIME HTML) document type (.mht;mhtml). | ||
X | X | X | Generic MIME (RFC 822) format. | ||
X | X | X | Generic (non-email) secure MIME (S/MIME) clear-signed. Clear-signed MIMEs have MIME media type "multipart/signed" (.p7s). | ||
Generic (non-email) secure MIME (S/MIME) opaque-signed. Opaque-signed MIMEs have exactly one MIME entity and this MIME entity usually has the media type "application/pkcs7-mime" (.p7s). | |||||
X | X | X | Generic (non-email) secure MIME (S/MIME) with compression (.p7z;.txt). | ||
X | X | X | Generic (non-email) secure MIME (S/MIME) with encryption (enveloped-data) (.p7m;.txt). | ||
X | Extensible Markup Language (XML) file of unknown format/use. Includes files with XML-like markup that do not have XML declaration at beginning of file (.xml). | ||||
X | RSS (Rich Site Summary) feed format (.rss). | ||||
X | XML Schema Definition (.xsd;.xml). | ||||
Enriched text - a simple formatted text developed for MIME (Content-Type: "text/enriched" or "text/richtext"). | |||||
X | Wireless Markup Language (WML), is a markup language (XML) intended for devices that implement the Wireless Application Protocol (WAP) specification, such as mobile phones (.wml). | ||||
X | MusicXML is an XML-based file format for representing Western musical notation (.mxl;.xml). | ||||
X | Mathematical Markup Language (MathML) XML file format (XML for describing mathematical notations) (.mml). | ||||
X | Mathcad XML document (.xmcd). | ||||
X | Windows Media Playlist (.wpl). | ||||
X | X | X | X | Domino XML (DXL) generic (unknown, document is missing 'form' attribute) export file format (.dxl). | |
X | X | X | X | Domino XML (DXL) custom form document (unknown 'form' attribute type) export file format (.dxl). | |
X | LaTeX markup language widely used in academia for the communication and publication of scientific documents (.tex). | ||||
Extensible Binary Meta Language (EBML) is a generalized file format for any kind of data, aiming to be a binary equivalent to XML (Matroska, WebM, and other formats are based on this format). | |||||
X | Extensible Metadata Platform Packet (XMP) metadata file format. XMP is an ISO standard, originally created by Adobe Systems Inc., for the creation, processing and interchange of standardized and custom metadata for all kinds of resources (.xmp). |