Identify Document File Formats
The Open Discover® SDK file format identifier API (see DocumentIdentifier) identifies over 1,500 file formats using internal signatures specific to the identified formats.
Overloaded method DocumentIdentifier.Identify is used to identify a document's file format. This method returns an IdResult object that contains information on the identified file format (see Id), known file extensions of the file format, description of the format, format classification (see IdClassification), quality (confidence) of the identification (see IdMatchType), and more.
The following unit test example illustrates how to use DocumentIdentifier.Identify method and also show cases most of the properties on the returned IdResult object:
1 2 3 4 5 6 7 8 9 10 11 12 13
var filePath = @"C:\WordProcessing\Word2003.doc";
using (var stream = File.OpenRead(filePath))
{
var idResult = DocumentIdentifier.Identify(stream, filePath);
Assert.IsTrue(idResult.ID == Id.Word2003);
Assert.IsTrue(idResult.Classification == IdClassification.WordProcessing);
Assert.IsTrue(idResult.MatchType == IdMatchType.SignatureAndExtension);
Assert.IsTrue(idResult.IsEncrypted == false);
Assert.IsTrue(idResult.MediaType == "application/msword");
Assert.IsTrue(idResult.Description != null);
Assert.IsTrue(idResult.PrimaryExtension != null);
Assert.IsTrue(idResult.Extensions != null);
}In the above example, DocumentIdentifier.Identify returns an IdResult object. The IdResult class contains useful information about the identified file format. If the document format cannot be identified then the IdResult.ID property will be set to Id.Unknown.
For example C# application usage of the DocumentIdentifier class, see our Github repository: Open Discover® SDK DocumentIdentifier Example