ContentExtractorFactoryGetContentExtractor Method

Returns a content extractor result for the given document using its document file format identification result (see IdResult).

Namespace: OpenDiscoverSDK
Assembly: OpenDiscoverSDK (in OpenDiscoverSDK.dll) Version: 2025.4.6.0 (2025.4.6)

Syntax

Copy

public static ContentExtractorResult GetContentExtractor(
	Stream documentStream,
	IdResult docIdResult,
	string filePath,
	ContentExtractionSettings settings
)

Parameters

documentStream Stream: An open read-only stream to the document's contents. The stream's Position is automatically set back to 0 upon exit.
docIdResult IdResult: The document's file format identification result (see DocumentIdentifier)
filePath String: The full file path or file name but can be null or empty (e.g., an attachment extracted from an Office document may exist only in memory as a MemoryStream instance and may not have a file system path, unless user saved to disk after extracting item). However, some document formats may need to be re-opened internally by specific API related to that file type that does not support stream arguments, so users SHOULD always set this argument to the valid file path if known/exists.
settings ContentExtractionSettings: ContentExtractionSettings settings object.

Return Value

ContentExtractorResult
A ContentExtractorResult object for the document, that can be used to extract document content.

Example

This example shows the pattern that should be used with ContentExtractorFactory to get a specific interface to extract content for the specific document format type.

Copy

using (var stream = File.OpenRead(filePath))
{
   // Step 1: Identify document format:
   var docIdResult = DocumentIdentifier.Identify(stream, filePath);

   // Step 2: Extract content from document (uses 'docIdResult' from above line to get correct content extractor):
   var docContentResult = ContentExtractorFactory.GetContentExtractor(stream, docIdResult, filePath, _contentConfig);

   if (docContentResult.HasError)
   {
       LogMessage(string.Format("Error getting content extractor for file format ID {0}: {1}", docIdResult.ID, docContentResult.Error));
   }
   else
   {
       var extractorType = docContentResult.ContentExtractor.ContentExtractorType;

       // Step 3: Convert base interface using above ContentExtractorType to a specific interface:
       switch (extractorType)
       {
           case ContentExtractorType.Archive:
               {
                   var archiveExtractor = (IArchiveExtractor) docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
           case ContentExtractorType.Document:
               {
                   var documentExtractor = (IDocumentContentExtractor)docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
           case ContentExtractorType.Database:
               {
                   var databaseExtractor = (IDatabaseExtractor)docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
           case ContentExtractorType.MailStore:
               {
                   var mailStoreExtractor = (IMailStoreExtractor)docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
           case ContentExtractorType.DocumentStore:
               {
                   var docStoreExtractor = (IDocumentStoreExtractor)docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
           case ContentExtractorType.Unsupported:
               {
                   var unsupportedExtractor = (IUnsupportedExtractor)docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
           case ContentExtractorType.LargeUnsupported:
               {
                   var largeBlobUnsupportedExtractor = (ILargeUnsupportedExtractor)docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
           case ContentExtractorType.LargeEncodedText:
               {
                   var largeEncodedTextExtractor = (ILargeEncodedTextExtractor)docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
       }
   }
}

Reference

ContentExtractorFactory Class

OpenDiscoverSDK Namespace