Click or drag to resize

ContentExtractorFactoryGetContentExtractor Method

Returns a content extractor result for the given document using its document file format identification result (see IdResult).

Namespace: OpenDiscoverSDK
Assembly: OpenDiscoverSDK (in OpenDiscoverSDK.dll) Version: 2025.4.4.0 (2025.4.4)
Syntax
C#
public static ContentExtractorResult GetContentExtractor(
	Stream documentStream,
	IdResult docIdResult,
	string filePath,
	ContentExtractionSettings settings
)

Parameters

documentStream  Stream
An open read-only stream to the document's contents. The stream's Position is automatically set back to 0 upon exit.
docIdResult  IdResult
The document's file format identification result (see DocumentIdentifier)
filePath  String
The full file path or file name but can be null or empty (e.g., an attachment extracted from an Office document may exist only in memory as a MemoryStream instance and may not have a file system path, unless user saved to disk after extracting item). However, some document formats may need to be re-opened internally by specific API related to that file type that does not support stream arguments, so users SHOULD always set this argument to the valid file path if known/exists.
settings  ContentExtractionSettings
ContentExtractionSettings settings object.

Return Value

ContentExtractorResult
A ContentExtractorResult object for the document, that can be used to extract document content.
Example

This example shows the pattern that should be used with ContentExtractorFactory to get a specific interface to extract content for the specific document format type.

C#
using (var stream = File.OpenRead(filePath))
{
   // Step 1: Identify document format:
   var docIdResult = DocumentIdentifier.Identify(stream, filePath);

   // Step 2: Extract content from document (uses 'docIdResult' from above line to get correct content extractor):
   var docContentResult = ContentExtractorFactory.GetContentExtractor(stream, docIdResult, filePath, _contentConfig);

   if (docContentResult.HasError)
   {
       LogMessage(string.Format("Error getting content extractor for file format ID {0}: {1}", docIdResult.ID, docContentResult.Error));
   }
   else
   {
       var extractorType = docContentResult.ContentExtractor.ContentExtractorType;

       // Step 3: Convert base interface using above ContentExtractorType to a specific interface:
       switch (extractorType)
       {
           case ContentExtractorType.Archive:
               {
                   var archiveExtractor = (IArchiveExtractor) docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
           case ContentExtractorType.Document:
               {
                   var documentExtractor = (IDocumentContentExtractor)docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
           case ContentExtractorType.Database:
               {
                   var databaseExtractor = (IDatabaseExtractor)docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
           case ContentExtractorType.MailStore:
               {
                   var mailStoreExtractor = (IMailStoreExtractor)docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
           case ContentExtractorType.DocumentStore:
               {
                   var docStoreExtractor = (IDocumentStoreExtractor)docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
           case ContentExtractorType.Unsupported:
               {
                   var unsupportedExtractor = (IUnsupportedExtractor)docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
           case ContentExtractorType.LargeUnsupported:
               {
                   var largeBlobUnsupportedExtractor = (ILargeUnsupportedExtractor)docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
           case ContentExtractorType.LargeEncodedText:
               {
                   var largeEncodedTextExtractor = (ILargeEncodedTextExtractor)docContentResult.ContentExtractor;
                   // See help file "How To" section for how to use this interface 
                   ...
               }
               break;
       }
   }
}
See Also