Content Extractor FactoryGet Content Extractor Method
Returns a content extractor result for the given document using its document file format identification result (see IdResult).
Definition
Namespace: OpenDiscoverSDK
Assembly: OpenDiscoverSDK (in OpenDiscoverSDK.dll) Version: 2026.2.6.0 (2026.02.06)
A ContentExtractorResult object for the document, that can be used to extract document content.
Assembly: OpenDiscoverSDK (in OpenDiscoverSDK.dll) Version: 2026.2.6.0 (2026.02.06)
C#
public static ContentExtractorResult GetContentExtractor(
Stream documentStream,
IdResult docIdResult,
string filePath,
ContentExtractionSettings settings
)Parameters
- documentStream Stream
- An open read-only stream to the document's contents. The stream's Position is automatically set back to 0 upon exit.
- docIdResult IdResult
- The document's file format identification result (see DocumentIdentifier)
- filePath String
- The full file path or file name but can be null or empty (e.g., an attachment extracted from an Office document may exist only in memory as a MemoryStream instance and may not have a file system path, unless user saved to disk after extracting item). However, some document formats may need to be re-opened internally by specific API related to that file type that does not support stream arguments, so users SHOULD always set this argument to the valid file path if known/exists.
- settings ContentExtractionSettings
- ContentExtractionSettings settings object.
Return Value
ContentExtractorResultA ContentExtractorResult object for the document, that can be used to extract document content.
Example
This example shows the pattern that should be used with ContentExtractorFactory to get a specific interface to extract content for the specific document format type.
C#
using (var stream = File.OpenRead(filePath))
{
// Step 1: Identify document format:
var docIdResult = DocumentIdentifier.Identify(stream, filePath);
// Step 2: Extract content from document (uses 'docIdResult' from above line to get correct content extractor):
var docContentResult = ContentExtractorFactory.GetContentExtractor(stream, docIdResult, filePath, _contentConfig);
if (docContentResult.HasError)
{
LogMessage(string.Format("Error getting content extractor for file format ID {0}: {1}", docIdResult.ID, docContentResult.Error));
}
else
{
var extractorType = docContentResult.ContentExtractor.ContentExtractorType;
// Step 3: Convert base interface using above ContentExtractorType to a specific interface:
switch (extractorType)
{
case ContentExtractorType.Archive:
{
var archiveExtractor = (IArchiveExtractor) docContentResult.ContentExtractor;
// See help file "How To" section for how to use this interface
...
}
break;
case ContentExtractorType.Document:
{
var documentExtractor = (IDocumentContentExtractor)docContentResult.ContentExtractor;
// See help file "How To" section for how to use this interface
...
}
break;
case ContentExtractorType.Database:
{
var databaseExtractor = (IDatabaseExtractor)docContentResult.ContentExtractor;
// See help file "How To" section for how to use this interface
...
}
break;
case ContentExtractorType.MailStore:
{
var mailStoreExtractor = (IMailStoreExtractor)docContentResult.ContentExtractor;
// See help file "How To" section for how to use this interface
...
}
break;
case ContentExtractorType.DocumentStore:
{
var docStoreExtractor = (IDocumentStoreExtractor)docContentResult.ContentExtractor;
// See help file "How To" section for how to use this interface
...
}
break;
case ContentExtractorType.Unsupported:
{
var unsupportedExtractor = (IUnsupportedExtractor)docContentResult.ContentExtractor;
// See help file "How To" section for how to use this interface
...
}
break;
case ContentExtractorType.LargeUnsupported:
{
var largeBlobUnsupportedExtractor = (ILargeUnsupportedExtractor)docContentResult.ContentExtractor;
// See help file "How To" section for how to use this interface
...
}
break;
case ContentExtractorType.LargeEncodedText:
{
var largeEncodedTextExtractor = (ILargeEncodedTextExtractor)docContentResult.ContentExtractor;
// See help file "How To" section for how to use this interface
...
}
break;
}
}
}