Welcome to Open Discover® SDK for .NET 8
Open Discover SDK is a .NET application programming interface (API) that supports:
- Identifying file formats using internal binary signatures for reliable and very fast file format identification (over 1,600+ file formats supported for identification)
- Extracting text from supported file formats and optionally identifying the languages present in the extracted text
- Extracting metadata from supported file formats (over 4,500+ known metadata fields extracted in total)
- Extracting attachments/embedded items from supported document formats
- Extracting archive container items (7-zip, .zip, .rar, .tar, and many more)
- Testing archives and archive items for true expansion size before extraction. This feature can help in malicious archive detection (e.g., 'compression bombs' and archives with intentionally modified item headers).
- Extracting mail store container email objects (PST, OST, OST2013, MBOX, etc)
- Detecting and extracting sensitive item information from text and metadata such as social security numbers, credit card numbers, driver's license numbers, addresses, phone numbers, and much more.
- Detecting and extracting supported entity item type information present in text and metadata.
Open Discover SDK API is purposed for users to develop higher level document processing applications for:
- Full text search using SDK for text/metadata/attachment/entity extraction
- Machine learning/AI requiring format identification and quality extracted text and metadata
- Text analytics
- Information governance
- Website crawling/full-text website search
- Enterprise search and content management
- IT Departments - identify, classify, and deduplicate documents in file storage devices on-premise or in the cloud
- eDiscovery
- And more...
Important |
|---|
The .NET assemblies that make up Open Discover SDK are x64 release builds (not AnyCPU) due to x64 dependencies. Therefore, .NET applications that directly reference and use the SDK assemblies MUST also be x64 builds. |
See Also