Welcome
Welcome to Open Discover® SDK for .NET 10
Open Discover SDK is a .NET application programming interface (API) that supports:
- Identifying file formats using internal binary signatures for reliable and very fast file format identification (over 1,600+ file formats supported for identification)
- Extracting text from supported file formats and optionally identifying the languages present in the extracted text
- Extracting metadata from supported file formats (over 4,500+ known metadata fields extracted in total)
- Extracting attachments/embedded items from supported document formats
- Extracting archive container items (7-zip, .zip, .rar, .tar, and many more)
- Testing archives and archive items for true expansion size before extraction. This feature can help in malicious archive detection (e.g., 'compression bombs' and archives with intentionally modified item headers).
- Extracting mail store container email objects (PST, OST, OST2013, MBOX, etc)
- Detecting and extracting sensitive item information from text and metadata such as social security numbers, credit card numbers, driver's license numbers, addresses, phone numbers, and much more.
- Detecting and extracting supported entity item type information present in text and metadata.
Open Discover SDK API is purposed for users to develop higher level document processing applications for:
- Full text search using SDK for text/metadata/attachment/entity extraction
- Machine learning/AI requiring format identification and quality extracted text and metadata
- Text analytics
- Information governance
- Website crawling/full-text website search
- Enterprise search and content management
- IT Departments - identify, classify, and deduplicate documents in file storage devices on-premise or in the cloud
- eDiscovery
- And more...