Instantly Search Terabytes—Holiday Edition
By Elizabeth Thede, Special for The Daily Blaze
Everyone needs extra time during the busy holiday season. Say you need to find a nugget of data from two years ago to wrap up an account before the holidays. Or say you are trying to remember what you bought your brother last year. Instead of individually going through each file, email and email attachment that may hold the answer, text retrieval software lets you enter a search request and instantly locate any references anywhere to what you are looking for.
One such text retrieval software is dtSearch. The company has enterprise and developer products that run “on premises” or on cloud platforms to instantly search terabytes of “Office” files, PDFs, emails along with nested attachments, databases and online data. Because dtSearch instantly search terabytes, many dtSearch customers are Fortune 100 companies and federal, state and international government agencies. But anyone with data to search can go dtSearch.com and download a fully-functional 30-day evaluation copy.
To instantly search terabytes, the dtSearch software first builds an index cataloguing each unique word in a data set, along with that word’s location. Indexing is simple. Just point to the file directories on your own machine or on a shared server that dtSearch can access, and dtSearch will do the rest. dtSearch figures out for itself what types of files and other data you have. You could have an email with a ZIP or RAR attachment and inside that an MS Word file with an embedded MS Access database, and dtSearch will automatically recognize and index all of that.
Each index can hold up to a terabyte, and dtSearch can build and simultaneously search as many indexes as you want. For different end-users needing to search across the same data repository, dtSearch supports instant concurrent searching across terabytes on a network or in a web-based repository. After a search, each end-user can browse the full text of retrieved files, emails and the like with highlighted hits for easy navigation.
dtSearch has over 25 different search options. The most basic is natural language searching: just enter some words and dtSearch will rank retrieved data by hit density and rarity. Suppose you search for: holiday, coffee, candy. If coffee is just in a few documents or emails, but holiday and candy are everywhere, coffee mentions, particularly files with denser coffee mentions, would receive a higher ranking.
For more structured searching, dtSearch has Boolean search options. A Boolean “or” / “any words” search works along the lines of natural language searching. dtSearch further supports Boolean “and” / “all words” search requests as well as Boolean “not” search requests. That way, you could search for only files that contain the words holiday and candy but that specifically do not mention coffee.
You can also enter a phrase like “happy holidays” in any Boolean search request by using quotation marks to tell dtSearch you are looking for the phrase as a whole. The stemming feature finds word variants, such as “happiest holiday” in a search for “happy holidays.” Or do a wildcard search for coffee with a wildcard at the end to also find coffeecake.
Proximity searching finds a word or phrase that occurs within X words of another word or phrase. The first word or phrase can be within X words in either direction from the second word or phrase, or X words just before the second word or phrase depending on what you pick. And you can combine all these search options, such as looking for “happy holidays” within 19 words of “July 4” with no mention of “Labor Day.”
Fuzzy search—adjustable from 0 to 10 for various degrees of fuzziness—locates words even if they are misspelled such as in an email or an OCR’ed PDF. For example, if holidays is misspelled holiways, dtSearch would find that with a low level of fuzziness. Concept searching lets you look for holiday as a concept and find synonyms. Relevant to holiday shopping, dtSearch can also locate any credit card numbers in data, and show you where they are.
Advanced users can fine-tune term hit weighting by assigning positive or negative weighting and optionally adjusting for whether the hit is in the full-text of data or in specific metadata or near the top of a file. dtSearch also supports regular expression searching and hash value searching. For developers, the dtSearch Engine adds options for faceted or category drill-down searching through database metadata and the like as well as data classification options to granularly filter the search results each end-user sees.
To try text retrieval software, go to dtSearch.com and download a fully-functional 30-day evaluation version. Then instantly search your own data for something holiday-related or otherwise.
RELATED: Kevin Price of the Price of Business show discusses the topic with Thede on a recent interview.