Beyond Searching for the Proverbial Needle in a Haystack


By the Price of Business Show, Hosted by Kevin Price.  The Price of Business is a media partner of this site. 

Sifting through a haystack to find a needle is a brute force search operation. You dig and you dig, and no one place in the haystack is more likely than any other to hold the needle that you seek.

But text search software can work differently in that it first builds a search index. A search index is more analogous to a treasure map. Once the index / treasure map is complete, dtSearch ( enterprise and developer products, for example, can literally search terabytes of data instantly to find anything you want–words, phrases, complex search requests, even credit cards.

Indexing is easy. For dtSearch to build an index, you only need point to the file folders, email repositories or other data sources that you want to index. There is no need to tell dtSearch what data formats you have. dtSearch will figure that out for itself, whether it is MS “Office” files, other “Office” files, PDFs, website data, databases or emails and nested attachments. For example, if you have an email with a ZIP attachment and inside the ZIP attachment is a PDF file and an MS Word file with a spreadsheet embedded, dtSearch will automatically recognize and fully parse that structure for indexing so you can find the needle wherever it may reside in the haystack.

With more than 25 different search features, dtSearch can be a lot more flexible in searching than a brute force effort to sift through a haystack. dtSearch can search for the word needle only if it appears in the phrase needle in a haystack. Or dtSearch can search for needle not w/27 words of knitting. Or dtSearch can search for needle with stemming activated and find needled and needling. With fuzzy searching, dtSearch can find needle misspelled as neadle. After a search, dtSearch can display the words needle and haystack inside a full view of a document, email, etc – even showing needle with a yellow highlight color and haystack with a green highlight color so you can quickly distinguish the needle from the haystack in browsing.

While combing through a haystack might take a while, in theory given a long enough time everyone can find the same needle. In contrast, dtSearch’s enterprise and developer customers typically demand granular security access. (For example, 4 out of 5 of the Fortune 500’s largest Aerospace and Defense companies are dtSearch customers. And dtSearch is used across federal, state and international government agencies.)

For that, the dtSearch Engine, for example, has multiple different options for data classification.  That way, different users can all search the same data collection with search results tailored specifically to each individual user’s data access rights. So multiple people can enter a single query like needle in a haystack within 55 words of companyXYZ. But instead of everyone seeing all of the same hits, the dtSearch Engine can filter search results so that each user sees only documents, emails, records and the like that they are authorized to view. That way, someone from the HR department searching for a candidate will retrieve a different data set from someone in the Legal department working on an e-discovery request even if they both enter identical search requests.

About Author

Leave a Reply

Your email address will not be published. Required fields are marked *

Follow by Email