Logik is such a TREC-ee Posted By Andy Wilson on July 30, 2009
Not the Star Trek kind of Trekky, well, maybe considering the high likelihood most participants (all tech companies) have seen all 7 movies. No offense, live long and prosper TREC participants. This TREC is more about going where no text retrieval algorithm has gone before, and less about finding new planets, although you could make an argument for it…but I digress. TREC stands for the “Text REtrieval Conference” and is co-sponsored by the National Institute of Standards and Technology (NIST) and U.S. Department of Defense. We are very proud to be a participant in their 2009 TREC study. So, what does that mean exactly?
![]()
TREC gave us a set of very LARGE data, The Enron emails, a subpoena, and said figure it out. Easy enough right? Maybe for The Enterprise crew this is easy, but for most eDiscovery companies, including us, this is a major challenge and one that has significant meaning. Our job will be to use what we know about search within discovery and find all the relevant emails and attachments that relate to the subpoena. This requires more than just your standard set of boolean keyword searches. We will need to use more powerful text retrieval algorithms to find the needles in the haystack.
None of the participants are allowed to post their results, even if they find every single document relevant to the subpoena. Although it’s probably every marketers dream to post the results (assuming they are good), TREC is smart to not allow it. Each participant is required to publish their results, the tools they used, etc. to TREC by September 7th, 2009. So, the clock is ticking. Hopefully, more advanced and accurate methods for text retrieval will come out of this process. If only the good people at NIST offered up a Netflix-like $1m prize (http://logiik.com/L)...sigh. Wish us luck.
Here is a brief intro into TREC taken from their website: http://trec.nist.gov/
The Text REtrieval Conference (TREC), co-sponsored by the National Institute of Standards and Technology (NIST) and U.S. Department of Defense, was started in 1992 as part of the TIPSTER Text program. Its purpose was to support research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies. In particular, the TREC workshop series has the following goals:
- to encourage research in information retrieval based on large test collections;
- to increase communication among industry, academia, and government by creating an open forum for the exchange of research ideas;
- to speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements in retrieval methodologies on real-world problems; and
- to increase the availability of appropriate evaluation techniques for use by industry and academia, including development of new evaluation techniques more applicable to current systems.
TREC is overseen by a program committee consisting of representatives from government, industry, and academia. For each TREC, NIST provides a test set of documents and questions. Participants run their own retrieval systems on the data, and return to NIST a list of the retrieved top-ranked documents. NIST pools the individual results, judges the retrieved documents for correctness, and evaluates the results. The TREC cycle ends with a workshop that is a forum for participants to share their experiences.
This evaluation effort has grown in both the number of participating systems and the number of tasks each year. Ninety-three groups representing 22 countries participated in TREC 2003. The TREC test collections and evaluation software are available to the retrieval research community at large, so organizations can evaluate their own retrieval systems at any time. TREC has successfully met its dual goals of improving the state-of-the-art in information retrieval and of facilitating technology transfer. Retrieval system effectiveness approximately doubled in the first six years of TREC.
TREC has also sponsored the first large-scale evaluations of the retrieval of non-English (Spanish and Chinese) documents, retrieval of recordings of speech, and retrieval across multiple languages. TREC has also introduced evaluations for open-domain question answering and content-based retrieval of digital video. The TREC test collections are large enough so that they realistically model operational settings. Most of today’s commercial search engines include technology first developed in TREC.
Post A Comment