Data Tsunami

Data Tsunami

Just the facts please

  • Four terabytes of Japanese data
  • English and Japanese search terms
  • 14,000,000 pages for review
  • 8,000,000 pages produced to ITC
  • $500,000 in savings
  • < 2 months to complete
  • Happy client, happy customer

Challenge:

One of the world’s largest producers of wind turbines needed to collect, process, analyze, review and produce over four terabytes of “real” data (email and office files) to the ITC in a matter of months. What they had was a windfall of data full of different encodings, email formats (Lotus, Outlook, EML, text-based), and Japanese proprietary document formats. They clearly needed help. Our client, one of the top three IP law firms on the planet, was tasked with managing this complex process from beginning to end. The data was collected in Japan and the US from over 100 people. Due to the volume of data, keywords in both English and Japanese (multiple encodings) were approved and needed to be applied to the large data set, post processing of course—a huge effort that needed help. Our client came to Logik to get the work done quickly and accurately.

So, what’s the problem?


What we did:

Great project management is needed for a project of this size and scope. The first thing we did was assemble a team to work directly with our law firm client and the upper-management from the customer to devise a realistic schedule. Normally, four terabytes of data trickles in as the data is collected over time – we were able to get all of the data delivered within a month’s time. The schedule we created allowed us to provide massive rolling deliveries of data (hundreds of thousands of documents), meaning the client was never without documents to review (always a good thing).

The results:


More cases

Case Studies

Did you know?

  • That attorneys can be sanctioned for improperly handling eDiscovery processing?  Search for Bray & Gillespie.

  • That Microsoft Outlook doesn’t actually compress data, so how can it possibly expand after processing?

  • That right-clicking on a file in Windows will alter the Last Accessed Date?

  • That many of the off-the-shelf eDiscovery programs can not detect the encoding of documents and thus can not properly handle foreign language character sets?

  • That MS Excel 2007 supports over 1 million rows of data?

  • That Lotus Notes databases (.NSF) can contain non-email related content, like customer complaint forms and inventory records?

  • That most near-dupe technologies can not group foreign language documents together?

  • That hard drives can deteriorate in a few years if not used, because the disks need to spin?

  • That when requesting another party’s metadata, timing is everything?

  • That you can use a mapped drive letter (e.g. X:\) to gain access to a Windows file that has accidentally gone over the 256 character limit?

  • That Microsoft XLS documents will print thousands and thousands of blank pages if your software doesn’t detect and remove them?

  • That Japanese documents can come in 1 of 3 different character sets?

  • That USB 3.0 is coming in 2009 and is 10 times faster than the current USB 2.0?

  • That collecting images from virtual machines can be much faster and easier than collecting an image from a non-virtual machine with the use of virtual machines snapshot features?

  • That there is no realistic way to redact native files without first converting the file to an image?