Japanese Data Tsunami

Japanese Data Tsunami

Just the facts please

  • Four terabytes of Japanese data
  • English and Japanese search terms
  • 14,000,000 pages for review
  • 8,000,000 pages produced to ITC
  • $500,000 in savings
  • < 2 months to complete
  • Happy client, happy customer

Challenge:

One of the world’s largest producers of wind turbines needed to collect, process, analyze, review and produce over four terabytes of “real” data (email and office files) to the ITC in a matter of months. What they had was a wave of data full of different encodings, email formats (Lotus, Outlook, EML, text-based), and Japanese proprietary document formats. They clearly needed help. Our client, one of the top three IP law firms on the planet, was tasked with managing this complex process from beginning to end. The data was collected in Japan and the US from over 100 people. Due to the volume of data, keywords in both English and Japanese (multiple encodings) were approved and needed to be applied to the large data set, post processing of course—a huge effort that needed help. Our client came to Logik to get the work done quickly and accurately.

So, what’s the problem?


What we did:

Great project management is needed for a project of this size and scope. The first thing we did was assemble a team to work directly with our law firm client and the upper-management from the customer to devise a realistic schedule. Normally, four terabytes of data trickles in as the data is collected over time – we were able to get all of the data delivered within a month’s time. The schedule we created allowed us to provide massive rolling deliveries of data (hundreds of thousands of documents), meaning the client was never without documents to review (always a good thing).

The results:


More cases

Case Studies

Did you know?

  • That Microsoft XLS documents will print thousands and thousands of blank pages if your software doesn’t detect and remove them?

  • That 1 gigabyte of information is actually 1,024 megabytes, not 1,000 megabytes?

  • That Lotus Notes has a soft delete option that activates when you open a NSF and it will automatically delete emails marked with soft delete?

  • That documents have multiple dates and usually the file system level dates (e.g. Last Accessed Date) are bad due to copy issues?

  • That a PST file from Microsoft Outlook 2002 or earlier cannot exceed 2GB in size, otherwise it will be corrupted?

  • That the “All Documents” view in Lotus Notes doesn’t always reveal ALL the documents, because it is a query and can be modified?

  • That Google Gmail emails can be downloaded to Microsoft Outlook using a POP3 or IMAP connection?

  • That you can fit approximately 2.7 million single page TIFF images on a 200GB hard drive?  That’s a lot more than you can fit in a bankers box.

  • That search terms generally miss over 50% of would-be relevant content according to TREC?

  • That Mozilla Thunderbird emails can be easily processed by most eDiscovery applications?

  • That transporting your sensitive evidence in an unsafe container, like a cardboard box, is ok until that box is dropped on the floor or lands in a puddle?

  • That when requesting another party’s metadata, timing is everything?

  • That Lotus Notes databases (.NSF) can contain non-email related content, like customer complaint forms and inventory records?

  • That Microsoft Outlook MSG files retain their attachments after processing, thus increasing the size of data you need to store on disk?

  • That your law firms’s litigation support department, if you have one, can add tremendous value (most likely) to your case if brought to the table at the beginning of discovery?