Japanese Data Tsunami

Japanese Data Tsunami

Just the facts please

  • Four terabytes of Japanese data
  • English and Japanese search terms
  • 14,000,000 pages for review
  • 8,000,000 pages produced to ITC
  • $500,000 in savings
  • < 2 months to complete
  • Happy client, happy customer

Challenge:

One of the world’s largest producers of wind turbines needed to collect, process, analyze, review and produce over four terabytes of “real” data (email and office files) to the ITC in a matter of months. What they had was a wave of data full of different encodings, email formats (Lotus, Outlook, EML, text-based), and Japanese proprietary document formats. They clearly needed help. Our client, one of the top three IP law firms on the planet, was tasked with managing this complex process from beginning to end. The data was collected in Japan and the US from over 100 people. Due to the volume of data, keywords in both English and Japanese (multiple encodings) were approved and needed to be applied to the large data set, post processing of course—a huge effort that needed help. Our client came to Logik to get the work done quickly and accurately.

So, what’s the problem?


What we did:

Great project management is needed for a project of this size and scope. The first thing we did was assemble a team to work directly with our law firm client and the upper-management from the customer to devise a realistic schedule. Normally, four terabytes of data trickles in as the data is collected over time – we were able to get all of the data delivered within a month’s time. The schedule we created allowed us to provide massive rolling deliveries of data (hundreds of thousands of documents), meaning the client was never without documents to review (always a good thing).

The results:


More cases

Case Studies

Did you know?

  • That Microsoft PPT files have hidden speaker notes that, if not extracted during processing, will not be searchable?

  • That converting documents to TIFF might actually save you more time and money depending on your case?

  • That transferring sensitive data via a device (like a hard drive) in a cardboard box (like a bankers box) is highly susceptible to promoting disk failure?

  • That Guidance EnCase images can be opened and mounted by other forensic software’s?

  • That a PST file from Microsoft Outlook 2002 or earlier cannot exceed 2GB in size, otherwise it will be corrupted?

  • That efficient and timely pre-trial eDiscovery is a huge strategic advantage in litigation?

  • That you should use the newest version of Winzip to compress your files, because Winzip will automatically preserve file-level dates/times?

  • That Lotus Notes (in comparison to Microsoft Outlook) emails usually contain a very high number of embedded images in the body text of the email, like desktop screen-shots?

  • That USB 3.0 is coming in 2009 and is 10 times faster than the current USB 2.0?

  • That running a front-end file-type filter using the visible document extension will likely miss many documents that match the criteria in content, but don’t have the correct extension (i.e. myxlsdoc.xlerd)?

  • That you can fit approximately 2.7 million single page TIFF images on a 200GB hard drive?  That’s a lot more than you can fit in a bankers box.

  • That MAPI = Messaging Application Programming Interface, and it allows access to email content and metadata?

  • That it would take a team of 1,000 attorneys 100 years to review a petabyte of information?

  • That Bloomberg email systems keeps attachments disconnected from the actual email and in a compressed .tar.gz file?

  • That just because someone says they are Unicode compliant, doesn’t necessarily mean they can truly handle foreign language data?