Operation Data Rescue

Operation Data Rescue

Just the facts please

  • BV2000 screws up: 100GBs of NSFs incorrectly converted to PSTs
  • Thousands of embedded images in emails are lost
  • Yikes: opposing counsel finds the mistake, not the client
  • Logik takes over after BV2000 fails
  • Logik re-processed the 100GBs of NSFs natively in less than one week
  • Logik identified the emails with embedded images
  • Logik matched the other vendors data to our data in less than one week
  • 500,000 pages correctly re-produced in only two days
  • 20 AS400 backup tapes restored
  • 200GBs of extra Lotus NSFs processed, searched and delivered in only two weeks

Challenge:

When opposing counsel finds critical mistakes in a law firm’s class-action document production before that law firm does, that firm could be in a world of hurt. That is, if they don’t act quickly to right the wrong. Our law firm client wasn’t at direct fault for this mistake, but they were left to deal with the mess and make it right. The eDiscovery Big Vendor, (who will go unnamed for obvious reasons – let’s call them “BV2000”), who managed the class action case from the beginning, was doing just fine with the processing of standard email formats like Microsoft Outlook (.PST) databases. But when the client dumped a few hundred gigabytes of Lotus Notes databases (.NSF) on the vendor, things went downhill, fast. The vendor, the law firm and their customer had no clue what was about to happen.

The vendor followed their standard practice of converting the Lotus Notes email databases into MS Outlook PSTs. They did this because processing a PST file is a heck of a lot easier than processing Lotus NSFs,  and their eDiscovery processing software didn’t fully support “native” NSF processing. What the vendor didn’t realize is that during the conversion from NSF to PST, valuable, highly relevant and sometimes privileged embedded images native to the Lotus email were lost. This would be easy to miss in a document review as the reviewer would probably not know there were NSF emails to begin with, leaving nothing to compare to.

It took a rather clever reviewer on the opposing counsel side to find the error and sound the alarm. Here’s how they found the problem: Although the images embedded in the original NSF emails were lost during conversion, small text in the emails that referenced the embedded images (mostly database screenshots), were not lost during the conversion. So, it looked a little odd to the reviewer when a small portion of the content in an email referenced an image that didn’t exist in the body or as an attachment. This led to finding hundreds more emails with similar references, but no image or attachment. Yikes.

On top of this embarrassing revelation, opposing counsel decided to pour it on with a request for more data from 20 new custodians – approximately 20 AS400 backup tapes worth. And the data on the tapes? 200GB of Lotus NSFs, naturally.

With the alarm sounded and new data being requested, the law firm, extremely upset with the first vendor, needed to find a different solution – and a fast one at that. Although our relationship with this law firm client was new, they turned to Logik knowing we had deep experience with native Lotus Notes processing.

So, what’s the problem:


What we did:

There were a lot of moving parts in this project, so our first priority was to get a grasp on what happened and create the proper scope of work the client needed us to complete. We ended up tackling this complex project in three phases:

Phase 1: Play matchmaker
First and foremost, our client needed to find all affected documents with embedded images in them. In order for us to do this, we needed to process the NSFs natively and identify any email with an embedded image (that is not an attachment). These emails were then flagged for matching. Then it got difficult. To match our data with the previous vendor’s data meant our engineers needed to buckle down and write some serious new code. You would think you could use an MD5hash or something unique, but not with this data. To put it kindly, it was a [expletive] mess. It took us about a week to find all the matches, which ended up being a few thousand emails (more than opposing identified – a good thing we found that first).

Phase 2: Production, round two
Now that we had the documents matched 100%, it was time to produce the documents again, but this time with correct formatting. Since we process Lotus NSFs natively, we preserve the original format of the email, even embedded images and tables. This makes it relatively easy for us to convert the documents to TIFF and maintain the original look and feel of the email – something the previous Big Vendor (aka, BV2000) failed to do with the NSF to PST conversion. There was one small “gotcha”: the embedded images were HUGE screenshots, which meant that they would be cutoff the page when printed. So, we adjusted Gridlogik™ and forced the embedded images into the confines of the page layout. This ensured no cutoff would happen. The new images were produced along with a file cross referencing the previously produced documents two days after the matching was complete.

Phase 3: Wrap it all up
Getting the extra 200GBs of NSFs off the 20 archaic AS400 backup tapes could be a case study by itself, but for now, we’re focusing on the data from the tapes. This data was no different from the original 100GBs we re-processed as there were thousands of emails with embedded images in them. The entire 200GB of NSFs was processed, searched and delivered (native only) in under two weeks. We added potential priv tags with the privilege keywords hit to the delivered records, greatly speeding up the review.


The results:



This was one of the more involved and complex clean-up projects we have ever done, and it was a great experience. Our client couldn’t be happier (ask us for a direct reference to them) and we’re sure the opposing counsel was very disappointed in our clients unusually quick and accurate response.

More cases

Case Studies

Did you know?

  • That you can easily reduce the amount of information to review by doing a domain name analysis on your data (e.g. remove all @amazon.com )?

  • That copying 5GB of tiny files is much slower than copying 1 large 5GB file?

  • That “Size” and “Size on Disk” are two different measurements if you right-click properties file(s) or folder(s)?

  • That by reading through all of these “Did You Knows” qualifies you as an eDiscovery ninja?

  • That just because someone says they are Unicode compliant, doesn’t necessarily mean they can truly handle foreign language data?

  • That MS Excel documents can have charts layered on top of each other, hiding potentially relevant data?

  • That many of the off-the-shelf eDiscovery programs can not detect the encoding of documents and thus can not properly handle foreign language character sets?

  • That Adobe Photoshop files contain multiple layers of information, most of which are hidden from view and cannot be seen without the use of Photoshop?

  • That 7-zip compression software has a better compression ratio than WinRAR or WinZIP?

  • That many of the off-the-shelf eDiscovery programs can only extract a limited number of embedded files?

  • That a 100MB text file will print over 100,000 pages or more if printed?

  • That a PST file from Microsoft Outlook 2002 or earlier cannot exceed 2GB in size, otherwise it will be corrupted?

  • That USB 3.0 is coming in 2009 and is 10 times faster than the current USB 2.0?

  • That transporting your sensitive evidence in an unsafe container, like a cardboard box, is ok until that box is dropped on the floor or lands in a puddle?

  • That all Microsoft Office document formats can contain embedded files and that those files too can contain embedded files?