Maximum Page Count

Maximum Page Count

Just the facts please

  • 270GBs of NSFs and eDocs
  • 400 search terms
  • 300k docs delivered natively
  • 16,000,000 estimated pages (@ 53 pages/doc)
  • 140,000 documents needed to produce
  • 11,100,000 pages TIFFd (@ 79 pages/doc)
  • Less than 3 weeks to process 270GB
  • Less than 1 week to TIFF 11.1 million pages
  • Less than 1 week to endorse/deliver 11.1 million pages
  • = 1.4 miles of pages
  • = 2.2m sticks of butter

Challenge:

It’s amazing how such a relatively small amount of documents can explode into 1.4 miles of pages (we’ll explain in a moment). That’s the challenge our client, a worldwide document hosting company, was faced with recently. What’s even more amazing is that deadlines stay the same, regardless of how many pages your eDiscovery project generates. It’s probably not very fair or logical, but we don’t make the rules, so take it up with Uncle Sam.

After we completed a quick turnaround (less than 3 weeks) to natively process 270GBs of Lotus Notes (.NSF) email and Microsoft Office documents (eDocs) for our hosting partner, the request to start productions came right after that. De-duplication and running search terms on the emails dramatically reduced the data to a mere 300,000 documents for review – a fairly small amount of documents considering the original 270GB volume of the data. But the real telling statistic was the number of estimated pages Gridlogik™ collected. With more than 16,000,000 estimated pages and only 300,000 documents (an average of 53 pages per document), the page counts were off-the-charts. This count was no where near “industry averages.”

So what’s an average amount? A common question we hear is “How many pages are in a document?” quickly followed by “Ok, then how many pages in a gigabyte?” The answers are simple and surprisingly boring: “Between 3 to 10 pages per document,” and “It depends.” Using these averages, the 300,000 documents would produce anywhere between 900,000 and 3,000,000 pages. Yeah… wow. Lucky for us, we have Gridlogik, which extracts the native page count of every document before printing. This is extremely helpful for our clients’ discovery process, where knowing page counts ahead of time truly matters. Ultimately, deadlines rarely change even if the case explodes with pages.

Our client received some “tough love” from the court, needing to produce relevant documents in less than one month. Our client, well aware of the 16,000,000 pages, turned to us for a solution. Smart client.

So, what’s the problem:


What we did:

We knocked it out of the ballpark.

Since we were the processing provider on this project, getting up-to-speed on the production requirements was easy. We had all the native documents, metadata and tag lists ready to go. However, TIFFing and endorsing the 140,000 documents in the required tight time-frame meant that we needed a super-software / herculean effort. Oh, we have that solved, it’s called Gridlogik.

Gridlogik was originally designed to be a TIFFing powerhouse, using all native applications to print complex and large file types like spreadsheets. The problem with this data, however, was that it was mostly text-based source-code files that were responsive and needed to be TIFFd. Text files are usually small in file size, but can generate thousands of pages. TIFFing a large text document meant the print driver needed to be extremely fast. Ok, so we thought on our feet and solved the hurdle: we modified one of our higher-speed print drivers to support streaming print spools. This enabled us to TIFF a 5,000 page text file in under 3 minutes. To put that into perspective, using a queue-based print driver to print a 5,000 page document might take 3 hours or more to complete.

Gridlogik did it in 3 minutes. That’s 1,666 pages per minute. Lightning fast. Insane fast. With this added modification we were able to convert all 140,000 documents into 11,100,000 TIFFs in about a week. It took another week to complete the multi-level endorsements and deliver the data on 10 hard drives with 2 backup copies for our client. We really love what we do.

The results:


So, how did we get 1.4 miles of pages? Here’s the math:

(((11,100,000 / 3,000) x 2) / 5,280)

And while we’re on a roll, here’s other random equivalents:

More cases

Case Studies

Did you know?

  • That when requesting another party’s metadata, timing is everything?

  • That early case assessment (ECA) is a buzzword that means a myriad of different things depending upon who you are talking to?

  • That the “All Documents” view in Lotus Notes doesn’t always reveal ALL the documents, because it is a query and can be modified?

  • That Apple Macintosh files usually don’t have file extensions?

  • That documents have multiple dates and usually the file system level dates (e.g. Last Accessed Date) are bad due to copy issues?

  • That it would take a team of 1,000 attorneys 100 years to review a petabyte of information?

  • That Microsoft Outlook PST files can contain foreign language characters even if the PST file isn’t Unicode?

  • That Bloomberg email systems can also contain instant messages and that all of the data is in simple text format?

  • That 7-zip compression software has a better compression ratio than WinRAR or WinZIP?

  • That Adobe Photoshop files contain multiple layers of information, most of which are hidden from view and cannot be seen without the use of Photoshop?

  • That hard drives can deteriorate in a few years if not used, because the disks need to spin?

  • That a PST file from Microsoft Outlook 2002 or earlier cannot exceed 2GB in size, otherwise it will be corrupted?

  • That iPods, iPhones, and Blackberry devices can contain discoverable information?

  • That the European Union’s Directive on Data Protection mandates that any non-EU recipient of EU-based personal data must provide the required levels of privacy protection? Logik is Safe Harbor Certified.

  • That Microsoft Outlook PST files can be password protected?