Maximum Page Count
Just the facts please
- 270GBs of NSFs and eDocs
- 400 search terms
- 300k docs delivered natively
- 16,000,000 estimated pages (@ 53 pages/doc)
- 140,000 documents needed to produce
- 11,100,000 pages TIFFd (@ 79 pages/doc)
- Less than 3 weeks to process 270GB
- Less than 1 week to TIFF 11.1 million pages
- Less than 1 week to endorse/deliver 11.1 million pages
- = 1.4 miles of pages
- = 2.2m sticks of butter
Challenge:
It’s amazing how such a relatively small amount of documents can explode into 1.4 miles of pages (we’ll explain in a moment). That’s the challenge our client, a worldwide document hosting company, was faced with recently. What’s even more amazing is that deadlines stay the same, regardless of how many pages your eDiscovery project generates. It’s probably not very fair or logical, but we don’t make the rules, so take it up with Uncle Sam.
After we completed a quick turnaround (less than 3 weeks) to natively process 270GBs of Lotus Notes (.NSF) email and Microsoft Office documents (eDocs) for our hosting partner, the request to start productions came right after that. De-duplication and running search terms on the emails dramatically reduced the data to a mere 300,000 documents for review – a fairly small amount of documents considering the original 270GB volume of the data. But the real telling statistic was the number of estimated pages Gridlogik™ collected. With more than 16,000,000 estimated pages and only 300,000 documents (an average of 53 pages per document), the page counts were off-the-charts. This count was no where near “industry averages.”
So what’s an average amount? A common question we hear is “How many pages are in a document?” quickly followed by “Ok, then how many pages in a gigabyte?” The answers are simple and surprisingly boring: “Between 3 to 10 pages per document,” and “It depends.” Using these averages, the 300,000 documents would produce anywhere between 900,000 and 3,000,000 pages. Yeah… wow. Lucky for us, we have Gridlogik, which extracts the native page count of every document before printing. This is extremely helpful for our clients’ discovery process, where knowing page counts ahead of time truly matters. Ultimately, deadlines rarely change even if the case explodes with pages.
Our client received some “tough love” from the court, needing to produce relevant documents in less than one month. Our client, well aware of the 16,000,000 pages, turned to us for a solution. Smart client.
So, what’s the problem:
- Must natively process, cull and deliver 270GBs of NSFs and eDocs to hosting review platform
- We found a small portion (10%) of the documents identified as Chinese, not a problem
- Estimated page counts put the 300,000 docs over 16 million pages (@ 53 pages/doc), way, way off the “industry average”
- 300,000 documents was reduced to 140,000 after review, but the page count was estimated at over 11 million
- The client had less than one month to complete production, and required all 140,000 documents to be TIFFd, endorsed and delivered on hard drives
What we did:
We knocked it out of the ballpark.
Since we were the processing provider on this project, getting up-to-speed on the production requirements was easy. We had all the native documents, metadata and tag lists ready to go. However, TIFFing and endorsing the 140,000 documents in the required tight time-frame meant that we needed a super-software / herculean effort. Oh, we have that solved, it’s called Gridlogik.
Gridlogik was originally designed to be a TIFFing powerhouse, using all native applications to print complex and large file types like spreadsheets. The problem with this data, however, was that it was mostly text-based source-code files that were responsive and needed to be TIFFd. Text files are usually small in file size, but can generate thousands of pages. TIFFing a large text document meant the print driver needed to be extremely fast. Ok, so we thought on our feet and solved the hurdle: we modified one of our higher-speed print drivers to support streaming print spools. This enabled us to TIFF a 5,000 page text file in under 3 minutes. To put that into perspective, using a queue-based print driver to print a 5,000 page document might take 3 hours or more to complete.
Gridlogik did it in 3 minutes. That’s 1,666 pages per minute. Lightning fast. Insane fast. With this added modification we were able to convert all 140,000 documents into 11,100,000 TIFFs in about a week. It took another week to complete the multi-level endorsements and deliver the data on 10 hard drives with 2 backup copies for our client. We really love what we do.
The results:
- 270GBs of NSFs and eDocs natively processed, searched and delivered in under 3 weeks
- De-duplication and searching culled the data to only 300,000 documents, but with estimated page counts of 16,000,000
- 140,000 documents TIFFd to over 11,100,000 pages, completed within a week
- 11,100,000 pages endorsed and delivered on 10 hard drives with 2 backup copies for the client
- Our client met the tight one-month production deadline with an extremely valuable week to spare
So, how did we get 1.4 miles of pages? Here’s the math:
(((11,100,000 / 3,000) x 2) / 5,280)
- 11,100,000 = total pages produced
- 3,000 = amount of pages that fit into a 2 foot bankers box
- 3,666 = total number of bankers boxes 11.1m pages can fit into
- 2 = size in feet of 1 bankers box
- 7,332 = size in feet the filled bankers boxes would reach
- 5,280 = the length, in feet, of 1 mile
And while we’re on a roll, here’s other random equivalents:
- 1,500 = amount of bankers boxes you can fit into a shipping container
- 4,500,000 = total pages 1 shipping container can hold
- 2.5 = total shipping containers needed for 11.1m pages
- 2,200,000 = total sticks of butter you can fit into 3,666 bankers boxes
- Seriously, we could go on and on