Schedule a Consult
Our eDiscovery experts are available around-the-clock to discuss your project or organization’s needs. Use this form to request a consultation, including the best availability to schedule a call.
Legal eDiscovery
Consistent, well managed eDiscovery for Legal
Leading law firms use our eDiscovery processing & hosting services to successfully complete their projects.
Get access to the best technology, people and service and get covered.
Stay Informed on eDiscovery
- Connect with Logik on LinkedIn
- Read up on our blog
- Follow DiscoveryBrain.com, our legal technology community & news website.
eDiscovery Services from Logik
Life is too short for bad eDiscovery. You need a team that stands behind you through thick and thin. A team that knows not just how to make you look good, but look GREAT. We bring our people and technology together with your legal expertise to form the ultimate machine eDiscovery machine.
You get fully-dedicated project managers with around-the-clock availability, flexible, competitive pricing and superior technical expertise. To exceed your expectations, our dedicated team maintains the same sharp focus on consistency and quality on everything from load files to reports throughout your project.
Ready to get started? Speak with an eDiscovery expert now.
Featured Benefits
Consistent, High Quality eDiscovery. Fewer Interruptions.
- World class service
Service without the complicated procedures or processes. We roll out the red carpet for everyone! Veteran project managers are available via phone, email or our project management software, around the clock. - eDiscovery expertise and technology
Access our combined 100+ years of eDiscovery knowledge to solve your most challenging projects. Have our engineers mold our technology around your specific projects needs. Have a uncommon file format that needs processing? No problem! - Highly secure and highly available hosting
Your data is safe and sound and always available. Our datacenter is SAS 70 certified and bullet proof. Our hosting servers are state of the art and always on with industry leading up-time. - Flexible pricing
Per-gigabyte, per-document, per-page, per-custodian, or per-project. We get it and we’re flexible enough to put the right pricing forward that makes sense for your project. No rigid pricing models here and no epic-long invoices either. The simpler the better. Nickel and diming isn’t our thing.
Typical Engagements
- Master eDiscovery Service Agreements
Get all the eDiscovery services and prices ironed out before the next matter hits with pre-negotiated master service agreements. Even though the next eDiscovery matter is unpredictable your eDiscovery partner doesn’t have to be. - Matter-based Service Agreements Engage us on a per-project or per-matter basis and get the same high quality service and product that other leading law firms are having success with. We’ll create a detailed and unique proposal to cover all your project needs. Contact us today for more details.
Success Stories
Data Tsunami Assisting one of the world's largest producers of wind turbines to collect, process, analyze, review and produce over four terabytes of data; deliverable to the ITC.
Read more
Beer for eDiscovery Beer and eDiscovery go together like hops and barley. Assisting large beverage manufacturers with merger and second requests.
Read more
Supporting the Big Bots Knock, knock. Who's there? One of top 3 accounting firms in the world who’s desperately in need of eDiscovery assistance. Uh, you're joking, right? No, seriously, we need your help.
Read more
Maximum Page Count Helping a worldwide document hosting company manage 1.4 miles of pages.
Read more
Enterprise eDiscovery
A better way to manage eDiscovery
Leading modern enterprises use our customized eDiscovery plans to significantly minimize risk & cost.
Get access to the best technology, people & service and get covered.
Stay Informed on eDiscovery
- Connect with Logik on LinkedIn
- Read up on our blog
- Follow DiscoveryBrain.com, our legal technology community & news website.
eDiscovery Services from Logik
The legal ecosystem is becoming overwhelming complex and risky. Technology and the increase of electronically stored information (ESI) compound these risks. Gone are the days when all of your enterprise information was safely behind your firewall. Potentially discoverable information exists everywhere now. Social networks, cloud storage providers, mobile phones, tablet computers, and virtual machines are just a few places discoverable data can exist now.
This creates a massive technical challenge when litigation or discovery happens. Yet many enterprises are still relying on their lawyers to take on this technical complexity. This is not what they went to school for.
Logik provides a better approach to help you manage the technical and legal challenges of eDiscovery effectively and consistently. We created technology and processes specifically for these eDiscovery challenges. This is what we went to school for. We’ve been doing eDiscovery and only eDiscovery since 2004.
Get Logik working for you. Speak with an eDiscovery expert now and see massive gains in your eDiscovery efficiency.
Featured Benefits
Consistent, High Quality eDiscovery. Fewer Interruptions.
- Partner with a leader
Prepare for eDiscovery by establishing a relationship with a proven provider before the need arises. - Predictability, across the board
You get predictable service, product, and pricing without any unexpected surprises. - Use the best tools
We engineer our own tools to a higher standard that outperforms other commercial packages. At your level of risk, there is no room for error. - Tools your internal and external stakeholders know Streamline the path of internal document to outside counsel for review, no new trainings.
Reduce the Costs of Responding to Discovery
- Make business sense, be the hero
Let us take care of the capital expenditures for eDiscovery, so you can focus on building your strategy and not your IT infrastructure. - The power to reduce and reuse
Advanced data culling techniques can reduce data volume by 80% or more BEFORE it gets to the attorneys for review. Take it a step further and use Logik as your eDiscovery document repository, so you never have to pay for processing the same document ever again. - Focus on your core business
Reduce the number of teams, tools and skills you need to support your complex legal process. We’ll take care of the rapidly changing technology needed to make your eDiscovery projects successful.
Acquire Domain Expertise Right Out of the Box
- Veteran players
Our project managers have worked on some of the largest and most complex cases imaginable down to the small and straight forward. We've seen it all, so you don't have to. - Stay ahead of the eDiscovery curve
Leverage our eDiscovery intelligence anytime. We live and breath this stuff and we’re always watching out for the next data source or document format that our clients should look out for. - Get access to software that outperforms the rest
GridLogik, our eDiscovery processing engine, is the result of years of eDiscovery research and development to help our clients increase the accuracy of their data.
Engage Logik
- Enterprise eDiscovery Service Agreements
Create the plan that’s right for you and get a 100% predictable eDiscovery budget. No more unexpected legal budget surprises. Service, product, and price. It’s all controlled by you. - Matter-based Service Agreements
Get all the eDiscovery services and prices ironed out before the next matter hits with pre-negotiated service agreements. Even though the next eDiscovery matter is unpredictable your eDiscovery partner doesn’t have to be.
Success Stories
Data Tsunami Assisting one of the world's largest producers of wind turbines to collect, process, analyze, review and produce over four terabytes of data; deliverable to the ITC.
Read more
Beer for eDiscovery Beer and eDiscovery go together like hops and barley. Assisting large beverage manufacturers with merger and second requests.
Read more
Supporting the Big Bots Knock, knock. Who's there? One of top 3 accounting firms in the world who’s desperately in need of eDiscovery assistance. Uh, you're joking, right? No, seriously, we need your help.
Read more
Maximum Page Count Helping a worldwide document hosting company manage 1.4 miles of pages.
Read more
How To Be An Amazing eDiscovery Team Member
You’re sitting at your desk all day. Work is piling up. Pressure is piling up. And yet, somehow, you’re supposed to be a great eDiscovery team member. What do you do?
Take Initiative
For starters, even if your direct supervisor isn’t in the office or lives across the country, you never know who you could impress. When you’re going through the eDiscovery process, look for value. Take charge of your role and own it. You might feel like you’re the unimportant litigation support on the bottom of the totem pole, but if you show that your work is amazing and your opinions are supported, you won’t stay that way for long.
Create Efficient Workflow
The nonstop work of a law firm can sometimes clash with vendors, who don’t always have to work the long hours that lawyers put in each day. To prevent a mad rush in the end, ask good questions at the start of the eDiscovery process and create an efficient work strategy that will help everyone coordinate their role effectively.
Communicate Timelines
When work is crazy, you aren’t always thinking about getting data out the door to the vendor to process it at the beginning of the week. Sometimes, Friday at 5 p.m. rolls around, and you have just enough time to send it to the vendor before their workday ends. However, this can pose a problem for the vendor, who might not be ready for it at the end of the work week.
The best way to avoid this problem is to communicate your timeline with the vendor. Shoot them a quick email telling them what’s going on and when to be ready for the data. Even though they’re at the end of the command chain, they’re the ones that need to turn it around the quickest for you so you can still meet your deadline. By creating a project road map, you’ll allow them to prepare for what’s coming, where it’s coming from and when to expect it. This enables them to do a better job on your data, giving you a better and more useful product in the end.
Form Good Partnerships
Establishing peer trust and partnership with a vendor is a great way to encourage good work all around. Feeling good about who you work with translates to better work and happier people. It’s kind of like karma—if you put out good vibes and maintain a good working relationship with everyone, then good things will happen.
The next time you feel like you’re drowning in papers and arguments, or that nothing you do makes a difference, revisit these concepts and get inspired.
Originally published on Discovery Brain.
Three Resources A Discovery Attorney Can’t Live Without
The smartest move a busy discovery attorney can make is to use available resources wisely. But given the amount of resources available, figuring out which ones to focus on can be difficult. So, for clarity’s sake:
1. IT support
The world of discovery is now predominantly comprised of eDiscovery. Time is money, and with review attorneys and court-mandated deadlines lined up, it’s important that the process goes smoothly. The various databases, networks and applications required to manage the discovery process will require fast help if something doesn’t hum the way it’s supposed to. Having good IT support comes in handy when that happens, allowing you to meet your deadlines in spite of any technical problems that may pop up.
2. Litigation support staff
Consider this: today’s discovery data sets often involve millions of pages of content. If you have a week left on a deadline and 2 million pages to cover, how many sets of eyes should that be split out to? Make sure your lit support staff is the right size and is ready for the oncoming work.
3. Smartphone
You can’t (or shouldn’t) always live in your office. The space at work is really too small in most cases for an air mattress, and personal living space is a good thing. But within the context of litigation, important developments don’t always occur in the 7AM to 9PM window. That’s where your smartphone comes in handy: you can go home at the end of the day knowing that if a crucially important development happens, there’s an easy way to reach you via several different communication methods, all located in one handy device.
These resources may seem deceptively simple. But when they’re used in a smart, intentional way, they can propel a team forward to do the best work possible, making every discovery attorney’s life easier and more productive.
Originally published on Discovery Brain.
Tape Backup is Dead! Long Live Backup-to-Disk!
Disk backups are a lot more convenient than tape, since restoring all kinds of data can be completed much faster. What’s more, all of the latest deduplication technologies can be used with backup-to-disk, so you’re not storing 5, 10, 20, 100 copies of the same data in your backup rotation. Some may say that disk is clearly the way of the future.
If only that were true. There is one area where tape is still quite competitive with disk, and that’s in truly long-term storage—as in, ten years’ worth of long term. Information retention policies can be quite conservative these days, so needing to keep ten-year-old data is not uncommon. And tape does have some advantages over disk:
- It doesn’t need power in order to keep it around.
- IIt doesn’t need hardware upgrades every 3 to 5 years.
- IThe price per gigabyte is a lot lower than disk.
- IIt’s shelf stable over decade time-scales.
- IIt allows certain specialized use cases, such as write-once media in very large sizes.
- IIf you have to keep data around for ten years, tape looks really attractive.
That said, the way tape is used in backup rotations has changed in recent years. More and more the daily backup duties are done by disk systems, and are then cascaded to tape for long-term archive. The modern tape drives are fast enough that the only way to keep them running at full speed is by spooling multiple, parallel backup sets out of the backup-to-disk system. Restores, when they happen, are similarly fast.
How does this impact discovery, though?
In olden days, discovery could produce actual tapes full of data to be sorted through. Frequently, these were tapes copied directly from the backup rotation. More recently, these were tapes containing the already-extracted contents of a discovery request, but put on tape for ease of transport. These days, such things get shipped on actual hard drives, if not transferred over the Internet directly.
The modern backup systems with large disk systems backed by huge deduplication logic can’t produce an equivalent to the backup tape of yore. The effort of creating such is very close to the effort needed to just extract the key bits of data being discovered, and if you’re doing that, you may as well copy it to a 2TB disk and overnight it (tape would be used for spite in this case). Even if a tape is created, it’ll be on a newer tape format where extraction should be relatively straightforward.
Is eDiscovery done with tape? Not really, but it will be a lot less common. When working with discoverable data going back ten years, lazy entities may just pull the tape out of archive and pass it over rather than go through the effort of pulling out just the right information. Or worse, they produce the tape because they themselves can’t use the tape and hope the discovery process can make sense of it.
But while we definitely will see a lot less of it from here on out, we’re not done with tape yet.
Originally published on Discovery Brain.
eDiscovery
Should Computer Forensics Experts be Licensed Private Investigators?
State jurisdictions are still split on the question, but Virginia says No.
States such as South Carolina contend that the computer forensics industry attracts a flow of contenders eager to cash in on lucrative disputes; contenders that may very well be expert in pulling information from a computer, and yet might be ignorant of the ethics and standards involved in handling personal information. These jurisdictions hold that computer forensics experts should be held to the same standards (including licensing and insurance coverage) as any other investigation agent or agency. Whether the motivation here is truly privacy oriented or fiscal, in effect this adds a number of hoops to jump through. The wide majority of states (including the District of Columbia) require a state-administered license in order to become a private investigator. While the requirements for this license aren’t particularly difficult, they usually include at least a certificate in criminal justice, hours of training, and a state administered exam. This state vetting process creates certain legal exemptions, easing access to personal information.
Be that as it may, computer forensics practices rely more upon expertise in Information Systems than anything-gumshoe. In the context of litigation, computer forensics can be held to professional standards in more ways than one… and this is the position recently taken by the state of Virginia.
Virginia recently signed into law HB 2271 (Computer and digital forensic services; exempt from regulation as private security service business). As introduced, this law:
Exempts from regulation as a private security service business any individual engaged in (i) computer or digital forensic services or in the acquisition, review, or analysis of digital or computer-based information, whether for purposes of obtaining or furnishing information for evidentiary or other purposes or for providing expert testimony before a court, or (ii) network or system vulnerability testing, including network scans and risk assessment and analysis of computers connected to a network.
This is a question that the states have been kicking around for several years now. If this takes hold it could prevent an army of eDiscovery gumshoes! Probably a good thing.
Cooperation 101
Academics have rattled on about cooperation in discovery for years now. It’s a principal that, if it works, could make life easier for counsel and cheaper for the client. If cooperation is in everyone’s best interest and Sedona principles are popular, why can’t we all just…get along?
Steven Bennett’s recent NYSBA Journal article, “How can Courts Encourage Cooperation in Discovery?” takes a run through several factors that discourage cooperation between parties and suggests several approaches that should be taken by federal courts to encourage the process. A sampling of Bennett’s suggested incentives includes:
- Requiring Competence – There have been conversations we’ve had with counsel revealing that, surprising or not, often the individual making decisions regarding data handling and processing specifications does not have a handle on their own client’s information systems and record keeping infrastructure. It’s easy to imagine that this lack of knowledge hampers conversations between opposing counsel to the same extent. Bennett recommends that the courts identify certain essential points of competence and require counsel to be up to speed in those points throughout the discovery process.
- A Settlement Privilege – Will we ever be able to get away from calculated exchanges based on game theory? Not likely, if every word or move enacted during discovery provides the basis of a further court submission. Here, Bennett recommends that courts create a limited settlement privilege applicable to certain stages of discovery negotiations (such as communications in pursuit of the finalization of search term lists).
- Supervised Negotiations – One example Bennett points out would be a mandatory mediation rule for all discovery motions.
- Giving Examples – Writing judicial opinions including positive examples of discovery negotiations done well.
- Creating the Ethos – By naming and praising attorneys in their opinions for good behavior, or by promoting awards, courts can incentivize cooperation in discovery.
One interesting note: all of Bennett’s suggestions center around action taken by the courts. The Sedona Conference is certainly the important thought leader in this arena, but in reality it’s likely to be the court and the court’s authority that has the influence to tell parties to cooperate—or else.
Words of Advice

From your Mentor… all the judges of the Seventh Circuit
To all the young Chicago attorneys out there getting ready to argue a case, want a tip or two from your judge first? Who could be a better mentor? Don’t worry, no political corruption is involved.. The Seventh Circuit Bar is gearing up to deliver these helpful tidbits through their new e-mentoring project.
A product of the Association’s Young Lawyer’s Committee, the project currently offers video-based mentoring sessions with over 45 judges and senior trial lawyers from throughout the circuit. Even better, the plan is to further develop this library into a collection including every Seventh Circuit district and magistrate judge. It used to be that only a lucky few had access to the knowledge held by these jurists. It’s still a lucky few: those who are members of the Seventh Circuit Bar Association. But hey—progress!
For a teaser video visit the Circuit’s webpage at http://www.7thcircuitbar.org/
Or, for a quicker sampling, according to Circuit Judge William Bauer, “We should not be so involved with our cases and our clients that we lose our own humanity and our knowledge or belief of what’s right and what’s wrong . . . When it’s appropriate, keep your mouth shut.”
The Future of Law
Where is this all going? Nicholas Parrella ran an article this summer in the New York Bar Association’s State Bar News speaking to the future of our profession. In it, Parrella discusses New York State Bar President Stephen P. Younger (Patterson Belknap Webb & Tyler LLP) and his new “Task Force on the Future of the Legal Profession.”
These days there has been a growing undercurrent of academics, practitioners and consultants trying to predict what the legal profession will become. Consultants Arthur Greene’s and Sandra Boyer’s article titled “Professional Staffing in the 21st Century” emphasize practice management, and propose that the continued delivery of quality legal services will require a growing emphasis on finding and retaining support staff. This emphasis includes the use of paralegals to take on the more routine legal services, reserving the complex and higher-level work for the lawyers. Other practitioners have endorsed the growing importance of “eLawyering.” That is, a web-based presence ranging from basic (web-sites, preferably more than informational sites) to sophisticated (web-based law firms).
Younger states that “[a]fter weathering one of the worst years in recent memory due to the economic downturn, bar leaders across New York and globally are cognizant of the need to fundamentally change the way we as attorneys do business . . .” To that end, Younger’s new task force will focus on harnessing new technologies, training new lawyers, developing the means to a work-life balance in the face of a perpetually plugged-in virtual office, and reforming law firms. Regarding the “reforming law firms” prong of the task force’s focus, the billable hour has been pegged as problematic and potentially not in the clients’ best interests. As for the technologies prong, a special subcommittee of the task force will look specifically at technologies of the future and how to harness them.
Beam me up Scottie.
Selling, We Were Born To Do It Right, So Why It Is So Difficult?
We were born to sell, right? When you need something as a little kid you beg, plead, insist, put on a sad face and hopefully you get what you want, right? My daughter is not even four and she has mastered the art of selling. Walking by the candy store she’ll say, “Daddy, can we get some candy?” My first response is a quick “No, keep walking.” What does she do? She starts her sales process with her body language. She slowly drops her head all-the-while looking at me out of the corner of her eye. She slumps her shoulders, arms hang loose, and BAM! Dad is sold right into the candy store without her uttering a word. Selling is an innate skill we all possess to some degree, and there are countless ways to approach it.
Most people are selling to their parent’s, family, and friends their whole lives. To play sports, to hang out with certain kids, to work, not to work, college choices, career choices, dating the right person. It’s endless. Selling is an inherent process that all people are involved in everyday. Yet, why are a lot of people in the sales profession so bad at sales?
Let’s look at the eDiscovery industry. The average customer in this market gets bombarded with cold calls, spam, mailed literature, knocks on the door…and did I say cold calls? Sitting with one of my clients for the course of an hour would make your head spin. The phone is ringing off of the hook. Why? Because they don’t answer. Because they simply don’t have the time to answer all the calls they get.
Now this can be viewed as a good or bad problem to have. Good because they have the pick of the litter for services and product offerings, and bad because it is very difficult to manage the overwhelming need for salespeople to connect with them. Without products and services, most of us would be out of a job or unable to do our jobs effectively. What should be discussed is approach. Approach is key to successful selling and getting a customer to consider you and your company’s products or services. It is those with a bad approach that sours our customers and potential customers, and ruins it for the rest of us.
You hear the horror stories all the time. “Johnny the sales guy at XYZ vendor sneaked into our office and was handing out business cards.” The one I like the best happened to me when a lady called for our CEO. She said “Hi this is Amy, I recently spoke to your CEO’s assistant and she told me to call for him today at 10:00 AM.” Well, the funny part is, our CEO doesn’t even have an assistant. It was an old school tactic for getting to the decision maker. Rather than playing games that turnoff prospective clients, use a professional approach to selling and try to always keep it that way.
Let’s take a look at some very common methods for gaining the attention of a prospect and ask ourselves, do they work or not:
Cold Calling, does it work? I have one question, do you like getting cold called? No you say? Okay next…
Cold Emailing, does it work? Sure. While probably not the most successful, it at least gives someone the chance to respond to you on their terms, whereas cold calling does not. The cold call is an unplanned, unexpected interruption to ones day. Advice for cold emailing: keep it short, sweet, and to the point.
Networking, does it work? Absolutely. Networking is one of the single most important reasons that sales people are successful. Unless you have the most amazing new product say (Viagra) and an insane amount of demand (tons of men over 50), you are going to have to get out there and really work at selling your product or service. Networking is a great medium to meet potential and current clients and give your elevator pitch in the hopes of a follow-up meeting. Networking also leads to referrals, which, in asking any successful salesperson, will tell you they get a lot of really good referrals.
Put yourself in the customers shoes. When someone calls you out of blue trying to peddle you something, how do you feel? What are your actions? How would you liked to be approached? When you have the answers to those questions, you probably have an idea of what type of selling you should be doing and what will be successful for you and your product or service.
Feel free to post your sales highlights, horror stories, or best practices so others can enhance that which we all must do in one capacity or another—SELL!
Good luck, have fun selling, and don’t forget what you can learn from a toddler!
Polish up your scripts with Optparse

If you’ve ever written an especially useful or popular script, you’ve noticed that features tend to creep into the codebase as you encounter variations in the input. As code evolves to handle more and more variation, you may notice that distinct ‘modes’ of operation arise. One way to accomodate these different modes is to use values hard-coded into the source. Examples such as field delimiters, input path, recursive operation and output paths are often wired directly into the operation of quickly-written scripts.
If you’re the only one that runs your code, and this solution may be perfectly workable. However, distributing scripts to colleagues, clients or trying to plug it into a larger framework will quickly reveal it’s shortcomings. Users may have a difficult time determining which values to change, or inadvertently introduce errors. So, this approach is error-prone at it’s worst and cumbersome at it’s best.
In this post, we’ll discuss a module in the Python standard library called ‘optparse’.
Optparse makes your life easier (and your scripts more usable) by providing objects and methods that automate the process of building well-defined and documented command line interfaces or CLIs. Instantiating an object and providing a few input values is all you need to provide consistent, well-documented interfaces for any script you write.
The first step is to import the necessary objects. Optparse ships as part of the standard Python library, and imports with the following:
from optparse import OptionParser
This makes the OptionParser object available to the script. We’ll create a new instance of OptionParser by using Python’s constructor syntax:
parser = OptionParser()
Parser is now instantiated and ready to be populated with options, via calls to its add_option method. In order to properly illustrate the remaining usage, we’ll introduce a simple example and work with it for the remainder of the article.
Example: Folder size detector
Let’s assume that we want to create a script which will recursively count the number of files contained in a folder and any of it’s sub folders. This is fairly simple to implement using the walk generator function within Python’s os module. The following line is not directly relevant to optparse, but probably bears a more in-depth discussion:
for root, dirs, files in os.walk(dir):
Walk is a special type of Python method called a ‘generator’ which computes a small piece of a larger results and returns it in steps. In this case, the partial results are lists of subitems our script encounters as it traverses each folder in a directory structure. Use of a generator in this context is very efficent, as Python is only storing one step of the traversal at a time. Traversing a large directory (say your C:\ drive, for instance) without generators would consume a very large amount of memory.
Walk makes it easy to determine file counts across all subfolders, because filenames are returned as a Python list. We can use the len() function to get the length of the list, thus obtaining our file count. Since the for loop is executing in steps, it is necessary to declare a variable outside of the loop to hold our result. The completed syntax is as follows:
subFiles = 0
for root, dirs, files in os.walk(dir):
subFiles += len(files)
The for loop will iterate until it has traversed ‘dir’ and all it’s subdirectories. As it visits each folder, it will increment the running total number of files.
Back to Optparse
This script is functional, and potentially useful in a few different scenarios, however, it only gives a summary file count for the top-most directory. We could trivially modify the script and add the ability to print out file counts for all subfolders within the tree. Using optparse allows us to easily add a command line interface which preserves both modes.
The following paragraphs will scratch the surface of the optparse module by walking through two different examples. To get the full picture, read through the documentation.
parser = OptionParser()
parser.add_option(”-v”, “—verbose”,
help = “Recursively print file counts for this folder and all subfolders”,
action=“store_true”,
dest=“verbose”)
This syntax adds an option to the ‘parser’ instance of OptionParser. The first line provides a list of switches which your script will accept. These will be familiar to OSX/*Nix users as short and long options. On the next line, the optionparser accepts a help string which is used to provide a brief description of the option and it’s usage. The action value can be one of several predefined strings (in this case a variable is set to true. Finally, the dest argument specifies which field within the ‘parser’ instance will receive the value.
parser.add_option(”-o”, “—output”,
help = “Specify name of file to write summary”,
metavar = “OFILE”,
action=“store”,
type=“string”,
dest=“outFile”)
This example is similar in that it allows the user to specify short and long options, a help string and an action to perform on a destination. In this case, a string is stored in a field named outfile. The concept of a metavar is also used. Metavars are used when you want to provide the user with an intuitively named destination for a value that is not the same as the option names.
Putting it all together
# Import the necessary objects from
# the python standard library
from optparse import OptionParser
# used to import the walk function
import os
import sys
# Options is declared globally so that it will be available
# to the entire script without being passed around. It will
# be populated with data later
options = “”
# OS walking will be wrapped into a function
def countFiles(dir, destination):
# Declare counter to aggregate results
subFiles = 0
# os.walk returns a string and two lists
# current_dir -> name of directory being explored
# dirs -> subdirectories of current_dir
# files -> list of files in the current dir
for current_dir, dirs, files in os.walk(dir):
# increment the counter with the number of files
# in the current directory
subFiles += len(files)
# If ‘verbose’ field is true (i.e., -v or—verbose is
# used in the CLI invocation) the script will print out
# any intermediate directories and file counts
if(options.verbose):
destination.write(current_dir + “: ” + str(len(files)) + “\n”)
return subFiles
if __name__ == “__main__”:
# Use the constructor to create a new option parser
parser = OptionParser()
# Add option aliases, documentation strings and behaviors
# This will set a field named ‘verbose’ to true if it is used
# on the command line
parser.add_option(”-v”, “—verbose”,
help = “Recursively print file counts for this folder and all subfolders”,
action=“store_true”,
dest=“verbose”)
parser.add_option(”-o”, “—output”,
help = “Specify name of file to write summary”,
metavar = “OFILE”,
action=“store”,
type=“string”,
dest=“outFile”)
# parse_args returns two values
# options -> hash containing the state of flags from the CLI
# args -> any positional arguments encountered after parsing options
(options, args) = parser.parse_args()
# Check to see if an output file was specified
# if not, use standard out (print to the console)
if(options.outFile != None):
results = open(options.outFile, ‘w’)
else:
results = sys.stdout
# Write the final result to the console or file
results.write(args[0] + “: ” + str(countFiles(args[0],results)) + “\n”)
Invoking the script with options: “
-o out.txt—verbose /path/to/directory ” will create a text file in the current directory which contains the results of exploring “directory” and all of it’s subfolders. Alternatively, using: “/path/to/directory” will calculate the partial results silently and the print the summed total to the screen. Finally, using “—help“ will print a summary of the operation:
Usage: dirCounter.py [options]
Options:
-h,—help show this help message and exit
-v,—verbose Recursively print file counts for this folder and all
subfolders
-o OFILE,—output=OFILE
Specify name of file to write summary
A little forethought, and modules like optparse make it easy to create user-friendly and self-documenting CLI scripts.
So you Want to do Business in Boston?
Never mind dropping your Rs, how’s your WISP?
And no, I don’t means lisp.. How’s your Written Information Security Plan?
Vigorous identity theft regulations introduced by the Massachusetts Office of Consumer Affairs and Business Regulation (201 CMR 17.00 et. seq.) requires any person or business that owns or licenses (receives, maintains, processes or accesses) personal information about a resident of the Commonwealth of Massachusetts to meet minimum standards in safeguarding that personal information—whether in paper or electronic form. Such parties must develop and implement a Written Information Security Plan to protect personal information in a manner fully consistent with industry standards and other applicable laws and regulations.
In this case “personal information” is defined as a Massachusetts resident’s first name and last name or first initial and last name in combination with any one or more of the following data elements that relate to such resident: (a) Social Security number; (b) driver’s license number or state-issued identification card number; or (c) financial account number, or credit or debit card number, with or without any required security code, access code, personal identification number or password, that would permit access to a resident’s financial account; provided, however, that “Personal information” shall not include information that is lawfully obtained from publicly available information, or from federal, state or local government records lawfully made available to the general public.
So what do you need to do? A few highlights pulled from the Regulations’ obligations include:
- Write, implement, maintain and monitor a “comprehensive written information security program”;
- Appoint an Information Security Manager or committee who will help your staff carry out the Information Security Plan and audit compliance regularly;
- Provide written notification when you know or have reason to know that there has been a security breach; and
- Dispose of personal information in a manner that precludes access to or reconstruction of the documents.
While this brings a little extra in the way of administrative oversight, it’s certainly doable.. and worth it. Like Massachusetts health care reform, it’s likely a sign of things to come. Just look at the attention Facebook and Google have been enjoying, look at the fees paid to IT Security consultants, and there’s no arguing it: privacy’s stock is rising.
So check out the specific administrative, physical and electronic security measures required by the Regulations. Because if you’re not doing business in Boston now, you may be soon.. and then you’ll be wicked late!
shameless promotion: Logik complies with Massachusetts’ standards
Social Media takes on SLAPP-happy litigation
Meet Justin Kurtz, an undergrad with more than 12,000 friends. Facebook friends. 12,000?! Why the popularity? Apparently Kurtz knows how to take a “SLAPP”.. a Strategic Lawsuit Against Public Participation that is. Or at least Kurtz and his 12,000 friends hope he knows how.
The story here is about a clash of two titans—the corporate or government plaintiffs willing to litigate to force a vocal critic to back down vs. the little guy channeling the power of social networks.
Mr. Kurtz states that, although he held a parking permit, a local towing company disabled his car alarm, towed his car from his apartment complex, and levied a $118 fine for the car’s release. Counsel for the towing company said the permit wasn’t visible. Either way, this kind of thing tends to irritate a car owner who feels wronged. Looking for a way to get his story out, Kurtz created a Facebook page called “Kalamazoo Residents against T&J Towing.”[1] Enter the power of social media: within two days of its creation 800 people had joined the group. It might have stopped there, but where Kurtz saw free speech, the towing company saw defamation… to the tune of $750,000 in damages.
The criticism against SLAPP suits is that often the plaintiff doesn’t necessarily even expect to win their fight in court—they expect to use their deeper pockets to pressure the defendant to clam up and avoid an expensive lawsuit. The strategy quite often works, but in this case, with the help of social media, the equation seems to have shifted. Kurtz’s collection of friends grew rapidly from 800 to over 12,000, and other social media outlets (Google maps Reviews , Yahoo! Local, and Yelp) have been enlisted with endless stories (true or otherwise) of apparent encounters with the plaintiff. Kurtz himself claims to have done nothing wrong: “The only thing I posted is what happened to me.”[2]
All this grass-root attention lends support to The Citizen Participation Act (H.R. 4364),[3] a federal bill sponsored by Representative Steve Cohen of Tennessee and Representative Charlie Gonzalez of Texas. The purpose of this proposed legislation is to enable a defendant who believes himself or herself to be the victim of a SLAPP to petition for dismissal of the suit. As enumerated by the Public Participation Project, the bill includes the following provisions:
1. Immunity for Petition Activity (for all petition activity performed without knowledge or reckless disregard of falsity);
2. Protections for Petition and Speech Activity (in connection with an issue of public interest);
3. Federal Removal Jurisdiction (in consideration of the fact that roughly half the states have no similar anti-SLAPP provisions, to allow for removal to federal jurisdiction when the defendant claims the defense of immunity under the pending Act);
4. Special Motion to Quash (to protect anonymous speech when the anonymous speaker’s personally identifying information is sought);
5. Fees and Costs (including reasonable attorney’s fees, for the party who prevails on a special motion to dismiss or quash);
6. Bankruptcy Non-Dischargeability of SLAPP and SLAPPBACK Awards (making fees awarded non-dischargeable in bankruptcy for successful SLAPP defendants and for defendants who are allowed to recover damages incurred in defending against a SLAPP); and
7. Exemptions (non-applicability of the bill to claims brought solely in the public interest or from advertising speech, to protect against abuse of the statute).[4]
[1] http://www.facebook.com/group.php?v=wall&gid=288159562692
[2] http://www.nytimes.com/2010/06/01/us/01slapp.html
[3] http://www.anti-slapp.org/?q=node/16
[4] Id.
There’s an App for That
There is soon to be an app for just about everything. Thankfully, the legal community isn’t about to be left behind. Looking for a few apps that can actually make you smarter? Here are a few in no particular order that have caught my attention over the past month.
LawStack
This stack is much more fun than the law stacks you remember from law school, and the app gives you more than simply a touch of sophistication in your iPhone. The free version comes pre-loaded with a nicely indexed collection including the US Constitution, the Federal Rules of Appellate Procedure, the Federal Rules of Bankruptcy Procedure, the Federal Rules of Civil Procedure, the Federal Rules of Criminal Procedure, and the Federal Rules of Evidence. You can add federal and local statutes to your stack via the app store, but those come along with an additional fee.
PocketJustice
Just like LawStack, this app avoids the use of the space bar. Also like LawStack it’s free! PocketJustice brings you concise abstracts of US Supreme Court decisions along with voting alignments and personal bios for all of the current and former Supreme Court Justices. The free version comes loaded with the top 100 constitutional law cases complete with media content including oral arguments. As always, you can pay for more if you want to: the current price for the entire Supreme Court constitutional law canon is $4.99.
BarMax
Then there’s the not-so-free end of the spectrum. I can’t easily think of a new law school grad who hasn’t shelled out for a complete bar review course. That hasn’t gone unnoticed. This team of Harvard lawyers means it when they say “Max.” So far as I’m aware, this app is the biggest (over a gigabyte of lectures, outlines and checklists) and the first to hit the four-figure price tag—One Thousand Dollars. Well, $999.. But hey, who’s counting? Especially when the other major bar prep courses will run you four times that amount? Even still, there’s something nice about the word “free”. BarMax thought of that.. and is offering a free MPRE app to get you hooked.
Should Wexis Fear?
Maybe it’s because I’m a legal research fan, maybe it’s because I like a deal, but William Manz’s recent article[1] in the New York State Bar Association Journal is pretty darn cool. The thing is, I’m not quite sure which part is the coolest.
Old, archived legal records and briefs have long been accessible only to lawyers and researchers who are willing to pay. As Manz points out, microfilm or microfiche records of New York Appellate Division cases only go back as far as the early 1970s. If you want to dig back a bit deeper, good luck! Google is working together with the Law Library Microform Consortium (LLMC) to change all that.
LLMC has been at work collecting archived materials in their physical format from law libraries. This is just the start, but it’s no small job. The New York Bar Association library worked for more than half a year, from January to August 2009, just to remove their volumes from basement stacks. From there Google has been arranging the shipment of materials to Googleplex in Mountain View, California, where their own high-speed scanners digitize them and record their size, author and publication date. Ultimately these volumes will be freely accessible via Google Scholar, complete with Google’s user-friendly search engine functionality.
The fun doesn’t end there. Sadly less accessible to people like—me—the original copies of the newly-scanned materials are being sent to Hutchinson, Kansas for permanent storage with Underground Vaults & Storage, Inc. Describing itself as “one of the most secure and elusive underground storage facilities in the world,” this secure facility uses portions of a salt mine 650 feet below the surface. We’re talking about more than 1.7 million square feet of storage space in cool, dry storage, where our legal heritage will live along with Hollywood archives and Cold War era documents. Their website boasts of “biometric scans, video cameras, redundant authorizations, steel vault doors, blind passwords, anonymous storage, restricted personnel access, infrared monitors, and more that we cannot reveal.”[2] As I understand it, access would even be a challenge for James Bond.
So should Wexis fear? Google isn’t exactly brand new to the game—Google Scholar has been gradually developing in content and functionality. But considering all the bells and whistles Westlaw and Lexis bring to the table, they’re not likely to be hurt. Those at risk, perhaps, are the collection of second-tier legal research services that cater to smaller firms and solos.
[1] William H. Manz, Recent Developments: Records and Briefs, N.Y. ST. B.A. J. 82, Feb. 2010, at 47.
[2] http://www.undergroundvaults.com/aboutus/hutchinson.cfm
Bootstrapped, Profitable, & Proud: Logik via 37signals
We were featured in 37signals’ Bootstrapped, Profitable, & Proud series this week. Here is a piece of the story…
From http://37signals.com/svn/posts/2385-bootstrapped-profitable-proud-logik
This is part of our series “Bootstrapped, Profitable, & Proud” which profiles companies that 1) have $1MM+ in revenues, 2) didn’t take VC, and 3) are profitable.
Q&A with Andy Wilson of Logik
What does your business do?
Logik helps companies find, organize, process, and make searchable terabytes of digital documents for legal discovery. I always say we sell digital aspirin to attorneys experiencing discovery migraines.

How successful is your business?
Financially, we’ve been very successful considering our size relative to the competition (most have close to or well over 100 employees, we have 16). We don’t reveal internal financials now, because 1. we are private and 2. we don’t want VC’s beating down our door anymore after what happened with the 2009 Inc 500 ranking.
With that said, from 2005 to 2008 we grew revenue by 1,067% from $373,866 in 2005 to $4.4 million in 2008 with about $3 million in profit. We did that with 8 employees, a ton of servers, niche software, and 1 dog. This, minus the profit, is all public information now. We were ranked #181 overall on the 2009 Inc 500 survey and #1 for eDiscovery companies.
Getting on the Inc 500 was a great marketing tool for us, because it helped some of our more skeptical, on the fence, customers realize we were indeed legit despite our small size. Although we don’t reveal financials anymore, we have doubled our company size, moved to a new, bigger, and more open office space closer to our customers, and we are hiring for more engineers and support staff.
How did you get started?
In 2004 Sheng and I met for some quality Chinese food in Virginia to discuss what would become Logik. Prior to Logik, we were working for a small legal printing company helping to destroy rain forests. No seriously, we would print hundreds of thousands of emails to paper, so that massive legal teams could manually review each page. Very efficient (odd fact: I worked on the Microsoft antitrust litigation and at one point was printing out Bill Gates’ email for a few weeks. He is very long winded.) After a few years of doing this and inhaling enough toner to paint your entire house black a few times over I decided I needed out and to find a better way to solve this problem. I mean, why would you “print” electronic documents to paper? Why not just print them to PDF or TIF? Ah-ha!
So, after letting the Chinese food settle we got to work drawing out the process flow for our document processing software. We quit our jobs, cut back on expenses, leased some servers, and got to work. I have a CS background, but Sheng is the real engineer and created the first version in just a few months. We got our first real customer 9 months after we started. This is how we got started.

Logik founders Sheng Yang and Andy Wilson.
How did you fund yourself at first?
Savings and credit cards. Our total startup costs were less than $20,000. Funny enough, we are still funding ourselves in the same way, but with a lot more savings and less credit card debt.
Did you ever consider taking on any investors?
Yes. We created a presentation, met with a dozen or so well-known investors, and then decided to scrap the idea all together. We realized that it didn’t make any sense to give up the control of our company for money. We couldn’t even figure out what we would spend the money on if we got more of it. So, we decided that, for us and for our culture, it would be best to keep growing organically. It was probably the best decision we’ve made yet.
To read the rest of the story on 37signals.com Click Here
The Perks of Flying Solo
Thinking of becoming what The American Lawyer[1] calls the Lone Wolf? Recession and all, the trend shows that many lawyers think this is the perfect time to go solo—or to go boutique. It comes down to two attractive perks: value for clients and autonomy for lawyers.
In terms of value, there’s nothing to match the experienced lawyer who decides to go small. These days even big clients are looking to capture economies in new places, and solo attorneys and start-up firms are reaping the benefits. Anchored by attorneys with field experience in larger blue-chip firms, these smaller players are becoming known for delivering conventional big-law quality coupled with unconventional flexibility in terms of billing. Smaller office space (with freedom from long-term lease agreements), pared-down staffing (making use of contract attorneys and virtual paralegals as needed), alternative billing schemes (using flat fees or bonus fees for successful outcomes in place of billable hours), streamlined legal claims and subbing in autonomy and ownership in the place of higher profits are all elements contributing to the rise of the Lone Wolf.
When it comes to flying solo, the growing numbers haven’t been missed by the American Bar Association. The ABA points out that over 30% of American attorneys are now solos, and yet fewer than 7% of these solos are members of the Association.[2] Wanting to capture that segment, the ABA will be cutting annual dues in half for solo practitioners as of September 1, 2010.
Of course not just any graduate is ready to go solo… for those new to the profession the notion of solo practice might just descend to the level of malpractice. Small firms require a full skill set ranging from legal experience to marketing abilities. But for those lawyers established enough to be able to bring clients along with them, small can be golden. Efficiency is the new black.
[1] For a full write-up on this topic see The American Lawyer, Economy Model, 57-61 (Feb. 2010).
[2] ABA Journal, ABA Halves Dues for Solos, 65 (March 2010).
ESI in ADR? It’s your call.
Recent blog posts have been popping up talking, with some alarm, about the rise of eDiscovery in ADR (Alternative Dispute Resolution). The idea seems to be that a once-friendly method for tabling business disputes is potentially being hamstrung by the encroachment of eDiscovery into the process. Granted, arbitration as an institution has developed or “matured” to such an extent that the old arguments for a faster, cheaper dispute resolution process often don’t ring true. But should the legal and business communities be alarmed?
With increasing frequency it’s becoming the reality that if you want to consider evidence at all, you’ll be considering electronic data. The argument goes that eDiscovery is today simply discovery… with an “e” appended to the front. This is easy to see when you look at the growing numbers of businesses of all sizes storing their records and communications primarily or exclusively as ESI (Electronically Stored Information). Think E-mail and spreadsheets. Enough said.
Without a doubt, ESI will be a growing presence in ADR. This is a natural and necessary progression in the history of arbitration (along with various other dispute resolution methods), and without this development arbitration would become somewhat archaic and ultimately hobbled.
Does this mean that the flexibility and speed associated with ADR is dead? No it does not. Party autonomy, the ability to affect the scope and dimensions of your dispute resolution agreement before you sign it, will remain a fundamental element of ADR. Such autonomy is only lost when you’re dealing with unbalanced parties–such as when Party A says to Party B “if you want to do business with me, you’ll sign this dispute resolution agreement.” These scenarios are common, but in such scenarios Party A has always had the power to call the shots–including the power to choose traditional litigation with everything that entails. Whenever two companies deal with each-other as equal Parties they maintain the ability to design a dispute resolution clause with an array of options. Some of those options should certainly be the extent to which ESI will be handled, and how it will be handled. This requires direct discussions between the Parties covering what is necessary or reasonable and in what circumstances.
An extremely common practice found in dispute resolution clauses is the rote adoption of the sets of rules developed by one or another of several dispute resolution or arbitration organizations. These include, among others, the American Arbitration Association, the Chartered Institute of Arbitrators, JARS, the International Chamber of Commerce, and the London Court of International Arbitration. Fortunately these institutions have been developing their own guidelines and protocols for addressing eDiscovery in the ADR context, and many of these protocols reflect an approach to eDiscovery that maintains greater constraints and streamlining than those found in traditional American civil litigation. As an example, take a look at the following clause entitled “Electronic Documents” taken from the ICDR (International Centre for Dispute Resolution) Guidelines for Arbitrators Concerning Exchanges of Information:
When documents to be exchanged are maintained in electronic form, the party in possession of such documents may make them available in the form (which may be paper copies) most convenient and economical for it, unless the Tribunal determines, on application and for good cause, that there is a compelling need for access to the documents in a different form. Requests for documents maintained in electronic form should be narrowly focused and structured to make searching for them as economical as possible. The Tribunal may direct testing or other means of focusing and limiting any search.[1]
[1] Taken from http://www.adr.org/si.asp?id=5288
Image Borrowed from: http://www.hobokenattorney.com/lawyer-attorney-1131734.html
ASAP Ale - World’s 1st eDiscovery Beer
Stressful, crazy day? Our beer makes it all ok! Introducing “ASAP Ale,” the world’s first eDiscovery beer. With the success of Redaction, which was our entre into the world of hand-crafted libations, we figured making beer was the next logikal step for an eDiscovery vendor…naturally. A nice, cold one after an especially grueling day in eDiscovery can definitely put a person at ease—which is why a nicely stocked fridge at all times is a must.
The way we see it, we wouldn’t buy a 3rd party eDiscovery software to ease our client’s processing problems (HELLO Gridlogik!) so why should we drink someone else’s beer when we can make our own? ASAP Ale is an Imperial Ale weighing in at 8% alcohol, so it’s not for the faint of heart. It will be bottled in the middle of May with our fancy label slapped on it. Unlike Redaction’s monthly giveaway, the only way to get this prize is by stopping by our office and saying hello. We take our jobs and our clients very seriously, but we also like to kick back with a well-deserved cold one after a hard drive’s work (snort snort.) So don’t be shy, loosen that tie and come on by! 709 G Street NW, Washington DC 20001.
Project Manager @ Wiley Rein
Thank you for your help. You make us look good. Thanks!
Capturing TIFF metadata
![]()
Image Courtesy of: http://www.cksinfo.com/clipart/construction/tools/magnifyingglasses/magnifying-glass-black-handle.png
Building from the same basic structure as the file system metadata gatherer (http://logik.com/whats_new/entry/capturing_file_system_metadata/), we can incorporate functionality to pull information from within the file. Once documents have been reviewed and produced, it is very common for them to be converted from their native or ‘dynamic’ form into a more static page-oriented form such as a TIF image. When the number of pages in a production approaches the millions, it becomes impossible to check every file for small details like compression, page orientation and resolution. Using the ‘for’ loop from the previous example and incorporating a third-party will make it possible to quickly generate a useful summary of all TIF images in a folder.
PIL
The Python Imaging Library (PIL - http://www.pythonware.com/products/pil/index.htm) is a general purpose image manipulation library for Python. It has classes and methods to parse, load and manipulate images in several different formats. We will demonstrate a very small part of the overall functionality, and it is well worth glancing through the documentation to figure out what else is possible.
All of the required files are bundled into a Windows installer which can be obtained from the Pythonware website (http://www.pythonware.com/products/pil/index.htm). Be sure to download the version that is appropriate for your particular installation of Python. Once the installer has completed, we’re able to import the Image class with
from PIL.TiffImagePlugin import TiffImageFile
We’ll modify the Glob loop so that it will only grab files with a TIF extension and then use that list as the variable in the for loop. Note the slight difference between this example and the prior one (you’ll probably find yourself repeating or reusing patterns from time to time).
fileList = glob.glob(base_path +
"*.tif")
Then, we can use the image’s open method which takes a path and creates an in-memory representation of the image stored at that path.
im = Image.open(file)
If nothing’s gone wrong, the only thing left to do is access properties and
methods of the im variable to get a summary of properties for every TIF
image in the folder. In this case, we’ll be interested in the compression,
resolution orientation and the number of pages. PIL has built-in methods to
handle most of these pieces of information.
|
Field
| Information/Format
|
im.field
|
Returns a string containing the format of the current image. “TIFF”
|
im.size
|
Returns the dimensions of the image as an ordered pair of pixel
|
im.info
|
Returns a dictionary with different fields depending on the image
|
Page Count
PIL does not have a built-in method or property for counting the number of pages in a file, so we’ll have to define our own. First, we’ll take a brief detour into a general programming topic called “Exception Handling.” The Image class in PIL has a method called seek() which accepts an integer as an argument and attempts to open that page. Trying to seek to page 35 in a one page document will cause the script to enter a special state known as an exception.
Look before you leap
Exceptions occur when programs do something that is unexpected or undefined. For instance, many languages have the notion of a “divide by zero” exception in case code causes it to do so. Exceptions are different from program crashes in that code which is likely to raise an exception can be wrapped inside special blocks of code which will try to perform the operation, detect an exception if it occurs and then execute cleanup code in order to allow the program to keep executing without crashing. In Python, this special code is known as a try/except block.
Since we have no way of determining where a particular document ends, we can take advantage of the fact that seek throws an exception. Essentially, we’ll just keep trying to move to the next page until an exception is raised. The following function keeps track of the number of pages successfully accessed with a counter variable.
def tifPageCount(tif):
pageCount = 1
try:
while(1):
tif.seek(pageCount)
pageCount += 1
except EOFError:
pass
return pageCount
Putting it all together
Here’s the full working code:
# imports functionality to enumerate files
from PIL.TiffImagePlugin import TiffImageFile
import glob
# imports functionality to read command line arguments
import sys
import Image
# Using a provided Image object, continually seek to the next page until
# an EOFException is raised. Keep track of the successfully encountered
# pages with a counter variable
def tifPageCount(tif):
pageCount = 1
# Code in this block will execute until the end of the image file
# is reached
try:
while(1):
tif.seek(pageCount)
pageCount += 1
except EOFError:
pass
#try/except has completed, return the count
return pageCount
if __name__ == "__main__":
base_path = sys.argv[1] + "\\"
# The glob module will find all files on a certain path
# which match the pattern provided (in this case we’ll use
# *.tif to match only tif images
fileList = glob.glob(base_path + "*.tif")
# Store the delimiter in this variable for convenience
d = "|"
# iterate over the list of tiff files
for file in fileList:
# create an image object
im = Image.open(file)
# pull releevant information out of the image object
imgFmt = str(im.format)
imgSize = str(im.size)
imgInfo = str(im.info)
# Call the page counting method
numPages = str(tifPageCount(im))
# access filed
print(file + d + imgFmt + d + imgSize + d + imgInfo + d + numPages)
Running this code in a folder with single-page tifs folder yields the following results:
ABC0131816.tif|TIFF|(2550, 3300)|{'compression': 'group4', 'dpi': (300, 300)}|1
ABC0131817.tif|TIFF|(2550, 3300)|{'compression': 'group4', 'dpi': (300, 300)}|1
ABC0131818.tif|TIFF|(2550, 3300)|{'compression': 'group4', 'dpi': (300, 300)}|1
ABC0131819.tif|TIFF|(2550, 3300)|{'compression': 'group4', 'dpi': (300, 300)}|1
ABC0131820.tif|TIFF|(2550, 3300)|{'compression': 'group4', 'dpi': (300, 300)}|1
ABC0131821.tif|TIFF|(2550, 3300)|{'compression': 'group4', 'dpi': (300, 300)}|1
This information can be used to quickly identify any abnormalities with compression, resolution or page orientation. Additionally, it is useful in determining page counts within a folder. This could easily be ingested into a database or column-oriented processing program like Excel as an effective and thorough QC technique.
Capturing File System Metadata
This script will be a little shorter than some of the previous examples. However, it represents a fairly common use case within the field of eDiscovery. As data moves from party to party in the collection/preservation stage of a matter, related files are often lumped into folders according to organizational need. Summaries of the information in these folders are often crucial to everything from formulating a review strategy to determining timelines. In this post, we’ll look at a technique for capturing file system metadata and collecting it for reporting purposes.
Glob
There are many different ways to traverse the file system with Python. One of the simplest methods uses the Glob module (http://docs.python.org/library/glob.html). The somewhat unintuitive name is a throwback to early Unix days, and refers to the process of finding all strings that match a particular pattern. To make Glob available to your script, it is first necessary to add the appropriate import statement to the top of the source file.
import glob
Then, it’s possible to make calls to the glob method within the glob module (just go with it) in order to start building lists of filenames. Once files are stored in lists, we’re free to start capturing information we care about. In order to build the list, we’ll use the following syntax:
fileList = glob.glob(base_path + "*")
Notice that we are supplying a “*” wildcard along with a base path. This causes the script to navigate to the folder we specify and make a list of any filenames that fit the pattern. This pattern matches everything, but we could just as easily use stricter conditions such as “*.tif” (all TIFF images) or “*\\natives\\*” (only files in a natives folder), depending on our specific task. Note that Glob can only operate on file or folder names, and it will only return results for the current folder, not its subfolders.
Stat
The stat module (http://docs.python.org/library/stat.html) is named after yet another throwback to Unix. It is the name of a system call which was used to retrieve very detailed information about files in a file system. We will use a subset of its capability to capture MAC (modified, accessed, created) times and size for every file in a directory. Stat is an object in the OS module, which must be imported with:
from os import stat
This line allows calls to the stat function like the following
fs_metadata = os.stat(path_to_file)
fs_metadata receives the result of the stat function, which consists of several pieces of metadata from the file whose full path is supplied as an argument. For purposes of this demonstration, we will access and save the size in bytes and the various times associated with each file in a folder. Once the assignment has occurred, it is possible to access various fields of information using the dot notation. For instance, accessing the files size is accomplished by accessing the “st_size” field.
sizeInBytes = fs_metadata.st_size
This will save a positive integer for later, when we print to a summary. Accessing the file’s MAC times is similar, as we can see from the example of accessing modified time (created and accessed will be demonstrated in the full source listing at the end of the article.
modTime = fs_metadata.st_mtime
Working with Timestamps
There is one final loose end to tie up before the report will be satisfactory. Times reported by the stat module are stored internally as timestamps, or the number of seconds that has elapsed since a specific date. If we were to print any of the MAC times without modification, they would look something like “1258124917.17”. While this is perfectly suitable for sorting or comparison, it’s not very intuitive for human consumption. Fortunately, it’s fairly easy to implement a function which takes floating point numbers and converts them to a wide variety of date strings. Indeed, Python has date, time and datetime classes which split these entities into accessible fields and provide many methods for manipulating them. For brevity and simplicity, we will convert our MAC times to the ISO format combined date and time format (http://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations). This captures both the date and time and combines them into a string which will sort correctly in Excel.
After importing the datetime module, we can write a function which performs the necessary Math to convert a floating point number to a into a datetime object and calls it’s isoformat() method.
def floatToTime(timestamp):
return datetime.fromtimestamp(timestamp).isoformat()
Putting it all together
Here’s the full working code:
# imports functionality to enumerate files
import glob
# imports functionality to harvest filesystem metadata
import os
from stat import *
# imports functionality for converting times
from datetime import datetime
# imports functionality to read command line arguments
import sys
# Function which takes floating-point style timestamps and converts
# to an ISO-stlye string (YYYY-MM-DDTHH:MM:SS.Ms). These dates will
# sort properly if imported into a column-oriented data store like
# MS Excel
def floatToTime(timestamp):
# use fromtimestamp method to convert the floating point
# number into a ‘datetime’ object, then call that
# object’s isoformat() method to give back a formatted
# string
return datetime.fromtimestamp(timestamp).isoformat()
if __name__ == "__main__":
base_path = sys.argv[1] + "\\"
# The glob module will find all files on a certain path
# which match the pattern provided (in this case we’ll use
# * to match everything
fileList = glob.glob(base_path + "*")
# fileList has a list of files and folders that match the pattern
# we will iterate over each in this for loop
for file in fileList:
# stat takes the full path to a file and returns an
# object that contains many useful pieces of filesystem
# metadata.
fs_metadata = os.stat(file)
# This if statement guards the print statement so that fs_metadata
# will only be printed if the entry that we’re on is NOT a directory
# In other words, information should only be printed out for files.
if not S_ISDIR(fs_metadata.st_mode):
# We’ll capture the file size by accessing a field of the
# fs_metadata object.
sizeInBytes = fs_metadata.st_size
# We’ll access three fields from the fs_metadata object to
# capture Modified, Accessed and Created times from the filenames
# in the list
# Note: the times are stored as a floating-point timestamp, so
# we will use the conversion function to make it slightly more
# human-readable
modTime = floatToTime(fs_metadata.st_mtime)
accTime = floatToTime(fs_metadata.st_atime)
creTime = floatToTime(fs_metadata.st_ctime)
# Finally, we’ll print all the values into a delimited format that
# programs like Excel should be able to read easily
print(file + "|" +
str(sizeInBytes) +"|" +
modTime + "|" +
accTime + "|" +
creTime)
Running this code in the “C:\Python31\DLLs” folder yields the following results:
C:\Python31\DLLs\bz2.pyd|68096|2009-08-17T17:03:50|2009-10-20T16:37:52.281250|2009-08-17T17:03:50
C:\Python31\DLLs\py.ico|19790|2007-12-06T08:47:58|2009-10-20T16:37:52.265625|2007-12-06T08:47:58
C:\Python31\DLLs\pyc.ico|19790|2007-12-06T08:47:58|2009-10-20T16:37:52.265625|2007-12-06T08:47:58
C:\Python31\DLLs\pyexpat.pyd|152576|2009-08-17T17:04:36|2009-10-20T16:37:52.281250|2009-08-17T17:04:36
C:\Python31\DLLs\select.pyd|11776|2009-08-17T17:04:46|2009-10-20T16:37:52.281250|2009-08-17T17:04:46
C:\Python31\DLLs\sqlite3.dll|302080|2009-08-13T19:57:14|2009-10-20T16:37:52.328125|2009-08-13T19:57:14
C:\Python31\DLLs\tcl85.dll|867328|2008-11-06T20:29:16|2009-10-20T16:37:52.343750|2008-11-06T20:29:16
C:\Python31\DLLs\tclpip85.dll|8192|2008-06-12T18:15:40|2009-10-20T16:37:52.343750|2008-06-12T18:15:40
This data can be redirected from the command prompt or written to a file and imported cleanly into Excel. Note that we used “|” as a delimiter, as it cannot appear in Windows path strings.
Post your Redaction taste results to Corkd
It took almost 2 years to fully taste our wine and boy were we pleased when we finally did. Redaction packs a powerful punch of sweet fruit and oaky flavors with a hint of that good-ol fashion Zin spiciness. At 14.5% alcohol, with deep red legs, you may want to take it easy sipping this Zin.
Some notes about the wine: the grapes we selected come from the Grist Vineyard in the Dry Creek Valley and are used in some of the highest quality brands in California. We wanted the wine to reflect our eDiscovery technology and service, unique and carefully handcrafted.
For all the people we’ve shared this wine with, we hope you enjoy it as much as we did! Please share your own experiences with Redaction on Corkd at:
http://corkd.com/wine/view/113667-2008-redaction
Uncle Sam to Project Managers: “I Want You”
Ben Bain’s article in Federal Computer Week is worthy of a read. His article highlights the Office of Inspector General’s (OIG) most recent report to Congress – a report including the ten most significant challenges faced by the National Archives and Records Administration. This top-ten list reads, for the most part, like a wish list of the skills and resources in high demand here in the world of eDiscovery:
1. Electronic Records Archives – How is the government’s document retention coming along? The OIG finds the agency’s success uncertain, stating that the system “has experienced delivery delays, budgeting problems, and contractor staffing problems.”
2. Records Management – Yet again, we’re talking about the issue of mushrooming electronic records.
3. IT Security – We’ve mentioned in this blog before that IT Security is the new goldmine in today’s and tomorrow’s economy.
4. Public Access to Records – With NARA’s roll in declassification of public records, think document productions on a massive scale.
5. Storage Needs – Across the spread of federal agencies, NARA is looking to ensure compliance with record storage regulations.
6. Preservation Needs – Here we’re looking at an issue that is only getting bigger: a growing backlog of records to preserve.
7. Project Management – Technical oversight to ensure results and budget adherence is always in demand.
8. Physical and Holdings Security – Here we’re talking about the security of staff and records against natural and manmade disasters.
9. Contract Management and Administration – Teams of contractors working for NARA mean management and oversight challenges.
10. Workforce Issues – The OIG speaks here of the need to assess the agency’s needs so that they can hire and retain people with the necessary technical abilities.
The skill set sounds pretty familiar, doesn’t it?
Redaction, the world’s 1st eDiscovery wine, is here!
After almost 2 years in the making Logik Redaction is here. Redaction is the world’s 1st eDiscovery custom made wine. It’s a red zinfandel from Dry Creek Valley, CA. Weighing in at 14.5% alcohol with hints of vanilla and oak, it is a very tasty Zin (we had a wine tasting at 10am today and we’re what you call “experts”).
We made this wine for ourselves, friends, family, and our amazing clients. Tomorrow we will draw for the Redaction case giveaway, cross your fingers. If you get to taste Redaction, please let us know what you think about it. You may even find it in some local DC wine bars, so keep an eye out. Cheers!
The Cloud Computing Advancement Act?
The question: Is it just a pep talk to encourage someone else to act? Or does an actual draft of the proposed bill exist somewhere in Microsoft’s corporate legal department?
In January, Brad Smith (General Counsel of Microsoft) spoke at Washington, D.C.’s Brookings Institution Policy Forum here in Washington. Mr. Smith came to Washington to speak with academics and industry leaders about something dear to Microsoft’s heart – cloud computing.[1] Urging the importance of a “safe and open cloud,” a need more recently underscored by the heavy impact of shadowy hackers on cloud computing, corporate security and international relations, Smith urges the passage of new legislation to “promote innovation, protect consumers, and provide the Executive Branch with the new tools needed for a new technology era” including the following:
- Improvements in privacy protection and data access rules to ensure users’ privacy, starting with reforming and strengthening the Electronic Communications Privacy Act to clearly define and provide stronger protections for consumers and business;
- Modernization of the Computer Fraud and Abuse Act so law enforcement has the tools it needs to go after malicious hackers and deter instances of online-based crimes;
- Truth-in-cloud-computing principles to ensure that consumers and businesses will know whether and how their information will be accessed and used by service providers and how it will be protected online;
- Pursuit of a new multilateral framework to address data access issues globally.[2]
It’s interesting to note that Mr. Smith pitched alternative approaches to regulating the industry: an industry level self-regulatory code vs. federal law administered by a federal agency (such as the F.T.C.). To me, this suggests that no bill yet exists to present to Congress. I have no doubt that when it’s ready to go it will be getting plenty of attention.
[1] The full text of Mr. Smith’s speech can be found here: http://blog.seattlepi.com/microsoft/library/20100120smithspeech.pdf
[2] Elements taken from http://www.microsoft.com/presspass/press/2010/jan10/1-20BrookingsPR.mspx.
Getting SaaSy with your Vendor
Choices, choices.. Trying to decide amongst all those competing SaaS Providers?
Today’s post is a direct hat tip to Joshua Poje, attorney and Research Specialist with the ABA’s Legal Technology Resource Center (LTRC), and coauthor of the LTRC’s legal technology blog “ABA Site-tation.” In January Mr. Poje brought us The ABCs of Cloud-Based Practice Tools, including this list of 18 key questions to ask a SaaS Vendor before signing on the dotted line. Several of these questions apply equally to a potential IaaS Provider. It pays to ask a few questions, so go on, get SasSy!
Drumroll, please:
- Do you offer a trial period or demo of your product?
- What training options are available for customers?
- How often are new features added to the product?
- How many attorneys are currently using your product?
- What hours is your tech support available?
- Do you offer a Service Level Agreement (SLA) and/or would you be willing to negotiate one with me?
- What types of guarantees and disclaimers of liability do you include in your Terms of Service (TOS)?
- How do you safeguard the privacy/confidentiality of stored data?
- Who has access to the firm’s data?
- Have you ever had a data breach?
- How often, and in what manner, is users’ data backed up?
- What is your company’s history – e.g., how long have you been in business, and where do you derive your funding?
- Can I remove or copy my data from your servers in a non-proprietary format?
- Where does the data reside – inside or outside of the United States?
- What happens to the firm’s data if the company fails?
- Do you require a contractual agreement for a certain length of service (e.g., 12 months, 24 months)?
- What is the pricing history for your product? That is, how often have rates been increased?
- Are there any incidental costs I should be aware of? [1]
[1] http://www.abanet.org/lpm/lpt/articles/ftr01103.shtml
Harmful if Swallowed
Have you finished digesting that data sir?
Spoliation simply can’t get much worse than this. Following his arrest outside of a bank in Queens, New York this January, Florin Necula apparently swallowed a 4 GB Kingston flash drive in an attempt to keep Secret Service agents from discovering the evidence. Facing a charge for the use of a “skimmer” to collect ATM and credit card numbers, Necula’s bizarre version of spoliation also earned him a charge of obstruction of justice.
According to court documents, while in custody at the US Secret Services offices Necula grabbed the flash drive “which had been on his person at the time of his arrest, and swallowed. Doctors from the New York Downtown Hospital later removed the flash drive because they were concerned that Necula would be injured if they allowed the flash drive to remain inside him.”[1]
This procedure to “safely eject” the flash drive from Necula’s operating system took place four days after it had been ingested. It seems that the suspect was proving, shall we say, retentive.
Was your account information lodged in this man’s descending colon? I, for one, wouldn’t care to be the forensics expert to find out.
[1] Affidavit in support of search warrants, issued to the United States District Court, Eastern District of New York.
LFP…WTF?
In this post, we’ll build on the
previous post’s technique of iterating through a file line-by line.
LFP files are an extremely common form of data interchange as document
sets trade hands in litigation. Their popularity is probably
due in part to their simplicity. As a review, LFP files are plain
text files, where each record is a comma-delimited, newline terminated
collection of five fields. Find more details on the file format
or fields here (http://platinumlit.wik.is/%
Since the record structure is fairly simple and predictable, MS Access, Excel or SQL databases are popular choices for manipulating or exploring LFP files. These tools are certainly appropriate for the job; however, it is possible to exceed the storage capacity of Excel and even Access in certain extreme cases. At a minimum, each of these approaches requires a certain amount of overhead associated with importing the data. Python can offer a dramatic speedup for large LFP files or tasks (QC, reporting, etc.) that need to be performed repeatedly. We will work through a few such cases in the remainder of this post.
Sample Data
For the next several examples, we’ll use a small fictitious dataset comprised of the following records (if only they could all be this simple). The set consists of ten single-page TIFF images taken from three documents.
IM,ABC00001,D,0,@DEF1022104;
IM,ABC00002, ,0,@DEF1022104;DEF10221042\
IM,ABC00003, ,0,@DEF1022104;DEF10221042\
IM,ABC00004, ,0,@DEF1022104;DEF10221042\
IM,ABC00005, ,0,@DEF1022104;DEF10221042\
IM,ABC000006,D,0,@DEF1022104;
IM,ABC000007, ,0,@DEF1022104;DEF10221042\
IM,ABC000008, ,0,@DEF1022104;DEF10221042\
IM,ABC00009,D,0,@DEF1022104;
IM,ABC00010, ,0,@DEF1022104;DEF10221042\
Reporting: Document Statistics
Just as in the previous example, we’ll need to open the file for reading with the following line:
datFile = open("..\\testData\\sample.
Then, we’ll use a ‘for’ loop to iterate through each line of the file:
for line in datFile:
Finally, we’ll perform some string manipulation to transform each record into its individual fields. The rstrip method removes the newline from the end of each line, and split breaks a string into substrings, based on the supplied delimiter (a comma in this case). This is similar to Excel’s “Text to columns” function.
fields = line.rstrip("\r\n").split(",")
If line contains “IM,ABC00001,D,0,@DEF1022104;
- rstrip will remove the newline from the end of the file.
- split(“,”) will identify all commas in the string and build a list according to the delimiters
- Finally, fields will be set equal to a list containing the following fields, (Notice that field number start at 0):
| Fields[0] | Fields[1] | Fields[2] | Fields[3] | Fields[4] |
| IM | ABC00001 | D | 0 | @DEF1022104;DEF10221042\0000; |
With this basic construct, we can now begin to add code to discover and track features from the data. Many features can be tracked simultaneously. For instance, it’s common to want to know how many pages and documents are represented by a particular LFP file. Page count for this data can be captured by initializing a counter variable outside of the loop and incrementing it with each line. Similarly, document count can be obtained by incrementing a counter every time a non-empty value in the third field is encountered.
numDocs = 0
numPages = 0
for line in datFile:
fields = line.rstrip("\r\n").split(",")
numPages += 1
if(fields[2] != “ ”):
numDocs+=1
When the loop is finished iterating, numDocs and numPages should contain the appropriate values.
QC: Finding Abnormalities
If you look at the data, you will notice that there is one document which is seemingly named with a different convention than the others. Files starting with ABC000006 through ABC000008 are zero-padded to six places instead of five. This can be easily detected and fixed with Python.
We’ll start out by assuming that all Bates numbers in this production should have uniform prefixes and padding length. If that’s the case, then every bates number in the file should be the same length, and adding code to detect otherwise is a simple matter, using Python’s built-in len() function.
if(len(fields[1]) > 8):
nonConformNum = fields[1]
print(str(numPages) + ":
" + nonConformNum)
This code checks for any Bates numbers that are comprised of more than eight characters (three characters of prefix plus 5 of padding). If any are encountered, the script will print the current value of numPages (which will be equivalent to the line number at any step in the loop) and the non-conforming Bates number. This is helpful, because it alerts us to the presence of non-conforming values and provides line numbers or values to aid the search. From this point, it’s only a little extra work to add code which fixes the problem.
Writing Files: Outputting the Fix
We’ve already determined a way to find non-conforming lines and established that the errors are ‘cosmetic’ and can be safely fixed without any further investigation. Since the piece of the string that we want to modify is in the middle, we can’t use simple functions such as left and right truncation available in programs like Excel and Access. We’ll need to take advantage of Python’s advanced string subscripting operator, which provides a compact notation for extracting a piece of a string. We’ve seen in prior examples that one element in a Python list is accessed by placing a number with [] brackets. Python also allows use of a range to return a sub list. For example, we could isolate the prefix (the first three characters in a list of nine characters) by specifying the range 0:3.
batesPrefix = nonConformNum[0:3] #Will store ‘ABC’
We can use this same principle to capture the numerical portion of the Bates number and apply some extra commands to format it correctly at the same time.
batesNumber = nonConformNum[4:].lstrip("0").
A before, the compound statement to the right of the equal sign starts at the left and works to the right, one method at a time. It does the following three things inline:
| Command | Description | Data |
| nonConformNum[4:]. | Select all characters in the nonConformNum string, starting with the fourth Character | ABC000006 |
| lstrip("0"). | Strip all 0’s from the beginning of the resulting string | 6 |
| zfill(5) | Add the correct zero padding (five digits total) to the stripped-down version of the string | 00006 |
When all three statements have completed, batesNumber now contains the correctly padded numerical portion of the Bates number. These commands could be broken into multiple lines, but it is slightly more compact to represent them as a compound statement on one line, and we don’t need to save any of the intermediate results.
All that’s left is to add code to handle the output of our corrected data. Assuming we’ll want to capture results in a new file, we’ll use a slight variation on the open statement which we’ve been using to open source data. This will need to be specified before the loop.
correctedFile
= open(‘..\\testData\\sample_
This is almost identical to previous uses of the open command, with the exception of the ‘w’ parameter that is passed to the function. This tells Python that the file should be opened for Writing. If the file does not already exist at this location, Python will create it and open it as a blank file. If not, its contents will be deleted and it will be opened as a blank file. (Note: be *very* careful when opening files for writing in Python, as any pre-existing data will be LOST). correctedFile will now be available for writing within the loop.
Before presenting the full code, we’ll present the join method, which save a lot of typing if you’re outputting simple delimited records, like those found in an LFP. The syntax might look a little strange if you’re new to object-oriented programming, but it’s intuitive as long as you remember what you’re trying to accomplish.
",".join(fields)
Join takes a list as its argument and flattens it by gluing each item together, using the string in double quotes between the fields. I takes data which once resided in compartmentalized and separate cells and flattens it into one string, with a marker to delineate the old boundaries. This is not unlike saving an Excel file to a CSV.
Putting it all together
Here’s the full working code:
if __name__ == "__main__":
# Open the LFP file for reading
lfpFile = open("..\\testData\\sample.
# Initialize counters outside of the line-by-line
# iteration, These variables will keep track of
# LFP features as the program steps through each line
# of the file
numDocs = 0
numPages = 0
# Variables to track QC steps while stepping through
# the file
nonConformingBates=0
correctedFile
= open("..\\testData\\sample_
print("Incorrect lines:")
print("================")
# Use a for loop to step though each line of the file
for line in
lfpFile:
# this line applies two functions to the line
# variable in order to normalize it for the remaiing
# steps. Method calls start inside, and work out left to
# right in order.
# 1) rstrip -> removes the newline character from each line
# 2) split -> scans the string for supplied delimiter and
# breaks it into substrings as it finds them
fields = line.rstrip("\r\n").split(",")
# Each line in this file corresponds to one page in the set
numPages += 1
# Non-empty field 2 means the start of a new document
if(fields[2] != " "):
numDocs+=1
# QC check to detect Bates numbers that are longer than 7 c
# characters
if(len(fields[1]) > 8):
# Assign the incorrect Bates number to a string
nonConformNum = fields[1]
# Print the line number and bad number for reporting
print(str(numPages) + ": " + nonConformNum)
# isolate the bates prefix by selecting the first three
# characters of the sequence
batesPrefix = nonConformNum[0:3]
# pull out the numerical portion of the beg Bates number
# and format it with the correct number of zeros
batesNumber = nonConformNum[4:].lstrip("0").
# Overwrite the incorrectly padded number in the field
# list
fields[1] = batesPrefix + batesNumber
# use the join method to merge all fields together with commas
correctedFile.write(",".join(
#back to the top of the loop
else:
# this case will be reached if the beg bates has the correct
# number of characters, thus no procesing is necessary
# it can simply be copied over to the new file
correctedFile.write(line)
#back to the top of the loop
# Display the final values of the variables
print()
print("Summary:")
print("========")
print("Number
of Documents: " + str(numDocs) + ", Number of Pages: "
+ str(numPages))
Running this code with the sample input yields the following results:
Incorrect lines:
================
6: ABC000006
7: ABC000007
8: ABC000008
Summary:
========
Number of Documents: 3, Number of Pages: 10
How to check a file for duplicate lines - part 2
This will just be a quick update to the last post. In the previous version of the duplicate record detector the input file is specified statically (or “Hard Coded”) inside the file. This means that the source code must be modified each time that users want to run analysis on a new load file.
Unlike compiled languages like C++ or Java, Python doesn’t have a lengthy build cycle associated with making changes. While this isn’t too inconvenient, your users might not be comfortable directly modifying source code and there’s also the potential to introduce bugs by changing the wrong line. Fortunately, Python provides a method for passing data to a program via the command line.
The System Module
Python has a built-in module for interacting with the file system called system. System is full of useful methods, but we’ll just be using the argument passing functionality. Libraries are imported into Python scripts via the import statement.
Here’s the full working code:
import system
import hashlib
import collections
# Defines a function that takes a string as its argument and returns the
# hexadecimal representation of its MD5 checksum
# In: A string
# Out: A string of hex characters corresponding to the checksum
def calculate_md5(inStr):
#create an instance of the md5 object from
#python’s hashlib
md5Obj = hashlib.md5()
#Convert the string to a series of raw bytes
#assuming that it’s UTF-8 encoded
md5Obj.update(bytes(inStr,‘utf8’))
#Render the object as a hex encoded md5 hash value
return md5Obj.hexdigest()
if __name__ == “__main__”:
#Default factory method which creates an empty
#dictionary of lists
lineDict = collections.defaultdict(list)
#keep a counter variable to track which line
#of the file we’re on
i=1
#Create an iterable file object
# use the system library to pull arguments in from the command line
datFile = open(system.argv[1], ‘r’)
#cycle through each line of the file
for line in datFile:
#Calculate the checksum of the record
lineHash = calculate_md5(line)
#Either create a new entry in the dictionary
#or append to the list of lines with the same
#check sum
lineDict[lineHash].append(i)
#Advance the counter to move to the next line
i+=1
# Finally, some code to print out the results
#Print a title
print(“Duplicate Lines”)
#Cycle through each slot or ‘key’ in the dictionary
for entry in lineDict:
#If the length of the list is 2 or greater
#print it out
if len(lineDict[entry]) > 1:
print(lineDict[entry])
A few notes on DOS
This code can be run from the DOS prompt with the following command:
C:\> python findDupLines.py “C:\Path to\theFile.txt”
There are a few important points to keep in mind when running Python scripts from the command prompt. The most important is that Python can find your script. The easiest way to ensure this is to change directories to the location where your script resides. In the example above, the findDupLines.py script would have to be located at the root of my C: drive. Also notice the double quotes (“) surrounding the C:\Path to\theFile.txt. These are necessary because of the space characters in the path. If any folders in your path contain spaces, double quotes are mandatory.
The Ten Step Rain-Dance
In the legal arena, regardless of how long you’ve been in the game, it always comes down to “making it rain.”
Myra L McKenzie, assistant general counsel in the Wal-Mart Stores, Inc. legal department, offers the following ten tips to rain-making in her article In Order to Make Rain, You Have to Know How to Gather the Clouds: Tips for Young Lawyers on Client Development, printed in the American Bar Association Young Lawyers Division 101 Practice Series.
- 1. Do good work and always add value. [Produce a work product that is both timely and of high quality.]
- 2. Find out if you have a client development budget and use it. [A clear record of how this budget is used may lead to a larger budget.]
- 3. Be strategic. [Create and present a client development plan.]
- 4. Perfect your professional presentation. [How does your web biography look?]
- 5. Research you potential clients and their needs.
- 6. Carry business cards and use them.
- 7. Speak and speak often. [Build a reputation for competence in certain issues.]
- 8. Get active in the bar. [This can increase your visibility.]
- 9. Attend events frequented by in-house lawyers. [These are “face time” opportunities.]
- 10. Learn to “pitch.” [Practice closing the deal!]
The full text of McKenzie’s article can be found reprinted in The ABA’s periodical the Young Lawyer, Vol.14, No.5, p.6 (Feb.-Mar. 2010).
Illustration from Holamun2
How to check a file for duplicate lines
In this edition of “eDiscovery-related Python Tricks,” we’ll cover some fundamental techniques and operations that you’ll likely find yourself using repeatedly. Suppose you’ve been given the task of merging load files from several productions together.
You’re fairly sure that merging several files together has left the load file with duplicative lines, but the file is large and this would be difficult to determine manually. While this example may seem a little contrived, it will provide a simple setup for laying foundation that will likely be re-used when we get to more interesting examples.
Opening files
The first thing we’ll need to do is tell Python where it can find the file it will be reading. This is accomplished with the open function. In Python open accepts a path and a few arguments in order to return a file object which the program can manipulate. Don’t worry too much if this terminology doesn’t make sense, you’ll get a feel for it. Assuming that the file resides at C:\loadFile\bigLoadFile.dat, we’d write the following code to open the file for reading:
datFile = open(‘C:\\loadFile\\bigLoadFile.dat’, ‘r’)
This line searches the specified path for a file with the given name. You’ll notice that the path slashes are doubled (i.e., C:\\ instead of C:\). The “\” character is special in Python, and is used to designate non-printable characters. A double slash “\\” is Python’s way of denoting a folder separator, so that this code will run in a Windows environment. The second argument ‘r’ tells Python that this file will be open *only* for reading.
Iterating line-by-line
Since we’ve determined that we’re interested in finding duplicative lines, we’ll need a way to access each line within the file. Python provides a convenient method for doing this by making its file object “iterable”. This is the computer science-y way of saying that file objects have a next() method that will pull the next line out of a file until the end of the file is reached. This can be wrapped into a construct called a “for loop” which will cause a Python program to execute a block of code until a certain condition is reached. Code to access a file line by line would look like this:
for line in datFile:
“for” is a special word within Python programs which marks the beginning of the loop. “line” is an arbitrarily-chosen variable name which will hold the result of pulling successive lines out of datFile.
Dictionaries and MD5 Calculation
Now that we’ve set up the basic structure for looping through the entire file, it’s time to give some thought to the duplicate detection strategy. Since there is no limit to the number of records a load file can contain, nor any size limit on rows, we’ll want to come up with an efficient way to capture and represent the data. Additionally, use Python’s dictionary data structure to keep track of lines that we’ve seen so far.
lineDict = collections.defaultdict(list)
This line takes advantage of the Python collections library to create a dictionary which stores a list in each of its open slots. We will calculate an MD5 value for each record in the DAT file. Any identical MD5 values will mean that lines are exactly identical, so keeping track of line numbers and updating the dictionary accordingly should give a summary of identical lines. The only missing piece is a way to calculate an MD5 hash value, given a string. The following lines will be used to accomplish this:
def calculate_md5(inStr):
md5Obj = hashlib.md5()
md5Obj.update(bytes(inStr,‘utf8’))
return md5Obj.hexdigest()
Briefly, we’re taking advantage of another Python library to generate the hash values and returning them as a string.
Putting it all together
Here’s the full working code:
import hashlib
import collections
# Defines a function that takes a string as its argument and returns the
# hexadecimal representation of its MD5 checksum
# In: A string
# Out: A string of hex characters corresponding to the checksum
def calculate_md5(inStr):
#create an instance of the md5 object from
#python’s hashlib
md5Obj = hashlib.md5()
#Convert the string to a series of raw bytes
#assuming that it’s UTF-8 encoded
md5Obj.update(bytes(inStr,‘utf8’))
#Render the object as a hex encoded md5 hash value
return md5Obj.hexdigest()
# Main Program starts HERE!!!!!
if __name__ == “__main__”:
#Default factory method which creates an empty
#dictionary of lists
lineDict = collections.defaultdict(list)
#keep a counter variable to track which line
#of the file we’re on
i=1
#Create an iterable file object
datFile = open(‘C\\loadFile\\bigLoadFile.dat’, ‘r’)
#cycle through each line of the file
for line in datFile:
#Calculate the checksum of the record
lineHash = calculate_md5(line)
#Either create a new entry in the dictionary
#or append to the list of lines with the same
#check sum
lineDict[lineHash].append(i)
#Advance the counter to move to the next line
i+=1
# Finally, some code to print out the results
#Print a title
print(“Duplicate Lines”)
#Cycle through each slot or ‘key’ in the dictionary
for entry in lineDict:
#If the length of the list is 2 or greater
#print it out
if len(lineDict[entry]) > 1:
print(lineDict[entry])
Running this code with sample input:
1. fee
2. foo
3. foo
4. fee
5. few
6. few
7. few
8. few
9. fine
10. pine
Yields the following output:
Duplicate Lines
[5, 6, 7, 8]
[2, 3]
[1, 4]
As you can see, the script correctly captures and summarizes duplicate lines within the file. Notice that number ten does not appear in the sample, because it is unique. Despite the fact that the sample input is small, the approach should scale up to very large files. Using the MD5 value instead of the actually string allows the algorithm to store a small, fixed amount of data per record regardless of its size, so it’s unlikely that you’ll hit memory limitations.
Again, even if you don’t ever anticipate having to perform this specific task, it provides the shell for tasks that have to be performed repeatedly. I hope that you’ve found this useful, and stay tuned for more eDiscovery-related Python Tricks.
Cloud Computing - the Winners and the Challenged
Do you hear that noise? No? That noise you may or may not hear is the sound of a quiet revolution well underway. The cloud computing revolution is bringing along with it a gradual but dramatic wave of change to the world of network infrastructure, IT servicing, and business models. This transformation speaks to different parties in different ways with the promise of efficiencies and cost containment on the one hand weighing in against security and hidden cost worries on the other hand.
By way of a definition, Gartner Inc., the Connecticut-based IT research and advisory company offers the following five attributes of cloud computing: 1) Service-based, 2) Scalable and Elastic, 3) Shared, 4) Metered by Use, and 5) Uses Internet Technologies.
How does cloud computing affect you? Now is the time to find out and to plan your next move. Some factors to consider:
Automatic Winners:
- Data centers and cloud vendors. With the bulk of the migration to the cloud yet to come, it’s little wonder that Amazon, Microsoft, Apple, IBM and a host of others want to be in the game now. Houston, we are prepared for lift off.
- Merger & acquisition attorneys. Sure, there are plenty of small-to-medium sized cloud vendors to choose from. Yet the expected industry trajectory says get ready for a whole lot of M&A consolidation leading to cloud Goliaths.
- Early adopters of cloud technology. If you’re a start-up or a company anticipating significant IT infrastructure investments, an early migration to the cloud will allow you to bypass costly up-front hardware and IT fees.
- Efficiency hawks. An IBM white paper reports an approximate 5% utilization rate of commodity servers on average. Yes, IBM is ramping up its efforts to capture the cloud computing market, but when you consider the money lost in operational costs, server maintenance and management, you have to admit that this makes a compelling argument.
- Fans of Mobility. Do you like the idea of being able to access your software, applications and data remotely? No need to travel with your data or license multiple copies of software for home computing. Do you have a web browser? Just log in.. From the moon!
- Data security consultants. Just because your data is “out there” doesn’t mean you’re willing to share it with greedy, prying eyes. Are you confused by all the options? Data security consultants certainly hope so.
- Encrypted networks are bound for growth.
- Compliance officers and attorneys. We’re back to the issue of data security. Getting it right and making sure that it stays that way just got more complicated. This will require a professional’s attention. Due diligence should be performed down the chain of providers, and contractual arrangements touching on data handling need to be in place between multiple parties.
The Challenged:
- Security (and Certainty). Where once you could sign a client’s contracts assuring a certain measure of data security and confidentiality with ease, now you are at the mercy of an outsourced support structure. Not only will you have to contend with your cloud provider’s handling of your client’s data, but you’ll also have to contend with your cloud provider’s outsourced support. Issues related to where data is located, how data is managed, and who could potentially access your data remain some of the greatest challenges to a wholesale migration to the cloud. How many borders is your data being transferred across? What are the legal implications?
- Data centers and cloud vendors. In the face of all of the cloud’s shiny promises, the key players have their work cut out for them to assuage the public’s concerns regarding data security and platform integrity.
- Infrastructure vendors will need to increasingly target the cloud vendors.
- Software vendors will need to contend with, or become, SaaS companies.
- Large corporations? An oft-cited McKinsey & Company report, “Clearing the Air on Cloud Computing” claims that large corporations could actually lose by adopting cloud computing. McKinsey claims that the efficiencies so attractive to small and medium-sized clients will pale when applied to a large corporation. The report points out that a combination of in-house virtual computing together with tax write-offs involving equipment depreciation offer greater savings. Read it yourself, but bear in mind that service offerings continue to change dynamically. Costs and savings are not static.
- In-house IT support staff. Exactly how many of these lost jobs will be replaced by new opportunities at remote datacenters? With efficiencies of scale, not enough. Especially when a significant number of those datacenters will be located off-shore.
Illustration by The Economist
We’re Hiring, Come Join Us!
Logik is growing and we are looking for more talented, smart, and amazing people to come join our small company. Learn more about the company here.
Here are the open positions:
“Data! Data! Data!” — a Posse List interview with Andy Wilson of Logik

This interview is part of the Posse List’s series “Data! Data! Data!” — Cures for a General Counsel’s ESI Nightmares”. For an introduction to the series click here.
Logik was humbled to be the first company interviewed. Below is a copy of that interview:
Start Interview
Logik is one of the more extraordinary companies to come onto the e-discovery scene. A dynamic company, they derive their name “the formal systematic study of the principles of valid inference and correct reasoning, and; the interrelation or sequence of facts or events when seen as inevitable or predictable.” Or, as in today’s parlance: they’re a lean, mean e-discovery processing machine”.
Located in Washington, D.C., in beautiful loft space across from the National Portrait Gallery, the company counts AM Law 100 law firms and Fortune 500 corporations as its client base.
We had the opportunity to spend a few hours at Logik’s corporate headquarters talking with Andy Wilson.
TPL: Both you and your co-founder, Sheng Yang, were in school together at Virginia Tech, but you didn’t really know each other until after school, correct?
AW: Correct. I actually started college as an English major but moved to Computer Science, graduating with a degree in Business Information Technology in 2001. Sheng and I did meet briefly at Virginia Tech. We both worked at a web design company for a few months, but we didn’t really know each other.
TPL: But straight after school you went home to Kentucky and tried “the entrepreneurial thing” and did independent web design.
AW: Yep. After a year in Kentucky, I realized I wasn’t really in the right place to launch my web career, so I headed to D.C. (proceeds from my tax refund check happily in hand) and began to interview with a multitude of government agencies. The government was in a hiring frenzy with respect to tech/web people. Thing was, as I cruised the various floors of endless cubicles (think “Dilbert” here), I said to myself, “Andy, do you really want to do this?” and the answer was a clear “No.”
TPL: And an opportunity popped up at Driven?
AW: That’s right. I was hired at Driven as a “techie,” which I did for several months before moving into sales. At the time, Driven specialized in various aspects of litigation support including digital reproduction, paper discovery, scanning, copying, printing and graphic design. It was not “e-discovery” per se, but it was fascinating because we sold the accounts and the projects, and we actually did the scanning, printing, blowbacks, etc.
It was incredibly long hours, but I was in my early twenties and didn’t mind. I was able to bring in a few top AM Law 100 law firms in my first year, generating about $1 million in sales over my first 12 months. And it was at Driven then I reconnected with Sheng.
TPL: And an idea was formed ….
AW: Yep! I wanted to go into application development, to create an easy and scalable software platform to do eDiscovery processing. And I found in Sheng a kindred spirit. But, at the time, Driven did not want to become a software company, so we parted ways and Sheng and I went off to start an eDiscovery processing company.
TPL: With an idea … but no clients.
AW: Yep, no clients – we had nothing to sell yet! (laughs) Sheng and I spent about a year writing code and our business plan, living off our credit cards, my wife’s salary… and working from my dining room. In the spirit of Steve Jobs, Apple and his garage!!
TPL: And then enter … Superior Glacier, your first client.
AW: Superior Glacier is an end-to-end litigation support provider focused on marketing in New York, Chicago and Washington, D.C. They first came to us through a friend of a friend. We were expecting the usual “tell us about Logik” – you know, a simple introductory meeting. Which we did on my couch. But then they whip out this DVD and say “Well, we are having issues with this data, what can you do with it?” So, we loaded it up onto our server (and quite frankly we were a little apprehensive thinking “Man, I hope this works!”) and we ran it through the software we developed: Gridlogik™.
TPL: And, of course, Superior Glacier was more or less expecting your response to be “We’ll get back to you in a few days…”
AW: Exactly. Except our software went through its paces and produced the results they were looking for in about 30 minutes. All the files were accurately processed, converted, numbered and exported with ready to import load files.
TPL: And you blew them away.
AW: Pretty much, yes. I don’t think they were expecting two guys in a dining room to solve their eDiscovery problem, but we did.
They had been working on this data set for almost two weeks with no results. We pointed out the problems they were having and how our software identified and fixed them. After they thoroughly rummaged through the output to confirm the results, we got to talking about pricing. They were used to the industry standard, which at the time was to charge based on the number of gigabytes the data extracted to, post processing. (Note to readers: Data extraction is the process of breaking down structured and unstructured data into individual records or documents. For instance, saving attachments from emails as their own documents or extracting files from .zip files is considered data extraction. This process is time-consuming and can result in the original data set exploding in size, often doubling and sometimes tripling from the original size.)
We had then, and still do, a very simple pricing model. We built our technology in a way that allows us to price eDiscovery on the original, non-extracted data size. We engineered a data-mapping algorithm that quickly identifies all documents in a data set without actually extracting it. Basically, it’s our secret sauce. So, this attractive pricing model coupled with a new technology that was very capable got Superior Glacier thinking. They sent us our first “real” project the following week.
TPL: And they paid you $40,000?
AW: Just about, yes. Sheng and I were singing and dancing. We thought “wow, we really are onto something here!” All of our start-up costs were covered and we had enough money left over to buy some more servers.
TPL: Ah, yes, the giddy feeling from the first paying client.
AW: In fact, they flew us up to New York to discuss a potential acquisition. Then we KNEW we had something! However, we took the road less travelled, so to speak, and decided to keep Logik between Sheng and I. Superior Glacier is still a client today and we help them on a few legacy cases.
TPL: I imagine you have had a number of companies who want to license your technology (which we think can usually end badly, resulting in “brand smashing,” among other issues).
AW: There have been a few companies interested in licensing Gridlogik, but we have always turned them down. Right now we are a business-to-business services company. If we licensed our software, we lose control over technology and become only a software company. If someone else used Gridlogik and made mistakes, that would negatively affect our reputation and our brand. We always want to do it right.
TPL: So, we have a very happy Superior Glacier. And then Fried Frank comes on board.
AW: The DC litigation manager at Fried Frank had some complex processing problems that involved unified communications, specifically Bloomberg data. (Note to readers: for some background information on unified communications click here).
While analog is somewhat easy to analyze and parse, unified communication offers one enormous text file. Meaning, you need to know how the software created the file and requiring you to break out the metadata, and so on. It’s much more complex. In the case of Fried Frank’s client, they had about 20 gigabytes of this stuff that needed to be reviewed. And of course, we were eager to do whatever we could to help our new client. So, we modified Gridlogik to quickly parse and piece all the data together into a reviewable format similar to what they were getting with Outlook email reviews. We finished the project in less than two days and the client was very, very happy.
TPL: And this led to Fried Frank referring Williams & Connolly, who referred you to Finnegan and Henderson, etc., etc.
AW: Exactly. We have done very little direct marketing. Almost all of our business, both for law firms and direct corporate clients, has come from referrals. Granted, it’s a somewhat slower customer acquisition process, but we find it beats cold-calling any day, and we’re fortunate to have very loyal, happy clients.
TPL: Tell us a bit about the first work for Finnegan — without mentioning the actual corporate client. You know, confidentiality and all that.
AW: Well, Finnegan had been working with a vendor (client’s choice) who had totally screwed up the data processing on a high-profile matter. The vendor had worked for months on it. There were missed deadlines, incorrect deliverables and poor communication throughout. Obviously, this would frustrate anyone, so Finnegan decided to look elsewhere for help. Logik was recommended to Finnegan by one of our clients and we ended up winning their confidence after processing some sample data. In under two months we re-processed all the data, matched up the already coded documents, and re-produced the data in a much cleaner and consistent manner. And with that, Finnegan become a happy client of ours, too.
TPL: You do a pretty large amount of work for a major top 3 accounting firm, yes?
AW: We do, yes. That work is mainly for rapid “banking-related” document productions to the government. They also work with us on more complicated Lotus Notes projects that they would rather outsource to us. It’s a great working relationship that we value highly.
TPL: Tell us a little bit about your work with predictive coding, that is, the capability to use a small set of (partially) coded documents to predict document coding over the complete corpus. I believe Recommind has done a lot of work in this area.
AW: Sure. Predictive coding is going to be big in the next few years. It makes sense, considering the volume of data lawyers have to review in a finite time frame. To get our feet wet in this space, we participated in the 2009 TREC legal study. It was fascinating and quite challenging, but we learned many useful methodologies to help our clients use advanced machine learning techniques to apply predictive coding to their documents. Like everyone, we are new to this area, but we are putting more resources into it in the coming year. We’re pretty excited about it.
TPL: As we discussed, the big “new new” thing all of last year — at every event we covered — was early case assessment and winnowing relevant data down to reduce the number of documents to review. As the stats bear out, it is the most expensive part of the process. But now we have predictive coding, plus the work being done in computer assisted review as evidenced by Patrick Oot and Anne Kershaw’s study “Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review “, plus the work being done by Google and Microsoft on auto-categorization or auto-coding. Are we headed down the path to where machines can be statistically proven to be as accurate as human review? Is the technology getting to the point where we can also winnow out the eyeballs — contract attorney reviewers?
AW: There’s too much value in humans to take us out of the equation. Technology is just a means to an end, and I don’t think we will see document review sans humans in our lifetime. I do see document review teams getting smaller, focused and more tech-savvy, like your “special ops” reference. With the right set of tools, a small number of tech-savvy attorneys can rip through massive amounts of data in a very short amount of time. The days of expensive linear review are numbered.
TPL: But the legal industry is… to put it mildly … risk adverse. Despite the lamentations of Richard Susskind and Jordon Furlong that the law profession needs to understand the tectonic changes that are occurring, and that change will only come slowly.
AW: “Risk adverse” is definitely putting it mildly, but smaller, younger and more agile firms are starting to sprout up, willing to take on traditional practices with new billing structures and a willingness to use technology in the best interest of the one paying the bills: the client. This follows a similar path as Logik – a small, fast-paced and lean company that can deliver results in a new way using new tools and methods. Things change, and to stay prime you have to stay on the wave of that change.
TPL: In Malcolm Gladwell’s book Outliers, he mentions Skadden, Arps as an example of a firm that has taken an opportune period of time, and some cultural advantages, to give it an edge. He also talks about the “10,000 Hour Rule.” Sort of reminds us of Logik, yes?
AW: I think so. The 10,000 hour rule is an interesting concept. Basically, it comes down to timing, perseverance and practice, practice, practice. We were fortunate to start our company at the right time, just when eDiscovery was starting to get hot in 2004. We persevered through very tough times trying to validate our market and our existence being a niche processing player. And we got a lot of practice. In the first few years, we focused just on eDiscovery processing, exposing us to many unique situations that our competitors haven’t even come across yet. Unfortunately, it meant I saw my wife for maybe only two hours a day for a year. But you do what it takes to succeed. It was a lot of hard work, but also very fun.
TPL: You now have a multimillion dollar business, all done with 14 employees. But you are expanding. Given the nature of your operation, I imagine you need to consider “culture matching” to a great degree.
AW: We do. We have been looking to add an additional eDiscovery project manager … and, in fact, we advertised via The Posse List and got a great response, thank you! Fitting into the culture is super important to us. Logik isn’t for everyone, so it’s tough to find the right person. And we don’t compromise based on someone’s resume. We encourage people to check the website at logik.com for openings.
TPL: Ok, now, a key question: your Logikbot mascot, symbol [shown above] … just what is that thing?
AW: Ha! Logikbot. Well, he’s both, really. Logikbot is a metaphor for who we are; a small team of smart and motivated people offering great technology and service, taking on the established big vendors, we call them BV2000’s. Unlike many companies in this space, we embrace the fact we are small and nimble. It’s a big advantage for us, so there is no reason to act all big and mighty. Our clients and our work speak for itself. Logikbot is akin to the main character in Rudy… who doesn’t want to root for the little guy?
TPL: And this new site you have created called eDDstuff.com. That’s charity-driven, yes?
AW: Absolutely. We’re really happy that eDDstuff.com is a fun, charity-driven destination. We figured the eDiscovery industry needed funny and witty t-shirts, so we created about nine different designs from “eDiscovery ninja” to “eDiscovery nerd.” Ten percent of every purchase goes to a local DC charity, with the rest going to the vendor who makes the actual products. It’s been a huge success since we launched it and the orders have already started coming in.
TPL: Andy, it was a pleasure chatting with you. We appreciate the time you’ve taken.
AW: This has been great, let’s do it again soon. We have some very interesting things coming out in 2010 that we think your “special ops” team will really like.
End Interview
Leaders Portfolio Chats With Logik’s CEO About eDiscovery and Noodles
Leaders Portfolio, an online and radio distributed interview show (leadersportfolio.com), invited Logik’s CEO, Andy Wilson, to chat about how Logik started, what is eDiscovery, and various entrepreneurial experiences. The interview is about fifteen minutes long.
Preliminary Look at a Preliminary Report on Civil Rules
Emery G. Lee III and Thomas E. Willging of the Federal Judicial Center recently released their Preliminary Report to the Judicial Conference Advisory Committee on Civil Rules. You might be thinking to yourself, “Wow, a 191-page preliminary report.. on Civil Rules.. what’s in it for me?” A fair question, but actually this thing is pretty interesting.
To begin at the end, I was intrigued to find 77 pages of feedback from survey respondents classified according to the clients they represent: plaintiff’s attorneys, defendant attorneys and attorneys representing plaintiffs and defendants about equally. Predictably, voices from the plaintiff’s attorney sector are pointing out abuses of discovery perpetrated by defendant attorneys adversely affecting both the duration and cost of the process… while similar voices from across the isle are complaining of discovery abuse on the part of plaintiff attorneys. (Maybe these guys could get together and talk?)
Moving on to the survey’s actual findings, here are a few intriguing statistics all taken from attorney feedback following closed cases:
- Figure 11 – By far, most attorneys (72.4% of plaintiff attorneys and 78.3% of defendant attorneys) reported no ESI disputes in their closed case.
- Figure 13 – Both plaintiff and defendant attorneys reported, for the most part, that the information produced by party-generated discovery was “just the right amount.”
- Figure 14 – A similar majority of both plaintiff and defendant attorneys found that their discovery costs were “just the right amount” when compared to their client’s stakes.
- Figure 17 – Most attorneys, regardless of which side of the “v.” they hail from, agree that “the parties in the named case were able to reduce the cost and burden of the named case by cooperating in discovery.”
- Figure 19 – When asked how discovery costs affected settlement, the most frequent response by far was “no effect.”
- Table 10 – In cases involving discovery costs, plaintiff attorneys estimate those costs to be a median of 1.6% of their client’s estimated stakes. Defendant attorneys estimate a median of 3.3%. (Perhaps this gives credence to the defendant attorneys’ perspective, as mentioned above in survey respondent feedback.)
- Figure 37 – Nearly all respondent attorneys agree that “attorneys can cooperate in discovery while still being zealous advocates for their clients.”
This is just a small sample of what the report has to offer. It’s worth taking a look.
Illustration by Carlos Castellanos © 2008
Logik Launches eDiscovery Apparel Website eDDstuff.com
Fun apparel & merchandise with a serious charitable attitude; perfect for everyone
For Immediate Release
WASHINGTON DC – November 11, 2009 – Just in time for the upcoming holidays, Logik, a Washington, DC-based eDiscovery company, is proud to announce the launch of the world’s first and only eDiscovery apparel and merchandise website at www.eDDstuff.com. Visitors of all ages will discover fun, hip designs professionally printed on a variety of apparel and merchandise, including comfy t-shirts, warm hoodies, large coffee mugs and even downloadable versions of the fun eDiscovery images for both computer and iPhone wallpapers.
“When we told our friends, family and clients that we were launching an eDiscovery clothing and merchandise website,” laughs Andy Wilson, Co-Founder and CEO of Logik, “the one question we heard over and over was… ‘Why? Who does that?’ When we first came up with the idea, it just made sense to us. Logik is energetic and passionate about what we do. We wanted a different way of showing that.”
So, why would an eDiscovery company launch a website featuring whimsical designs on clothing? There’s a simple answer to that: it’s just one more example of how Logik thinks differently than other companies.
“Like us, people who work on eDiscovery projects are passionate about their industry, whether they work for the government, a law firm or a corporation,” says Sheng Yang, Logik Co-Founder and CTO. “The eDDstuff website allows us to enjoy our shared passion. Maybe it’s more like we get to show we’re all part of the same club.”
“You don’t have to be an ‘eDiscovery nerd,’ as one of the designs happily proclaims, to enjoy all the great things on eDDstuff. Buy a shirt for your mom, a hoodie for your honey and download the ‘eDiscovery Ninja’ for your iPhone. Wear your eDiscovery smarts with pride”
— Andy Wilson, CEO
In reality, eDDstuff.com is about more than eDiscovery stuff – it’s also about giving back. Ten percent of every purchase made on eDDstuff.com goes directly to Martha’s Table, a non-profit organization located in Washington, DC (the remainder of the purchase price goes to Zazzle.com who actually makes the products). Martha’s Table’s mission is to help at-risk children, youth, families and individuals in the Washington, DC community improve their lives by providing educational programs, food, clothing and enrichment opportunities.
“You don’t have to be an ‘eDiscovery nerd,’ as one of the designs happily proclaims, to enjoy all the great things on eDDstuff. Buy a shirt for your mom, a hoodie for your honey and download the ‘’ for your iPhone. Wear your eDiscovery smarts with pride,” says Andy.
Get a Shirt. Give a Bite. Visit eDDstuff.com and pick up a mug or a hoodie to keep warm this winter, and give a little back.
About Logik:
Logik is an eDiscovery processing company located in Washington, DC. Number 181 on the 2009 Inc. 500, Logik helps corporations, law firms, government agencies and service providers simplify electronic data sought in discovery requests. Logik’s innovative and highly distributed processing platform, Gridlogik, was developed to process all kinds of unstructured and structured data sets such as email databases, spreadsheets, images and MS Office documents. Combined with their transparent pricing model, Logik offers customers the smart way to discover accurate results and make sense of processing costs. Find out more at logik.com. Media interested in setting up an interview with a representative from Logik should email .(JavaScript must be enabled to view this email address) or call 800-951-5507.
###
Making a Federal Case of the Duty to Produce
In our last post we had a look at the duty to Preserve. Leaving that pickle behind, today we’re moving on to the Duty to Produce. Or, as the Federal Rules of Civil Procedure would term it, the Duty to Disclose.
From a federal context, the duty to disclose has been bundled up nice and tidily in Fed. R. Civ. P. 26. Rule 26 should be examined and addressed early when facing a potential lawsuit because, absent an exemption, some of the required disclosures must be made from the very outset – “without awaiting a discovery request” – including contact details for those who are likely to have discoverable information.
More interesting than Fed. R. Civ. P. 26(a)’s coverage of disclosure’s “whos, whats, whens and hows” are the following subsections and their coverage of Discovery’s Scope and Limits. Although discovery’s potential scope is broad[1], the limitations are numerous including:
- [Upon proper showing,] A party need not provide discovery of [ESI] from sources that the party identifies as not reasonably accessible because of undue burden or cost . . . [that is, unless] the requesting party shows good cause . . . .[2]
- The court must limit discovery’s frequency or extent if it finds requests to be unreasonably cumulative or duplicative, if another source would be less burdensome, if the requesting party had ample prior opportunity to obtain the information sought, and if burden or expense outweighs the requested information’s likely benefit.[3]
- Both information that has been withheld from disclosure and information that has been produced may be subject to a claim of privilege or of protection as trial-preparation material.[4] (Refer to Fed. R. Evid. 502 for specific provisions related to privilege and work product.)
It’s important to note that these duties apply even to those who don’t have a pan on the fire. Fed. R. Civ. P. 34(c), citing Rule 45, points out that even “a nonparty may be compelled to produce documents . . . or to permit an inspection.”
Production format issues seem to have been hammered in the rule book and can be found repeated through Rules 26, 34 and 45. The Advisory Committee’s comment on this issue points out that:
- The rule does not require a party to produce [ESI] in the form in which it is ordinarily maintained, as long as it is produced in a reasonably usable form. But the option to produce in a reasonably usable form does not mean that a responding party is free to convert [ESI] from the form in which it is ordinarily maintained to a different form that makes it more difficult or burdensome for the requesting party to use the information efficiently in the litigation. If the responding party ordinarily maintains the information it is producing in a way that makes it searchable by electronic means, the information should not be produced in a form that removes or significantly degrades this feature.[5]
[1] See Fed. R. Civ. P. 26(b)(1).
[2] Fed. R. Civ. P. 26(b)(2)(B).
[3] Fed. R. Civ. P. 26(b)(2)(C).
[4] Fed. R. Civ. P. 26(b)(5).
[5] Fed. R. Civ. P. 34, Advisory Committee’s Note to the 2006 Amendment.
Happy Halloween!!
Logik + Equinix = Speed n Security
We are very excited to announce some big news at Logik. Our processing power (all of our servers and your data) are now within our new data-center at Equinix. If you drove through Chinatown on your way to work over the past few months you may have noticed some street construction between 7th and 9th streets. Sorry, that was us. We were installing a secure high-speed fiber line into our Equinix data-center.
Who’s Equinix you ask? The world’s leading global data center and interconnection provider (Nasdaq: EQIX). Netflix, DoubleClick, Amazon, Google, and Adobe use Equinix to house their servers, and now Logik does too. www.equinix.com/company/customers

So, what does this mean for you?
Speed
The secure fiber line we had installed gives us gigabit ethernet connectivity to our servers, even though they are now in Ashburn Virginia. We also increased our internet bandwidth by 10x and can now provide lightning-fast FTP uploads/downloads at speeds up to 100mbps. Downloading 10+ gigabytes in just a few hours is now a reality (assuming you aren’t using dial up).
Security
When 80% of the east coast’s internet traffic flows through your data-center it’s safe to assume security at Equinix is top-notch. That being said, here are some of the security highlights: N+1 power redundancy, precision HVAC temperature controls, smoke detection units, fire suppression systems, environmental control, biometric hand-geometry scanners, monitoring via CCTV and tightly controlled access.
Spoliation and the Duties you Do - Preservation vs. Production
If you don’t want to spoliate all over yourself, it’s best to know how to do your duties.
Judge Grimm’s comments on the not-quite-twin duties of Preservation and Production in Goodman v. Praxair Services, Inc. come in the form of an easily overlooked footnote[1], but this is a sidebar worth looking into. Judge Grimm points out that there is “an important difference between the duty to preserve and the duty to produce . . . .”[2] This blog, as the first of a two-part series, will take a closer look at the duty to preserve.
Preserve.
Preservation is the duty with which spoliation comes into play. In Zubulake v. UBS Warburg LLC, (Zubulake V)[3], Judge Scheindlin’s ruling stated that “[s]poliation is the destruction or significant alteration of evidence, or the failure to preserve property for another’s use as evidence in pending or reasonably foreseeable litigation.”[4]
The Sedona Conference Working Group on Electronic Document Retention & Production illustrates litigation holds by stating that “whenever litigation [or a regulatory investigation or proceeding] is reasonably anticipated, threatened or pending against an organization [or natural person], that organization has a duty to preserve relevant information. This duty arises at the point in time when litigation is reasonably anticipated whether the organization is the initiator or the target of litigation.”[5]
Judge Grimm made further reference to Zubulake IV in spelling out this duty: “Once a party reasonably anticipates litigation, it is obligated to suspend its routine document retention/destruction policy and implement a ‘litigation hold’ to ensure the preservation of relevant documents.”[6]
Potentially the best definition of “relevant documents” can also be found in Zubulake IV, including:
[A]ny documents or tangible things (as defined by [Fed.R.Civ.P. 34(a)]) made by individuals “likely to have discoverable information that the disclosing party may use to support its claims or defenses.” . . . [A]lso . . . documents prepared for those individuals, to the extent those documents can readily be identified (e.g., from the “to” field in e-mails) . . . [A]lso . . . information that is relevant to the claims or defenses of any part, or which is “relevant to the subject matter involved in the action.”[7]
You may well have a comfortable grasp of the litigation hold-based duties to preserve, but preservation duties extend beyond the first, basic step of issuing a litigation hold. In July we saw this point reiterated in Pinstripe, Inc. v. Manpower, Inc.,[8] a hearing on the motion for sanctions against Pinstripe for failure to preserve documents relevant to a court proceeding. Again referencing Zubulake, the U.S. Magistrate Judge held that
. . . a party’s issuance of a litigation hold does not end its responsibilities in discovery. The party must see that the litigation hold is complied with, “monitoring the party’s efforts to retain and produce the relevant documents.” . . . This necessarily involves communication with all of the “key players” in the litigation.[9]
Finally, the Federal Rules of Civil Procedure create the back-door Safe Harbor for electronic information system maintenance entitled Failure to provide Electronically Stored Information: “Absent exceptional circumstances, a court may not impose sanctions under these rules on a party for failing to provide electronically stored information lost as a result of the routine, good-faith operation of an electronic information system.” [10]
We’ll move on to the duty to produce in my next post.
[1] __ F.Supp.2d __, 2009 WL 1955805 at *17 n.13 (D.Md. July 7, 2009).
[2] Id.
[3] 2004 U.S. Dist. LEXIS 13574; 85 Empl Prac. Dec. (CCH) P41, 728.
[4] Id. at HN1.
[5] http://www.thesedonaconference.org/content/miscFiles/Legal_holds.pdf at 1.
[6] Goodman, FN 1, at *14 (quoting Thompson, 219 F.R.D. at 100, quoting Zubulake IV, 220 F.R.D. at 218).
[7] Zubulake IV, 220 F.R.D. at 217-18 (footnotes omitted).
[8] Pinstripe, Inc. v. Manpower, Inc., 2009 WL 2252131 (N.D.Okla.).
[9] Id. citing Zubulake v. UBS Warburg LLC, 229 F.R.D. 422, 432 (S.D.N.Y. 2004).
[10] Fed. R. Civ. P. 37(e).
Pics from the Logik Open House and Inc 500 Party
We threw a party in our new office to celebrate our new home and our Inc. 500 award (#181). In case you missed it, check out the pictures we posted. We had a good turnout, about 100 people showed up and enjoyed catered food from Occasions, pool, fresh drinks, and of course some Mario Cart Racing on the Wii. Everyone had a great time. Check out the pics!
Logik Offers NO Sales Tax to Clients
We are excited to announce that, as of today, Logik has qualified to be a High Technology Company in the District of Columbia. Ok, so what does that mean?
It means Logik no longer applies SALES TAX (now at 6% by the way) to any of our invoices. Yes, this is NOT an April Fools joke and is totally legal and legit. It’s kind of like a big 6% discount across the board for all of our valued clients (and future clients-wink). This results in HUGE savings for many of our clients as sales tax can really add up, especially for larger projects. Last year alone we tacked on $200,000 in sales tax.

We love discounts and we hope you do too. Please give us a call or write to us with any questions you may have about this change.
A Grimm View on Spoliation
Goodman v. Praxair
Would you be surprised to hear that Judge Paul Grimm, Chief United States Magistrate Judge for the U.S. District Court for the District of Maryland, holds the Parachutist Badge, the Meritorious Service Medal, the Army Commendation Medal and the Army Achievement Medal? I was. These just aren’t the usual images springing to
mind when one thinks of the small handful of federal judges in the eDiscovery world who have been instrumental in getting eDiscovery’s rules of the game out there with clarity. Lawyers beware; you don’t want to be on his bad side.
So bombs away: True to form, Judge Grimm’s decision in Goodman v. Praxair Services, Inc.[1] packed a punch with its wealth of analysis and rules.
This was a case in which Goodman, a pro se litigant, filed suit for breach of contract based upon non-payment of a success fee. Marc Goodman was hired by the Tracer Research Corporation (“Tracer”) to help secure an Environmental Protection Agency exemption for Tracer’s products. Although Tracer succeeded in winning their desired exemptions, the company refused to pay Goodman the stated fee – stating that other third party consultants were solely responsible for obtaining the exemptions.
The opinion revolved around Goodman’s Motion for Spoliation Sanctions, filed pursuant to Tracer’s failure to institute a timely litigation hold and due to Tracer’s destruction of computers (and files) after the duty to hold had been triggered.
Here is a quick look at a few eDiscovery take-aways from Goodman.
The Two Main Federal Law Sources of a Court’s Authority to Levy Sanctions on a Spoliator. Judge Grimm points to the following:
(1) First, there is the “court’s inherent power to control the judicial process and litigation, a power that is necessary to redress conduct ‘which abuses the judicial process.’”[2]
(2) Second, if the spoliation violates a specific court order or disrupts the court’s discovery plan, sanctions also may be imposed under Fed.R.Civ.P. 37.[3]
The Required Elements for Spoliation Sanctions. The proponent must prove:
(1) [T]he party having control over the evidence had an obligation to preserve it when it was destroyed or altered;
(2) the destruction or loss was accompanied by a “culpable state of mind;” and
(3) the evidence that was destroyed or altered was “relevant” to the claims or defenses of the party that sought the discovery of the spoliated evidence, to the extent that a reasonable factfinder could conclude that the lost evidence would have supported the claims or defenses of the party that sought it.[4]
The Culpability Requirement. The mens rea element included in number two above may be satisfied by one of the following three states of mind:
(1) Bad faith / knowing destruction. “Bad faith” as used here means “destruction for the purpose of depriving the adversary of the evidence.”[5] “Knowing destruction” has been related to “willful,” and “[d]estruction is willful when it is deliberate or intentional . . .”[6]
(2) [G]ross negligence, and
(3) [O]rdinary negligence.[7]
Sanctions Involving Money. When it comes to motions for a reimbursement of discovery costs and attorneys’ fees, four situations give rise to such awards:
(1) First, courts will award legal fees in favor of the moving party as an alternative to dismissal or an adverse jury instruction.
(2) Second, courts will grant discovery costs to the moving party if additional discovery must be performed after a finding that evidence was spoliated.
(3) Third, in addition to a spoliation sanction, a court will award a prevailing litigant the litigant’s reasonable expenses incurred in making the motion, including attorney’s fees.
(4) Fourth, in addition to a spoliation sanction, a court will award a prevailing litigant the reasonable costs associated with the motion plus any investigatory costs into the spoliator’s conduct.[8]
The Timeliness of a Spoliation Motion. Although this element of a motion for discovery sanctions isn’t covered by Fed.R.Civ.P. 37, the following factors have been considered in judicial assessments:
(1) [K]ey to the discretionary timeliness assessment of lower courts is how long after the close of discovery the relevant spoliation motion has been made . . . .[9]
(2) [A] court should examine the temporal proximity between a spoliation motion and motions for summary judgment.[10]
(3) [C]ourts should be wary of any spoliation motion made on the eve of trial.[11]
(4) [C]ourts should consider whether there was any governing deadline for filing spoliation motions in the scheduling order issued pursuant to Fed.R.Civ.P. 16(b) or by local rule.[12] [and]
(5) [T]he explanation of the moving party as to why the motion was not filed earlier should be considered.[13]
In sum, Judge Grimm says that these motions should ideally be filed during the discovery phase in order to accommodate the court’s determination of “when the duty to preserve commenced, whether the party accused of spoliation properly complied with its preservation duty, the degree of culpability involved, the relevance of the lost evidence to the case, and the concomitant prejudice to the party that was deprived of access to the evidence because it was not preserved.”[14]
[1] __ F.Supp.2d __, 2009 WL 1955805 (D.Md. July 7, 2009).
[2] Id. at *9, citing United Med. Supply Co. v. United States, 77 Fed. Cl. 257, 263-64 (2007), and Chambers v. NASCO, Inc., 501 U.S. 32, 45-46, 111 S.Ct. 2123, 115 L.Ed.2d 27 (1991).
[3] Id. citing United Med. Supply Co. v. United States, 77 Fed. Cl. 257 at 264 (2007).
[4] Id. at *12 (citing Thompson, 219 F.R.D. at 101, and Zubulake v. UBS Warburg, LLC, 220 F.R.D. 212, 220 (S.D.N.Y.2003)).
[5] Id. at *19, citing Poell v. Town of Sharpsburg, 591 F.Supp.2d 814, 820 (E.D.N.C. 2008).
[6] Id.
[7] Id. at *18.
[8] Id. at *22.
[9] Id. at *10, citing McEachron v. Glans, No. 98-CV-17 (LEK?DRH) 1999 WL 33601543, at *2 & n.3 (N.D.N.Y. June 8, 1999).
[10] Id.
[11] Id. citing Permasteelisa CS Corp. v. Airolite Co., LLC, No. 2:06-cv-569, 2008 WL 2491747, at *2-3 (S.D. Ohio June 18, 2008).
[12] Id.
[13] Id.
[14] Id. at *11.
Logik Gets New Digs
Logik is moving downtown this month! We will miss our Dupont office, especially since it was our first office and we put so much work into it. But, every growing company has to move on and seek new office space to accommodate that growth. When we first set out to find our new home we knew we wanted something different. Just as we did with our Dupont location, we also needed it to be bigger and definitely have more than one bathroom.
After a long search we found our new home at 707-709 G Street (it’s 2 buildings merged into 1). The space was designed and built by Faison architects, the former tenants. They put a ton of work into the space to make it open and inviting, so we didn’t need to do much more than paint it. We can comfortably fit 40 or so Logik people in the space and we plan on doing just that in the next 4 years. Here are the pics:
Loose Clicks Sink Ships
Need Another Reason to Review your Information Management System? You don’t think so? Here’s one anyway provided on June 4, 2009 by the U.S. Court of Appeals (8th Cir.): American Boat Company, Inc. v. Unknown Sunken Barge.[1]
This case really should be subtitled “Are You Being Served?” – although it sadly lacks in ironic humor or the English accents.
In February a towboat company called American Boat lost one of its towboats, to the tune of $3 Million in damages, in a collision with a hidden submerged barge on the lower Mississippi River. American Boat brought an action against the United States alleging negligence for failure to maintain a navigable channel. Facing
a district court summary judgment for the U.S., counsel for American Boat filed a Motion to Amend Judgment or in the Alternative for Reconsideration.
At this point someone somewhere seems to have dropped the ball.
The District Court issued an adverse final ruling on both of American Boat’s motions. The court notified local counsel of this ruling through email only, via their new electronic notification system, a common practice when attorneys have signed up for a court’s electronic notification system. In this case, American Boat’s local counsel (but not their trial counsel) had signed up for electronic notification. American Boat apparently never received this message which means they failed to appeal this final order within the allowable period. Four months later, too late to file an appeal, American Boat’s attorneys learned of the adverse final ruling through the court’s Public Access to Court Electronic Records (PACER) website. Counsel for American Boat cried foul.
First, it was of no importance that the court sent American Boat’s trial counsel neither electronic nor written notice. The court said it sent electronic notice to the party’s local counsel, thus the party was deemed to be on notice. American Boat’s counsel filed affidavits claiming that the office of local counsel never in fact received the court’s automatically generated email notification. The court kept the denials rolling; finding that counsel had received timely electronic notice as reflected by court docket, the court denied American Boat’s Motion to Reopen. Forced to contend with a presumption of delivery for email sent by the court’s electronic notification system,[2] only an appeal to the Eighth Circuit won American Boat an evidentiary hearing to determine whether the email in question was ever truly received. Following the hearing, once again, American Boat’s Motion to Amend Judgment or for Reconsideration was denied. Expert analysis of counsel’s computers brought the court to the conclusion that although no sign of the email notification could be found on counsel’s hard drives, the law firm had in fact received notice of the district court’s ruling as reflected in the court’s docket entry.
Digging deeper, it seems that a staff member working for local counsel had the job of checking both her own email account and that of American Boat’s attorney. Expert examination of this staff member’s office hard drive revealed no trace of the court’s email Notice. On the other hand, evidence was presented that the Notice was correctly addressed by the court and was successfully received by the law firm’s ISP server. The law firm used an email software program that would POP mail from their ISP’s server to their own local storage once that email is accessed. By default such programs then delete the original record from the ISP server, meaning the email then only exists in its final point of destination. The firm had not changed this default setting, and any item of email resided only on the hard drive of the computer used to access it. Oddly enough, although the staff member sometimes used the firm’s front-desk computer to check emails, nobody went to the trouble to examine that particular hard drive. If the staffer opened the court’s email Notice from the front desk, this Notice would have never made it to her own office computer. An expert witness testified “95 percent” certainty that this is what, in fact, happened.
Based on such evidence, the 8th Circuit upheld the district court’s determination that American Boat did receive the court’s Notice, thus the Motion to Reopen was correctly denied. The 8th Circuit held that “[o]nce the electronic notifications reached the ISP, they were available to local counsel for American Boat, in the same way that a letter that has reached a post office box becomes available to the owner of that box.”[3] The court placed the burden to rebut the presumption of delivery of Notice upon plaintiffs,[4] and providing proof that one computer used by the law firm held no trace of said Notice was not proof adequate to rebut the presumption. Failing to rebut the presumption, American Boat bore the consequences.
So just in case you needed another reason to review your Information Management System, do you know if you’re being served? What you don’t know can really hurt you. Modifying your email software program and tweaking your backup storage policy to ensure retention of your records are small ways to avoid big pain.
[1] No. 08-2166 (8th Cir. 2009).
[2] See Kennell v. Gates, 215 F.3d 825, 829 (8th Cir. 2000).
[3] American Boat Company, Inc. v. Unknown Sunken Barge, No. 08-2166 (8th Cir. 2009).
[4] Am. Boat Co., 418 F.3d at 914 (8th Cir. 2005).
Searching…Sorting Through the Tool Box
You may know what you’re looking for, but do you know how to look?
For those who are engaged in eDiscovery, two cases touching on search methodologies that have held our attention over the past year include Magistrate Judge John Facciola’s decision in U.S. v. O’Keefe[1], and Magistrate Judge Paul Grimm’s decision in Victor Stanley, Inc. v. Creative Pipe, Inc.[2].
Facciola’s harangue regarding the complex nature of ESI searches may have assured his immortality, and it is too good to resist quoting yet again:
Whether search terms or ‘keywords’ will yield the information sought is a complicated question involving the interplay, at least, of the sciences of computer technology, statistics and linguistics…. Given this complexity, for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.[3]
In this vein, Facciola noted that searching was best left to the experts. On the other hand, Grimm, emphasizing cross-party collaboration, sees the creation of search protocols as potentially falling within attorney competency – so long as the attorney has performed quality assurance testing on the methodology selected, can explain the rationale for selecting the methodology, and can show proper implementation.[4]
Facciola and Grimm come by their wariness honestly. The Text Retrieval Conference (TREC) series is a research body co-sponsored by the NIST and the IARPA (Logik is a 2009 TREC participant). The TREC Legal Track “focuses on evaluation of search technology for discovery of electronically stored information in litigation and regulatory settings.”[5] The Overview of the TREC 2008 Legal Track reports that “the consensus Boolean query found 42% of the highly relevant documents, on average per topic, . . . [and 33% ] of all relevant documents.”[6] Further, negotiated Boolean keyword searches were found to be on par with the newer and more complex search methods tested.[7] In fact, keyword searches can be notably strengthened when they are performed in an iterative fashion: sampling the search results, and then adjusting the negotiated keywords to improve the results. Yet it has been observed that although various search methodologies may return a comparable percentage of recall, the actual responsive documents retrieved varies – allowing a higher rate of recall through the use of mixed search technologies on the same data set.[8]
This emerging data, along with recent judicial enthusiasm for the incorporation of concept searching[9], reinforces the idea that attorneys need develop a comfortable working knowledge of the array of electronic data search technologies. The following non-exclusive list of search methodologies and vocabulary is intended as a reference for those who are finding their way through the etymological wrangle and getting to know the eDiscovery landscape:
- Keyword Search: A search through a body of data for a stipulated word or set of words. Keywords are useful in finding documents containing a specific term.
- Boolean Search: Keyword searching with the aid of Boolean operators such as “AND”, “OR”, “NOT”, “W/#”, “( )”, “NEAR”, “TI=( )”, “BEFORE”, “AFTER”, “*”, or “!” (proximity designators, phrase designators, sequencing instructions, and word-trunk expanding instructions) to increase the searcher’s precision in included or excluded results.
- Fuzzy Logic: A search method using non-exact word matching to capture results that include variations of, or misspellings of stipulated search terms.
- Concept Search: The use of sophisticated (and often proprietary) mathematical and linguistic analysis to return results pertaining to the concept and context suggested by your search term(s). The concept upon which your search results are based may or may not be literally present in your search terms, or in your search results.
- Algebraic Search: A search using mathematical models, including Boolean proximity operators, to interpret meaning in a document and to retrieve results accordingly.
- Clustering: The grouping of documents with related content into “clusters,” within which documents are often given a statistical ranking in their relationship to a template or seed document. These documents may be found to be related through an overlap of concepts and contexts, or through an overlap of specific terms. The use of this search method may provide the searcher access to the entire cluster, or may provide the searcher with related, alternative search terms.
- Concept and Categorization Tools: Search methods based on the use of a given thesaurus to return results from documents that express the same concept contained in the search term(s), in an alternative fashion.
- Linguistic Methods: Search methodologies that classify or select text documents based on a given taxonomy, ontology, or thesaurus.
- Naive Bayes Classifier: Based on the Bayesian theorem, a predictive relevance value is assigned to particular words according to their interrelationships, recurrence, a word’s position within a document, and proximity to other search terms.
- Ontologies: An ontology is similar to a taxonomy, but the relationships between terms need not be hierarchical and are broad (including synonyms and associated ideas). Using this search methodology, a searcher entering the term “tort” could pull results from documents containing the terms “litigation” or “damages.”
- Probabilistic Latent Semantic Analysis: In brief, this method of analysis (or indexing) uses a probabilistic model to retrieve text containing polysemy (words having multiple similar meanings) and synonymy (words having the same meaning).
- Probabilistic Search Models (including Bayesian Classifiers): Probability formulas, including Bayesian methods, are used to determine the relevance of documents within a search pool – often incorporating a term’s historical relevance to the particular search performed to rank the search results.
- Social Network Analysis: An analysis and mapping of the interactions or associations amongst sets of nodes (actors, people, entities, information sources) into a complex grid representation of a network. Significance may be found in various factors such as the centrality of a node.
- Taxonomies: The hierarchical classification of terms and ideas into categories or sets, and subcategories or subsets. The use of this tool enables the searcher, for example, to retrieve results from any subcategory of their search query. A search for “tort” could pull results from documents containing the terms “negligence” or “nuisance.”
- Vector Space Retrieval: A search methodology based on the Vector Space Model. This method measures the similarity between documents, premised upon the idea that similarity may be used to indicate relevance. The model represents various documents as vectors in space, with those deemed to be more similar being positioned closer together in space.
[1] U.S. v. O’Keefe, 537 F. Supp. 2d 14 (D.D.C. 2008).
[2] Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251 (D. Md. 2008).
[3] U.S. v. O’Keefe, 537 F. Supp. 2d 14, 24 (D.D.C. 2008).
[4] Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251 (D. Md. 2008) (citing The Sedona Conference Best Practices Commentary on the Use of Search & Information Retrieval Methods in E-Discovery, 8 Sedona Conf. J. 189 (2007)).
[5] http://trec.nist.gov/pubs/trec17/papers/LEGAL.OVERVIEW08.pdf.
[6] http://trec.nist.gov/pubs/trec17/papers/LEGAL.OVERVIEW08.pdf at 5.
[7] Jason Krause, In Search of the Perfect Search, A.B.A.J. (Apr. 2009), http://www.abajournal.com/ magazine/in_search_of_the_perfect_search/.
[8] Id.
[9] See Disability Rights Council of Greater Wash. V. Wash. Metro. Area Transit Auth., 2007 WL 1585452 (D.D.C. June 1, 2007).
Is eDiscovery Processing a Commodity?
Lately I’ve heard quite a few people in the eDiscovery industry throw around the word commodity when discussing processing.
Which begs the question, is eDiscovery processing really a commodity?
If so, why so?
If not, why not?
I’ll start. First things first, my company (Logik), does just eDiscovery processing, so I have an obvious biased response, but I think it’s a logikal one. Just because the market is saturated with over 600 eDiscovery vendors, buyers of eDiscovery services may find it hard to see any real difference and thus focus on price alone.
Which supports the commodity frame of mind. But, unlike grocery stores (that sell commodity goods), saturation in eDiscovery does not equal sameness.
As document formats continue to change and new eDiscovery services come about from processing (think: how do you collect and process data in the “Cloud”?), processing becomes more and more dynamic. This constant change in technology platforms alone makes processing a non-commodity service.
In order for eDiscovery processing to truly become a commodity I think it needs to become indistinguishable from any other vendors product and technology offering.
Has your experience been the same with every eDiscovery vendors technology and product? Why do we still fill out RFPs about our technology, product, and service? If everything was equal among vendors…we wouldn’t need to do these. Price alone would determine everything.
This is my opinion, what’s yours?
Litigation Watch – the Fourth Circuit Whacks a Hack
Earlier this spring, the United States Court of Appeals for the Fourth Circuit took a notable stand to strengthen the Stored Communications Act (SCA).
In Van Alstyne v. Electronic Scriptorium[1] the court broke new ground in SCA litigation by ruling that a civil litigant may be awarded attorney’s fees and punitive damages even in the absence of any proof of actual
damages, although statutory damages will be withheld. In this case the court found that for more than a year a former employer had repeatedly accessed his former employee’s personal email account (as opposed to a company account with privacy waivers) – thus violating her personal privacy.
The operative SCA subsection reads as follows:
(c) Damages.— The court may assess as damages in a civil action under this section the sum of the actual damages suffered by the plaintiff and any profits made by the violator as a result of the violation, but in no case shall a person entitled to recover receive less than the sum of $1,000. If the violation is willful or intentional, the court may assess punitive damages. In the case of a successful action to enforce liability under this section, the court may assess the costs of the action, together with reasonable attorney fees determined by the court.[2]
The court noted that the SCA’s provision for punitive damages and attorney’s fees, found in the second and third sentence of this subsection, lacks the limiting language “actual damages suffered” that can be seen in the SCA’s provision for statutory damages, found in the first sentence of this subsection.
Interestingly, this is a case in which the former employee had initiated employment claims against her former employer, and it seems that the former employer wanted to conduct his own unofficial, hacker-style eDiscovery. Although one would think it goes without saying, eDiscovery is discovery; rules do apply!
[1] 560 F.3d 199 (4th Cir. 2009).
[2] The Stored Communications Act 18 U.S.C. § 2707(c) (1986).
Did “One Size Fits All” Ever Really Work?
The Final Report on the Joint Project of the American College of Trial Lawyers (ACTL) Task Force on Discovery and The Institute for the Advancement of the American Legal System (IAALS)
Change is in the air. On March 20, 2009, an eighteen-month collaboration between the ACTL and the IAALS came to fruition through their joint-release of 29 Principles─marking the launch of a new nationwide
movement to reform both federal and state rules of civil procedure. The Report includes Proposed Principles touching on eDiscovery (see specifically Principles 12 – 18), and in the coming months these two organizations, together with contributing members of the top echelon of the American and Canadian Trial bar, will be working to assist the implementation of these Principles into pilot projects in the U.S. civil justice system.
Flagging inefficiencies, disproportionate costs and delays, the Final Report emphasizes that the civil justice system is “in serious need of repair,” and that “[t]he traditional ‘one size fits all’ application of uniform rules to all cases . . . no longer works.” Many of us are left to wonder if, in fact, it ever really worked. We can watch for these 29 Principles, together with the Sedona Principles, to be instrumental in the retooling of the rules of civil procedure across the United States.
- A complete list of the 29 Proposed Principles follows:
- The “one size fits all” approach of the current federal and most state rules is useful in many cases but rulemakers should have the flexibility to create different sets of rules for certain types of cases so that they can be resolved more expeditiously and efficiently.
- Notice pleading should be replaced by fact-based pleading. Pleadings should set forth with particularity all of the material facts that are known to the pleading party to establish the pleading party’s claims or affirmative defenses.
- A new summary procedure should be developed by which parties can submit applications for determination of enumerated matters (such as rights that are dependent on the interpretation of a contract) on pleadings and affidavits or other evidentiary materials without triggering an automatic right to discovery or trial or any of the other provisions of the current procedural rules.
- Proportionality should be the most important principle applied to all discovery.
- Shortly after the commencement of litigation, each party should produce all reasonably available nonprivileged, non-work product documents and things that may be used to support that party’s claims, counterclaims or defenses.
- Discovery in general and document discovery in particular should be limited to documents or information that would enable a party to prove or disprove a claim or defense or enable a party to impeach a witness.
- There should be early disclosure of prospective trial witnesses.
- After the initial disclosures are made, only limited additional discovery should be permitted. Once that limited discovery is completed, no more should be allowed absent agreement or a court order, which should be made only upon a showing of good cause and proportionality.
- All facts are not necessarily subject to discovery.
- Courts should consider staying discovery in appropriate cases until after a motion to dismiss is decided.
- Discovery relating to damages should be treated differently.
- Promptly after litigation is commenced, the parties should discuss the preservation of electronic documents and attempt to reach agreement about preservation. The parties should discuss the manner in which electronic documents are stored and preserved. If the parties cannot agree, the court should make an order governing electronic discovery as soon as possible. That order should specify which electronic information should be preserved and should address the scope of allowable proportional electronic discovery and the allocation of its cost among the parties.
- Electronic discovery should be limited by proportionality, taking into account the nature and scope of the case, relevance, importance to the court’s adjudication, expense and burdens.
- The obligation to preserve electronically-stored information requires reasonable and good faith efforts to retain information that may be relevant to pending or threatened litigation; however, it is unreasonable to expect parties to take every conceivable step to preserve all potentially relevant electronically stored information.
- Absent a showing of need and relevance, a party should not be required to restore deleted or residual electronically-stored information, including backup tapes.
- Sanctions should be imposed for failure to make electronic discovery only upon a showing of intent to destroy evidence or recklessness.
- The cost of preserving, collecting and reviewing electronically-stored material should generally be borne by the party producing it but courts should not hesitate to arrive at a different allocation of expenses in appropriate cases.
- In order to contain the expense of electronic discovery and to carry out the Principle of Proportionality, judges should have access to, and attorneys practicing civil litigation should be encouraged to attend, technical workshops where they can obtain a full understanding of the complexity of the electronic storage and retrieval of documents.
- Requests for admissions and contention interrogatories should be limited by the Principle of Proportionality. They should be used sparingly, if at all.
- Experts should be required to furnish a written report setting forth their opinions, and the reasons for them, and their trial testimony should be strictly limited to the contents of their report. Except in extraordinary cases, only one expert witness per party should be permitted for any given issue.
- A single judicial officer should be assigned to each case at the beginning of a lawsuit and should stay with the case through its termination.
- Initial pretrial conferences should be held as soon as possible in all cases and subsequent status conferences should be held when necessary, either on the request of a party or on the court’s own initiative.
- At the first pretrial conference, the court should set a realistic date for completion of discovery and a realistic trial date and should stick to them, absent extraordinary circumstances.
- Parties should be required to confer early and often about discovery and, especially in complex cases, to make periodic reports of those conferences to the court.
- Courts are encouraged to raise the possibility of mediation or other form of alternative dispute resolution early in appropriate cases. Courts should have the power to order it in appropriate cases at the appropriate time, unless all parties agree otherwise. Mediation of issues (as opposed to the entire case) may also be appropriate.
- The parties and the courts should give greater priority to the resolution of motions that will advance the case more quickly to trial or resolution.
- All issues to be tried should be identified early.
- These Principles call for greater involvement by judges. Where judicial resources are in short supply, they should be increased.
- Trial judges should be familiar with trial practice by experience, judicial education or training and more training programs should be made available to judges.
The full text of this final report can be found here.
As a practice tip it is important to keep in mind that these are, at present, Proposed Principles. Yet change is in the air, and this just might be a peek into the not-too-distant future of the American system of civil justice.
ALI Stung like a Bee
Microsoft and Linux urged ALI to Float like a Butterfly - ALI Stung like a Bee
Citing a need for flexibility of commercial law and freedom of contract, and hoping for a lighter touch, the Linux Foundation’s and Microsoft’s recent jointly sent open letter to the American Law Institute (ALI) urged a reconsideration of the ALI’s
pending Principles of the Law of Software Contracts. Although competitors in the market, the two software providers came together to point out that the language of the ALI’s forthcoming Principles discriminated among business models, that it would be harmful to the climate of the law surrounding software provision and for related services and support, and that its release should be delayed to allow further input from the software development and user community.
Microsoft and the Linux Foundation took issue primarily with § 3.05(b), which calls for a non-disclaimable implied warranty of no material hidden defects for all transferors that receive “money or a right to payment of a monetary obligation in exchange for the software . . .” The Linux Foundation points out the ambiguity of the concept of “free” under this language, given that providers of “free software” may yet be able to obtain payment (for example, through advertisement delivery or support services). Collectively, the authors cite inconsistencies of the ALI’s draft with the Uniform Commercial Code, general commercial law, and public policy. The authors of this open letter urge that the implied warranty of no material hidden defects should continue to be disclaimable, claiming that this “would cover individual contributors to open source projects, as most open source licenses disclaim warranties and indicate that the software code is provided ‘as is’.”
Despite the united appeals of industry players coming from opposite ends of the software licensing spectrum, in May the ALI unanimously approved the final draft of the Principles of the Law of Software Contracts. Perhaps motivated by the desire to place the risk of defective software on the party best able to manage that risk, § 3.05(b) as referenced above remains applicable in the approved draft. The membership of the ALI might argue that this section only applies in situations where the provider is aware of a material defect that is hidden from the consumer, yet software providers may well be concerned by the unpredictability of a court’s application of “hidden” or “material.”
One way or another, with the highly persuasive nature of the ALI’s Principles, courts are likely to start applying this particular release soon. You can read the Principles of the Law of Software Contracts for yourself, but don’t hold your breath for an open source version. The ALI’s website offers a download – but, of course, for a price.
Will Sotomayor Weigh-In on eDiscovery?
Any edition of the high-court shuffle will always attract attention. Although it is rare to see the Supreme Court ruling specifically on a question of eDiscovery, Court watchers have been interested to see how the addition of Sotomayor might influence the Court in the event of a relevant controversy.
Having specialized in intellectual property while working with the firm of Pavia & Harcourt, and having at times touched on technology over the course of her more than 150 decisions, appeals court judge Sonia Sotomayer has created a record worth speculating over.
Bringing her history in intellectual property to bear, Sotomayer appeared comfortable in technology-based cases when wrote a few Anticybersquatting Consumer Protection Act cases in the early part of this decade. Examples can be found in Storey v. Cello Holdings, L.L.C.[1] and Mattell, Inc. v. Barbie-Club.com.[2]
Perhaps most specifically relevant to eDiscovery was Sotomayor’s opinion in Leventhal v. Knapek.[3] This wasn’t an eDiscovery case per-se, yet it touched on closely related issues. Here the U.S. Court of Appeals for the Second Circuit (located in New York City) considered and rejected a challenge to the actions of a public employer in its search of an employee’s office computer for evidence of alleged work-related misconduct. Plaintiff Gary Leventhal worked as an accountant for the New York State Department of Transportation (DOT). An anonymous letter, in which plaintiff was not mentioned by name, complained of various acts of job-related misconduct in the DOT’s accounting department including tardiness, absence, and excessive personal pursuits and conversations during DOT work time. In response, the DOT conducted its own variation of eDiscovery, compelling a search of plaintiff’s office computer and the computers of other accounting employees for “non-standard” computer programs. In the face of a Fourth Amendment challenge brought by plaintiff, Sotomayor considered that although a public employee has a “reasonable expectation of privacy in the contents of his office computer,” in this case the search and seizure did not violate plaintiff’s rights to due process.
The court acknowledged that “the Fourth Amendment protects individuals from unreasonable searches conducted by the Government, even when the Government acts as an employer.”[4] Sotomayor went on to clarify that “[t]he ‘special needs’ of public employers may, however, allow them to dispense with the probable cause and warrant requirement when conducting workplace searches related to investigations of work-related misconduct.”[5] Finally, Sotomayor stated that “[a] public employer’s search of an area in which an employee had a reasonable expectation of privacy is ‘reasonable’ when ‘the measures adopted are reasonably related to the objectives of the search and not excessively intrusive in light of’ its purpose.”[6]
In this context the Second Circuit ruled the DOT’s search of an employee’s office computer to be justified at its inception and reasonable in its scope – finding that “the searches of his computer were ‘reasonably related’ to the DOT investigation of allegations of [plaintiff’s] workplace misconduct.”
[1] Storey v. Cello Holdings, L.L.C. 347 F.3d 370 (2d Cir. 2003).
[2] Mattel, Inc. v. Barbie-Club.com, 310 F.3d 293 (2d Cir. 2002).
[3] Leventhal v. Knapek, 266 F.3d 63 (2d Cir. 2001).
[4] Nat’l Treasury Employees Union v. Von Raab, 489 U.S. 656, 665 (1989).
[5] Citing O’Connor v. Ortega, 480 U.S. 709, 719-26 (1987) (plurality opinion); id. at 732 (Scalia, J. concurring).
[6] Citing O’Connor, 480 U.S. at 726 (plurality opinion) (internal quotation marks omitted).
Just how itchy is that trigger finger?
Our tip of the hat to Ralph Losey for his early comments on Phillip M. Adams & Associates, L.L.C., v. Dell, Inc., a recent case that has been turning heads everywhere. This case is certainly worth a read, and although it touches on a topic covered in one of our earlier posts, the outcome was surprising enough to be worth exploring again.
This was a case in which the defendant was sanctioned for not implementing a litigation hold, thus eliminating emails and data dated as far back as 1999. The catch: the defendant apparently did not receive notice from the plaintiff of a potential infringement claim until 2005, and claims to have implemented a litigation hold from that point forward.
The Utah Magistrate judge reasoned that the entire computer and component manufacturing industry were essentially on notice of potential litigation (and as a result their litigation holds should have been triggered) in 1999 due to the presence of class action lawsuits against certain players in the industry in 1999 and 2000 based upon claims of defects in floppy disk controllers.
Moving on, the Magistrate found the defendant’s electronic information system architecture, which relied upon each employee to archive or delete their own emails and documents according to company practices, to be unreasonable. The Magistrate referred to the company’s system architecture as one “of questionable reliability which has evolved rather than been planned . . . .”[1] Even more to the point, the Magistrate did not find the defendant to be in possession or control of a coherent document retention policy.
At stake–the ability of companies to rely on the Safe Harbor protections of FRCP 37(e) which reads:
(e) Failure to Provide Electronically Stored Information. Absent exceptional circumstances, a court may not impose sanctions under these rules on a party for failing to provide electronically stored information lost as a result of the routine, good-faith operation of an electronic information system.[2]
A few problems leap out of the pages of this ruling:
- Ralph Losey does a nice job of pointing out that per Rule commentary and case precedent, the good-faith requirement of Rule 37(e) refers to destruction of electronically stored information prior to the triggering of a duty to preserve (rather than the subjective reasonability of the electronic information system itself). He also makes the point that Sedona does not indicate reasonability adjudications for records management systems as a prerequisite for Rule 37(e) protection. On the other hand, one should keep in mind that Sedona does mandate reasonability in email retention policies by saying that “[w]hatever their form, email retention policies must be reasonable in purpose and reasonable as applied.”[3]
- It is also interesting to note that even if there had been class actions in the industry, they had been directed against different companies for a product defect. If the defendant’s trigger for a litigation hold had been tripped, by the Magistrate’s reasoning it would be for an issue different than the one in question: patent infringement.
- By introducing dubious triggers to the litigation hold, this decision tends to weaken the Rule 37(e) Safe Harbor and promotes potentially cumbersome wide-angle data retention any time tangentially related lawsuits are taking place in a particular sector of the market.
- Based on the record in the text of the opinion, why didn’t the defendant do a little more to set up the elements of their Rule 37(e) defense to begin with?
On that last point, the Utah Magistrate judge seemed to be treading on more familiar ground in charging that “[the defendant] offers no statements from management-level persons explaining its practices, or existence of any policies.”[4] If not, why not?
The Magistrate went on to reference Guideline 1 of The Sedona Guidelines: Best Practice Guidelines & Commentary for Managing Information & Records in the Electronic Age (November 2007) as follows: “[a]n organization should have reasonable policies and procedures for managing its information and records.”[5] This built up to the Magistrate judge’s conclusion that “[t]he absence of a coherent document retention policy is a pertinent factor to consider when evaluating sanctions.”[6]
Even if the ruling comes across, on the whole, as heavy handed, and even if this decision is reversed on appeal, these final points are important. While it’s an open guess as to what the Magistrate judge may consider to be a “coherent” document retention policy, (or, for that matter, whether or not most of us would agree with him,) a management-level explanation of the defendant’s practices and policies does not seem hard to deliver. The defendant claimed Rule 37(e) Safe Harbor protection against sanctions for pre-litigation elimination of electronically stored information. When making such a claim, always remember to build the elements of your defense in advance through careful implementation and oversight of a company-wide document retention policy.[7] Then, when you need them, you can argue the elements supporting your Rule 37(e) defense piece by piece.
[1] Phillip M. Adams & Associates, L.L.C., v. Dell, Inc., 2009 WL 910801 (D.Utah March 30, 2009).
[2] Fed. R. Civ. P. 37(e).
[3] The Sedona Conference Working Group on Electronic Document Retention & Production (WG1), The Sedona Conference Commentary on Email Management: Guidelines for the Selection of Retention Policy, 8 The Sedona Conference Journal 239, 240 (Fall 2007).
[4] Phillip M. Adams & Associates, L.L.C., v. Dell, Inc., 2009 WL 910801 (D.Utah March 30, 2009).
[5] Id.
[6] Id.
[7] See ESI Maintenance – Sailing the Safe Harbor, posted March 12, 2009.
How to make a quick-n-dirty histogram
Most people know that Microsoft excel has the capability to produce a wide variety of charts in order visualize data. However, if you find yourself needing to summarize more rows than excel can load or you need to use SQL to provide more flexible data manipulation, Microsoft Access also provides a function called “pivot charts” which allows users to generate quick visual summary of queries.
We’ll start by importing a sample set of data which was obtained from the Internet. The data is in the form of Comma Separated Values, or CSV which is a common data interchange format.
Once the data is imported into it’s own table, we’ll create a new query in design view. This is a key step, as pivot charts are designed to visualize data from queries.
- We’ll add the table
- Select some columns that we’d like to summarize
- And be sure that the ‘Group By’ feature is turned on
- Let’s assume we want to know which position played in the most games for this data set
- Finally, we’ll preview the query
Now that our query is created, we’ll save it with a sensible name.
In order to display this data, we’ll need to select the view menu and select ‘Pivot Chart view’
The Chart field list appears with the fields from our query. Clicking the drop down at the bottom of the field palate shows the different areas of the chart where data can be added. Since we want to know how many games each position played, we’ll add Position to categories and sum of games played to the data area. We can easily see from this summary that the outfielders played more games than any other position.
There is a wide array of chart types to choose from and these reports can be included in reports or other office documents to add impact. It’s a quick and easy way to add visualization to data-based summaries.
Logik Named to Inc. 500 List of Fastest Growing Companies
Out of the 500 companies on this years list, Logik is the top eDiscovery company. Wow! We’re thrilled and humbled.

The Inc. 500 is a list of the fastest-growing private companies in the U.S. This award places Logik in company with past honorees such as Microsoft, Timberland, Intuit, Jamba Juice, Oracle, and Under Armour. We can only hope to reach the same success these companies have achieved.
We couldn’t have done this without the support of our family, friends, partners, vendors, and our ahhh-mazing clients. Thank you so much to everyone that has supported us over the years. We greatly appreciate it.
Click here to see the Inc. 500 page about our ranking
Here is the formal press release we issued about the ranking:
Logik Named to Inc. 500 List of Fastest Growing Companies
DC-Based eDiscovery Company Ranks #181 on the 2009 Inc. 500
For Immediate Release
WASHINGTON DC – August 12, 2009 – DC’s fastest-growing eDiscovery company, Logik, is thrilled and humbled to announce that Inc. magazine has ranked Logik as number 181 on the “2009 Inc. 500” list of fastest-growing companies in America. Out of the 500 companies on the list, Logik is the top eDiscovery company. There are two groups to thank: Logik employees and Logik customers. Both are directly responsible for the achievement.
“Logik started out as a desire to do something better,” says Andy Wilson, co-founder and CEO at Logik. “We worked for a long time in the eDiscovery industry, watching the large vendors shuffle along, and we saw the mind-boggling inefficiencies. We saw insanely high rates. We saw a better way and we took a chance on it.”
Logik began in the dining room of Andy Wilson’s apartment as a simple, straight-forward idea. Provide a faster, more accurate and more budget-friendly way to process eDiscovery data. The hard part was inventing the technology.
“The high ranking by Inc. magazine is a really nice compliment to the technologies we’ve developed to help fill the gap in the industry.”
— Sheng Yang, CTO
“Working out of a dining room has its advantages,” laughs Sheng Yang, Logik co-founder and CTO. “Late night after late night we had munchies and soda readily available. While the sugar fueled the work, it was the desire to start something new, interesting and compelling that drove us on day after day. The high ranking by Inc. magazine is a really nice compliment to the technologies we’ve developed to help fill the gap in the industry. What it tells us is that our customers see the value in what Logik offers. And that’s the best compliment there is.”
With themselves as the sum total of two employees in 2005, Andy and Sheng took their eDiscovery data process invention, GridLogik™, from the dining room table to the client table. An incredibly short four years later, Andy and Sheng are still driven to improve the industry they’ve helped to shape. The biggest difference now is that they work from an office in downtown Washington, DC, and keep their twelve employees busy with a ton of work, constant fun and ping-pong matches.
“We know very well the benefits of a creative, relaxed work-environment,” says Andy. “Our flat organization promotes the sharing of ideas and teamwork, team-think and, more important, individual contribution and inventiveness. The only principle we stand on is the one where we deliver exceptionally well for our customers. We have more new ideas coming from our team than we can ever hope to launch this year. So we work with our team to find the best ideas that offer our customers faster results and a better bottom-line. Because frankly, the better our customers do and the happier they are, the better we do as a company. The proof of that simple concept is the growth we continue to experience.”
That’s exactly the formula Logik continues to follow. Fast processing + smart work + happy employees x extremely happy customers = 181. At least, according the 2009 Inc. 500 list.
“If you want to find out which companies are going to change the world, look at the Inc. 500,” said Inc. Editor Jane Berentson. “These are the most innovative, dynamic, fast-growth companies in the nation, the ones coming up with solutions to some of our most intractable ills, creating systems that let us conduct business faster and easier, and manufacturing products we soon discover we can’t live without. The Inc. 500 list is Inc. magazine’s tribute to American business ingenuity and ambition.”
“Though I have to say,” Sheng wistfully admits, “It’s nice to have a dinner party now without a server rack sitting next to the dining room table.”
About Logik:
Logik is an eDiscovery processing company located in Washington, DC. Logik helps corporations, law firms, government agencies and service providers simplify electronic data sought in discovery requests. Logik’s innovative and highly distributed processing platform, Gridlogik, was developed to process all kinds of unstructured and structured data sets such as email databases, spreadsheets, images and MS Office documents. Combined with their transparent pricing model, Logik offers customers the smart way to discover accurate results and make sense of processing costs. Find out more at logik.com.
Media interested in setting up an interview with a representative from Logik should email .(JavaScript must be enabled to view this email address) or call 800-951-5507.
###
In the Cloud, Warrants are for the Birds?
I see skies of blue – and clouds of white – a bright blessed subpoena! You mean warrant, right?
Nope. We respect you for trying, but they meant subpoena. (…what a wonderful world…)
In U.S. v. Weaver, a Seventh Circuit district court addressed the question of whether a court can, via subpoena, compel an Internet Service Provider’s
(Microsoft’s) production of a subscriber’s opened emails which are less than 181 days old. 2009 WL 2163478 (C.D.Ill.) This was a case of first impression for the Seventh Circuit, and it clarified Theofel v. Farey-Jones, a previous Ninth Circuit ruling. 359 F.3d 1066 (9th Cir. 2004). Whereas the court in Theofel found that circumstances called for the use of a warrant, the Seventh Circuit in Weaver said that a subpoena would suffice.
In Weaver, seeking Defendant’s Hotmail records, the Government submitted a trial subpoena to Microsoft requiring the production of “the contents of electronic communications [emails] (not in ‘electronic storage’ as defined by 18 U.S.C. § 2510(17)).” The Government specified that this would “include the contents of previously opened or sent email.” Microsoft, in turn, replied that due to its location (headquartered Redmond, Washington, in the Ninth Circuit), it was bound by the Ninth Circuit precedent found in Theofel which required the use of a warrant to obtain such records from an ISP.
The Hon. Jeanne Scott of the Central District of Illinois pointed to the Stored Communications Act 18 U.S.C. § 2701, et seq., and the Wiretap Act 18 U.S.C. § 2510, et seq. to resolve the issue. The language leading to a warrant requirement in Theofel was found in § 2703(a), stipulating that governmental entities requiring disclosure by providers of electronic communications service of electronic communications in electronic storage for 181 days or less must obtain and present a warrant based upon probable cause. Yet on the other hand, subsection (b) allows the government to procure similar emails less than 181 days old that are “held or maintained . . . solely for the purpose of providing storage or computer processing services to such subscriber or customer . . . .”
The question: in Weaver, were Defendant’s emails “in electronic storage” – subject to the warrant requirement – or were they “held or maintained . . . solely for the purpose of providing storage or computer processing services” (etc.) and thus available via subpoena?
As defined by the Wiretap Act, because the emails were opened, the only way they could have been in “electronic storage” would be if they were in storage “for purposes of backup protection of such communication . . . .” 18 U.S.C. § 2510(17)(B). This is where the facts in Weaver differ from those in Theofel. In Theofel the Ninth Circuit was addressing an email system in which users downloaded messages from their ISP to their local hard drive. With such systems, residual copies of an already-downloaded email remaining on the ISP’s server could be kept for backup protection until the user’s copy “expire[s] in the normal course.” Theofel, 359 F.3d at 1070. Yet today we’re seeing more and more use of web-based (cloud-based) email systems.
In Weaver, this was the situation addressed by the Seventh Circuit. Here, the Defendant was using Microsoft’s cloud-based Hotmail email. The Ninth Circuit itself, in Theofel, pointed out that “[a] remote computing service might be the only place a user stores his messages; in that case, the messages are not stored for backup purposes.” Id.
So both courts appear to agree that web-based email falls into the provisions of § 2703(b), meaning the government is free to compel production from an ISP via subpoena. Or, per the same subsection, the government may even compel production without notice if it wishes to secure a warrant. But going even further, the Seventh Circuit faulted the Ninth Circuit’s analysis as “unpersuasive” and as out of step with “legislative history and other provisions of the Act.” In drafting the Stored Communications Act, the drafters noted the case in which an addressee receives an email yet chooses to leave it in storage on the ISP’s server for later re-access. The drafters said that “such communication should continue to be covered by section 2702(a)(2)” – a section that reads identically to the provision allowing the Government access by trial subpoena.
In light of statements that have come from the executive branch along with various other court decisions referring to a diminished “expectation of privacy” in the use of cloud-based computing, the lowered level of Due Process protection represented by this decision really isn’t such a surprise.
Sailing the safe harbor
BRING OUT YOUR DEAD…documents. If your company goes to court, and your opponent’s discovery request includes dead files or electronic files previously deleted from your archives, have you secured safe harbor protections against court sanctions? In order to do so, here’s a quick set of guidelines:
- Adopt a company-wide document retention policy, defining the time frames within which specific categories of documents must be retained (according to file type, local and federal law, and industry standards).
- Eliminate files only after their defined retention period expires.
- Consistently implement this policy, throughout your company.
- In the event that you might reasonably anticipate litigation, implement a “legal hold” policy defining and executing the process by which relevant information is identified, preserved, and maintained for discovery purposes.[1]
- Enjoy safe harbor protections for files deleted in the course of database management (as defined by your document retention policy), falling outside of the context of the aforementioned legal hold.
- In this context, an adverse inference or other court sanction for spoliation of evidence would require the following three elements:
- that the party having control over the evidence had an obligation to preserve it at the time it was destroyed;
- that the records were destroyed with a ‘culpable state of mind’ and
- that the destroyed evidence was ‘relevant’ to the party’s claim or defense such that a reasonable trier of fact could find that it would support that claim or defense.[2]
The corporate records you maintain as electronically stored information (ESI)─now including email, voice messages, proposals, sales documents, contracts, legal documents, tax records, employment records, Board minutes, and press releases amongst other important files─are both assets and potential burdens to your company.
Having extensive records at your fingertips will enable smooth operations by informing you in your transactions with existing and potential clients, by allowing market analysis and company forecasts, and potentially by protecting you in the event of a lawsuit.
Yet every coin has its flip-side. In many companies, massive proliferation of ESI threatens to bog down their storage capacities. Picture the deluge in physical terms: if one gigabyte of data would, (roughly speaking) require enough pages of print-out to fill a pick-up truck to capacity imagine the one thousand twenty four trucks that would be required to hold a terabyte. Such volumes are often seen in today’s large enterprises.
Understandably, in this post-Enron and Sarbanes-Oxley era, when in doubt regarding the decision to retain or delete a document, many have chosen to avoid potential liability by opting to retain. Section 404 of the Sarbanes-Oxley Act of 2002 ties executive liability to, among other things, the presence of effective internal controls.[3] A reasonable and functioning document retention policy could be a relevant metric in the examination of these controls. Even more directly, the errant destruction of an electronic file may lead to the inability to produce a document requested during the discovery phase of a lawsuit. Rule 37 of the Federal Rules of Civil Procedure (Fed. R. Civ. P.) levies sanctions upon parties to a dispute (including an unfavorable default judgment) for a failure to make disclosures or to cooperate in discovery.[4]
This Scylla and Charybdis, the opposing hazards of run-away databases on the one hand and over-zealous ESI culling on the other hand, can be fairly easily avoided. Fed. R. Civ. P. 37(e) creates the following Safe Harbor for “Failure to provide Electronically Stored Information”:
Absent exceptional circumstances, a court may not impose sanctions under these rules on a party for failing to provide electronically stored information lost as a result of the routine, good-faith operation of an electronic information system.[5]
It is vital to note from the outset that this Rule does not open the doors to old-style record elimination─such an interpretation would simply amount to federal common law spoliation of evidence[6]─yet a Safe Harbor may be sailed via the well-crafted creation of and compliance to a document retention policy. Two questions that should be asked are: 1) How do I create a Safe Harbor for my company’s elimination of ESI? 2) How do I ensure that my company doesn’t sail beyond the boundaries of the Safe Harbor?
Creating your Safe Harbor begins with the drafting and implementation of your document retention policy. Your company’s document retention policy (including your digital document retention protocol) should be your best friend. By creating guidelines regarding the period of time during which different categories of documents should be maintained you will be able to free your hard drives from their accumulation of digital detritus, and at the same time you can ensure that your employees do not eliminate files relevant to a potential litigation (a.k.a. covering your back-side). In setting the time frames within which a particular file of ESI should be maintained, consideration must be given to local and federal laws,[7] to the standards of the industry within which your company fits (as well as to the type of ESI in question (see the types of corporate records discussed at the beginning of this article).
Next, a company must take care to put a “legal hold” on their routine when the shadow of litigation appears. Even if a file is scheduled for routine culling, the act of eliminating a piece of relevant ESI in such circumstances would remove the Safe Harbor protections of Fed. R. Civ. P. 37(e), exposing you fully to Rule 37’s sanctions (including the default judgment). The Sedona Conference Working Group on Electronic Document Retention & Production has produced an excellent commentary entitled The Trigger & The Process.[8] This commentary clarifies the circumstances in which a legal hold should be placed on ESI, providing eleven useful guidelines expanding upon the following statement:
The law has developed rules regarding the manner in which information is to be treated in connection with litigation. One of the principal rules is that whenever litigation [or a regulatory investigation or proceeding] is reasonably anticipated, threatened or pending against an organization [or natural person], that organization has a duty to preserve relevant information. This duty arises at the point in time when litigation is reasonably anticipated whether the organization is the initiator or the target of litigation.[9]
So far, so good – but a policy per se will not be sufficient. Any compliance team will be quick to point out that the human element can be one of the trickier elements to tackle when implementing a new policy. As basic as it sounds, a strong dose of oversight is required to ensure that your policy is, in fact, executed. In the Corporate Counsel Section of the New York State Bar Association annual meeting, panelist Kenneth Rashbaum (of Fios, Inc.) pointed out that the “most critical aspect of record retention policies and e-mails is employee education . . . employees won’t follow what they don’t understand. . . .”[10] In addition to explaining what records fall into particular categories of your policy, another panelist (Eva L Jerome of Bryan Cave LLP) pointed out that upon implementation of a legal hold, “oral instruction immediately followed by a written one . . .” should be given to all potential data custodians, followed by “ongoing monitoring of compliance, including sending out periodic reminders of the hold and recertifications. . .”[11]
If you need to create, implement, or oversee a digital document retention protocol, discuss these issues with your attorney. Ensure compliance. Discover and sail the sheltered waters of your safe harbor. Put a legal hold on your ESI if the shadow of litigation presents.
[1] www.thesedonaconference.org/content/miscFiles/Legal_holds.pdf at 2.
[2] Zubulake v. UBS Warburg LLC, 229 F.R.D. 422, 430 (S.D.N.Y.2004).
[3] Sarbanes-Oxley Act, Pub. L. No. 107-204, § 116 Stat. 745 (2002).
[4] Fed. R. Civ. P. 37.
[5] Id. at (e).
[6] See, e.g., Silvestri v. General Motors, 271 F.3d 583 (4th Cir. 2001).
[7] See, e.g., Fair Labor Standards Act of 1938.
[8] www.thesedonaconference.org/content/miscFiles/Legal_holds.pdf
[9] Id. at 1.
[10] Alessandra Scalise, Corporate Counsel program reviews e-record management, N.Y. St. B.A. State Bar News, Mar.-Apr. 2009, at 24.
[11] Id. at 27. See also Zubulake v. UBS Warburg LLC, 229 F.R.D. 422, 432 (S.D.N.Y. 2004).
Electronic Document Management Systems in 2009
AIIM’S Revised Recommended Practices
Get ready for an acronym or two. Oh what the heck, make it seven. No Glossary Needed (NGN).
In June the non-profit Association for Information and Image Management (AIIM), an official ANSI-approved Standards Development Organization, approved and
released the updated 2009 version of AIIM ARP-1-2009: Recommended Practice – Analysis, Selection, and Implementation of Electronic Document Management Systems (EDMS).
AIIM focuses on the tools and modes used in Enterprise Content Management (ECM – or elterprise-level data management) standards. This vendor-neutral report was prepared by industry experts and approved under a Standards Board including members of the U.S. District Courts, Microsoft, Adobe Systems Inc., and OpenText. The new practice guidelines address the analysis, selection and implementation procedures associated with electronic document management, starting with a description of the technologies currently being used by companies to store and manage ESI, the report details current industry standards, and finishes up with a review of industry best practices.
In their June 16 report, AIIM quoted former general counsel and securities regulator Virginia Jo Dunlap in stating that “[c]ompanies that will be facing any type of E-Discovery requests should pay close attention to ARP-1 as it provides guidance on the critical first steps toward being able to certify to courts or regulators that the documents produced are accurate.”
A complete copy of this report can be downloaded here.
Passing the bucks – when and why to expect cost-shifting
How do you feel about “going Dutch?” You may or may not have strong feelings about being asked to split a dinner tab, but my money says that you’ll have even stronger feelings about splitting a discovery “tab.” This is a brief look at when to expect cost-shifting in eDiscovery.
When?
From the outset, keep in mind that eDiscovery cost-shifting is an extraordinary remedy. Court modifications of discovery requests (including cost-shifting) are not a given. In fact, the benchmark decision of Zubulake 1 points out that in many typical discovery requests a consideration of cost-shifting would be wholly inappropriate.[1]
In general, courts should deny burdensome requests for data in the absence of a reasonable prospect that the data will contribute significantly to discovery.[2]
The remedy is more likely to arise where a request might be burdensome upon the recipient, but that burden is coupled with a justification – a demonstration of substantial need by the requesting party.[3] While a motion to limit a discovery request or to shift a portion of discovery costs to the requesting party remains a matter of court discretion, clear guidance has been provided in several compelling sources – allowing us the benefit of a few reasonable predictions.
Typically expect cost-shifting when…
- a party is compelled to recover and produce deleted data – deleted as a result of the routine, good-faith maintenance of an electronic information system;[4]
- a party is compelled to recover and produce data from recovery / backup tapes;[5]
- a party is compelled to recover and produce residual data;[6]
- a party is compelled to recover and produce legacy data;[7]
- the aggregate volume of data requested outstrips the needs of the requesting party;[8]
- the requesting party has disproportionately greater resources than the party from whom the data is sought;[9] or
- there is a lack of reasonable likelihood that the requested evidence will lead to the discovery of admissible evidence.[10]
Don’t expect cost-shifting when…
- a party may reasonably anticipate litigation, yet failing to place a legal hold on relevant data, that party allowed relevant data to be deleted;[11]
- in spite of the fact that the production of certain data would be unduly burdensome, a party agreed to a stipulation ordering production of the data in question;[12] or
- the data requested is reasonably accessible, meaning compliance would not be unduly burdensome or costly.[13]
What?
Where do these factors come from? What does “reasonably accessible” mean? In a federal context, eDiscovery requests are at the discretion of the court. Fed. R. Civ. P. 26 notes that where the production of ESI is found to be unduly burdensome (where the ESI is not reasonably accessible) the court may “specify conditions for the discovery.”[14] So how do we recognize undue burden or cost, the lodestar for data that is not reasonably accessible? The Federal Rules of Civil Procedure, The Sedona Principles, and case law all shed light on these questions.
Reasonable Availability and Undue Burden in Context…
- Rule 26 provides a proportionality standard to be used when a court steps in to specify discovery conditions, incorporating the factors of IT feasibility, balancing its burden or expense against “its likely benefit, considering the needs of the case, the amount in controversy, the parties’ resources, the importance of the issues at stake . . . and the importance of the discovery in resolving the issues.”[15]
- Principle 13 of The Sedona Conference Working Group on Electronic Document Retention & Production states that “if the information sought is not reasonably available . . . in the ordinary course of business, then, absent special circumstances, the costs of retrieving and reviewing such electronic information may be shared by or shifted to the requesting party.”[16] The bald use of “reasonably available” and “ordinary course of business” may be vague on its own, but Comment 13.a. provides eight factors to determine whether cost-shifting should occur in the production of burdensome ESI:
- whether the information is reasonably accessible as a technical matter without undue burden or cost;
- the extent to which the request is specifically tailored to discover relevant information;
- the availability of such information from other sources, including testimony, requests for admission, interrogatories, and other discovery responses;
- the total cost of production, compared to the amount in controversy;
- the total cost of production, compared to the resources available to each party;
- the relative ability of each party to control costs and its incentive to do so;
- the importance of the issues at stake in the litigation, and
- the relative benefits of the parties of obtaining the information.[17]
- In Zubulake I, judge Scheindlin wrote that “whether production of documents is unduly burdensome or expensive turns primarily on whether it is kept in an accessible or inaccessible format,”[18] and went on to clarify that metric with the following seven-point test:
- The extent to which the request is specifically tailored to discover relevant information;
- The availability of such information from other sources;
- The total cost of production, compared to the amount in controversy;
- The total cost of production, compared to the resources available to each party;
- The relative ability of each party to control costs and its incentive to do so;
- The importance of the issues at stake in the litigation; and
- The relative benefits to the parties of obtaining the information.[19]
The first two factors are generally the weightiest, but factor six takes precedence if the case is one of broad, important impact.[20] This calculus is objective; a sampling of the requested data is required to allow an analysis of these factors.
What you should do…
Review these factors (and any corresponding state/local law) to see whether an eDiscovery request is likely to fall within a precedent for cost-shifting. It remains vital that in the shadow of anticipated litigation, you maintain viable records of your relevant data.[21] Be prepared, and take the mystery out of “going Dutch.”
[1] Zubulake v. UBS Warburg LLC, 217 F.R.D. 309 (S.D.N.Y. 2003) (drawing a distinction between accessible email files on optical discs and less accessible email files on backup tapes).
[2] The Sedona Conference Working Group on Electronic Document Retention & Production (WG1), The Sedona Principles: Second Edition, Best Practices Recommendations & Principles for Addressing Electronic Document Production, Comment 13.b. (June 2007).
[3] Id.
[4] Id. at Comment 13.a.
[5] Id.
[6] Id.
[7] Id.
[8] Id. See Fed. R. Civ. P. 26(b)(2)(C)(iii).
[9] Id. See Fed. R. Civ. P. 26(b)(2)(C)(iii).
[10] See Fed. R. Civ. P. 26(b)(2)(C), Fed. R. Civ. P. 26(c), see also Zubulake v. UBS Warburg LLC, 217 F.R.D. 309 (S.D.N.Y. 2003).
[11] See http://www.thesedonaconference.org/content/miscFiles/Legal_holds.pdf. See also Procter & Gamble Co. v. Haugen, 2003 WL 22080734, No. 1:95CV94 DAK (D. Utah August 19, 2003).
[12] In re Fannie Mae Securities Litigation, 552 F.3d 814 (2009).
[13] Fed. R. Civ. P. 26(b)(2)(B).
[14] Fed. R. Civ. P. 26 (b)(2)(B).
[15] Fed. R. Civ. P. 26 (b)(2)(C).
[16] The Sedona Conference Working Group on Electronic Document Retention & Production (WG1), The Sedona Principles: Second Edition, Best Practices Recommendations & Principles for Addressing Electronic Document Production, at 67 (June 2007).
[17] Id. at Comment 13.a.
[18] Zubulake v. UBS Warburg LLC, 217 F.R.D. 309 (S.D.N.Y. 2003).
[19] Id. at 322.
[20] Id.
[21] See FN 9 supra, and accompanying text.
Reading the pulse of the industry
Open wide and say “ahh.” Um-hmm… interesting.
These days we seem to be surrounded by various pronouncements and diagnostics on the health of the economy. Sometimes these seem to be counter-intuitive. Consecutive months of increased spending (rising 0.5%) at the same time as a 1.3% fall in personal income? (Consumer spending rose again in June.)
More people are filing first-time claims for unemployment benefits, but the trend is improving? Somehow I think the idea that “the pace of decline [has] moderated” can cut both ways.
This climate has certainly had an effect in the realm of law, litigation and legal services. Newspapers from China to the UK are running stories on the New York information technology graduate who is suing to recover her college tuition after finding herself unemployed. Within the industry itself, we’ve heard plenty of news and advice regarding law firm dissolution and downsizing, layoffs, and associate/staff furloughs. This news often seems to find its way around through rumors and speculation, but a fair amount of advice comes from the ABA itself.
In the world of eDiscovery, the truth is that for those who are willing to keep pace with the cutting edge of technology and the rapid evolution of the law, the opportunities to succeed are not-so-hidden. George Socha and Tom Gelbmann (of the Socha-Gelbmann Electronic Discovery Survey) have pointed out the growth and strength of new, creative and innovative Electronic Data Discovery (EDD) providers with “strong, scalable, sophisticated advanced search tools.” The industry is growing along with them. Survey participants expect the eDiscovery market to expand by “about 30% throughout 2009 and about 25% in 2010.” The continuing increase in the volume of data processed by eDiscovery providers would seem to substantiate these expectations.
Gifted programmers, project managers, and attorneys have been steering their careers in this direction for some time now, finding plenty of scope for career development. Law firms and corporations have also been eager to hire experienced EDD professionals, yet for all the demand and need, the workers seem to be few. Socha and Gelbmann’s survey participants perceive there to be “no more than 100 to 200 lawyers in the entire country [who] really ‘get’ EDD.”
In this industry, opportunities are no longer reserved for seniority. Court decisions are beginning to spring up around the country, to the accompaniment of rewritten portions of the Federal Rules of Civil Procedure. In contrast to the more glacial pace at which most areas of the law develop, the law, substance and methods of eDiscovery have been on a growth spurt – struggling (and in some cases failing) to keep up with an even more rapid development in electronic communications. In fact, a sobering picture of the court’s reaction when an attorney hasn’t kept pace with the latest developments can be seen clearly in Chen v. Dougherty, 2009 WL 1938961 (W.D. Wash. July 7, 2009). Referring to an experienced litigator’s failure to submit search terms, the court said that her “inhibited ability to participate meaningfully in electronic discovery tells the Court that she has novice skills in this area and cannot command the rate of experienced counsel.”
This application of emerging law to an ever-developing sea of potential discovery sources (from social networking engines to cloud-based platforms for applications and data storage) has placed the digitally-native generation at a double advantage: 1) Comfort with the technology through immersion in the amorphous world of electronic social networking, applications and electronic search procedures; and, 2) A rare opportunity to begin a career on something close to level ground (that is, more level than a junior associate’s usual starting position) when it comes to knowing the rules of the game.
Sarek, Spock’s Dad
Logi(k) offers a serenity humans seldom experience in full.
How to hook it up right
Prevent data spoilation by using a simple write-blocking device. They are fairly cheap (~$270 @ tableau.com ) and well worth the price considering spoiling data may just ruin your whole day.
Connecting a hard drive to a computer seems simple enough. But if you want to avoid modifying the metadata on the drive you will need to use a write-blocking device that prevents the hard drive from updating the metadata. This is very important, especially for legal discovery where metadata should always be preserved to avoid spoilation.
Logik is such a TREC-ee
Not the Star Trek kind of Trekky, well, maybe considering the high likelihood most participants (all tech companies) have seen all 7 movies. No offense, live long and prosper TREC participants. This TREC is more about going where no text retrieval algorithm has gone before, and less about finding new planets, although you could make an argument for it…but I digress. TREC stands for the “Text REtrieval Conference” and is co-sponsored by the National Institute of Standards and Technology (NIST) and U.S. Department of Defense. We are very proud to be a participant in their 2009 TREC study. So, what does that mean exactly?
![]()
TREC gave us a set of very LARGE data, The Enron emails, a subpoena, and said figure it out. Easy enough right? Maybe for The Enterprise crew this is easy, but for most eDiscovery companies, including us, this is a major challenge and one that has significant meaning. Our job will be to use what we know about search within discovery and find all the relevant emails and attachments that relate to the subpoena. This requires more than just your standard set of boolean keyword searches. We will need to use more powerful text retrieval algorithms to find the needles in the haystack.
None of the participants are allowed to post their results, even if they find every single document relevant to the subpoena. Although it’s probably every marketers dream to post the results (assuming they are good), TREC is smart to not allow it. Each participant is required to publish their results, the tools they used, etc. to TREC by September 7th, 2009. So, the clock is ticking. Hopefully, more advanced and accurate methods for text retrieval will come out of this process. If only the good people at NIST offered up a Netflix-like $1m prize (http://logiik.com/L)...sigh. Wish us luck.
Here is a brief intro into TREC taken from their website: http://trec.nist.gov/
The Text REtrieval Conference (TREC), co-sponsored by the National Institute of Standards and Technology (NIST) and U.S. Department of Defense, was started in 1992 as part of the TIPSTER Text program. Its purpose was to support research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies. In particular, the TREC workshop series has the following goals:
- to encourage research in information retrieval based on large test collections;
- to increase communication among industry, academia, and government by creating an open forum for the exchange of research ideas;
- to speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements in retrieval methodologies on real-world problems; and
- to increase the availability of appropriate evaluation techniques for use by industry and academia, including development of new evaluation techniques more applicable to current systems.
TREC is overseen by a program committee consisting of representatives from government, industry, and academia. For each TREC, NIST provides a test set of documents and questions. Participants run their own retrieval systems on the data, and return to NIST a list of the retrieved top-ranked documents. NIST pools the individual results, judges the retrieved documents for correctness, and evaluates the results. The TREC cycle ends with a workshop that is a forum for participants to share their experiences.
This evaluation effort has grown in both the number of participating systems and the number of tasks each year. Ninety-three groups representing 22 countries participated in TREC 2003. The TREC test collections and evaluation software are available to the retrieval research community at large, so organizations can evaluate their own retrieval systems at any time. TREC has successfully met its dual goals of improving the state-of-the-art in information retrieval and of facilitating technology transfer. Retrieval system effectiveness approximately doubled in the first six years of TREC.
TREC has also sponsored the first large-scale evaluations of the retrieval of non-English (Spanish and Chinese) documents, retrieval of recordings of speech, and retrieval across multiple languages. TREC has also introduced evaluations for open-domain question answering and content-based retrieval of digital video. The TREC test collections are large enough so that they realistically model operational settings. Most of today’s commercial search engines include technology first developed in TREC.
Electronic gadget blackout
Planning to use your electronic exhibits in court? On June, 29, 2009, the United States District Court for the Southern District of New York announced an interim measure that denied attorneys permission to bring their laptop computers (in addition to other electronic devices) through security and into the Courthouse short of a specific court order “authorizing a specific attorney to bring a specific electronic device into the building for a specific proceeding.”
This draconian little measure stemmed from the local judges’ concerns that laptops can contain bombs, and that personal electronic devices can make prohibited recordings during a proceeding. The new procedure came from the Southern District’s Ad Hoc Committee –
superimposed upon the Court’s Local Civil Rule 1.8, reading “[n]o one other than court officials engaged in the conduct of court business [or federal prosecutors and defenders] shall bring any camera, transmitter, receiver, portable telephone or recording device into any courthouse or its environs without written permission of a judge of that court.” The big difference between the old rule and the new is that under Local Rule 1.8 it was common for judges to sign blanket orders allowing an array of unspecified electronic devices, but the interim rule requires a whole lot of specificity concerning the who, the what, and the when. Simply put, this means a good deal fewer tools in the courthouse.
Although this new policy affected only the US District Court for New York’s Southern District, it’s the big one. This is the largest of all US District Courts with up to 1/5 of all civil litigation pending before the Federal Courts. You can read about its history here. The oft-quoted Judge Learned Hand presided in this court from 1909 to 1924, and might once again be quoted in this context: “Life is made up of constant calls to action, and we seldom have time for more than hastily contrived answers.” (Speech in New York City, 1952.)
While this measure is in effect, greater advance planning is a must for an attorney wishing to display electronic evidence exhibits or other electronic exhibits ─ or to simply consult your calendar or case notes. On a positive note, it’s likely that this interim measure will never make into the permanent Rules. The Federal Bar Council (among others) is submitting proposed amendments to Local Civil Rule 1.8 relaxing the standard to allow attorneys the use of their electronic devices, but subject to strict use conditions and perhaps subject to the use of a Secure Pass ID. The Southern District’s Ad Hoc Committee on Cell Phones met to receive public comment on July 29.
Logik.com is launched!
After six months of development, we are finally launching our new site, which you happen to be on now. Mind blowing, I know. Here is the formal press release we issued about the site:
NEW LOGIK WEBSITE ENLIGHTENS, EDUCATES AND ENTERTAINS
Enhanced resources on Logik.com include a rich library of eDiscovery articles, how-to videos, informative blog posts, and a detailed look into Logik’s service offerings and transparent pricing model.
For Immediate Release
WASHINGTON DC, July 28, 2009—A new online destination has arrived for those seeking to learn more about Logik and related eDiscovery information with the redesigned Logik.com website. As an eDiscovery processing company, Logik.com’s goal is simple: get straight to the point on what they do, how they do it and why their clients love to work with them. Sounds like a pretty simple concept, but it’s one that Logik
worked hard to implement and get right. What sets the Logik.com website apart from other eDiscovery company websites is how it makes information easy to find. The site is designed with customers (past, present and future), the media and job-seekers in mind—it’s graphically appealing and easy-to-navigate. Logik.com features insightful articles on the industry, instructional “how-to” and best-practice videos, technology updates, and eye-candy to please all visitors. And it features Logik’s new mascot for eDiscovery processing services: Logikbot.
“It’s not often you see a technology company making their own wine and then giving it away through an online raffle.”
— Sheng Yang, CTO
“When we started to strategize about the new design for Logik.com, we knew we wanted a site that matched our focused work style and relaxed personalities. We aren’t like everyone else, and our new website clearly shows that,” said Andy Wilson, co-founder and CEO at Logik. “As our customers know, we maintain our sense of humor while working exceedingly hard for them, and we wanted our marketing to represent us in a full light. We chose vibrant imagery on the site and decided to showcase our new mascot, the Logikbot.”
The site also has an interesting, original promotion for an eDiscovery company: visitors can register to win a free case of Logik “Redaction” red wine. It’s not often you see a technology company making their own wine and then giving it away through an online raffle. “Andy and I are big fans of wine,” said Sheng Yang, Logik co-founder and CTO. “With the help of Crushpad, a wine-making business in San Francisco, we created our own wine for our employees and valued clients. We’ve already sampled it, and I have to say, we think people are going to really enjoy the rich and spicy flavor. Redaction will be ready to drink by early 2010, so we encourage our visitors to register to win a case and taste the fruits of our labor on Logik.com.”
Front and center on the new Logik.com website is a section entirely dedicated to Logik’s transparent eDiscovery pricing model, offering a detailed description of the Logik pricing model and what client’s can expect to get for their money. “Most pricing models in the eDiscovery processing business are about as consistent and easy to understand as the congressional budget,” said Andy. “Not knowing what you are going to pay until you get the bill is extremely annoying—and makes it hard to plan your annual budget. This is why we’ve designed our innovative eDiscovery technology to allow for 100% predictable processing costs. This means our clients always know how much they are going to spend for eDiscovery processing services before they send us the data. It sounds like a no-brainer, but it is a very rare pricing model in our industry and one our clients love.”
“We are a relatively young technology company—just five years—and we have accumulated a lot of relevant industry knowledge over the years,” said Andy. “We designed Logik.com to enable us to easily share our experiences. We hope people will learn some useful things on the website while getting a kick out of our take on eDiscovery marketing. And we think that’s a great combination.”
Discover the newly designed website at http://www.logik.com.
Logik.com is designed with the best Web standards in mind and should appeal to Logik’s client base as well as the general public. To make sure you have the best experience possible, please ensure you have Javascript enabled in your browser.
About Logik:
Logik (formerly known as Logik Systems) is an eDiscovery processing company located in Washington, DC. Logik helps corporations, law firms, government agencies and service providers simplify electronic data sought in discovery requests. Logik’s highly distributed processing platform, Gridlogik, was developed to process all kinds of unstructured and structured data sets such as email databases, spreadsheets, images and MS Office documents. Combined with their transparent pricing model, Logik offers customers the smart way to discover accurate results and make sense of processing costs. Find out more at logik.com.
Media interested in setting up an interview with a representative from Logik should email .(JavaScript must be enabled to view this email address) or call (800) 951-5507.
###
Data Tsunami
Just the facts please
- Four terabytes of Japanese data
- English and Japanese search terms
- 14,000,000 pages for review
- 8,000,000 pages produced to ITC
- $500,000 in savings
- < 2 months to complete
- Happy client, happy customer
Challenge:
One of the world’s largest producers of wind turbines needed to collect, process, analyze, review and produce over four terabytes of “real” data (email and office files) to the ITC in a matter of months. What they had was a windfall of data full of different encodings, email formats (Lotus, Outlook, EML, text-based), and Japanese proprietary document formats. They clearly needed help. Our client, one of the top three IP law firms on the planet, was tasked with managing this complex process from beginning to end. The data was collected in Japan and the US from over 100 people. Due to the volume of data, keywords in both English and Japanese (multiple encodings) were approved and needed to be applied to the large data set, post processing of course—a huge effort that needed help. Our client came to Logik to get the work done quickly and accurately.So, what’s the problem?
- Four terabytes of emails and office files = tens of millions of documents pre-search
- Japanese documents have multiple encodings, making search and detection extremely difficult – plus the words need to be “tokenized” for accurate search
- ITC has tight deadlines and expects perfect productions without error
- Choosing a vendor that uses “extracted size” billing would double or triple the cost
- So…which documents are English and which are Japanese, Chinese, or Korean again?
What we did:
Great project management is needed for a project of this size and scope. The first thing we did was assemble a team to work directly with our law firm client and the upper-management from the customer to devise a realistic schedule. Normally, four terabytes of data trickles in as the data is collected over time – we were able to get all of the data delivered within a month’s time. The schedule we created allowed us to provide massive rolling deliveries of data (hundreds of thousands of documents), meaning the client was never without documents to review (always a good thing).The results:
- Using language detection, we were able to flag all non-English documents with their respective language (e.g. Japanese, Chinese, Korean, etc.), thus facilitating a more efficient document review
- We delivered ~14,000,000 pages, post search, in native + TIFF format to our client for review
- Over 8,000,000 pages were flagged as responsive, numbered, endorsed and provided to the ITC
- Production of the 8,000,000 pages took less than 24 hours for us to complete, ready to be delivered to the ITC
- All data was processed, searched and delivered in under 2 months on a rolling delivery schedule, easily making the tight ITC deadline
- Against other bids for this project, we saved the client over $500,000 in processing fees
More cases
Redaction Terms

lorem ipsum winesum
By participating in the Logik Redaction giveaway, entrants agree that Logik and their designees and assignees and all of their respective officers, directors, employees, representatives and agents shall have no liability and entrant will indemnify, defend and hold Logik harmless from any liability, loss, injury, or damage to entrants themselves or any other person or entity, including personal injury, death or damage to personal or real property, to entrant or any other person or entity due in whole or in part, directly or indirectly, by reason of the acceptance, possession, use or misuse of the prize or participation in this Giveaway (wow, that’s a long sentence). Entrants further acknowledge that said parties have neither made nor are in any manner responsible or liable for any warranty, representation or guarantee express or implied, in fact or in law, relative to the prize, including, but not limited to, its quality or fitness for a particular purpose. Logik is based in the US, so entrants must be 21 years of age or older, sorry kids. Entering the giveaway multiple times does not improve your chances, sorry winos. Entrants must submit a valid email address that associates them with the eDiscovery industry.
Redaction


What is Logik Redaction?
Logik Redaction is a California red zinfandel created by Andy and Sheng for both our employees and our valued clients. Back in 2007 we decided that it would be a fun idea to create our own wine, so we hooked up with the amazing people at Crushpad in San Francisco and reserved a French Oak barrel to make our wine in.
Naturally, we named it Logik Redaction:
- Red = the sweet color of the wine (and in our logo)
- + Action = like Gridlogik™, the wine is designed to taste great and pack a punch (14.5% alcohol)
- = Redaction = A common term in the eDiscovery industry (to revise or edit)
Crushpad has a huge wine making facility right on the bay in downtown San Fran. They provided the tools and expertise and we gave them the instructions on what we wanted to make; a really tasty, fruity and fun red Zin. The people at Crushpad get everyone involved, you can come down to help in the crushing, sipping, and bottling. It’s a lot of fun. If you are interested in making your own barrel please let them know Logik sent you (we get a free case of wine for all referrals!).
Logik Redaction Giveaway
We’ve been known to enjoy a nice glass of wine at the end of a long day. Actually, we like red wine so much, we decided to make our own! We want to share the goodness with you, and are giving away one bottle of our coveted and custom-made red Zinfandel every month.
Enter to win a bottle of Logik Redaction, our very own red Zin.
Now that’s Logik tuff!
It’s a bird, a plane, no…it’s a Pelican case? Have you ever wondered what would happen to a hard drive if you threw it out a four-story building? Odds are the hard drive would smash into a million little pieces that even the best forensic examiner couldn’t piece back together. BUT, what if you put that hard drive inside a plastic box, surrounded by impact foam?
Well, we wanted to find out if our Pelican cases were tuff enough to withstand the impact. We placed one of our Logik hard drives within one of our custom designed Pelican cases and started the launch sequence, 5, 4, 3, 2,...1 launch!
The team on the ground secured the dropped case and hooked up the drive. Sure enough, the case and hard drive were intact and still working, no shattered little pieces. So, what does all this prove exactly? We aren’t entirely sure, but one thing is certain, if you happen to drop one of our cases from your corner office the hard drive “should” be safe.
Be careful out there and watch out for falling Logik Pelican cases.
PS: We do eDiscovery better than we do film-making. But then, that’s why we do eDiscovery.
Beer for eDiscovery
Just the facts please
- 500GB of PSTs
- 70% cull rate
- 900,000 documents reviewed
- 1,000,000 pages produced
- Less than 2 weeks to process
- Less than 1 week to produce
- Happy, happy client
Challenge:
Beer and eDiscovery go together like hops and barley. Our law firm client had a large, very well known beverage company as a customer who was in the middle of a massive merger with another frosty beverage (beer) manufacturer. Then the DOJ handed them a rather large second request. Although these requests are extremely time-sensitive, clients can’t sacrifice quality over speed. This presents a rather difficult challenge for any company, especially if you are already over-budget on merger expenses.
The data in the request, about 500GB of Microsoft Outlook PSTs, was collected by the client. Since the client didn’t have enough time to limit the amount of redundancy while collecting the PSTs, duplicate emails and attachments slipped through the cracks, increasing the volume of data. In order to facilitate a speedy review, these duplicate documents needed to be pulled before review started.
Prior to the second request, the law firm had already contracted with an outside Attenex® provider for the processing, review and production. But their plate was already full and taking on another 500GB of email, which would very likely be hundreds of thousands if not millions of additional records, risked missing the DOJ deadline.
Although the 500GB of email needed to be loaded into Attenex, which provides a very fast way to review large sets of documents, our client turned to Logik for a solution to their time-critical issue.
So, what’s the problem:
- 500GB of email = approximately 3,000,000 documents before culling
- 30 days to review and produce to DOJ
- Large volume of duplicate documents
- Chosen vendor was already overloaded
What we did:
Logik has worked on fast-paced second requests before, so the incredibly tight turnaround wasn’t new to us, but the added Attenex element was an interesting twist we had little experience with. The client wanted us to process and reduce the data with Gridlogik™ and send only the unique parent emails in MSG format to the Attenex vendor. Ok, no problem. Then, our client requested on-going horizontal de-duplication (across the entire data set) to further reduce the data. Ok, no problem. Then they asked us if we could handle the TIFF productions to the DOJ, assuming we could match up the Attenex records with the Logik records. Again, no problem.
Gridlogik is excellent with record keeping (every single document Gridlogik processes is tracked with a unique Logik ID). This made it very easy for the Attenex provider to send us exported Attenex XML files after a batch of documents was ready for production. We took the Attenex XML and easily pared the exported records with the Logik records. The matching process was fast and successful.
The matched records were flagged, formatted and converted to TIFF for production to the DOJ. Since we setup the Concordance database according to strict DOJ specs, the client’s quality control process and subsequent production approval was quick and painless, and they easily met their deadline. That makes everyone happy.
The results:
- The 500GB of MS Outlook PSTs reduced by 70% to 900,000 unique documents
- The responsive documents produced 1,000,000 pages produced to the DOJ
- Although received in batches, the entire data was processed in less than two weeks
- After matching up all the Attenex/Logik records, production to DOJ took less than one week
- The client made the tight deadline with room to spare
- Although already over-budget, we kept the costs low with our predictable pricing
- Tasty frosty beverages continue to be served around the world
It was tempting to ask for payment in a lifetime supply of quality beer for managing such a fast-paced and complex DOJ second request, but we chose the more conventional route and went with a check. Yes, it may seem more boring, but you can bet that check went to good use for the entire team. Cheers!
didyouknow_101
That by reading through all of these “Did You Knows” qualifies you as an eDiscovery ninja?
didyouknow_100
That it will take a team of 10 reviewers ~500 days to review 10,000,000 documents, assuming 2,000 documents/reviewer/day?
didyouknow_99
That you can use a mapped drive letter (e.g. X:\) to gain access to a Windows file that has accidentally gone over the 256 character limit?
didyouknow_97
That early case assessment (ECA) is a buzzword that means a myriad of different things depending upon who you are talking to?
didyouknow_98
That Lotus Notes (in comparison to Microsoft Outlook) emails usually contain a very high number of embedded images in the body text of the email, like desktop screen-shots?
didyouknow_95
That Microsoft Exchange (.edb) databases can be easily opened by a variety of software products?
didyouknow_94
That AutoCad documents should be viewed in native, not TIFF, format because of their 3-dimensional layouts?
didyouknow_96
That removing near duplicate documents without first reviewing them could risk missing important information?
didyouknow_92
That efficient and timely pre-trial eDiscovery is a huge strategic advantage in litigation?
didyouknow_91
That MAPI = Messaging Application Programming Interface, and it allows access to email content and metadata?
didyouknow_93
That the “All Documents” view in Lotus Notes doesn’t always reveal ALL the documents, because it is a query and can be modified?
didyouknow_89
That if you redact a document, you should re-OCR the document before producing the text of that document?
didyouknow_88
That a journalist at the New York Times OCRd 4 terabytes of TIFF images in under 24 hours with the use of Amazon’s EC2 cloud services?
didyouknow_90
That instant messages are discoverable information and are slowly taking over email as the dominant form of business communication?
didyouknow_86
That printing electronic files to paper is, in many cases, totally unnecessary and wasteful?
didyouknow_85
That Adobe Photoshop files contain multiple layers of information, most of which are hidden from view and cannot be seen without the use of Photoshop?
didyouknow_87
That not all OCR software is created equal and that many don’t work very well?
didyouknow_84
That there is no realistic way to redact native files without first converting the file to an image?
didyouknow_83
That transporting your sensitive evidence in an unsafe container, like a cardboard box, is ok until that box is dropped on the floor or lands in a puddle?
didyouknow_81
That converting documents to TIFF might actually save you more time and money depending on your case?
didyouknow_82
That MS Excel documents can have charts layered on top of each other, hiding potentially relevant data?
didyouknow_79
That you can significantly cull large email collections just by isolating the domain name (e.g. @ebay.com)?
didyouknow_78
That many enterprise search applications don’t extract embedded files?
didyouknow_80
That keyword searching is more effective if you talk to the person who created the data before confirming the keywords?
didyouknow_77
That Google just started performing OCR on PDF documents to make them Google-searchable in late 2008?
didyouknow_75
That focusing on what NOT to collect can dramatically reduce your discovery costs?
didyouknow_76
That most near-dupe technologies can not group foreign language documents together?
“I am absolutely blown away by how fast you can process data.”
Senior Manager, Venable LLP
Data Tsunami One of the world's largest producers of wind turbines needed to collect, process, analyze, review and produce over four terabytes of "real" data (email and office files) to the ITC in a matter of months. What they had was a windfall of data full of different encodings, email formats [...]
Read more
Beer for eDiscovery Beer and eDiscovery go together like hops and barley. Our law firm client had a large, very well known beverage company as a customer who was in the middle of a massive merger with another frosty beverage (beer) manufacturer. Then the DOJ handed them a rather large second request [...]
Read more
Supporting the Big Bots Knock, knock. Who's there? One of top 3 accounting firms in the world who’s desperately in need of eDiscovery assistance. Uh, you're joking, right? No, seriously, we need your help. Well then, come on in and stay awhile. That's just about how things happened, only they used a phone. Our now very new client [...]
Read more
Operation Data Rescue When opposing counsel finds critical mistakes in a law firm’s class-action document production before that law firm does, that firm could be in a world of hurt. That is, if they don't act quickly to right the wrong. Our law firm client wasn't at direct fault for this mistake, but they were left to deal with the mess and make it right [...]
Read more
Maximum Page Count It's amazing how such a relatively small amount of documents can explode into 1.4 miles of pages (we’ll explain in a moment). That's the challenge our client, a worldwide document hosting company, was faced with recently. What's even more amazing is that deadlines stay the same [...]
Read more
Logik Systems
Like Red Wine?
Enter to win a hard-to-get bottle of Logik Redaction, our very own and quite tasty red Zinfandel. Each month we will give away 1 bottle.
Click to Win



