Category: how_tos
If you've ever written an especially useful or popular script, you've noticed that features tend to creep into the codebase as you encounter variations in the input. As the code evolves to handle more and more variation, you may notice that distinct 'modes' of operation arise. One way to accomodate these different modes is to use values hard-coded into the source. Examples such as field delimiters, input path, recursive operation and output paths are often wired directly into the operation of quickly-written scripts. Read more
Building from the same basic structure as the file system
metadata gatherer (http://logik.com/whats_new/entry/capturing_file_system_metadata/),
we can incorporate functionality to pull information from within the file.
Once documents have been reviewed and produced, it is very common for them to
be converted from their native or ‘dynamic’ form into a more static
page-oriented form such as a TIF image. When the number of pages in a
production approaches the millions, it becomes impossible to check every file
for small details like compression, page orientation and resolution. Using the
‘for’ loop from the previous example and incorporating a third-party will make
it possible to quickly generate a useful summary of all TIF images in a folder.Read more
This script will be a little shorter than some of the
previous examples. However, it represents a fairly common use case within the
field of eDiscovery. As data moves from party to party in the
collection/preservation stage of a matter, related files are often lumped into
folders according to organizational need. Summaries of the information in
these folders are often crucial to everything from formulating a review
strategy to determining timelines. In this post, we’ll look at a technique for
capturing file system metadata and collecting it for reporting purposes.Read more
LFP…WTF? Posted By Adam Reilly on March 4, 2010
In this post, we’ll build on the
previous post’s technique of iterating through a file line-by line.
LFP files are an extremely common form of data interchange as document
sets trade hands in litigation. Their popularity is probably
due in part to their simplicity. As a review, LFP files are ... Read more
This will just be a quick update to the last post. In the previous version of the duplicate record detector the input file is specified statically (or “Hard Coded”) inside the file. This means that the source code must be modified each time that users want to run analysis on a new load file.
Unlike compiled languages like C++ or Java, Python doesn’t have a lengthy build cycle associated with making changes. While this isn’t too inconvenient, your users might not be comfortable directly modifying source code and there’s also the potential to introduce bugs by changing the wrong line. Fortunately, Python provides a method for passing data to a program via the command line...
Read more
In this edition of “eDiscovery-related Python Tricks,” we’ll cover some fundamental techniques and operations that you’ll likely find yourself using repeatedly. Suppose you’ve been given the task of merging load files from several productions together.
You’re fairly sure that merging several files together has left the load file with duplicative lines, but the file is large and this would be difficult to determine manually. While this example may seem a little contrived, it will provide a simple setup for laying foundation that will likely be re-used when we get to more interesting examples...Read more
Most people know that Microsoft excel has the capability to produce a wide variety of charts in order visualize data. However, if you find yourself needing to summarize more rows than excel can load or you need to use SQL to provide more flexible data manipulation, Microsoft Access also provides a function called "pivot charts" which allows users to generate quick visual summary of queries.
We'll start by importing a sample set of data which was obtained from the Internet. The data is in the form of Comma Separated Values, or CSV which is a common data interchange format.Read more
Prevent data spoilation by using a simple write-blocking device. They are fairly cheap (~$270 @ tableau.com ) and well worth the price considering spoiling data may just ruin your whole day. Connecting a hard drive to a computer seems simple enough. But if you want to avoid modifying the metadata on the drive you will need to use a write-blocking device that prevents the hard drive from updating the metadata. This is very important, especially for legal discovery where metadata should always be preserved to avoid spoilation.Read more
Despite it's simple appearance, the humble Command shell can be an extremely powerful tool for automating repetitive or difficult system tasks. Many people are scared away by the lack of GUI elements, but this can be a tremendous asset in terms of making processes consistent and repeatable.
The first command we'll look at may be familiar, most people have seen, heard of or learned the dir command at some point. When run without any arguments, it prints a list of files in the current directory along with some file-system metadata. You may not be aware that dir can be run with several flags and parameters that can modify it's behavior. For instance, typing dir *.txt will filter the list of files according a pattern, in this case it will only list files with a txt extension...Read more
Page 1 of 1 pages