Polish up your scripts with Optparse Posted By Adam Reilly on July 28, 2010


If you’ve ever written an especially useful or popular script, you’ve noticed that features tend to creep into the codebase as you encounter variations in the input.  As code evolves to handle more and more variation, you may notice that distinct ‘modes’ of operation arise.  One way to accomodate these different modes is to use values hard-coded into the source.  Examples such as field delimiters, input path, recursive operation and output paths are often wired directly into the operation of quickly-written scripts. 

If you’re the only one that runs your code, and this solution may be perfectly workable.  However, distributing scripts to colleagues, clients or trying to plug it into a larger framework will quickly reveal it’s shortcomings.  Users may have a difficult time determining which values to change, or inadvertently introduce errors.  So, this approach is error-prone at it’s worst and cumbersome at it’s best.

In this post, we’ll discuss a module in the Python standard library called ‘optparse’.

Optparse makes your life easier (and your scripts more usable) by providing objects and methods that automate the process of building well-defined and documented command line interfaces or CLIs.  Instantiating an object and providing a few input values is all you need to provide consistent, well-documented interfaces for any script you write.

The first step is to import the necessary objects.  Optparse ships as part of the standard Python library, and imports with the following:

from optparse import OptionParser

This makes the OptionParser object available to the script.  We’ll create a new instance of OptionParser by using Python’s constructor syntax:

 

parser = OptionParser()

Parser is now instantiated and ready to be populated with options, via calls to its add_option method.  In order to properly illustrate the remaining usage, we’ll introduce a simple example and work with it for the remainder of the article.

Example: Folder size detector

Let’s assume that we want to create a script which will recursively count the number of files contained in a folder and any of it’s sub folders.  This is fairly simple to implement using the walk generator function within Python’s os module.  The following line is not directly relevant to optparse, but probably bears a more in-depth discussion:

 

for root, dirs, files in os.walk(dir):

Walk is a special type of Python method called a ‘generator’ which computes a small piece of a larger results and returns it in steps.  In this case, the partial results are lists of subitems our script encounters as it traverses each folder in a directory structure.  Use of a generator in this context is very efficent, as Python is only storing one step of the traversal at a time.  Traversing a large directory (say your C:\ drive, for instance) without generators would consume a very large amount of memory.

Walk makes it easy to determine file counts across all subfolders, because filenames are returned as a Python list.  We can use the len() function to get the length of the list, thus obtaining our file count.  Since the for loop is executing in steps, it is necessary to declare a variable outside of the loop to hold our result.  The completed syntax is as follows:

 

subFiles = 0

for root, dirs, files in os.walk(dir):
  subFiles += len(files)

The for loop will iterate until it has traversed ‘dir’ and all it’s subdirectories.  As it visits each folder, it will increment the running total number of files.

Back to Optparse

This script is functional, and potentially useful in a few different scenarios, however, it only gives a summary file count for the top-most directory.  We could trivially modify the script and add the ability to print out file counts for all subfolders within the tree.  Using optparse allows us to easily add a command line interface which preserves both modes.

The following paragraphs will scratch the surface of the optparse module by walking through two different examples.  To get the full picture, read through the documentation.

 

parser = OptionParser()

parser.add_option(”-v”, “—verbose”,
      help = “Recursively print file counts for this folder and all subfolders”,
      action=“store_true”,
      dest=“verbose”)

This syntax adds an option to the ‘parser’ instance of OptionParser.  The first line provides a list of switches which your script will accept.  These will be familiar to OSX/*Nix users as short and long options.  On the next line, the optionparser accepts a help string which is used to provide a brief description of the option and it’s usage.  The action value can be one of several predefined strings (in this case a variable is set to true.  Finally, the dest argument specifies which field within the ‘parser’ instance will receive the value.

 

parser.add_option(”-o”, “—output”,
            help = “Specify name of file to write summary”,
            metavar = “OFILE”,
            action=“store”,
            type=“string”,
            dest=“outFile”)

             
This example is similar in that it allows the user to specify short and long options, a help string and an action to perform on a destination.  In this case, a string is stored in a field named outfile.  The concept of a metavar is also used.  Metavars are used when you want to provide the user with an intuitively named destination for a value that is not the same as the option names.

Putting it all together

 

# Import the necessary objects from
# the python standard library
from optparse import OptionParser

# used to import the walk function
import os
import sys

# Options is declared globally so that it will be available
# to the entire script without being passed around.  It will
# be populated with data later
options = “”

# OS walking will be wrapped into a function
def countFiles(dir, destination):

  # Declare counter to aggregate results
  subFiles = 0

  # os.walk returns a string and two lists
  #  current_dir ->  name of directory being explored
  #  dirs ->  subdirectories of current_dir
  #  files -> list of files in the current dir
  for current_dir, dirs, files in os.walk(dir):

      # increment the counter with the number of files
      # in the current directory
      subFiles += len(files)

      # If ‘verbose’ field is true (i.e., -v or—verbose is
      # used in the CLI invocation) the script will print out
      # any intermediate directories and file counts
      if(options.verbose):
        destination.write(current_dir + “: ” + str(len(files)) + “\n”)
     

  return subFiles


if __name__ == “__main__”:

  # Use the constructor to create a new option parser
  parser = OptionParser()

  # Add option aliases, documentation strings and behaviors
  #  This will set a field named ‘verbose’ to true if it is used
  #  on the command line
  parser.add_option(”-v”, “—verbose”,
              help = “Recursively print file counts for this folder and all subfolders”,
              action=“store_true”,
              dest=“verbose”)

  parser.add_option(”-o”, “—output”,
              help = “Specify name of file to write summary”,
              metavar = “OFILE”,
              action=“store”,
              type=“string”,
              dest=“outFile”)

  # parse_args returns two values
  #  options -> hash containing the state of flags from the CLI
  #  args -> any positional arguments encountered after parsing options
  (options, args) = parser.parse_args()

  # Check to see if an output file was specified
  # if not, use standard out (print to the console)
  if(options.outFile != None):
      results = open(options.outFile, ‘w’)
  else:
      results = sys.stdout

  # Write the final result to the console or file
  results.write(args[0] + “: ” + str(countFiles(args[0],results)) + “\n”)

Invoking the script with options: “

-o out.txt—verbose /path/to/directory ” will create a text file in the current directory which contains the results of exploring “directory” and all of it’s subfolders.  Alternatively, using: “/path/to/directory” will calculate the partial results silently and the print the summed total to the screen.  Finally, using “—help

“ will print a summary of the operation:

Usage: dirCounter.py [options]

Options:
  -h,—help         show this help message and exit
  -v,—verbose       Recursively print file counts for this folder and all
                subfolders
  -o OFILE,—output=OFILE
                Specify name of file to write summary

A little forethought, and modules like optparse make it easy to create user-friendly and self-documenting CLI scripts.

Comments

Post A Comment

Categories

Sep 2010

S M T W T F S
     1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    

Sign me up for Logik news!