vim and screen, automagic titles

Previously, I discussed using multiuser screen so that I could concurrently access a shared screen session across multiple remote hosts (from work, from home, from my phone, etc).

I would like to augment screen such that the titles would always tell me what directory I'm currently in, as well as what program is running (if any). Additionally, if I'm editing a file in vim I would like to see the filename in the screen window title. If I have multiple vim buffers open (say, in tabs) I would like the screen window title set to whichever filename I'm currently editing.

GNU screen provides a shelltitle attribute that can get us partly there, you could add something like this to your screenrc,

# automagic window title
shelltitle ") |bash:"

In this example, screen will automatically fill in any currently running shell command as the window title. Importantly, the ") " must be the final characters on your command prompt. For most people, this is the '$' character, mine is still set to the smiley() cursor discussed previously. Everything after the '|' character will be the default screen title.

Unfortunately, while this approach does provide us a dynamic window name for running programs it does not show us the current directory and does nothing for vim (other than just to say "vim"). This approach, which may work for some, turned out to be a dead end. I had been searching for ways to get screen to update the window titles to the current directory and had almost given up.

Recently, I discovered this article, which provides a working (albeit complicated) approach.

Essentially, in the newer versions of bash we can use the trap command DEBUG, which will run command before every single shell command!

Additionally, we can set a screen window title on the command prompt by printing an escape sequence then the new title. So, we can run a bash function in the DEBUG trap that sets the title.

Sounds easy? Well, not really. The DEBUG trap is a bit heavy handed and using it to print escape characters can have odd effects involving BASH_COMMAND and PROMPT_COMMAND. Here is a working solution I've been using,

# turn off debug trap, turn on later if we're in screen
trap "" DEBUG

... rest of my .bashrc

# Show the current directory AND running command in the screen window title
# inspired from
if [ "$TERM" = "screen" ]; then
    export PROMPT_COMMAND='true'
    set_screen_window() {
      HPWD=`basename "$PWD"`
      if [ "$HPWD" = "$USER" ]; then HPWD='~'; fi
      if [ ${#HPWD} -ge 10 ]; then HPWD='..'${HPWD:${#HPWD}-8:${#HPWD}}; fi
      case "$BASH_COMMAND" in
            printf '\ek%s\e\\' "$HPWD:"
            printf '\ek%s\e\\' "$HPWD:${BASH_COMMAND:0:20}"
    trap set_screen_window DEBUG

In this case, I set PROMPT_COMMAND to true and make sure that my PS1 environment variable is not relying on PROMPT_COMMAND. The reason is because the BASH_COMMAND environment variable will be set to whatever the parent shell is currently running, and the DEBUG trap will fire every time the BASH_COMMAND changes (which is a lot, especially if you're executing a shell script).

Fortunately, anytime a command finishes, PROMPT_COMMAND will run, which in this case executes true, and I catch that in the case statement and set the title to the current directory. This effectively sets the title every time bash prints a command prompt.

If you execute a long running command, that screens window title will be set to that command, and as soon as the command finishes the title will change back.

The only remaining problem is vim. With the above approach, it almost works with vim. If you were in a directory named "foo" and ran "vim spam.txt", then the screen window title would be set to "foo:vim spam.txt". So far so good, but when you open additional files in vim, the title will still be say "foo:vim spam.txt".


The final step is to update your vimrc to set the titlestring, and with some tweaking vim will send the escape characters that screen recognizes to change the window title. Lastly, add an autocmd for all relevant events (opening a new file, switching tabs, etc), and you'll have a working solution,

" screen title
if &term == "screen"
  let &titlestring = "vim(" . expand("%:t") . ")"
  set t_ts=^[k
  set t_fs=^[\
  set title
autocmd TabEnter,WinEnter,BufReadPost,FileReadPost,BufNewFile * let &titlestring = 'vim(' . expand("%:t") . ')'

* to type ^[, which is an escape character, you need to enter CTRL+V <Esc>

With this approach, while vim is running it will effectively take over the job of updating the screen window title, for example,
As we switch tabs or open new files or change focus in a split screen, vim will update the screen window title to "vim(filename)" for the file that's being edited.

All of these changes (and more) can be found in my dotfiles in github

Posted in bash, shell tips, vim

node.js redirect with query string

Previously, I discussed javascript appending to query string, where we serialized an associative array to a query string. I would now like to leverage this technique within node.js as a redirect service.

Specifically, I am using express to make a web app in node.js and the app includes a redirect service, e.g.,

var app = express();
var redirectVars = {'foo':'spam and eggs', 'tracker':42 };

// redirect and append redirectVars
app.get('/redirect', function(request, result, next) {
  if(request.query.url) {
    var urle = request.query.url;
    var url = decodeURIComponent(urle);
    var firstSeperator = (url.indexOf('?')==-1 ? '?' : '&');

    var queryStringParts = new Array();
    for(var key in redirectVars) {
      queryStringParts.push(key + '=' + encodeURIComponent(redirectVars[key]));
    var queryString = queryStringParts.join('&');

    result.redirect(url + firstSeperator + queryString);
  result.send("400", "Bad request");

Usage of this service is as simple as,


Any external app could use this service, which will append server controlled query string variables to the redirected URL. This is useful for a redirect service that needs to dynamically construct query string variables, such as cross-domain authentication and authorization.

Importantly, in order to preserve an existing query string in the new-location, simply encode the entire URL string before sending it into the service, e.g.,

var new_location = encodeURIComponent("");
window.location = "" + new_location;

Using the above node.js example, this would have the effect of redirecting the user to

Posted in javascript

javascript appending to query string

I would like to append an associative array to a URL's query string. For whatever reason, there is no native javascript method to accomplish this task. This needs to be done manually or using a common web framework such as jQuery.

The first step is to serialize the associative array into a query string,

native javascript

With plain-old-javascript, you can do something like this,

var queryVars = {'foo':'bar', 'spam':'eggs', 'tracker':'yes' };

var queryStringParts = new Array();
for(var key in queryVars) {
  queryStringParts.push(key + '=' + queryVars[key]);
var queryString = queryStringParts.join('&');

The value of queryString will be



Since version 1.2 jQuery has supported the jQuery.param() function to serialize any array or object into a URL query string. The above example becomes,

var queryVars = {'foo':'bar', 'spam':'eggs', 'tracker':'yes' };

var queryString = jQuery.param(queryVars);


My favorite approach is the node.js querystring.stringify() function, I like this as it is easiest to remember,

var queryVars = {'foo':'bar', 'spam':'eggs', 'tracker':'yes' };

var queryString = querystring.stringify(queryVars);

Appending ? or &

In most cases you don't want to assume an input url does not already contain a query string, in fact, this would be a rather bad assumption. To get this to work you'll want to append your new query string to any existing query string using the & character, otherwise use the ? character. Here is an example,

function appendQueryString(url, queryVars) {
    var firstSeperator = (url.indexOf('?')==-1 ? '?' : '&');
    var queryStringParts = new Array();
    for(var key in queryVars) {
        queryStringParts.push(key + '=' + queryVars[key]);
    var queryString = queryStringParts.join('&');
    return url + firstSeperator + queryString;

var url = "something.html?q=test";
var queryVars = {'foo':'bar', 'spam':'eggs', 'tracker':'yes' };

var new_url = appendQueryString(url, queryVars);

The value of new_url will be


Posted in javascript

multiuser screen

Previously, I discussed using GNU screen as a window manager.

I would like to access my screen session concurrently from multiple hosts (say, at work, at home, and even remotely on my phone). I would also like to define default screens specific to one host.

Default screens can be configured easily in the .screenrc in your home directory. To keep things simple I use a shared screenrc file, available in this github repo, this is shared across multiple environments that often have different uses (between home and work computers). Host specific screenrc commands are defined in a special .screenrc_local, that is loaded from the main .screenrc as follows,

source .screenrc_local

In order to load default screens each with a specific initial command, I use the "screen" and "stuff" commands in my .screenrc_local, for example,

## default screens
screen -t bash 0

screen -t cloud 1
stuff "cd cloud/cloudsource/trunk/roles/; pushd ../../branches/staging/roles; dirs -v^M"

screen -t ecr/ 2
stuff "cd /mnt/sartre-data/ecr/; ll^M"

## go back to the first screen
select 0

Screen shot 2013-04-23 at 12.32.44 PM
With this configuration any new session will have those initial screens.

Whatever is in the "stuff" command will be typed automatically into the screen session. Add "^M" to send a hard return to execute the "stuff" command.

To enable multiuser mode in new screen sessions, add the following in your .screenrc

# enable multiuser screen
multiuser on

To enable multiuser mode in an existing screen session, press Ctrl-A : and enter "multiuser on", that is,

^A :multiuser on

A multiuser screen session can be joined by multiple connections concurrently. By default, only your user account can access the shared screen session. To join a multiuser session, use the following command from the shell,

$ screen -x sessionname

photoIf you don't enter a sessionname, the most recent session will be joined. If you use "-xR" a new session will be created if a multiuser session did not exist.

With this approach I can seamlessly switch to another computer or device, even in mid command.

Best of all, multiple connections can be active at the same time -- so for example you can have the same screen session open at home and in the office, as well as on your phone (typing commands on your phone knowing they're also showing on your home and work computer).

If you would like to allow other users to join your screen session, you would use the following commands, either in .screenrc or interactively using "Ctrl-A :"

acladd username

The other user can access this shared session using the following command,

$ screen -x owner/sessionname

Sharing a screen session with multiple users can get complicated; and because you'll need to setuid root on the screen binary, it's not a good security practice. However, within a trusted developer network on a shared host it's a very good way to collaborate. If you do wish to allow multiple users to share a single screen session, you'll need to run the following,

$ sudo chmod u+s `which screen`
$ sudo chmod 755 /var/run/screen

If you run into the following, "ERROR: Cannot open your terminal '/dev/pts/1' - please check." or something similar, this is likely because the current user did not login directly but instead performed a "su - username" and does not have access to the pts. An interesting hack I found here resolves this using the "script" command (which creates a new pts as the current user), that is,

script /dev/null
screen -x owner/sessionname

By default, all users will have full access to the shared session; able to type commands as the session owner. You can modify access by using "aclchg", or remove access with "acldel".

The "aclchg" command can apply to an entire session or to a specific window, e.g.,

## read only for entire session
aclchg username -w "#"

## full access to screen 0 only
aclchg username +rwx 0

As a simple shortcut, you can use aclchg to add a new user with specific (such as read-only) access.

Posted in bash, shell tips

scripting Photoshop for stop motion

I would like a simple and quick way to save a copy of an image in Photoshop, with an auto-incrementing filename. Ideally, a single button to capture a frame in a stop motion animation. In other words, I would like to save a copy of the working image as a JPEG without any interactive prompts and the filename will automatically increment a count.

For example, if I'm working with a file "test.psd", I want a single action that will save a copy "test_0001.jpg", and subsequent calls will save "test_0002.jpg", "test_0003.jpg", and so on.

By default, Photoshop will overwrite existing files, and it would be quite tedious to manually "Save As" for hundreds or thousands of images. Fortunately, Photoshop offers a scripting interface to call user defined scripts. Custom scripts can even be loaded into Photoshop and executed as an Action.

The following snippet can be saved as [Photoshop Directory]/Presets/Scripts/saveFrame.jsx, and after restarting Photoshop you should see "saveFrame" under File -> Scripts.


 * Scripted "save as" with incrementing filename
 *   e.g., test_0001.jpg, test_0002.jpg, ...
function main() { 
	if (!documents.length)
	cnt = 1;
    try {
        var Name = decodeURI(\.[^\.]+$/, '');
        var Path = decodeURI(activeDocument.path);
        var saveFrame = Path + "/" + Name + "_" + zeroPad(cnt,4) + ".jpg";
        // find the next available filename
        while ( File(saveFrame).exists ) {
            saveFrame = Path + "/" + Name + "_" + zeroPad(cnt,4) + ".jpg";
        // save as, change the default JPEG quality here as needed
        SaveJPEG(File(saveFrame), 9);
	} catch(e) {
        alert(e + "\r@ Line " + e.line);

function SaveJPEG(saveFile, jpegQuality) {
	var doc = activeDocument;
	if (doc.bitsPerChannel != BitsPerChannelType.EIGHT) 
		doc.bitsPerChannel = BitsPerChannelType.EIGHT;
	jpgSaveOptions = new JPEGSaveOptions();
	jpgSaveOptions.embedColorProfile = true;
	jpgSaveOptions.formatOptions = FormatOptions.STANDARDBASELINE;
	jpgSaveOptions.matte = MatteType.NONE;
	jpgSaveOptions.quality = jpegQuality; 
	activeDocument.saveAs(saveFile, jpgSaveOptions, true, Extension.LOWERCASE);

function zeroPad(n, s) { 
	n = n.toString(); 
	while (n.length < s) 
		n = '0' + n; 
	return n; 

Using Photoshop scripts you can automate any task and even create animation effects. In CS6 you can render a series of images as a video, alternatively, you can create the image frames in Photoshop and use ffmpeg to render the video.

If you want to use ffmpeg to render a series of images, you could use the following command,

$ ffmpeg -r 30 -f image2 -i test_%04d.jpg -vb 1M -r 30 test.webm

Here is a simple (90 frame loop) example animating a series of scripted lighting effects,

The above video is embedded in this page using the following html,

<video id="test_test" poster="test_0001.jpg" preload="auto" loop autoplay>
    <source src="test.mp4" type="video/mp4" />
    <source src="test.webm" type="video/webm" />
    <source src="test.ogv" type="video/ogg" />
    <object width="600" height="360" type="application/x-shockwave-flash" data="test.swf">
        <param name="movie" value="test.swf" />
        <img src="test_0001.jpg" width="600" height="360" alt="test" title="No video playback" />
Posted in html, javascript, shell tips

locking and concurrency in python, part 2

Previously, I created a "MultiLock" class for managing locks and lockgroups across a shared file system. Now I want to create a simple command-line utility that uses this functionality.

To start, we can create a simple runone() function that leverages MutliLock, e.g.,

def _runone(func, lockname, lockgroup, basedir, *args, **kwargs):
    ''' run one, AND ONLY ONE, instance (respect locking)

        >>> _runone(print, 'lock', 'locks', '.', 'hello world')
    lock = MultiLock(lockname, lockgroup, basedir)
    if lock.acquire():
        func(*args, **kwargs)

Any python function (with its *args and **kwargs) will be called if (and-only-if) the named lock was acquired. At a minimum, this guarantees that one (and only one) instance of the function can be called at a given time.

To make this slightly more magic, we can wrap this as a decorator function -- a decorator that accepts arguments,

def runone(lockname='lock', lockgroup='.locks', basedir='.'):
    ''' decorator with closure
        returns a function that will run one, and only one, instance per lockgroup
    def wrapper(fn):
        def new_fn(*args, **kwargs):
            return _runone(fn, lockname, lockgroup, basedir, *args, **kwargs)
        return new_fn
    return wrapper

The closure is used so that we can pass arguments to the decorator function, e.g.,

@runone('lock', 'lockgroup', '/shared/path')
def spam():
    #do work, only if we acquire /shared/path/lockgroup/lock 

Putting this all together, we can create a command-line utility that will execute any command-line program if (and only if) it acquires a named lock in the lockgroup. With such a utility we can add concurrency and fault-tolerance to any shell script that can be executed across all nodes in a cluster. This code is also available in this github repo.

import time, sys, subprocess, optparse, logging
from multilock import MultiLock

def runone(lockname='lock', lockgroup='.locks', basedir='.'):
    ''' decorator with closure
        returns a function that will run one, and only one, instance per lockgroup
    def wrapper(fn):
        def new_fn(*args, **kwargs):
            return _runone(fn, lockname, lockgroup, basedir, *args, **kwargs)
        return new_fn
    return wrapper

def _runone(func, lockname, lockgroup, basedir, *args, **kwargs):
    ''' run one, AND ONLY ONE, instance (respect locking)

        >>> _runone(print, 'lock', 'locks', '.', 'hello world')
    lock = MultiLock(lockname, lockgroup, basedir)
    if lock.acquire():
        func(*args, **kwargs)

if __name__ == '__main__':

    p = optparse.OptionParser('usage: %prog [options] cmd [args]')
    p.add_option('--lockname', '-l', dest="lockname", default='lock', help="the lock name, should be unique for this instance")
    p.add_option('--lockgroup', '-g', dest="lockgroup", default='.locks', help="the lockgroup, a collection of locks independent locks")
    p.add_option('--basedir', '-d', dest="basedir", default='.', help="the base directory where the lock files should be written")
    p.add_option('--wait', '-w', dest="wait", default=None, help="optional, wait (up till the number of seconds specified) for all locks to complete in the lockgroup")
    options, args = p.parse_args()

    if options.wait:
        lock = MultiLock(options.lockname, options.lockgroup, options.basedir)
    @runone(options.lockname, options.lockgroup, options.basedir)
    def _main():

Posted in python, shell tips, software arch.

locking and concurrency in python, part 1

I would like to do file-locking concurrency control in python. Additionally, I would like to provide a "run-once-and-only-once" functionality on a shared cluster; in other words, I have multiple batch jobs to run over a shared compute cluster and I want a simple way to provide fault tolerance for parallel jobs.

The batch jobs should leverage a locking mechanism with the following method signatures,

class Lock:

    def acquire(self)

    def release(self)

    def wait(self, timeout)

Using a shared filesystem, such as NFS, we can use file or directory locking, provided we can guarantee atomicity for the creation of the lock. I.e., only one host in a cluster can acquire a named lock. There are different ways to guarantee atomicity on file operation, depending on your filesystem.

One approach is os.makedir(), which is atomic on POSIX systems. Alternatively, you can use the following,

>>> fd ='foo.lock', os.O_CREAT|os.O_EXCL|os.O_RDWR)

This is atomic on most filesystems. Lastly, os.rename() is atomic on POSIX and most network file systems. In other words, if multiple hosts attempt the same os.rename operation on a shared file, only one will succeed and the others will raise on OSError.

In order to maximize fault-tolerance, we can create a lockfile with a hostname and process-id, rename the file, and then read the renamed file to verify the correct hostname and process-id. This will cover most all network shared filesystems (that may or may not be POSIX compliant). The following python snippet will perform this multi-lock,

class MultiLock:
    def __init__(self, lockname='lock'
        self.lockname = lockname
        self.lockfile = os.path.join(lockname, lockname + '.lock')
        self.lockedfile = os.path.join(lockname, lockname + '.locked')
        self.hostname = socket.gethostname() = os.getpid()
        self.fd = None

    def acquire(self):
        if not self.verify():
            logging.debug('you do not have the lock %s' %(self.lockedfile))
                logging.debug('attempt to create lock %s' %(self.lockfile))
                self.fd =, os.O_CREAT|os.O_EXCL|os.O_RDWR)
                os.write(self.fd, self.hostname+' '+str(
                logging.debug('attempt multilock %s' %(self.lockedfile))
                os.rename(self.lockfile, self.lockedfile)
                return self.verify()
            except OSError:
                logging.debug('unable to multilock %s' %(self.lockfile))
        return 0

    def verify(self):
        logging.debug('test if this is your lock, %s' %(self.lockedfile))
            self.fd =, os.O_RDWR)
            qhostname, qpid =, 1024).strip().split()
            if qhostname != self.hostname or int(qpid) != int(
                logging.debug('%s:%s claims to have the lock' %(qhostname, qpid))
                return 0
            logging.debug('success, you have lock %s' %(self.lockedfile))
            return 1
            logging.debug('you do not have lock %s' %(self.lockedfile))
            return 0

Furthermore, I would like a "lockgroup" such that I can create several locks in a group and a wait() function that will wait for all of the locks in a group to complete. In other words, we can start multiple jobs in parallel which can be distributed across the cluster (say, one per node) and then a wait() statement will wait for all jobs to complete.

Putting this all together, we can create a python "multilock" module with a "MultiLock" class, which is also available in this github repo, as follows,

import time, socket, shutil, os, logging, errno

class MultiLockTimeoutException(Exception):

class MultiLockDeniedException(Exception):

class MultiLock:
    def __init__(self, lockname='lock', lockgroup='.locks', basepath='.', poll=0.5):
        ''' MultiLock instance

            lockname: the name of this lock, default is 'lock'
            lockgroup: the name of the lockgroup, default is '.locks'
            basepath: the directory to store the locks, default is the current directory
            poll: the max time in seconds for a lock to be established, this must be larger
                  than the max time it takes to acquire a lock
        self.lockname = lockname
        self.basepath = os.path.realpath(basepath)
        self.lockgroup = os.path.join(self.basepath, lockgroup)
        self.lockfile = os.path.join(self.lockgroup, lockname, lockname + '.lock')
        self.lockedfile = os.path.join(self.lockgroup, lockname, lockname + '.locked')
        self.hostname = socket.gethostname() = os.getpid()
        self.poll = int(poll)
        self.fd = None

    def acquire(self, maxage=None):
        if not self.verify():
            logging.debug('you do not have the lock %s' %(self.lockedfile))
            if maxage:
                logging.debug('make sure that the lockgroup %s exists' %(self.lockgroup))
            except OSError as exc:
                if exc.errno == errno.EEXIST:
                    logging.error('fatal error trying to access lockgroup %s' %(self.lockgroup))
                logging.debug('attempt to create lock %s' %(self.lockfile))
                self.fd =, os.O_CREAT|os.O_EXCL|os.O_RDWR)
                os.write(self.fd, self.hostname+' '+str(
                logging.debug('attempt multilock %s' %(self.lockedfile))
                os.rename(self.lockfile, self.lockedfile)
                return self.verify()
            except OSError:
                logging.debug('unable to multilock %s' %(self.lockfile))
        return 0

    def release(self):
            if self.verify():
                    logging.debug('released lock %s, will try to clean up lockgroup %s' %(self.lockname, self.lockgroup))
                except OSError as exc:
                    if exc.errno == errno.ENOTEMPTY:
                        logging.debug('lockgroup %s is not empty' %(self.lockgroup))
            return self.cleanup()

    def verify(self):
        logging.debug('test if this is your lock, %s' %(self.lockedfile))
            self.fd =, os.O_RDWR)
            qhostname, qpid =, 1024).strip().split()
            if qhostname != self.hostname or int(qpid) != int(
                logging.debug('%s:%s claims to have the lock' %(qhostname, qpid))
                return 0
            logging.debug('success, you have lock %s' %(self.lockedfile))
            return 1
            logging.debug('you do not have lock %s' %(self.lockedfile))
            return 0

    def cleanup(self, maxage=None):
        ''' safely cleanup any lock files or directories (artifacts from race conditions and exceptions)
        if maxage and os.path.exists(os.path.dirname(self.lockedfile)):
                tdiff = time.time() - os.stat(os.path.dirname(self.lockedfile))[8]
                if tdiff >= maxage:
                    logging.debug('lock %s is older than maxage %s' %(os.path.dirname(self.lockedfile), maxage))
        if os.path.isfile(self.lockedfile):
            logging.debug('lock %s exists, checking hostname:pid' % (self.lockedfile))
            qhostname, qpid = (None, None)
                fh = open(self.lockedfile)
                qhostname, qpid =
            if self.hostname == qhostname:
                    if int(qpid) > 0:
                        os.kill(int(qpid), 0)
                except OSError, e:
                    if e.errno != errno.EPERM:
                        logging.error('lock %s exists on this host, but pid %s is NOT running, force release' % (self.lockedfile, qpid))
                        return 1
                        logging.debug('lock %s exists on this host but pid %s might still be running' %(self.lockedfile, qpid))
                    logging.debug('lock %s exists on this host with pid %s still running' %(self.lockedfile, qpid))
            return 0
        return 1

    def wait(self, timeout=86400):
        logging.debug('waiting for lockgroup %s to complete' %(self.lockgroup))
        timeout = int(timeout)
        start_time = time.time()
        while True:
                if (time.time() - start_time) >= timeout:
                    raise MultiLockTimeoutException("Timeout %s seconds" %(timeout))
                elif os.path.isdir(self.lockgroup):
                return 1
            except OSError as exc:
                if exc.errno == errno.ENOTEMPTY:
                elif exc.errno == errno.ENOENT:
                    logging.error('fatal error waiting for %s' %(self.lockgroup))

    def __del__(self):

    def __enter__(self):
        ''' pythonic 'with' statement

            >>> with MultiLock('spam') as spam:
            ...     logging.debug('we have spam')
        if self.acquire():
            return self
        raise MultiLockDeniedException(self.lockname)

    def __exit__(self, type, value, traceback):
        ''' executed after the with statement
        if self.verify():

We can use this class to manage locks and lockgroups across network file shares, next, I'd like to demonstrate a simple command-line utility that uses this functionality.

Posted in python, software arch.

zip archive in python

I would like to create zip archives within a python batch script. I would like to compress individual files or entire directories of files.

You can use the built-in zipfile module, and create a ZipFile as you would a normal File object, e.g.,

>>> foo = zipfile.ZipFile('', mode='w')
>>> foo.write('foo.txt')

Unfortunately, by default the zipfile is uncompressed. You can add multiple files and directories to your zipfile, which can be useful for archival, but they will not be compressed. In order to compress the files, you'll need to have the zlib library installed (it should already be installed in newer versions of python, 2.5 and greater). Simply use the ZIP_DEFLATED flag as follows,

>>> foo = zipfile.ZipFile('', mode='w')
>>> foo.write('foo.txt', compress_type=zipfile.ZIP_DEFLATED)

In order to archive an entire directory (and all its contents) you can use the os.walk function. This function will return a list of all files and subdirectories as a triple (root, dirs, files). You can iterate through the returned files as follows,

>>> foo = zipfile.ZipFile('', mode='w')
>>> for root, dirs, files in os.walk('/path/to/foo'):
...     for name in files:
...         file_to_zip = os.path.join(root, name)
...         foo.write(file_to_zip, compress_type=zipfile.ZIP_DEFLATED)

We can put this all together into a handy utility function that creates a compressed zipfile for any file or directory. This is also available in the following github repo.

def ziparchive(filepath, zfile=None):
    ''' create/overwrite a zip archive

        can be a file or directory, and always overwrites the output zipfile if one already exists

        An optional second argument can be provided to specify a zipfile name, 
        by default the basename will be used with a .zip extension

        >>> ziparchive('foo/data/')
        >>> zf = zipfile.ZipFile('', 'r')

        >>> ziparchive('foo/data/', 'foo/')
        >>> zf = zipfile.ZipFile('foo/', 'r')
    if zfile is None:
        zfile = os.path.basename(filepath.strip('/')) + '.zip'
    filepath = filepath.rstrip('/')
    zf = zipfile.ZipFile(zfile, mode='w')
    if os.path.isfile(filepath):
        zf.write(filepath, filepath[len(os.path.dirname(filepath)):].strip('/'), compress_type=zipfile.ZIP_DEFLATED)
        for root, dirs, files in os.walk(filepath):
            for name in files:
                file_to_zip = os.path.join(root, name)
                arcname = file_to_zip[len(os.path.dirname(filepath)):].strip('/')
                zf.write(file_to_zip, arcname, compress_type=zipfile.ZIP_DEFLATED)
Posted in python, shell tips

chaining ssh tunnels

Imagine you're working within a private home network and need to connect to an Oracle database within a corporate network accessible only through a bastion host hidden within the corporate network. Odd as that sounds, it's a typical network configuration, as follows:

The layout is very simple, when you're within the corporate network you must use the bastion host to access the vault (e.g., the Oracle database). The arrows in the above diagram represent the directionality of the various firewall rules, and in this case, these are SSH-only. A workstation within the corporate network would simply create an SSH-tunnel through the bastion to the vault.

Technically, there's a line of (completely separate) SSH connections from the private home network all the way to the vault, but there is absolutely no way for either side to talk directly. And further, there is no way for an Oracle client on the home network to connect all the way to the vault.

Each line could represent an SSH-tunnel, in which case, if we chain these tunnels together, then we could connect to the vault from the home network. This would allow an Oracle client (such as SQL Developer, Toad, or DbVisualizer) on the home network to connect through the SSH-tunnel chain to the vault.

step 1

For starters, we'll need all the arrows pointing the right way, we can use a reverse SSH tunnel from the corporate workstation, as follows

$ ssh -R 54321:localhost:22 user@internet-server

This will allow the internet-server to connect into the corporate workstation. However, this command must be run from within the corporate network. We can persist a reverse tunnel using the approach discussed in a previous article, see reverse ssh tunnel.

Persisting this reverse tunnel effectively points all arrows from the home network to the vault. At this point you could simply SSH from one host to another and eventually get to the vault, but this does not help us connect our Oracle client to the vault. We'll need to chain everything together, and we'll need to do this all from the home network.

step 2

Once an SSH-tunnel has been persisted between the corporate workstation and the internet-server, we can create a new tunnel from the home network into the corporate workstation. From the home network we'll issue the following command:

$ ssh -f user@internet-server -L 12345:localhost:54321 -N

Now we have a local port on the home network that connects into the corporate workstation, effectively chaining the reverse tunnel (tcp/54321) to a new tunnel (tcp/12345).

As an example, we could use this new tunnel as a SOCKS5 proxy, e.g.,

$ ssh -f -N -D 8080 user@localhost -p 12345

From the home network, we could now set our web browser to use a proxy on localhost:8080 to securely access the corporate network. This is already incredibly useful (replacing a VPN with SSH), but we still don't have access from the home network into the vault.

step 3

Now that the home network has a connection into the corporate workstation, we'll need to send a command to the workstation to create a tunnel through the bastion to the vault. From the home network we'll issue the following commands:

$ VAULT_TUNNEL="ssh -f user@bastion -L 1520:vault:1520 -N"
$ ssh user@localhost -p 12345 "$VAULT_TUNNEL"

This gives us a tunnel from the corporate workstation into the vault.

step 4

Now we can link the vault tunnel to the home network tunnel. From the home network we'll issue the following command:

$ ssh -f user@localhost -p 12345 -L 1520:localhost:1520 -N

What this does is create a new tunnel through the local tcp/12345 (which is the doubly-chained tunnel into the corporate network) to tcp/1520 on the other end (which itself is a tunnel into the vault). This new tunnel links everything together such that the home network now has a local port tcp/1520 into the Oracle database.

Simply point Toad (or SQL-Developer or DbVisualizer or whatever) to localhost:1520 on your home network and you'll be accessing the database through a triple-chained forward SSH tunnel.

all together

Since everything (except the reverse SSH tunnel) originates from the home network, we can create one shell command to establish this connection in one go. It's also a good idea to use an ssh agent across multiple hosts, but you could also just forward your auth credentials to avoid having to enter a password multiple times. Although in some cases password prompts may be unavoidable, for example, a decently secure bastion may not allow agent forwarding (or authorized_keys in general) and may even require secure keyfob access- in that case you'll simply have to enter the password when creating the chain of tunnels.

Posted in bash, shell tips, ssh

timeout command in python

I would like to add a timeout to any shell command such that if it does not complete within a specified number of seconds the command will exit. This would be useful for a any long-running command where I'd like it to die on its own rather than manually killing the long-running process.

There are several solutions to this, GNU coreutils provides a timeout command that does exactly like this, you could also use ulimit, or shell hacks such as

bash -c '(sleep 10; kill -9 $) & exec command'

However, I'd like to give the program a chance to exit gracefully before forcibly killing it. In other words, in a UNIX environment I would like to first send a SIGTERM signal to the long-running process before issuing a SIGKILL. This way the command has a chance to properly close any network or I/O before exiting.

You could use the following python recipe, also available here

#!/usr/bin/env python
"""Usage: [timeout] [command]
    timeout in seconds before killing [command]
    command is any command (and arguments) that you wish to timeout"""
import datetime, os, signal, subprocess, sys, time

# how long (in seconds) between sigterm and sigkill

def timeout_command(cmd, timeout):
    start =
    process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    while process.poll() is None:
        now =
        if (now - start).seconds >= timeout:
            os.kill(, signal.SIGTERM)
            sys.stderr.write('TIMEOUT %s second%s, sent sigterm to %s %s\n' %(str(timeout), '' if timeout==1 else 's',, ' '.join(cmd)))
            if process.poll() is None:
                os.kill(, signal.SIGKILL)
                sys.stderr.write('process still running, sent sigkill to %s %s\n' %(, ' '.join(cmd)))
                os.waitpid(-1, os.WNOHANG)
            return 2
    return 0

def main(argv=None):
        if "-h" in argv or "--help" in argv:
            print __doc__
            return 0
        return timeout_command(sys.argv[2:], int(argv[1]))
        print >>sys.stderr, __doc__
        return 2

if __name__ == '__main__':

This could be run as follows,

tim@laptop:~/bin :) 2 sleep 3
TIMEOUT 2 seconds, sent sigterm to 9036 sleep 3

Or for the case where a SIGTERM was caught and ignored,

tim@laptop:~/bin :) 2 
TIMEOUT 2 seconds, sent sigterm to 9060
process still running, sent sigkill to 9060

Posted in python, shell tips

python slice and sql every Nth row

I would like to retrieve every Nth row of a SQL table, and I would like this accessed via a python slice function.

A python slice allows access to a list (or any object that implements a __getitem__ method) by a start, stop, and step -- for example,

>>> foo = range(100)
>>> foo[5]
>>> foo[5:10]
[5, 6, 7, 8, 9]
>>> foo[5:10:2]
[5, 7, 9]

The parameters of a slice behave identically to the builtin range() function,

>>> range(5,10,2)
[5, 7, 9]

Given a python class that accesses a database table, you may want to implement slice functionality directly into that class. This can be done by implementing a __getitem__ method, a very simple __getitem__ looks something like this:

def __getitem__(self, key):
    curs = self.db.cursor()
    return list(curs.execute('SELECT * FROM %s WHERE ROWID = %s' %(self.table, key+1)).fetchall()[0])

However, this fails to account for slices and will only work for single item retrievals. When only a single key is provided python will pass that key to __getitem__, and when slice parameters are provided python will pass a slice object. In other words, inspect the key to figure out if a slice is passed in, e.g.,

def __getitem__(self, key):
    if isinstance( key, slice ):
        ... slice acccess, e.g., foo[1:5], foo[:5], foo[0:10:2]
        ... single element access, e.g., foo[5]

A slice object will have three parameters: slice.start, slice.stop, and slice.step; corresponding to object[start:stop:step]

Going back to the SQL case, we'll need to dynamically construct a SQL statement that will access every Nth row between a start and stop.

The simplest way to do this is using a modulo operator (available in all major relational databases), for example, object[5:10:2] in sqlite

FROM table
  AND ROWID % 2 = 5%2

This approach allow you to retrieve every Nth row between an arbitrary start and stop, to put this into practice in python, a __getitem__ may be implemented as follows:

def __getitem__(self, key):
    curs = self.db.cursor()
    if isinstance( key, slice ):
        start = 0 if key.start is None else key.start
        stop = len(self) - key.start if key.stop is None else key.stop
        qstep = ''
        if key.step is not None:
           qstep = ' AND ROWID %% %s = %s ' %(key.step, (start+1) % key.step)
        res = curs.execute('SELECT * FROM %s WHERE ROWID > %s AND ROWID <= %s %s'
            %(self.table, start, stop, qstep)).fetchall()
        res = list(curs.execute('SELECT * FROM %s WHERE ROWID = %s' %(self.table, key+1)).fetchall()[0])
    return res

The above code is part of a sqlite python wrapper used for parsing large csv files, available here

Posted in data arch., mysql, oracle, python

sqlplus, utility scripts

I would like to use sqlplus in development projects, and I would like it to be easy to use.

First off, I want command-history and tab-completion. The easiest approach is to use rlwrap which uses the GNU readline library as a wrapper to command-line programs such as sqlplus.

If you install rlwrap, you can then set an alias in your .bashrc (or similar)

alias sqlplus='rlwrap sqlplus'

That's it, sqlplus will now have GNU readline compatible command-line history and editing! If you wish to access sqlplus directly without rlwrap (for loading scripts, etc), simply unalias, i.e.,

$ \sqlplus

I would also like intuitive, terse, and easy-to-remember commands to perform basic tasks like inspecting tables and schemas, viewing and editing packages of stored procedures, etc.

I started a github repository [] of utility scripts to make this easier.

Set your $SQLPATH environment variable to the directory containing these scripts, which allow for the following operations.

foo@DB> @show databases


2 rows selected.

Change to the SPAM schema,

foo@DB> @use spam

List all tables with 'abc' anywhere in the table name,

spam@DB> @show tables abc


3 rows selected.

List all packages with 'abc' anywhere in the package name,

spam@DB> @show packages abc


2 rows selected.

View the source code of a package of stored procedures,

spam@DB> @show code abc_date

PACKAGE             "ABC_DATE" AS


13 rows selected.

View the explain plan output (also consider 'set autotrace on')

spam@DB> explain plan for SELECT D.ID FROM DATES_DIM D WHERE SQL_DATE = '07-feb-12';


spam@DB> @explain

Plan hash value: 1171817064

| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT            |                  |     1 |    13 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DATES_DIM        |     1 |    13 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | DATESDIMSQLDTUNQ |     1 |       |     1   (0)| 00:00:01 |

Predicate Information (identified by operation id):

   2 - access("SQL_DATE"='07-feb-12')

Edit a package in vim and automatically recompile

spam@DB> @editcode abc_date
... opens ABC_DATE.sql in vim

Package created.

Package body created.


Please see the github repository [] for periodic updates.

Posted in oracle, shell tips

screen and screenrc

I would like to use GNU screen as a window manager.

By default screen seems like little more than a persistent shell (that you can resume even after logging out). By itself, this is incredibly useful if you wish to access the exact same terminal session from different locations.

For example, on the command-line you can run screen or start a named session, i.e.,

$ screen -S sessionname

This starts a new shell, which you can detach from by pressing Ctrl-a and then d, i.e.,

^A d

Ctrl-a is the default command-key, you press Ctrl-a followed by a screen command. This is abbreviated as ^A in the screen manual.

By default, you'll also detach from screen by simply closing your terminal.

You can list existing screen sessions, and then re-attach to a screen session as follows,

$ screen -ls
There are screens on:
        23762.this      (Detached)
        23814.that      (Detached)
2 Sockets in /var/run/screen/S-timwarnock.

$ screen -r that

In this case I've re-attached to a screen session named "that".

If a screen session is already attached, and you want to takeover the session, you can do the following,

$ screen -d -r sessionname

If desired, you can use the -D option which will detach and logout the other terminal. You can also use -R to re-attach to a session or create one if it doesn't exist.

$ screen -d -r
 ... detach the current attached terminal and then re-attach
$ screen -D -r
 ... and logout the other terminal
$ screen -D -R
 ... and if no session, create one
$ screen -D -RR
 ... and if more than one session, use the first one

I prefer the following command,

$ screen -dRR

This will connect to the first session, politely detach as needed, or create a new one if no screen session exists.

Once you're attached to screen, you can run multiple windows, split the screen multiple ways, and various other useful tricks. At the very least, here are the most common commands for everyday use:

^A ? - help
^A c - create new window, or screen -t tabname
^A A - rename screen window
^A k - kill a screen window, or ^D (or exit) to properly exit
^A " - window list, change windows
^A n - next window, or ^A space
^A p - previous window
^AA - switch to previous screen (toggle)
^A [0-9] - go to window [0-9]
^A esc - copy mode (for scrollback), ] or esc to abort

If you want to get more advanced, and see what screen can really do, you'll want to save a .screenrc in your home directory. Otherwise managing multiple windows and split-screens in a single screen session can be quite confusing.

Here is an example screenrc,

# ^A :source ~/.screenrc
bind , source "$HOME/.screenrc"

# no default message, no audible bell
startup_message off
vbell on

# more usable scrollback
scrollback 5000

# detach on hangup
autodetach on

# make bold work
attrcolor b ".I"
attrcolor i "+b"

# hardstatus and caption
hardstatus alwayslastline "%{= wk}%-w%{= Bw} %n %t %{-}%+w %-= %H [ %Y-%m-%d %c:%s ]"
caption splitonly "%{= wK}%-w%?%F%{= bw}%:%{= Wk}%? %n %t %{-}%+w %-= "

# resize
bind + resize +5
bind - resize -5
bind = resize =

# bindings
bind % screen -t top top

Using the above screenrc with a resized split-screen looks like this:

The important bits of the screenrc are the hardstatus and caption. The hardstatus can set the bottom line in a terminal, in this case a window-list highlighting the active window, the date and time, and the hostname of the machine running the screen session.

The hostname is also on my command-prompt, but it's useful for me to know the hostname running the screen session separate from wherever the current shell happens to be logged in.

The caption in this case is set to only display when the screen is split. You can split a screen multiple times. In this case the caption will appear underneath its respective window.

The commands to split a window are as follows,

^A S – split screen
^A Q – return to non split view
^A Tab - move to the next window in split-display
^A :resize [n|+n|-n]

In the above screenrc, I've bound custom commands for resizing, this way you simply hit Ctrl-A followed by + or - to resize the active window (or = to return to the default size).

To copy+paste within screen, you enter "copy mode" and can view the scrollback buffer. Once in copy mode you hit enter to start selecting, and enter again to end your selection.

^A esc - copy mode, or ^A ]
^A ] - to paste

If you're using screen to manage several windows and want to be alerted when something changes on a given window, you can use monitoring. This simply alerts you of activity on a given window, or silence in the case you were expecting activity.

^A M - monitor for activity
^A _ - monitor for silence (default 30 seconds)

Lastly, you can log a screen window or do a simple grab of the contents currently on display, i.e.,

^A h - screen-grab
^A H - logging
Posted in bash, shell tips

nvl, ifnull, nullif, isnull, coalesce

I would like to programmatically handle NULL values in a database.

If I add or multiply a known value to a NULL, the result is still NULL. This is an appropriate result as a NULL value in a database schema means "unknown" or "nonexistent", so mathematic functions against known values and NULL should produce NULL.

That said, if I want a sum of line-items I could filter out NULL values, e.g.,


However, this could get difficult to manage in larger queries.

In Oracle, we can use NVL to get the above result

SELECT SUM(NVL(col,0)) FROM foo;

In MySQL and SQLite, we can use IFNULL


In Postgres we can use NULLIF


And in Microsoft SQL Server we can use ISNULL


While I prefer NVL for it's conciseness, I recommend COALESCE as it works in each of the above database engines. COALESCE accepts two or more parameters and simply returns the first non-null parameter, e.g.,


Accepting multiple parameters can be very useful in cases where we're aggregating otherwise mutually exclusive columns, e.g.,

SELECT SUM(COALESCE(eggs.weight, spam.weight, 0)) AS lonely_weight

In the above case, it's the sum weight of all eggs without spam, and spam without eggs.

Posted in data arch., mysql, oracle

oracle, limit results and pagination

I would like to limit query results in oracle to support pagination.

In postgres, mysql, and sqlite you can use the LIMIT and OFFSET modifier, e.g.,

mysql> SELECT login FROM users ORDER BY login LIMIT 2 OFFSET 70001;
| login       |
| acrobat11   | 
| acrobat12   | 
2 rows in set (0.03 sec)


In mysql and sqlite you can use a two-parameter version of LIMIT, and the format is

LIMIT offset, count

Which is identical to

LIMIT count OFFSET offset

In oracle you can leverage the magic ROWNUM variable, although the value of ROWNUM is assigned before an ORDER BY is applied (so the ORDER BY sql must be nested). There is a lengthy and very informative Ask Tom article that covers ROWNUM as well as optimization information.

For reference, to use ROWNUM as a LIMIT expression the general form is,


And a pagination query using ROWNUM as both LIMIT and OFFSET is,

  ( SELECT /*+ FIRST_ROWS(10) */  a.*, ROWNUM rnum
    FROM ( ... ) a
    WHERE ROWNUM < 20 )
WHERE rnum >= 10;

The oracle equivalent of the above mysql would be as follows,

foo_owner@FOO> SELECT login FROM
 2  ( SELECT /*+ FIRST_ROWS(2) */  a.login, ROWNUM rnum
 3    FROM ( SELECT login FROM users ORDER BY login ) a
 4    WHERE ROWNUM <= 70002 )
 5  WHERE rnum >= 70001;


2 rows selected.


Posted in data arch., mysql, oracle