sqlplus explain plan

I would like to view explain plan output in sqlplus.

One easy approach is to set autotrace. You can set autotrace on to show query statistics as well as an exection plan. You can also view only the execution plan, for example,

foo_owner@FOO> set autotrace on explain
foo_owner@FOO> SELECT COUNT(*) FROM tax_facts WHERE create_date <= TO_DATE('10-JAN-12', 'DD-MON-YY');


Execution Plan
Plan hash value: 3136400752

| Id  | Operation	      | Name	     | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT      | 	     |	   1 |	   8 |	1988   (5)| 00:00:36 |
|   1 |  SORT AGGREGATE       | 	     |	   1 |	   8 |		  |	     |
|*  2 |   INDEX FAST FULL SCAN| TAXFCT_CRTDT |	3774K|	  28M|	1988   (5)| 00:00:36 |

Predicate Information (identified by operation id):

   2 - filter("CREATE_DATE"<=TO_DATE('10-JAN-12','DD-MON-YY'))


Alternatively, you can use EXPLAIN PLAN FOR in front of your sql-statement, but viewing the plan output is less-intuitive, e.g.,

foo_owner@FOO> EXPLAIN PLAN FOR  SELECT COUNT(*) FROM tax_facts WHERE create_date <= TO_DATE('10-JAN-12', 'DD-MON-YY');



Plan hash value: 3136400752

| Id  | Operation	      | Name	     | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT      | 	     |	   1 |	   8 |	1988   (5)| 00:00:36 |
|   1 |  SORT AGGREGATE       | 	     |	   1 |	   8 |		  |	     |
|*  2 |   INDEX FAST FULL SCAN| TAXFCT_CRTDT |	3774K|	  28M|	1988   (5)| 00:00:36 |

Predicate Information (identified by operation id):

   2 - filter("CREATE_DATE"<=TO_DATE('10-JAN-12','DD-MON-YY'))

14 rows selected.

Posted in oracle

sqlplus, exploring schemas and data

I would like to get information about different schemas and tables in sqlplus. I would like this to be as easy as mysql, it's less intuitive but almost as easy.

To get a list of schemas (similar to mysql "show databases"), you can run the following,

foo_owner@FOO> SELECT username FROM all_users ORDER BY username;


5 rows selected.


To change to a different schema (similar to mysql "use database"), you can run the following,


Session altered.


To view the tables (similar to mysql "show tables"), you can run the following,

foo_owner@FOO> SELECT table_name FROM tabs;


To view the columns of the table, it's the same as mysql, i.e.,

rpt_owner@FOO> DESC foo_user_group

 Name								   Null?    Type
 ----------------------------------------------------------------- -------- --------------------------------------------

And all the normal SQL you feel like running, e.g.,

foo_owner@FOO> SELECT COUNT(*) FROM foo_users;


Posted in oracle

sqlplus pagesize and linesize

I would like sqlplus output to be more readable.

Fortunately, you can adjust the pagesize and linesize variables on the fly, e.g.,

foo_owner@FOO> set pagesize 50000
foo_owner@FOO> set linesize 120

You can set pagesize to 0, which is very useful for scripted output, as it will not print any column headers and only print the results. However, if you want to see column-headers but not repeat-them every nth line, set pagesize to something reasonably high.

The linesize defaults to 80, which is fine on a 1980's dumb-terminal, but I prefer this set to a more reasonable width to fit my terminal screen.

You can add both of these lines to your login.sql file (sqlplus will look for login.sql in the current directory or in $SQLPATH environment variable)

-- format output
set pagesize 50000
set linesize 120
Posted in oracle, shell tips

sqlplus command prompt (sqlprompt)

I would like to change the sqlplus command-prompt to something more useful than


You can modify the sqlprompt variable as follows,

SQL> set sqlprompt "_USER'@'_CONNECT_IDENTIFIER> "

You can also use variables such as _PRIVILEGE and _DATE, although all I really want is the user and schema. Unfortunately, with the above technique, if you ever connect to a database using a full connection string (rather than a tnsnames.ora entry), this prompt will be a bit unwieldy. e.g.,

$ sqlplus foo_owner/foo_pass@'(DESCRIPTION=(ADDRESS_LIST=(load_balance=on)(failover=on)(ADDRESS=(PROTOCOL=TCP)(HOST='

SQL*Plus: Release - Production on Wed Jan 11 12:47:19 2012

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release - 64bit Production
With the Partitioning option

SQL> set sqlprompt "_USER'@'_CONNECT_IDENTIFIER> "        

I love that I can connect without a tnsnames.ora entry, but I would like a simplistic prompt and I want it loaded every time I start sqlplus.

Fortunately, sqlplus will look in the current directory and in the $SQLPATH environment variable for a login.sql, and execute it automatically. For example, if you include the following in your login.sql

DEFINE sqlprompt=none
COLUMN sqlprompt NEW_VALUE sqlprompt
SET SQLPROMPT '&sqlprompt> '
UNDEFINE sqlprompt

Then whenever you run sqlplus the prompt will be set accordingly, e.g.,

$ sqlplus foo_owner/foo_pass@FOO

SQL*Plus: Release - Production on Wed Jan 11 12:47:19 2012

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release - 64bit Production
With the Partitioning option

foo_owner@FOO> show user
Posted in oracle, shell tips

reverse ssh tunnel

I would like ssh access to a protected host that is not directly accessible on the Internet but does have outbound access. This is a common scenario in corporate networks that often require a vpn for remote access; but in situations where vpn access is not available (e.g., I forgot my keyfob, or I don't want to install vpn software) a reverse ssh tunnel to a trusted host can be used instead.

Basically, a reverse ssh tunnel is setup by the private host (i.e., any host that's not directly accessible from the Internet). The private host connects to a trusted host (that is accessible on the Internet).

A reverse ssh tunnel is usually started with something like,

$ ssh -R 12354:localhost:22 user@example.com

This will create a tunnel from example.com to the private host. In other words, you would simply login to example.com and issue the following command,

$ ssh user@localhost -p 12345

And you would now be logged into the private host.

The only problem is that you must establish the reverse tunnel from the private host. In order to keep this tunnel alive (without you physically being able to login to the private host), you can use a periodic cron, e.g.,

# e.g., crontab every 5 minutes
# */5 * * * * ~/.ssh/reverse_tunnel >/dev/null 2>&1
# -OR-
# */5 * * * * ~/.ssh/reverse_tunnel hostname port >/dev/null 2>&1

## set defaults


if [ $# == 2 ]; then

## reverse tunnel command

## start tunnel if not already running
pgrep -f -x "$REVERSE_TUNNEL" > /dev/null 2>&1 || $REVERSE_TUNNEL

## test tunnel by executing a command on the remote host
ssh -i $SSH_KEY $REMOTE_HOST netstat -an | egrep "tcp.*:$REMOTE_PORT.*LISTEN"  > /dev/null 2>&1
if [ $? -ne 0 ] ; then
   pkill -f -x "$REVERSE_TUNNEL"

This will maintain a tunnel on the public host to the private host. Now you can easily login to the private host anywhere that has access to the public host.

Posted in shell tips, ssh

ssh agent across multiple hosts

I would like secure single-sign-in across multiple hosts. An easy way to do this is with ssh-agent, however, ssh-agent is a bit limited.

For example, the normal use of ssh-agent looks like this,

$ ssh-agent
SSH_AUTH_SOCK=/tmp/ssh-stSwW11394/agent.11394; export SSH_AUTH_SOCK;
echo Agent pid 11395;

You would need to set those environment variables to use the newly created agent, and then use ssh-add to add your credentials, e.g.,

$ eval `ssh-agent`
Agent pid 11464
$ ssh-add
Enter passphrase for /home/foobar/.ssh/id_dsa: 
Identity added: /home/twarnock/.ssh/id_dsa (/home/foobar/.ssh/id_dsa)
$ ssh-add -l
1024 84:e3:23:7b:f4:22:a2:da:53:fb:04:19:67:78:2b:3d /home/foobar/.ssh/id_dsa (DSA)

From here you can access any hosts that has your public key in ~/.ssh/authorized_keys

However, any new shell instances on this same host would have to go through this process. Ideally, I want to login ONCE and ONLY ONCE while working across multiple hosts.

The following code snippet can be added to your .bashrc which maintains a single agent across multiple shells per host, and also respects ssh auth forwarding. This code will also work in situations where your home directory is mounted across multiple hosts (maintaining an ssh-agent per host as needed, depending on where you login first).

# ssh agent -- for shared home directory across hosts
function start_agent {
  echo "Starting a new ssh-agent on this host"
  ssh-agent | sed 's/^echo/#echo/' > ${SSH_ENV}
  chmod 600 ${SSH_ENV}
  . ${SSH_ENV} > /dev/null
  echo succeeded

## ssh-agent
if [ -e "$SSH_AUTH_SOCK" ]; then
  echo "Using ${SSH_AUTH_SOCK}"
elif [ -f "${SSH_ENV}" ]; then
  echo "Using ${SSH_ENV}"
  . ${SSH_ENV} > /dev/null
  ps -ef | grep ${SSH_AGENT_PID} | grep ssh-agent$ > /dev/null || {
    echo "${SSH_ENV} agent is no longer running"

With this scenario you typically need to only login once per work session (regardless of which host you first login to). And with appropriate use of ssh auth forwarding, e.g., ssh -A user@host, you can jump around from host-to-host without constantly typing in the same password.

Posted in shell tips, ssh

python, finding recurring pairs of data

I would like to find all pairs of data that appear together at least 10 times.

For example, given a large input file of keywords:

>>> foo, bar, spam, eggs, durian, stinky tofu, ...
>>> fruit, meat, vinegar, sphere, foo, ...
>>> merlot, red, hearty, spam, oranges, durian, ...
>>> ...

"durian" and "spam" appear together twice and all other combinations appear together only once.

Rather than a brute-force approach counting through all possible keyword pairs, we can use python set operations to quickly determine the number of times two keywords appeared together.

#!/usr/bin/env python
Processes the given input and returns all pairs that appear together in 
at least 10 different lines.

Algorithm Explaination:

Hash each value such that each value maps a 'set' of row indexes 
(i.e., each line in the file)

Next, rather than compare all possible combinations of values, create a 
shorter list of candidate values who have 10 or more row indexes.

The set-intersection will contain all rows that BOTH values were listed in, 
if the size of the set-intersection is greater than 10, output this pair

import codecs, csv, argparse, sys

def setIntersection(infile):
    datafile = codecs.open(infile, 'r', 'utf-8')
    csv_data = csv.reader(datafile)
    data_sets = {}
    i = 0
    for row in csv_data:
        i = i+1
        for a in row:
            if a in data_sets:
                data_sets[a] = set([i])
    candidates = {a:s for a,s in data_sets.items() if len(s) > 10}
    pairc = {a for a in candidates.keys()}
    for a,s in candidates.items():
        for c in pairc:
            if len(s & candidates[c]) > 10:
                print('%s, %s, %s' % (a,c,len(s & candidates[c])))

def main():
    p = argparse.ArgumentParser()
    p.add_argument('--output-file', '-o', default='')
    p.add_argument('--bitmap', '-b', action="store_true")
    p.add_argument('files', nargs='+')
    args = p.parse_args()

    if (args.output_file != ''):
        sout = sys.stdout
        fh = open(options.output_file, 'w')
        sys.stdout = fh

    for infile in args.files:

    if (args.output_file != ''):
        sys.stdout = sout

if __name__ == '__main__':

Alternatively, we could use a bitmap comparison and compute the Hamming Weight, e.g.,

def bitmapIntersection(infile):
    The bitmap example is more general purpose and easily ported to C.  In fact, 
    the bitCount function is a python version of computing Hamming Weight from K&R.

    A simple example illustrates the bitmap,

      keyword1 : [0 0 0 0 0 0 1 0 1 0 ... ]
      keyword2 : [0 1 1 0 0 0 1 1 1 0 ... ]
             & : [0 0 0 0 0 0 1 0 1 0 ... ] <- bitwise-AND
    datafile = codecs.open(infile, 'r', 'utf-8')
    csv_data = csv.reader(datafile)
    data_maps = {}
    i = 0
    for row in csv_data:
        for a in row:
            if a in data_maps:
                data_maps[a] += (2 ** i)
                data_maps[a] = (2 ** i)
        i = i+1
    candidates = {a:b for a,b in data_maps.items() if bitCount(b) > 10}
    pairc = {a for a in candidates.keys()}
    for a,b in candidates.items():
        for c in pairc:
            bits = bitCount(b & candidates[c])
            if bits > 10:
                print('%s, %s, %s' % (a,c,bits))

def bitCount(int_type):
    ''' ported from Kernighan & Ritchie's "The C Programming Language"
    count = 0
        int_type &= int_type - 1
        count += 1

A bitmap comparison is functionally identical to the set intersection but provides potential optimization enhancements such as using encoded vectors like the BitVector module.

Posted in python

python, analyzing csv files, part 2

Previously, we discussed analyzing CSV files, parsing the csv into a native python object that supports iteration while providing easy access to the data (such as a sum by column header).

For very large files this can be cumbersome, especially where more advanced analytics are desired.

I would like to keep the same simple interface but use an in-memory database connection, thus transforming the CSV files into database tables for deeper analysis.

For example, I would like to do the following (leveraging the builtin sqlite3 module):

>>> reports = Reports('/home/user/reports-1109')
>>> reports.billing_detail.sum('Tax Amount', {'Fiscal Period':'2011-09'})
>>> reports.tax_summary.sum('Amount', {'Fiscal Period':'2011-09'})
>>> len(reports.billing_detail)
>>> curs = reports.conn.cursor()
>>> curs.execute('SELECT name FROM sqlite_master WHERE type="table"').fetchall()
[(u'billing_detail',), (u'billing_summary',)]

This approach can be orders of magnitude faster for even the most basic analysis. Furthermore, this allows OLAP cube analysis of the data from the CSV files, e.g.,

>>> curs.execute('CREATE TABLE t_fact(id TEXT UNIQUE, b INT, t INT, r INT)').fetchall()
>>> curs.execute('CREATE INDEX IF NOT EXISTS idxt ON t_fact(id)').fetchall()
>>> ## load some data into the fact table
>>> curs.execute('''INSERT OR REPLACE INTO t_fact(id,b,t,r)
                SELECT bd.%(id)s as id, bd.ROWID as b, ts.ROWID as t, rf.ROWID as r
                FROM billing_detail bd
                LEFT OUTER JOIN tax_summary ts ON bd.%(id)s = ts.%(tax_id)s
                LEFT OUTER JOIN refunds r ON bd.%(id)s = rf.%(ref_id)s
                ''' % query_dict).fetchall()
>>> ## e.g., find billing records without tax summaries
>>> billings_without_tax = curs.execute('SELECT id FROM t_fact WHERE t IS NULL').fetchall()

Using the same Report and Reports objects discussed previously, the code can be modified to leverage a database connection to support this type of analytics:

class Report(collections.Mapping):
    def __init__(self, filehint, table = None, conn = sqlite3.connect(':memory:')):
        self.filename = Reports.find_report(filehint)
        self.info = []
        self.headers = []
        self.table = table
        self.conn = conn
        self.indexes = []

    def _load(self):
        logging.debug('loading %s' %(self.filename))
        curs = self.conn.cursor()
        fh = open(self.filename)
        reader = csv.reader(fh)
        self.info = reader.next()
        self.headers = reader.next()
        columns = ', '.join(['c'+str(x) for x in range(len(self.headers))])
        columnTypes = ' TEXT, '.join(['c'+str(x) for x in range(len(self.headers))]) + ' TEXT'
            curs.execute('CREATE TABLE %s(%s)' %(self.table, columnTypes))
        except sqlite3.OperationalError as e:
            logging.debug('%s -- using existing table' %(e))
            curs.executemany('INSERT INTO %s (%s) VALUES(%s)' %(
                self.table, columns,
                '?, ' * (len(self.headers) -1) + '?'
            ), reader)

    def _column(self, key):
        if key.lower() not in [x.lower() for x in self.headers]:
            raise IndexError('%s not in %s'%(key, self.table))
        return 'c' + str([x.lower() for x in self.headers].index(key.lower()))

    def create_index(self, col):
        col = self._column(col)
        icol = 'i' + col
        if icol not in self.indexes:
            logging.debug('adding index %s to %s(%s)' %(icol, self.table, col))
            curs = self.conn.cursor()
            curs.execute('CREATE INDEX IF NOT EXISTS %s ON %s(%s)' %(icol, self.table, col))

    def __getitem__(self, key):
        curs = self.conn.cursor()
        res = list(curs.execute('SELECT * FROM %s WHERE ROWID = %s' %(self.table, key+1)).fetchall()[0])
        return res

    def __iter__(self):
        curs = self.conn.cursor()
        self.__iter = curs.execute('SELECT * FROM %s' %(self.table))
        return self

    def next(self):
        return self.__iter.next()

    def __len__(self):
        curs = self.conn.cursor()
        ret = curs.execute('SELECT COUNT(*) FROM %s' %(self.table)).fetchall()[0][0]
        return ret

    def get(self, column, value):
        '''get rows where column matches value'''
        curs = self.conn.cursor()
        column = self._column(column)
        ret = curs.execute('SELECT * FROM %s WHERE %s = "%s"' %(self.table, column, value)).fetchall()
        return ret

    def sum(self, col, filter = {}):
        curs = self.conn.cursor()
        _where = []
        for k,v in filter.iteritems():
            _where.append(' %s = "%s" ' %(self._column(k),v) )
        ret = Decimal(str(curs.execute('SELECT SUM(%s) FROM %s %s' %(
            ' WHERE ' + ' AND '.join(_where) if _where else ''
        return ret
Posted in data arch., python, software arch.

python, analyzing csv files, part 1

I would like to analyze a collection of CSV (comma-separated-values) files in python. Ideally, I would like to treat the csv data as a native python object.

For example,

>>> financial_detail = Report('financial-detail.csv')
>>> transactions = {}
>>> for row in financial_detail:
...   transactions.append(row['Transaction'])
>>> financial_detail.sum('Tax Amount')

Additionally, I would like to easily retrieve row data by column name, e.g.,

>>> financial_detail = Report('financial-detail.csv')
>>> financial_detail[0]['Transaction']
>>> ## the above should be equivalent to
>>> financial_detail.headers.index('Transaction')
>>> financial_detail[0][7]

Given a directory of CSV files, I would like to retrieve these files (magically) by name, e.g.,

>>> reports = Reports('/directory/of/reports/')
>>> len(reports.sales_tax)
>>> len(reports.finance_detail)
>>> reports.sales_tax.sum('Amount') - reports.financial_detail.sum('Tax Amount')

This can be accomplished by leveraging the python builtin csv module, the collections.Mapping abstract base class, as well as the python magic __getattr__ methods, i.e.,

!/usr/bin/env python
# vim: set tabstop=4 shiftwidth=4 autoindent smartindent:
import csv, glob, os, sys 
import collections
from decimal import Decimal

## set the logging level
import logging
  format='%(levelname)s:%(filename)s:%(lineno)d -- %(message)s'

class Report(collections.Mapping):
    Interface into a CSV Report

    Parses the csv into an iterable collection and provides memory of pivot sums

    def __init__(self, *filehints):
        self.filename = Reports.find_report(*filehints)
        self.info = []
        self.headers = []
        self.data = []
        self.pivot_sums = {}

    def load(self):
        logging.debug('loading %s' %(self.filename))
        fh = open(self.filename)
        reader = csv.reader(fh)
        self.info = reader.next()
        self.headers = reader.next()
        for row in reader:
        self.curval = 0 

    def __getitem__(self, key):
        return VirtualRecord(self.data[key], self)

    def __iter__(self):
        self.curval = 0 
        return self

    def next(self):
        if self.curval < 0 or self.curval >= (len(self.data) - 1): 
            raise StopIteration
            self.curval += 1
            return VirtualRecord(self.data[self.curval], self)

    def __len__(self):
        return len(self.data)

    def sum(self, col):
        if col not in self.pivot_sums:
            logging.debug('computing sum for %s in %s' %(col, ', '.join(self.info)))
            sum = Decimal(0)
            index = col
            if isinstance(index, str):
                index = [x.lower() for x in self.headers].index(index.lower())
            for i in self.data:
                sum += Decimal(i[index]) if i[index] != '' else 0
            self.pivot_sums[col] = sum
        return self.pivot_sums[col]

class VirtualRecord():
    A virtual record within a Report object, only one instance exists at a time (not-thread safe) 
    to avoid the memory crunch from large report files
    _instance = None
    def __new__(self, *args, **kwargs):
        if not self._instance:
            self._instance = super(VirtualRecord, self).__new__(self)
        return self._instance

    def __init__(self, row, parent):
        self.row = row
        self.parent = parent

    def __getitem__(self, key):
        index = key
        if isinstance(key, str):
            index = [x.lower() for x in self.parent.headers].index(key.lower())
        return self.row[index]

    def __len__(self):
        return len(self.row)

class Reports:
    Magic collection of billing reports

    reports = {}

    def __init__(self, reportdir = '.'):
        self.reportdir = reportdir

    def __getattr__(self, report):
        magically find reports within self.reportdir

        For example, if there is a report 'sales-tax.2011-09.20111018.csv'
        You can access this with an attribute 'sales_tax', e.g.,
        >>> reports = Reports()
        >>> len(reports.sales_tax)
        filename = Reports.find_report(self.reportdir, report.replace('_', '?') + '.*.csv')
        if filename is None or not os.stat(filename):
            return None
        if report not in self.reports:
            self.reports[report] = Report(filename, table=report)
        return self.reports[report]

    def find_report(self, *hints):
        files = glob.glob( os.path.join(*hints) )
        if len(files) == 1:
            return files[0]
        elif len(files) == 0:
            logging.error('No reports found: find_report(%s)' %(os.path.join(*hints)))
            logging.error('Found too many matching reports: find_report(%s) => %s' %(os.path.join(*hints), ', '.join(files) ))

Posted in data arch., python, software arch.

python, unique files by content

I would like to retrieve a list of unique files by content rather than by filename.

That is, if spam.txt and eggs.txt both contained the same contents I want only one of them to return. A very simple approach is to compute a SHA-1 checksum on each file, and build a dictionary with the checksum as the unique key.

#!/usr/bin/env python
# vim: set tabstop=4 shiftwidth=4 autoindent smartindent:
import hashlib, sys
import logging

def _dupdoc(filelist):
	returns a list of unique files (by content rather than filename)
	that is, if spam.txt and eggs.txt both contained the same contents, 
	only one filename will be returned
	shasums = {}
	for file in filelist:
			fh = open(file, 'rb')
			sha1 = hashlib.sha1(fh.read()).hexdigest()
			if sha1 not in shasums:
				shasums[sha1] = file
				logging.debug('%s %s' %(file, sha1))
		except IOError as e:
			logging.warning('could not open %s' %(file))
	uniquelist = [file for file in shasums.values()]
	return uniquelist

if __name__ == "__main__":
	command-line, accept either a list of files in STDIN
	or a single filename argument that contains a list of files

	filelist = []
	if len(sys.argv) > 1:
		fh = open(sys.argv[1], 'r')
		filelist = fh.readlines()
		filelist = sys.stdin.readlines()
	filelist = [file.strip() for file in filelist]
	uniques = _dupdoc(filelist)
	for file in uniques:
		print file

The commandline __main__ portion of the program expects an optional command line argument, or if no argument is specified than a filelist will be read on STDIN, e.g.,

#  find test -type f | dupdoc
Posted in python, shell tips

python daemon

I would like to create a python daemon, completely detaching from the terminal or parent process, and yet retaining any log handlers through the python logging module. There is a wonderful example at cookbook-278731 of a well-behaved daemon, and see PEP 3143.

Borrowing from these examples, here is a simple daemonize function that will retain thread-safe logging handlers that were setup through the logging module. This is also available via github.

# vim: set tabstop=4 shiftwidth=4 autoindent smartindent:
Daemon (python daemonize function)

Detach process through double-fork (to avoid zombie process), close all
file descriptors, and set working directory to root (to avoid umount problems)

import os, resource
import logging

# target environment
UMASK = 0 
MAXFD = 1024
if (hasattr(os, "devnull")):
    REDIRECT_TO = os.devnull
    REDIRECT_TO = "/dev/null"

def daemonize():
    '''Detach this process and run it as a daemon'''

        pid = os.fork() #first fork
    except OSError, e:
        raise Exception, "%s [%d]" % (e.strerror, e.errno)

    if (pid == 0): #first child
            pid = os.fork() #second fork
        except OSError, e:
            raise Exception, "%s [%d]" % (e.strerror, e.errno)

        if (pid == 0): #second child

    #close all file descriptors except from non-console logging handlers
    maxfd = resource.getrlimit(resource.RLIMIT_NOFILE)[1]
    if (maxfd == resource.RLIM_INFINITY):
        maxfd = MAXFD
    filenos = []
    for handler in logging.root.handlers:
        if hasattr(handler, 'stream') and hasattr(handler.stream, 'fileno') and handler.stream.fileno() > 2:
            filenos.append( handler.stream.fileno() )
    for fd in range(0, maxfd):
            if fd not in filenos:
        except OSError:

    #redirect stdin, stdout, stderr to null
    os.open(REDIRECT_TO, os.O_RDWR)
    os.dup2(0, 1)
    os.dup2(0, 2)


Simply call the daemonize() function within your application, e.g.,

import logging
import Daemon

logging.info('daemonized! we are now safely detached')

From a shell, console logging will only appear before daemonize() was called, e.g.,

# python test/daemon.py 
INFO:daemon.py:4 -- daemonize?

And the logfile output:

# cat test.log
2011-09-27 13:26:02,712 INFO:daemon.py:4 -- daemonize?
2011-09-27 13:26:02,717 INFO:daemon.py:6 -- daemonized! we are now safely detached
Posted in python, software arch.

python logging

I would like customizable logging in python applications, and I would like to easily send log messages to multiple handlers without any modification of the application code. The built-in logging module provides a very robust and easy-to-use logging capability.

In it's simplest form, log messages will be sent to the console with minimal formatting, e.g.,

>>> import logging
>>> logging.warning('this is a warning')
WARNING:root:this is a warning

In fact, your application and module code can simply use the built-in logging methods which can inherit logging handlers set by the main application code.

Below is an example of a logging configuration that creates three logging handlers:

import logging

# Log to file
    format='%(asctime)-15s %(levelname)s:%(filename)s:%(lineno)d -- %(message)s'

# Log to console
console = logging.StreamHandler()
console.setFormatter(logging.Formatter('%(levelname)s:%(filename)s:%(lineno)d -- %(message)s'))

# Log to syslog
from logging.handlers import SysLogHandler
syslog = SysLogHandler(address='/dev/log')
syslog.setFormatter(logging.Formatter('%(asctime)-15s %(levelname)s:%(filename)s:%(lineno)d -- %(message)s'))

Once this module is imported, any logging calls to the root logger [e.g., logging.warning(), logging,critical(), etc] will be processed by all three handlers. You can add as many log handlers as needed while your module code maintains a very simple logging interface, e.g.,

import logging

logging.debug('this is a debug msg')
logging.info('this is an info msg')
logging.warning('this is a warning msg')
logging.error('this is an error msg')
logging.critical('this is a critical error msg')

The console output will look like:

INFO:test_log.py:4 -- this is an info msg
WARNING:test_log.py:5 -- this is a warning msg
ERROR:test_log.py:6 -- this is an error msg
CRITICAL:test_log.py:7 -- this is a critical error msg

The filelog (and syslog) will look like:

2011-09-26 12:30:52,521 INFO:test_log.py:4 -- this is an info msg
2011-09-26 12:30:52,522 WARNING:test_log.py:5 -- this is a warning msg
2011-09-26 12:30:52,522 ERROR:test_log.py:6 -- this is an error msg
2011-09-26 12:30:52,522 CRITICAL:test_log.py:7 -- this is a critical error msg
Posted in python, software arch.

HTML5 canvas Mandelbrot

I would like to create an animated Mandelbrot visualization using JavaScript on an HTML5 <canvas> element. The Mandelbrot set, popularized by BenoƮt Mandelbrot, is the set of complex numbers that remain bounded under the function zn+1 = zn2 + c. This is known as an escape function; that is, regardless of the size of n, zn never "escapes". Computing the Mandelbrot set can be as computationally complex as desired for a given visualization.

In JavaScript, the escape function can be written as follows:

Mandelbrot.prototype.escapeFunc = function(x, y) {
  r = 0.0; i = 0.0; m = 0.0;
  j = 0;
  while ((j < this.max) && (m < 4.0)) {
    m = r * r - i * i;
    i = 2.0 * r * i + y;
    r = m + x;
  return j;

For a given HTML5 canvas element, such as

<canvas id="mandelbrot" width="512" height="512">

A Manelbrot set over the complex plane can be represented with the follow object

function Mandelbrot(m) {
  this.m = m;
  this.c = m.getContext("2d");
  this.width = m.width;
  this.height = m.height;
  this.SX = -1.75; // start value real
  this.EX = 0.6;    // end value real
  this.SY = -1.125; // start value imaginary
  this.EY = 1.125;  // end value imaginary
  this.xzoom = (this.EX - this.SX) / (this.width*1.0);
  this.yzoom = (this.EY - this.SY) / (this.height*1.0);

Given these functions, rendering a Mandelbrot set on an HTML5 canvas element is as simple as looping through each of the pixels of the canvas, calculating the escape value, and drawing the pixel. Here is a simple render function:

Mandelbrot.prototype.render = function() {
  var prev_h = 0;
  for (var x = 0; x < this.width; x=x+1) {
    for (var y = 0; y < this.height; y=y+1) {
      esc = this.escapeFunc(this.SX + x*this.xzoom, this.SY + y*this.yzoom);
      h = 360 * (esc/this.max)
      if (h != prev_h) {
         perc = Math.floor(100*(h/360))
         this.c.fillStyle='hsla('+ h + ','+ (perc+100) +'%,'+ (60-perc) +'%,'+ this.opacity  +')';
         prev_h = h;

If you have an HTML5 compatible browser you should see an animated example below:

your browser does not support the HTML5 canvas element

Posted in html, javascript

javascript keyboard buffer, part 4

Previously, we created a JavaScript keyboard buffer that can edit text in a static html page as well as play back each key as it was typed.

I would like to combine all of this into a single JavaScript include called keylogger.js, and rather than specify each div that I want to edit I'd like to specify a classname such that all elements of that class will become editable, e.g.,

<div class="editable">
Spam and eggs are fantastic, although on second though...

<script src="keylogger.js"></script>
<script type="text/javascript" defer><!--
Spam and eggs are fantastic, although on second thought...

Please see the example, keylogger.html, that uses multiple editable div's in one page.

The editableByClassname(cls) function registers onclick events that will activate the keyboard buffer, editor, and replay functions for every element of the specified class.

// attach onclick event handlers by classname
function editableByClassname(cls) {
  editable = $$(cls);
  for (var i = 0; i < editable.length; i++) {
    editable[i].onclick = initEditable;

There are two utility functions, $ and $$. The single $ is a shortcut to document.getElementById and the double $$ returns all elements by classname. I.e.,

// shortcut to getElementById
function $(el) {
  return document.getElementById(el);

// a simple getElementsByClassname implementation
function $$(cl) {
  var retnode = [];
  var myclass = new RegExp('\\b'+cl+'\\b');
  var elem = document.getElementsByTagName('*');
  for (var i = 0; i < elem.length; i++) {
    var classes = elem[i].className;
    if (myclass.test(classes)) retnode.push(elem[i]);
  return retnode;
Posted in css, html, javascript, software arch.

javascript keyboard buffer, part 3

Previously, we created a javascript keyboard buffer that can edit text in a static html page. Using those functions I'd like to add a buffer replay.

I would like to add an onclick event to a specified div such that it will activate the keyboard buffer and leverage the previously discussed handleKey() event handler, as well as the appendChar(el) function. E.g.,

// onclick event handler
// initialize this element for keyboard buffer
function initEditable() {
  var SPECIAL = [8, 32, 37, 39, 222];
  document.onkeydown = handleKey(function(k) {return SPECIAL.contains(k)});
  document.onkeypress = handleKey(function(k) {return !SPECIAL.contains(k)});

targetDiv = $('target');
targetDiv.onclick = initEditable;

The function calls selectElement(this) which will need to modify the element to add a temporary link for starting a replay. Once selected, that div will be editable, I would like another temporary link that when pressed will remove all temporary links and remove the onkey events (making the div non-editable).

The following JavaScript will enable the target div to toggle between editable and non-editable:

// create <a> element
function newA(aid, alabel, ahref, fonclick) {
  var newA = document.createElement('a');
  newA.setAttribute('href', ahref);
  newA.setAttribute('id', aid);
  newA.onclick = fonclick;
  return newA;

// prepare <div>, add 'replay' and 'save' links
function selectElement(el) {
  origText = el.innerHTML;
  el.innerHTML = '';
  var newDiv = document.createElement('div');
  newDiv.innerHTML = origText;
  newDiv.setAttribute('id', 'editSelected');
  el.appendChild(newA('areplay', 'replay', '#', replayKeys));
  el.appendChild(newA('aclear', 'save', 'javascript:clearSelected()', null));
  el.onclick = null;

// clear any <div> that is intercepting onkey events
// and clear the buffer
function clearSelected() {
  document.onkeypress = null;
  document.onkeyup = null;
  document.onkeydown = null;
  BUFFER = [];
  bp = 0;
  var el = $('editSelected');
  if (el) {
    var oldText = el.innerHTML;
    var pel = el.parentNode;
    pel.innerHTML = oldText;
    pel.onclick = initEditable;

When the target div is clicked it will activate the keyboard buffer such that all keystrokes will appear in the innerHTML of that div and link to a replayKeys() function. The replayKeys() function simply needs to process the BUFFER according to a timer, e.g.,

TIMER = 100;
// replay all keystrokes from buffer
function replayKeys() {
  el = $('editSelected');
  el.innerHTML = origText;
  bp = 0;
  replaying = setInterval(replayHandler, TIMEOUT);
function replayHandler() {
  el = $('editSelected');
  if (bp < BUFFER.length) {
  } else {
    replaying ? clearInterval(replaying) : false;

The built-in setInterval() function will call a specified function periodically according to the TIMEOUT value in milliseconds, and will continue running until the clearInterval() function is called.

Next, I'd like to combine all of this into a single JavaScript include such that it can register multiple editable div's.

Posted in javascript