python slice and sql every Nth row

I would like to retrieve every Nth row of a SQL table, and I would like this accessed via a python slice function.

A python slice allows access to a list (or any object that implements a __getitem__ method) by a start, stop, and step -- for example,

>>> foo = range(100)
>>> foo[5]
5
>>> foo[5:10]
[5, 6, 7, 8, 9]
>>> foo[5:10:2]
[5, 7, 9]
>>> 

The parameters of a slice behave identically to the builtin range() function,

>>> range(5,10,2)
[5, 7, 9]
>>> 

Given a python class that accesses a database table, you may want to implement slice functionality directly into that class. This can be done by implementing a __getitem__ method, a very simple __getitem__ looks something like this:

def __getitem__(self, key):
    curs = self.db.cursor()
    return list(curs.execute('SELECT * FROM %s WHERE ROWID = %s' %(self.table, key+1)).fetchall()[0])

However, this fails to account for slices and will only work for single item retrievals. When only a single key is provided python will pass that key to __getitem__, and when slice parameters are provided python will pass a slice object. In other words, inspect the key to figure out if a slice is passed in, e.g.,

def __getitem__(self, key):
    if isinstance( key, slice ):
        ... slice acccess, e.g., foo[1:5], foo[:5], foo[0:10:2]
    else:
        ... single element access, e.g., foo[5]

A slice object will have three parameters: slice.start, slice.stop, and slice.step; corresponding to object[start:stop:step]

Going back to the SQL case, we'll need to dynamically construct a SQL statement that will access every Nth row between a start and stop.

The simplest way to do this is using a modulo operator (available in all major relational databases), for example, object[5:10:2] in sqlite

SELECT * 
FROM table
WHERE ROWID > 5 AND ROWID <= 10
  AND ROWID % 2 = 5%2

This approach allow you to retrieve every Nth row between an arbitrary start and stop, to put this into practice in python, a __getitem__ may be implemented as follows:

def __getitem__(self, key):
    curs = self.db.cursor()
    if isinstance( key, slice ):
        start = 0 if key.start is None else key.start
        stop = len(self) - key.start if key.stop is None else key.stop
        qstep = ''
        if key.step is not None:
           qstep = ' AND ROWID %% %s = %s ' %(key.step, (start+1) % key.step)
        res = curs.execute('SELECT * FROM %s WHERE ROWID > %s AND ROWID <= %s %s'
            %(self.table, start, stop, qstep)).fetchall()
    else:
        res = list(curs.execute('SELECT * FROM %s WHERE ROWID = %s' %(self.table, key+1)).fetchall()[0])
    curs.close()
    return res

The above code is part of a sqlite python wrapper used for parsing large csv files, available here github.com/timwarnock/csv_reports

This entry was posted in data arch., mysql, oracle, python. Bookmark the permalink.

Comments are closed.