iter_words

I would like to iterate over a stream of words, say, from STDIN or a file (or any random input stream). Typically, this is done like this,

def iter_words(f):
    for line in f:
        for word in line.split():
            yield word

And then one can simply,

for word in iter_words(sys.stdin):
    # do something

For a more concrete example, let's say we need to keep a count of every unique word in an input stream, something like this,

from collections import Counter
c = Counter

for word in iter_words(sys.stdin):
    c.update([word])

The only problem with this approach is that it will read data in line-by-line, which in most cases is exactly what we want, however, in some cases we don't have line-breaks. For extremely large data streams we will simply run out of memory if we use the above generator.

Instead, we can use the read() method to read in one-byte at a time, and manually construct the words as we go, like this,

def iter_words(sfile):
    chlist = []
    for ch in iter(lambda: sfile.read(1), ''):
        if str.isspace(ch):
            if len(chlist) > 0:
                yield ''.join(chlist)
            chlist = []
        else:
            chlist.append(ch)

This approach is memory efficient, but extremely slow. If you absolutely need to get the speed while still being memory efficient, you'll have to do a buffered read, which is kind of an ugly hybrid of these two approaches.

def iter_words(sfile, buffer=1024):
    lastchunk = ''
    for chunk in iter(lambda: sfile.read(buffer), ''):
        words = chunk.split()
        lastchunk = words[-1]
        for word in words[:-1]:
            yield word
        newchunk = []
        for ch in sfile.read(1):
            if str.isspace(ch):
                yield lastchunk + ''.join(newchunk)
                break
            else:
                newchunk.append(ch)
Posted in python

Punycode

I would like a webapp that supports UTF-8 URLs. For example, https://去.cc/叼, where both the path and the server name contain non-ASCII characters.

The path /叼 can be handled easily with %-encodings, e.g.,

>>> import urllib
>>> 
>>> urllib.parse.quote('/叼')
'/%E5%8F%BC'

Note: this is similar to the raw byte representation of the unicode string:

>>> bytes('/叼', 'utf8')
b'/\xe5\x8f\xbc'

However, the domain name, "去.cc" cannot be usefully %-encoded (that is, "%" is not a valid character in a hostname). The standard encoding for international domain names (IDN) is punycode; such that "去.cc' will look like "xn--1nr.cc".

The "xn--" prefix is the ASCII Compatible Encoding that essentially identifies this hostname as a punycode-encoded name. Most modern web-browsers and http libraries can decode this kind of name, although just in case, you can do something like this:

>>> 
>>> '去'.encode('punycode')
b'1nr'

In practice, we can use the built-in "idna" encoding and decoding in python, i.e., IRI to URI:

>>> p = urllib.parse.urlparse('https://去.cc/叼')
>>> p.netloc.encode('idna')
b'xn--1nr.cc'
>>> urllib.parse.quote(p.path)
'/%E5%8F%BC'

And going the other direction, i.e., URI to IRI:

>>> a = urllib.parse.urlparse('https://xn--1nr.cc/%E5%8F%BC')
>>> a.netloc.encode('utf8').decode('idna')
'去.cc'
>>> urllib.parse.unquote(a.path)
'/叼'
Posted in python, software arch.

Using getattr in Python

I would like to execute a named function on a python object by variable name. For example, let's say I'm reading in input that looks something like this:

enqueue 1
enqueue 12
enqueue 5
enqueue 9
sort
reverse
dequeue
print

Afterwards, we should see:

[9,5,1]

Let's say we need to implement a data structure that consumes this input. Fortunately, all of this behavior already exists within the built-in list datatype. What we can do is extend the built-in list to map the appropriate methods, like so:

class qlist(list):
    def enqueue(self, v):
        self.insert(0,v)

    def dequeue(self):
        return self.pop()

    def print(self):
        print(self)

The sort and reverse methods are already built-in to list, so we don't need to map them. Now, we simply need a driver program that reads and processes commands to our new qlist class. Rather than map out the different commands in if/else blocks, or use eval(), we can simply use getattr, for example:

if __name__ == '__main__':
    thelist = qlist()
    while line in sys.stdin:
        cmd = line.split()
        params = (int(x) for x in cmd[1:])
        getattr(thelist, cmd[0])(*params)
Posted in shell tips

Graph Search

I would like to discover paths between two nodes on a graph. Let's say we have a graph that looks something like this:

graph = {1: set([2, 3]),
         2: set([1, 4, 5, 7]),
         3: set([1, 6]),
         ...
         N: set([...] }

The graph object contains a collection of nodes and their corresponding connections. If it's a bi-directional graph, those connections would have to appear in the corresponding sets (e.g., 1: set([2]) and 2: set([1])).

Traversing this kind of data structure can be done through recursion, usually something like this:

def find_paths(from_node, to_node, graph, path=None):
    ''' DFS search of graph, return all paths between
        from_node and to_node
    '''
    if path is None:
        path = [from_node]
    if to_node == from_node:
        return [path]
    paths = []
    for next_node in graph[from_node] - set(path):
        paths += find_paths(next_node, to_node, graph, path + [next_node])
    return paths

Unfortunately, for large graphs, this can be pretty inefficient, requiring a full depth-first search (DFS), and storing the entire graph in memory. This does have the advantage of being exhaustive, finding all unique paths between two nodes.

That said, let's say we want to find the shortest possible path between two nodes. In those cases, you want a breadth-first search (BFS). Whenever you hear the words "shortest path", think BFS. You'll want to avoid recursion (as those result in a DFS), and instead rely on a queue, which in Python can be implemented with a simple list.

def find_shortest_path(from_node, to_node, graph):
    ''' BFS search of graph, return shortest path between
        from_node and to_node
    '''
    queue = [(from_node, [from_node])]
    while queue:
        (qnode, path) = queue.pop(0) #deque
        for next_node in graph[qnode] - set(path):
            if next_node == to_node:
                return path + [next_node]
            else:
                queue.append((next_node, path + [next_node]))

Because a BFS is guaranteed to find the shortest path, we can return the moment we find a path between to_node and from_node. Easy!

In some cases, we may have an extremely large graph. Let's say you're searching the Internet for a path between two unrelated web pages, and the graph is constructed dynamically based on scraping the links from each explored page. Obviously, a DFS is out of the question for something like that, as it would spiral into an infinite chain of recursion (and probably on the first link).

As a reasonable constraint, let's say we want to explore all the links up to a specific depth. This could be done easily. Simply add a depth_limit, as follows:

def find_shortest_path(from_node, to_node, graph, depth_limit=3):
    queue = [(from_node, [from_node])]
    while queue and depth_limit > 0:
        depth_limit -= 1
        (qnode, path) = queue.pop(0) #deque
        for next_node in graph[qnode] - set(path):
            if next_node == to_node:
                return path + [next_node]
            else:
                queue.append((next_node, path + [next_node]))
Posted in python, software arch.

python unittest

I would like to setup unit tests for a python application. There are many ways to do this, including doctest and unittest, as well as 3rd-party frameworks that leverage python's unittest, such as pytest and nose.

I found the plain-old unittest framework to be the easiest to work with, although I often run into questions about how best to organize tests for various sized projects. Regardless of the size of the projects, I want to be able to easily run all of the tests, as well as run specific tests for a module.

The standard naming convention is "test_ModuleName.py", which would include all tests for the named module. This file can be located in the same directory (package) as the module, although I prefer to keep the tests in their own subdirectory (which can easily be excluded from production deployments).

In other words, I end up with the following:

package/
 - __init__.py
 - Module1.py
 - Module2.py
 - test/
    - all_tests.py
    - test_Module1.py
    - test_Module2.py

Each of the test_*.py files looks something like this:

#!/usr/bin/env python
# vim: set tabstop=4 shiftwidth=4 autoindent smartindent:
import os, sys, unittest

## parent directory
sys.path.insert(0, os.path.join( os.path.dirname(__file__), '..' ))
import ModuleName

class test_ModuleName(unittest.TestCase):

    def setUp(self):
        ''' setup testing artifacts (per-test) '''
        self.moduledb = ModuleName.DB()

    def tearDown(self):
        ''' clear testing artifacts (per-test) '''
        pass

    def test_whatever(self):
        self.assertEqual( len(self.moduledb.foo()), 16 )


if __name__ == '__main__':
    unittest.main()

With this approach, the tests can be run by all_tests.py, or I can run the individual test_ModuleName.py.

The all_tests.py script also must add the parent directory on the path, i.e.,

#!/usr/bin/env python
# vim: set tabstop=4 shiftwidth=4 autoindent smartindent:
import sys, os
import unittest

## set the path to include parent directory
sys.path.insert(0, os.path.join( os.path.dirname(__file__), '..' ))

## run all tests
loader = unittest.TestLoader()
testSuite = loader.discover(".")
text_runner = unittest.TextTestRunner().run(testSuite)
Posted in python

HTML + CSS + JavaScript Lessons

I would like a very simple introduction to web development, from the basics of HTML and CSS, to the proper use of JavaScript; and all without getting bogged down in complicated textbooks.

I've been working with HTML, CSS, and JavaScript (as well as dozens of programming languages in more environments than I can remember) for over 20 years. While there are some excellent resources online (I recommend w3schools), I believe web development is a very simple topic that is often unnecessarily complicated.

I created a simple set of 9 lessons for learning basic web development. This includes HTML, CSS, and some simple JavaScript (including callback functions to JSONP APIs), everything you need to make and maintain websites.

You can find the lessons here
http://avant.net/lessons/

It's also available on Github
https://github.com/timwarnock/lessons

Posted in css, html, javascript

bash histogram

I would like to generate a streamable histgram that runs in bash. Given an input stream of integers (from stdin or a file), I would like to transform each integer to that respective number of "#" up to the length of the terminal window; in other words, 5 would become "#####", and so on.

You can get the maximum number of columns in your current terminal using the following command,

twarnock@laptop: :) tput cols
143

The first thing we'll want to do is create a string of "####" that is exactly as long as the max number of columns. I.e.,

COLS=$(tput cols);
MAX_HIST=`eval printf '\#%.0s' {1..$COLS}; echo;`

We can use the following syntax to print a substring of MAX_HIST to any given length (up to its maximum length).

twarnock@laptop: :) echo ${MAX_HIST:0:5}
#####
twarnock@laptop: :) echo ${MAX_HIST:0:2}
##
twarnock@laptop: :) echo ${MAX_HIST:0:15}
###############

We can then put this into a simple shell script, in this case printHIST.sh, as follows,

#! /bin/bash
COLS=$(tput cols);
MAX_HIST=`eval printf '\#%.0s' {1..$COLS}; echo;`

while read datain
do
  if [ -n "$datain" ]; then
    echo -n ${MAX_HIST:0:$datain}
    if [ $datain -gt $COLS ]; then
      printf "\r$datain\n"
    else
      printf "\n"
    fi
  fi
done < "${1:-/dev/stdin}"

This script will also print any number on top of any line that is larger than the maximum number of columns in the terminal window.

As is, the script will transform an input file into a crude histogram, but I've also used it as a visual ping monitor as follows (note the use of unbuffer),

twarnock@cosmos:~ :) ping $remote_host | unbuffer -p awk -F'[ =]' '{ print int($10) }' | unbuffer -p printHIST.sh
######
#####
########
######
##
####
#################
###
#####
#######

Posted in bash, shell tips

xmouse

I would like to remotely control my Linux desktop via an ssh connection (connected through my phone).

Fortunately, we can use xdotool.

I created a simple command-interpreter that maps keys to xdotool. I used standard video game controls (wasd) for large mouse movements (100px), with smaller movements available (ijkl 10px). It can toggle between mouse and keyboard, which allows you to somewhat easily open a browser and type URLs.

I use this to control my HD television from my phone, and so far it works great.

#!/bin/bash
#
#
: ${DISPLAY:=":0"}
export DISPLAY

echo "xmouse! press q to quit, h for help"

function print_help() {
  echo "xmouse commands:
  h - print help

  Mouse Movements
  w - move 100 pixels up
  a - move 100 pixels left
  s - move 100 pixels down
  d - move 100 pixels right

  Mouse Buttons
  c - mouse click
  r - right mouse click
  u - mouse wheel Up
  p - mouse wheel Down

  Mouse Button dragging
  e - mouse down (start dragging)
  x - mouse up (end dragging)

  Mouse Movements small
  i - move 10 pixels up
  j - move 10 pixels left
  k - move 10 pixels down
  l - move 10 pixels right

  Keyboard (experimental)
  Press esc key to toggle between keyboard and mouse modes
  
"
}

KEY_IN="Off"
IFS=''
while read -rsn1 input; do
  #
  # toggle mouse and keyboard mode
  case "$input" in
  $'\e') if [ "$KEY_IN" = "On" ]; then
           KEY_IN="Off"
           echo "MOUSE mode"
         else
           KEY_IN="On"
           echo "KEYBOARD mode"
         fi
     continue
     ;;
  esac
  #
  # keyboard mode
  if [ "$KEY_IN" = "On" ]; then
  case "$input" in
  $'\x7f') xdotool key BackSpace ;;
  $' ')  xdotool key space ;;
  $'')   xdotool key Return ;;
  $':')  xdotool key colon ;;
  $';')  xdotool key semicolon ;;
  $',')  xdotool key comma ;;
  $'.')  xdotool key period ;;
  $'-')  xdotool key minus ;;
  $'+')  xdotool key plus ;;
  $'!')  xdotool key exclam ;;
  $'"')  xdotool key quotedbl ;;
  $'#')  xdotool key numbersign ;;
  $'$')  xdotool key dollar ;;
  $'%')  xdotool key percent ;;
  $'&')  xdotool key ampersand ;;
  $'\'') xdotool key apostrophe ;;
  $'(')  xdotool key parenleft ;;
  $')')  xdotool key parenright ;;
  $'*')  xdotool key asterisk ;;
  $'/')  xdotool key slash ;;
  $'<')  xdotool key less ;;
  $'=')  xdotool key equal ;;
  $'>')  xdotool key greater ;;
  $'?')  xdotool key question ;;
  $'@')  xdotool key at ;;
  $'[')  xdotool key bracketleft ;;
  $'\\') xdotool key backslash ;;
  $']')  xdotool key bracketright ;;
  $'^')  xdotool key asciicircum ;;
  $'_')  xdotool key underscore ;;
  $'`')  xdotool key grave ;;
  $'{')  xdotool key braceleft ;;
  $'|')  xdotool key bar ;;
  $'}')  xdotool key braceright ;;
  $'~')  xdotool key asciitilde ;;
  *)     xdotool key "$input" ;;
  esac
  #
  # mouse mode
  else
  case "$input" in
  q) break ;;
  h) print_help ;;
  a) xdotool mousemove_relative -- -100 0 ;;
  s) xdotool mousemove_relative 0 100 ;;
  d) xdotool mousemove_relative 100 0 ;;
  w) xdotool mousemove_relative -- 0 -100 ;;
  c) xdotool click 1 ;;
  r) xdotool click 3 ;;
  u) xdotool click 4 ;;
  p) xdotool click 5 ;;
  e) xdotool mousedown 1 ;;
  x) xdotool mouseup 1 ;;
  j) xdotool mousemove_relative -- -10 0 ;;
  k) xdotool mousemove_relative 0 10 ;;
  l) xdotool mousemove_relative 10 0 ;;
  i) xdotool mousemove_relative -- 0 -10 ;;
  *) echo "$input - not defined in mouse map" ;;
  esac
  fi
done
Posted in bash

VLC remote control

Recently I was using VLC to listen to music, as I often do, and I wanted to pause without getting out of bed.

Lazy? Yes!

I learned that VLC includes a slew of remote control interfaces, including a built-in web interface as well as a raw socket interface.

In VLC Advanced Preferences, go to "Interface", and then "Main interfaces" for a list of options. I selected "Remote control" which is now known as "oldrc", and I configured a simple file based socket "vlc.sock" in my home directory as an experiment.

You can use netcat to send commands, for example,

twarnock@laptop:~ :) nc -U ~/vlc.sock <<< "pause"

Best of all VLC cleans up after itself and removes the socket file when it closes. The "remote control" interface is pretty intuitive and comes with a "help" command. I wrapped all of this in a shell function (in a .bashrc).

function vlcrc() {
 SOCK=~/vlc.sock
 CMD="pause"
 if [ $# -gt 0 ]; then
  CMD=$1
 fi
 if [ -S $SOCK ]; then
  nc -U $SOCK <<< "$CMD"
 else
  (>&2 echo "I can't find VLC socket $SOCK")
 fi
}

I like this approach because I can now use "vlc command" in a scripted environment. I can build playlists, control the volume, adjust the playback speed, pretty much anything VLC lets me do. I could even use a crontab and make a scripted alarm clock!

And of course I can "pause" my music from my phone while laying in bed. Granted, there's apps for more user friendly VLC smartphone remotes, but I like the granular control provided by a command line.

Posted in shell tips

datsize, simple command line row and column count

Lately I've been working with lots of data files with fixed rows and columns, and have been finding myself doing the following a lot:

Getting the row count of a file,

twarnock@laptop:/var/data/ctm :) wc -l lda_out/final.gamma
    3183 lda_out/final.gamma
twarnock@laptop:/var/data/ctm :) wc -l lda_out/final.beta
     200 lda_out/final.beta

And getting the column count of the same files,

twarnock@laptop:/var/data/ctm :) head -1 lda_out/final.gamma | awk '{ print NF }'
200
twarnock@laptop:/var/data/ctm :) head -1 lda_out/final.beta | awk '{ print NF }'
5568

I would do this for dozens of files and eventually decided to put this together in a simple shell function,

function datsize {
    if [ -e $1 ]; then
        rows=$(wc -l < $1)
        cols=$(head -1 $1 | awk '{ print NF }')
        echo "$rows X $cols $1"
    else
        return 1
    fi
}

Simple, and so much nicer,

twarnock@laptop:/var/data/ctm :) datsize lda_out/final.gamma
    3183 X 200 lda_out/final.gamma
twarnock@laptop:/var/data/ctm :) datsize lda_out/final.beta
     200 X 5568 lda_out/final.beta
twarnock@laptop:/var/data/ctm :) datsize ctr_out/final-theta.dat
    3183 X 200 ctr_out/final-theta.dat
twarnock@laptop:/var/data/ctm :) datsize ctr_out/final-U.dat
    2011 X 200 ctr_out/final-U.dat
twarnock@laptop:/var/data/ctm :) datsize ctr_out/final-V.dat
    3183 X 200 ctr_out/final-V.dat
Posted in bash, shell tips

Getting the most out of your ssh config

I typically find myself with voluminous bashrc files filled with aliases and functions for connecting to specific hosts via ssh. I would like an easier way to manage the various ssh hosts, ports, and keys.

I typically maintain an ssh-agent across multiple hosts, as well as various tunnels; reverse tunnels, and chained tunnels -- but I would like to simplify my normal ssh commands using an ssh config.

First, always remember to RTFM,

man ssh

This is an excellent starting point, the man page contains plenty of information on all the ins-and-outs of an ssh config.

To get started, simply create a plaintext file "config" in your .ssh/ directory.

Setting Defaults

$HOME/.ssh/config will be used by your ssh client and is able to set per-host defaults for username, port, identity-key, etc

For example,

# $HOME/.ssh/config
Host dev
    HostName dev.anattatechnologies.com
    Port 22000
    User twarnock
    ForwardAgent yes

On this particular host, I can now run

$ ssh dev

Which is much easier than "ssh -A -p 22000 twarnock@dev.anattatechnologies.com"

You can also use wildcards, e.g.,

Host *amazonaws.com *ec2.nytimes.com *.dev.use1.nytimes.com
    User root

which I find very useful for cases where usernames are different than my normal username.

Tunnels

Additionally, you can add tunneling information in your .ssh/config, e.g.,

Host tunnel.anattatechnologies.com
    HostName anattatechnologies.com
    IdentityFile ~/.ssh/anattatechnologies.key
    LocalForward 8080 localhost:80
    User twarnock

Even if you chose to use shell functions to manage tunnels, the use of an ssh config can help simplify things greatly.

Posted in shell tips, ssh

git, obliterate specific commits

I would like to obliterate a series of git commits between two points, we'll call these the START and END commits.

First, determine the SHA1 for the two commits, we'll be forcefully deleting everything in between and preserving the END exactly as it is.

Detach Head

Detach head and move to END commit,

git checkout SHA1-for-END

Reset

Move HEAD to START, but leave the index and working tree as END

git reset --soft SHA1-for-START

Redo END commit

Redo the END commit re-using the commit message, but on top of START

git commit -C SHA1-for-END

Rebase

Re-apply everything from the END

git rebase --onto HEAD SHA1-for-END master

Force Push

push -f
Posted in shell tips

vim: Visual mode

I have been using vim for years and am consistently surprised at the amazing things it can do. Vim has been around longer than I have been writing code, and its predecessor (Vi) is as old as I am.

Somehow through the years this editor has gained and continues to gain popularity. Originally, my interest in Vi was out of necessity, it was often the only editor available on older Unix systems. Yet somehow Vim nowadays rivals even the most advanced IDEs.

One of the more interesting aspects of Vim is the Visual mode. I had ignored this feature for years relying on the normal command mode and insert mode.

Visual Mode

Simply press v and you'll be in visual mode able to select text.

Use V to select an entire line of text, use the motion keys to move up or down to select lines of text as needed.

And most interestingly, use Ctrl-v for visual block mode. This is the most flexible mode of selection and allows you to select columns rather than entire lines, as shown below.
vim
In this case I have used visual block mode to select the same variable in 5 lines of code.

In all of these case, you can use o and O while selecting to change the position of the cursor in the select box. For example, if you are selecting several lines downwards and realize you wanted to grab the line above the selection box as well, just hit o and it will take you to the top of the selection.

In practice this is far easier and more powerful than normal mouse highlighting, although vim also supports mouse highlighting exactly as you would intuitively expect (where mouse highlighting enables visual mode).

What to do with a visual selection

All sorts of things! You could press ~ to change the case of the selection, you can press > to indent the selection (< to remove an indent), you can press y to yank (copy) the selection, d to delete the selection.

If you're in visual block mode and if you've selected multiple lines as in the example above, then you can edit ALL of the lines simultaneously. Use i to start inserting at the cursor, and as soon as you leave insert mode the changes will appear on each of the lines that was in the visual block.

Similarly, you can use the familiar a, A, and I to add text to every line of the visual block. You can use c to change each line of the visual block, r to replace the selection. This is an incredibly fast and easy way to add or replace text on multiple lines.

Additionally, you can use p to put (paste) over a visual selection. Once you paste over a visual selection, that selection will now be your default register (see :reg), which is extremely handy when you need to quickly swap two selections of text.

You can even use the visual block to limit the range of an otherwise global find and replace, that is,

:s/\%Vfind/replace/g

adding the \%V to the search limits the find and replace to the selection block.

More information is available in vim's help file,

:h visual-operators
Posted in shell tips, vim

vim: tags and tabs

Previously, I discussed using vim and screen with automagic titles.

The great part about working with vim and screen is that I can work from anywhere with minimal setup, and when working remotely I can pick up the cursor exactly where I left it -- I never have to worry about a remote terminal disconnecting.

I tend to avoid vim plugins as I like having a minimal setup on different hosts, I occasionally make an exception for NERDTree, but I find the default netrw easily workable. I keep my .vimrc and other dotfiles in github so I'm always a git clone away from getting my environment setup (in Linux, cygwin, OSX, etc).

With this in mind, I would like an easier way to navigate files in a project and if possible avoid non-standard vim plugins.

One of the most effective approaches I have found using the default (no plugins) vim is the combination of tags and tabs.

Tag files are generated by ctags (typically from exuberant ctags), which then vim can use as a keyword index into your source tree for any given project.

Generating Tags

I prefer to keep a single "tags" file at the root of each project directory, typically as follows,

$ ctags -R .

This will create a "tags" file in the current directory. For larger codebases these can get surprisingly large, but they are usually fast to generate. To manage these files you may consider using git hooks on

Telling vim about Tags

In vim you can load as many tag files as you like, the command is,

:set tags+=tags

where "tags" is the filename of the tags file.

The problem is, you won't want to type this every time you open vim, so add the following to your .vimrc,

set tags+=tags;$HOME

By adding the ";$HOME" to the set tags command, this will simply look for a "tags" file in the current working directory, and if it doesn't find one it will look in the parent directory and keep looking for a tags file all the way back to "$HOME". So if you're 10 directories deep within $HOME then it would search up to 10 directories looking for a "tags" file. You can replace $HOME with any base directory, in my case I keep all project source code in my $HOME directory.

Using Tags

Typing :tag text will search for files with the exact tag name, or you can use :tag /text to search for any tag that matches "text".

By default, vim opens the new files in a tag stack, you can use Ctrl-T to go back to the previous file -- alternatively you can navigate the files through the normal buffer commands, e.g.,

list open buffers
:ls

switch to a different buffer (from list)
:b #

unload (delete) a buffer
:bd #

You can also use put your cursor on the word you want to search and press Ctrl-] to go the file that matches the selected text, then use Ctrl-T to jump back. If you want to see all the files that match a tag, you can use :tselect text

I find navigating a tag stack and maintaining multiple buffers is a bit cumbersome, this is where tabs can really help out.

Using Tabs

Once you're in vim you can open a new tab with :tabedit {file} or :tabe {file} which will open the optionally specified file in a new tab (or open a new blank tab if no file is specified). Usually I use,

:tabe .

to open a new tab with the file browser in the current working directory.

With multiple tabs open you can use gt and gT to toggle thru the open tabs, or {i}gt to go to the i-th tab (starting at 1). You can re-order tabs using tabm # to a move a tab to a new position (starting at 0).

Most importantly, tabs work great with mouse enabled, simply click on a tab as you would intuitively expect, drag the tabs to re-order, or click the "X" in the upper-right to close.

Tabs meet Tags

I find the default tags behavior slightly cumbersome as I end up navigating the tag stack through multiple buffers open in one window.

When searching tags I want the file to always open in a new tab, or at least to open in a vertical split.

I have added the following to my .vimrc

map <C-\> :tab split<CR>:exec("tag ".expand("<cword>"))<CR>
map <C-]> :vsp <CR>:exec("tag ".expand("<cword>"))<CR>

This will effectively remap Ctrl-] to open the matching file in a vertical split. I can then close the vertical split or even move it to a new tab using Ctrl-w T.

However, mostly I use Ctrl-\ to open the matching file in a new tab.

Between tab and tag navigation I find this a very powerful way to manage even very large projects with default vim (rather than rely on an IDE).

Careful with Splits

One interesting thing about splits (vertical and horizontal, that is, :sp and :vsp) is that they will exist entirely within a tab window. In other words, a split occurs within only one tab.

You can close a split using Ctrl-w q, and if you need to navigate through multiple splits you can either use the mouse or Ctrl-w and then an arrow key (or h,j,k,l if you prefer).

In any given split, you can always move that file to a new tab using Ctrl-w T

Posted in shell tips, vim

vim and screen, automagic titles

Previously, I discussed using multiuser screen so that I could concurrently access a shared screen session across multiple remote hosts (from work, from home, from my phone, etc).

I would like to augment screen such that the titles would always tell me what directory I'm currently in, as well as what program is running (if any). Additionally, if I'm editing a file in vim I would like to see the filename in the screen window title. If I have multiple vim buffers open (say, in tabs) I would like the screen window title set to whichever filename I'm currently editing.

GNU screen provides a shelltitle attribute that can get us partly there, you could add something like this to your screenrc,

# automagic window title
shelltitle ") |bash:"

In this example, screen will automatically fill in any currently running shell command as the window title. Importantly, the ") " must be the final characters on your command prompt. For most people, this is the '$' character, mine is still set to the smiley() cursor discussed previously. Everything after the '|' character will be the default screen title.

Unfortunately, while this approach does provide us a dynamic window name for running programs it does not show us the current directory and does nothing for vim (other than just to say "vim"). This approach, which may work for some, turned out to be a dead end. I had been searching for ways to get screen to update the window titles to the current directory and had almost given up.

Recently, I discovered this article, which provides a working (albeit complicated) approach.

Essentially, in the newer versions of bash we can use the trap command DEBUG, which will run command before every single shell command!

Additionally, we can set a screen window title on the command prompt by printing an escape sequence then the new title. So, we can run a bash function in the DEBUG trap that sets the title.

Sounds easy? Well, not really. The DEBUG trap is a bit heavy handed and using it to print escape characters can have odd effects involving BASH_COMMAND and PROMPT_COMMAND. Here is a working solution I've been using,

# turn off debug trap, turn on later if we're in screen
trap "" DEBUG

...
... rest of my .bashrc
...

# Show the current directory AND running command in the screen window title
# inspired from http://www.davidpashley.com/articles/xterm-titles-with-bash.html
if [ "$TERM" = "screen" ]; then
    export PROMPT_COMMAND='true'
    set_screen_window() {
      HPWD=`basename "$PWD"`
      if [ "$HPWD" = "$USER" ]; then HPWD='~'; fi
      if [ ${#HPWD} -ge 10 ]; then HPWD='..'${HPWD:${#HPWD}-8:${#HPWD}}; fi
      case "$BASH_COMMAND" in
        *\033]0*);;
        "true")
            printf '\ek%s\e\\' "$HPWD:"
            ;;
        *)
            printf '\ek%s\e\\' "$HPWD:${BASH_COMMAND:0:20}"
            ;;
      esac
    }
    trap set_screen_window DEBUG
fi

In this case, I set PROMPT_COMMAND to true and make sure that my PS1 environment variable is not relying on PROMPT_COMMAND. The reason is because the BASH_COMMAND environment variable will be set to whatever the parent shell is currently running, and the DEBUG trap will fire every time the BASH_COMMAND changes (which is a lot, especially if you're executing a shell script).

Fortunately, anytime a command finishes, PROMPT_COMMAND will run, which in this case executes true, and I catch that in the case statement and set the title to the current directory. This effectively sets the title every time bash prints a command prompt.

If you execute a long running command, that screens window title will be set to that command, and as soon as the command finishes the title will change back.

The only remaining problem is vim. With the above approach, it almost works with vim. If you were in a directory named "foo" and ran "vim spam.txt", then the screen window title would be set to "foo:vim spam.txt". So far so good, but when you open additional files in vim, the title will still be say "foo:vim spam.txt".

.vimrc

The final step is to update your vimrc to set the titlestring, and with some tweaking vim will send the escape characters that screen recognizes to change the window title. Lastly, add an autocmd for all relevant events (opening a new file, switching tabs, etc), and you'll have a working solution,

" screen title
if &term == "screen"
  let &titlestring = "vim(" . expand("%:t") . ")"
  set t_ts=^[k
  set t_fs=^[\
  set title
endif
autocmd TabEnter,WinEnter,BufReadPost,FileReadPost,BufNewFile * let &titlestring = 'vim(' . expand("%:t") . ')'

* to type ^[, which is an escape character, you need to enter CTRL+V <Esc>

With this approach, while vim is running it will effectively take over the job of updating the screen window title, for example,
screen
As we switch tabs or open new files or change focus in a split screen, vim will update the screen window title to "vim(filename)" for the file that's being edited.

All of these changes (and more) can be found in my dotfiles in github

Posted in bash, shell tips, vim