datsize, simple command line row and column count

Lately I've been working with lots of data files with fixed rows and columns, and have been finding myself doing the following a lot:

Getting the row count of a file,

twarnock@laptop:/var/data/ctm :) wc -l lda_out/final.gamma
    3183 lda_out/final.gamma
twarnock@laptop:/var/data/ctm :) wc -l lda_out/final.beta
     200 lda_out/final.beta

And getting the column count of the same files,

twarnock@laptop:/var/data/ctm :) head -1 lda_out/final.gamma | awk '{ print NF }'
200
twarnock@laptop:/var/data/ctm :) head -1 lda_out/final.beta | awk '{ print NF }'
5568

I would do this for dozens of files and eventually decided to put this together in a simple shell function,

function datsize {
    if [ -e $1 ]; then
        rows=$(wc -l < $1)
        cols=$(head -1 $1 | awk '{ print NF }')
        echo "$rows X $cols $1"
    else
        return 1
    fi
}

Simple, and so much nicer,

twarnock@laptop:/var/data/ctm :) datsize lda_out/final.gamma
    3183 X 200 lda_out/final.gamma
twarnock@laptop:/var/data/ctm :) datsize lda_out/final.beta
     200 X 5568 lda_out/final.beta
twarnock@laptop:/var/data/ctm :) datsize ctr_out/final-theta.dat
    3183 X 200 ctr_out/final-theta.dat
twarnock@laptop:/var/data/ctm :) datsize ctr_out/final-U.dat
    2011 X 200 ctr_out/final-U.dat
twarnock@laptop:/var/data/ctm :) datsize ctr_out/final-V.dat
    3183 X 200 ctr_out/final-V.dat
This entry was posted in bash, shell tips. Bookmark the permalink.

Comments are closed.