Graph Search

I would like to discover paths between two nodes on a graph. Let's say we have a graph that looks something like this:

graph = {1: set([2, 3]),
         2: set([1, 4, 5, 7]),
         3: set([1, 6]),
         ...
         N: set([...] }

The graph object contains a collection of nodes and their corresponding connections. If it's a bi-directional graph, those connections would have to appear in the corresponding sets (e.g., 1: set([2]) and 2: set([1])).

Traversing this kind of data structure can be done through recursion, usually something like this:

def find_paths(from_node, to_node, graph, path=None):
    ''' DFS search of graph, return all paths between
        from_node and to_node
    '''
    if path is None:
        path = [from_node]
    if to_node == from_node:
        return [path]
    paths = []
    for next_node in graph[from_node] - set(path):
        paths += find_paths(next_node, to_node, graph, path + [next_node])
    return paths

Unfortunately, for large graphs, this can be pretty inefficient, requiring a full depth-first search (DFS), and storing the entire graph in memory. This does have the advantage of being exhaustive, finding all unique paths between two nodes.

That said, let's say we want to find the shortest possible path between two nodes. In those cases, you want a breadth-first search (BFS). Whenever you hear the words "shortest path", think BFS. You'll want to avoid recursion (as those result in a DFS), and instead rely on a queue, which in Python can be implemented with a simple list.

def find_shortest_path(from_node, to_node, graph):
    ''' BFS search of graph, return shortest path between
        from_node and to_node
    '''
    queue = [(from_node, [from_node])]
    while queue:
        (qnode, path) = queue.pop(0) #deque
        for next_node in graph[qnode] - set(path):
            if next_node == to_node:
                return path + [next_node]
            else:
                queue.append((next_node, path + [next_node]))

Because a BFS is guaranteed to find the shortest path, we can return the moment we find a path between to_node and from_node. Easy!

In some cases, we may have an extremely large graph. Let's say you're searching the Internet for a path between two unrelated web pages, and the graph is constructed dynamically based on scraping the links from each explored page. Obviously, a DFS is out of the question for something like that, as it would spiral into an infinite chain of recursion (and probably on the first link).

As a reasonable constraint, let's say we want to explore all the links up to a specific depth. This could be done easily. Simply add a depth_limit, as follows:

def find_shortest_path(from_node, to_node, graph, depth_limit=3):
    queue = [(from_node, [from_node])]
    while queue and depth_limit > 0:
        depth_limit -= 1
        (qnode, path) = queue.pop(0) #deque
        for next_node in graph[qnode] - set(path):
            if next_node == to_node:
                return path + [next_node]
            else:
                queue.append((next_node, path + [next_node]))
This entry was posted in python, software arch.. Bookmark the permalink.