Fast path finding with priorityq

The power of Nostalgia

I really miss Weewar!    Weewar for those who don’t remember was a very very simple and light themed, turn-based, strategy war game.   There have been many clones since then but nothing comes close.  Here’s a video to give an idea of the mayhem:

So it all came flooding back and I embarked on what I would do when I am flooded with nostalgia.  I recreate it!    Plus now I had the added incentive of exposing my kids to some good old fashioned fun and strategy!

Getting there fast!

The first thing I wanted to teach my kid was around the concept of finding your way, say in a maze.   And here again, Weewar provides us with good life lessons.   Weewar, like other turn-based war games, offers different kinds of units (eg Soldier, Tank, Plane, Mortar etc) that, like other turn-based war games, can move around on different kinds of terrain (eg river, mountain, plains, swamp etc).    Each unit has several movement points (MP) and each terrain type has different costs (that would eat away at a unit’s movement cost).    So clearly it is strategically paramount that you get your units to the enemy base as quickly as possible.

Given the different costs and movement points and speeds, it requires a bit of careful thought.   You cannot adopt an easy approach where you pick the terrain with the best cost and hope it would get you there as that could be a trap that leads you to a “costly” mountain!   There needs to be a way to compute the cheapest/shortest path between a given two points and this is where Dijkstra’s shortest path algorithm comes in!   It looks something like (courtesy of Skienna’s The Algorithm Design Manual):

In line 7, in order to find a vertex with the smallest distance, a Priority Queue is usually used.   And in line 8, as the distances of new nodes are modified, they will be reorganized within the Priority Queue based on these distances (this is the decrease-key method on a Priority Queue).

Current hurdles

A python implementation of the above algorithm highlights two problems:

  1. Finding an element is inefficient (O(n))
  2. Adjusting the position of an item in the heap after the item has been modified (the decrease-key operation) is not possible.

All that can be done now is the child node will have to be removed (which is an O(n) operation) and then have to be re-inserted into the heap.   In the python’s heapq module a node cannot also be easily removed.

PriorityQ to the rescue

To fix this I have developed the priorityq python module.   This module has the following advantages over the traditional heapq module:

  1. Finding a value is an O(1) operation
  2. Once a value is updated, it can be re-adjusted without a removal.
  3. The heap storage strategy can be changed transparently without needing to peek into the internals of the implementation.
  4. Seamless integration of custom comparators.

The new implementation with the priorityq module is very simple and can be found here.

How it works?

The reason this works is delightfully simple.  Current implementations refer directly to the object that is stored in the heap.   What PriorityQ does is slightly different.  Instead of dealing with the objects directly, the objects are wrapped in an opaque handle which is what is returned to the user (of the library).

The user, in turn, can pass these handles back to the queue to, well, do things!    Though a value can be modified, the handle remains the same and more importantly, the handle (opaquely) contains all the information required by the custom heap storage implementation to find and identify an existing element.

Benchmarks

The examples folder (in the package) contains a few examples that can be used on real use cases.   Thankfully the 9th DIMACS international challenge was just the arena that provided a few really good graphs.

The example can be run as follows:

python examples/dijkstra/runner.py      

eg:

python examples/dijkstra/runner.py examples/dijkstra/data/USA-road-d.BAY.gr.gz   10

The above runs 10 queries over randomly selected source and destination nodes in a graph that contains about 320,000 nodes and 800,000 edges.   The times (as expected) were proportional to the number of nodes processed (ie searched and adjusted).  Given the different heap storage types the following times were recorded:

Storage Strategy Total Nodes Processed Average Nodes Per Second
List Heap

3,800,404

3,610
Binary Heap

4,857,736

26,361

Binomial Heap TBD TBD
Pairing Heap TBD TBD

What coming next?

In the upcoming parts of this series, I will talk about:

  1. The JS (and Swift) ports of this library
  2. More heap storage strategies (like binomial, pairing and Fibonacci heaps).
  3. Adding more “standard” algorithms that can leverage the “find” and “decrease-key” features.

Useful links

Python Heapq Module

Java PriorityQueue

PriorityQueue Java Source

The skyline problem

A Mercurial Branching Workflow

The StreetHawk code is broken up into multiple repositories, the backend webapp, the iphone UI, the iphone API layer that accesses the database and handles all logical/operational aspects of the iPhone app, the android app (with the same breakdown), tools etc.  Up until now all code has been checked straight into the “default” branch (HEAD).  Going forward we have decided to adopt the following scheme inspired by and based on workflows prescribed by Steve Losh and Vincent Driessen.  I would particularly recommend Steve’s tutorial on branching which was very very critical in helping me understand the general branching techniques in mercurial.  If anything this post is a summary of their article to act as a “manual” for ourselves and is prone to change given all the nasties we may find in the future.

Command Summary

Firstly a recap of some of the branching and tagging commands that we will use later on.

Creating a new branch

# Get into the branch from which the new branch is to be created.  
# This would usually be "default" if it is a new future development or "stable" if a bugfix
hg checkout <parent_branch_name>
# Create the new branch
hg branch <branch_name>
# commit it - at this point it is a local branch
hg commit -m "Added <branch_name> branch"
# push it to the server to create a remote branch
hg push -b <branch_name> --new-branch

Merging from another branch

Time to time you may find the need to merge from another branch.  This is simple.  This will ensure the source_branch is merged into the target_branch.  NOT the other way around (even after you do a push).

# Get into the branch "into" which you want changes merged 
hg checkout <target_branch>
# merge from the source branch
hg merge <source_branch>

Closing a branch

Some times (quite often actually) you may find branches created unnecessary.  To remove the branch:

# Get into the branch you want to close (by now you can see the general pattern here!)
hg checkout <target_branch>
# close it
hg commit --close-branch

Tagging a branch

Tag a branch on every release.  Simply do:

# Get into the branch you want to close
hg checkout <branch_name>
hg tag <tag_name>

The convention used here is: <PUBLIC_VERSION>_<PRIVATE_VERSION>

PUBLIC_VERSION = <MAJOR_VERSION>_<MINOR_VERSION>_<BUGFIX_NUMBER>
PRIVATE_VERSION = <PURPOSE>_<BUILD_NUMBER>
MAJOR_VERSION - indicates a back incompatible version
MINOR_VERSION - new features added
BUGFIX_NUMBER - bugfixes updates
PURPOSE - purpose of the build - eg "APPSTORE", "TESTFLIGHT", "ADHOC" etc...
BUILD_NUMBER - The build attempt for a given PUBLIC_VERSION and purpose.

Branching and Release Workflow

In this scheme we have the following branches:

“default” – This the dev branch that contains all the checked in features and bugfixes that are *ready* to go into the next stable release.

“stable” – The branch that tracks the stable production releases.  Code from “default” is always merged into stable before a release.

“feature_xyz” – New features are added and developed on their own branches (eg feature_camera, feature_location etc) and merged into “default” upon completion but NOT before they are ready to be accepted into are release.  So only merge these into default when they are ready for the next release.

“bugfix_xyz” – Similar to the feature branches but these are usually branched from stable and then merged back into both “default” and “stable”.  These are usually urgent changes/fixes to be applied on released code.

Roles

Given the above branches, an appropriate workflow would contain the following roles (could be the same person):

Developer – Is responsible for adding new features and bug fixes and most likely would be adding to “default” or “feature” branches (or onto stable via bugfix branches).

For example to add the new “awesome” feature, one would do:
# get into default
hg checkout default
# do an update
hg pull -u
# create the feature branch
hg branch feature_awesome
# commit it - at this point it is a local branch
hg commit -m "Added <branch_name> branch"
# add the awesome feature
# do a few commits A, B, C
# push it to the server to create a remote branch ONLY 
# when ready for release
hg push -b <branch_name> --new-branch

Release Creator/Owner – Creates new stable production releases.  Note DO NOT do a rebase here as that would detach local changes.  We specifically want to keep track of all past activity.

# get the latest on all branches
hg pull -u
# get into stable
hg checkout stable
# Merge default into stable to get all production ready 
# features into stable
hg merge default
# Do the testing and when time to release, tag it (you are still in stable)
hg tag <tag_name>
# push the new tag
hg push