Fast path finding with priorityq

The power of Nostalgia

I really miss Weewar!    Weewar for those who don’t remember was a very very simple and light themed, turn-based, strategy war game.   There have been many clones since then but nothing comes close.  Here’s a video to give an idea of the mayhem:

So it all came flooding back and I embarked on what I would do when I am flooded with nostalgia.  I recreate it!    Plus now I had the added incentive of exposing my kids to some good old fashioned fun and strategy!

Getting there fast!

The first thing I wanted to teach my kid was around the concept of finding your way, say in a maze.   And here again, Weewar provides us with good life lessons.   Weewar, like other turn-based war games, offers different kinds of units (eg Soldier, Tank, Plane, Mortar etc) that, like other turn-based war games, can move around on different kinds of terrain (eg river, mountain, plains, swamp etc).    Each unit has several movement points (MP) and each terrain type has different costs (that would eat away at a unit’s movement cost).    So clearly it is strategically paramount that you get your units to the enemy base as quickly as possible.

Given the different costs and movement points and speeds, it requires a bit of careful thought.   You cannot adopt an easy approach where you pick the terrain with the best cost and hope it would get you there as that could be a trap that leads you to a “costly” mountain!   There needs to be a way to compute the cheapest/shortest path between a given two points and this is where Dijkstra’s shortest path algorithm comes in!   It looks something like (courtesy of Skienna’s The Algorithm Design Manual):

In line 7, in order to find a vertex with the smallest distance, a Priority Queue is usually used.   And in line 8, as the distances of new nodes are modified, they will be reorganized within the Priority Queue based on these distances (this is the decrease-key method on a Priority Queue).

Current hurdles

A python implementation of the above algorithm highlights two problems:

  1. Finding an element is inefficient (O(n))
  2. Adjusting the position of an item in the heap after the item has been modified (the decrease-key operation) is not possible.

All that can be done now is the child node will have to be removed (which is an O(n) operation) and then have to be re-inserted into the heap.   In the python’s heapq module a node cannot also be easily removed.

PriorityQ to the rescue

To fix this I have developed the priorityq python module.   This module has the following advantages over the traditional heapq module:

  1. Finding a value is an O(1) operation
  2. Once a value is updated, it can be re-adjusted without a removal.
  3. The heap storage strategy can be changed transparently without needing to peek into the internals of the implementation.
  4. Seamless integration of custom comparators.

The new implementation with the priorityq module is very simple and can be found here.

How it works?

The reason this works is delightfully simple.  Current implementations refer directly to the object that is stored in the heap.   What PriorityQ does is slightly different.  Instead of dealing with the objects directly, the objects are wrapped in an opaque handle which is what is returned to the user (of the library).

The user, in turn, can pass these handles back to the queue to, well, do things!    Though a value can be modified, the handle remains the same and more importantly, the handle (opaquely) contains all the information required by the custom heap storage implementation to find and identify an existing element.

Benchmarks

The examples folder (in the package) contains a few examples that can be used on real use cases.   Thankfully the 9th DIMACS international challenge was just the arena that provided a few really good graphs.

The example can be run as follows:

python examples/dijkstra/runner.py      

eg:

python examples/dijkstra/runner.py examples/dijkstra/data/USA-road-d.BAY.gr.gz   10

The above runs 10 queries over randomly selected source and destination nodes in a graph that contains about 320,000 nodes and 800,000 edges.   The times (as expected) were proportional to the number of nodes processed (ie searched and adjusted).  Given the different heap storage types the following times were recorded:

Storage Strategy Total Nodes Processed Average Nodes Per Second
List Heap

3,800,404

3,610
Binary Heap

4,857,736

26,361

Binomial Heap TBD TBD
Pairing Heap TBD TBD

What coming next?

In the upcoming parts of this series, I will talk about:

  1. The JS (and Swift) ports of this library
  2. More heap storage strategies (like binomial, pairing and Fibonacci heaps).
  3. Adding more “standard” algorithms that can leverage the “find” and “decrease-key” features.

Useful links

Python Heapq Module

Java PriorityQueue

PriorityQueue Java Source

The skyline problem

Cyginstall – The Stateful Cygwin Installer

Hands up those who have tried installing cygwin on windows only to have the damn installer not completely downloading sources or you found that by selecting multiple mirrors, the installer would just get stuck on a slow or buggy mirror?

Well back up a bit.

Hands up those who were *forced* to use windows and hence had to install cygwin?

Me and me again!

At work I am on a windoze box while having to ssh into a linux box to do my development (let us not go there).  Luckily cygwin has helped me in the past but only because I managed to complete the installations out of sheer luck.   For some reason (may be limited intelligence) I could not select the fastest or the most reliable mirror and the installations kept bombing out after hours of what seemed like progress.  Seems wierd but I would have thought the cygwin mirror seeing what files it had already downloaded would have helped.  But no such luck (was it some option I missed)?

So I decided to write a downloader myself (you would still have to call the cygwin setup tool to install the sources).  It HAD to be stateful.  You interrupt it at any point and it should continue where it left of.  So out of that came cyginstall.

It also has a few other goodies:

  1. Multi threaded (defaults to 5)
  2. Web based UI
  3. Easily select or deselect mirrors
  4. Works through proxy with NTLM authentication.
  5. Better reporting of download status and mirror health.

So please give it a go and let me know what you think.  I managed to do a whole download but that does not mean there are no bugs so would appreciate it if you could let me know of any and all issues that need fixing.

Also the UI is very drab.  Very basic but drab, so any help or advice there would be gold.

Django/Appengine Coexistence, Part 1 – Environment Specific Settings.py

We’ve heard it all.  Appengine has uber scalability and that too on demand.  And it is based on Django.  Clarification – the python based SDK is based on django.  And yes it is a great environment to develop in.  No issues.  But the thing that still causes me a bit of concern is the datastore.  Oh it is fantastic, scalable and all that.  But the limits are still limits.  (I havent even mentioned the 1000 limit on blobs and files which is another big issue).  And also my main concern is if I want to commit myself to writing all my code for appengine and then be in a position where porting to django would involve a total rewrite. Ok there are projects like appenginepatch and so on, but to me its trying to mold one paradigm into another.  Why patch a key/value pair based datastore into an sql backend when each has their own strengths and you are better off writing interfaces that do it in specific ways so the same app at run time will choose the right implementation based on the environment rather than all the adapting of one paradigm to another?

So anyway, my M.O is to have a single project with the view and controller logic that is common for django and appengine but to have seperate model classes and APIs targetted to each backend.  The models are accessed via the API classes that do all django/appengine specific magic behind the scenes transparent to the views and the controllers.  I have talked more about this in a previous post.  However I did not provide a specific implementation.  This is now open sourced as part of the Djapps project.  At the moment this is still in its infancy but it is a collection of app to do a lot of common tasks like external authentication, provide model accessors transparently based on appengine or django and so on.  Still evolving and I will talk about this in future posts.

But to get started with all this, the first thing that needs to be worked on is settings.py.  For better or worse, this file is where all your app specific settings go.  So what I do is literally have a common_settings.py that i import from settings.py and have environment specific settings.py.  So I include one depending on what kind of environment I am on.  Environments for me would be “appengine dev server”, “appengine production server”, “local django server”, “production djanso server” etc.

So as an example, here is my settings.py.  Very stripped down:

# -*- coding: utf-8 -*-
import os
import djapps

# import all common settings - we will override them later on
from djapps.default_settings import *

# Make this unique, and don't share it with anybody.
SECRET_KEY = 'xxxxx'

# The root folder of the project
PROJ_ROOT           = os.path.abspath(os.path.split(__file__)[0])

TEMPLATE_DIRS = (
 # Put strings here, like "/home/html/django_templates" or "C:/www/django/templates".
 # Always use forward slashes, even on Windows.
 # Don't forget to use absolute paths, not relative paths.
 PROJ_ROOT + "/templates/"
)

import dev_profiles
# from dev_profiles.profile_dev_gae import *
from dev_profiles.profile_prod_gae import *
# from dev_profiles.profile_dev_nongae import *
# from dev_profiles.profile_prod_nongae import *

if not USING_APPENGINE: ROOT_URLCONF = 'myproject.urls'

Now in the above dev_profiles is the folder where all my different environment based settings files are (I call them profiles).   So an example env specific settings py (profile_prod_gae from above) is:

URL_PATH_PREFIX     = ""
 
USING_APPENGINE     = True
SITE_URL            = "http://myproject.appspot.com"
SITE_NAME           = "myproject"
SERVE_STATIC        = False
USE_TEST_LOGIN      = False
DEBUG               = True
TEMPLATE_DEBUG      = DEBUG
PROJ_STATIC_HOST    = "/static"
DATABASE_ENGINE     = ''
DATABASE_NAME       = 'myproject'
DATABASE_USER       = 'xxx
DATABASE_PASSWORD   = 'yyy'
DATABASE_HOST       = ''
DATABASE_PORT       = ''
FORMAT_PARAM        = "format"
 
EMAIL_HOST          = "localhost"
EMAIL_PORT          = 25
EMAIL_HOST_USER     = ""
EMAIL_HOST_PASSWORD = ""
 
MIDDLEWARE_CLASSES  = (
 'django.middleware.common.CommonMiddleware',
 'djapps.gaeutils.middleware.SessionMiddleware',
 'djapps.auth.local.middleware.LocalAuthMiddleware',
 'djapps.auth.openid.middleware.OpenIDAuthMiddleware',
 # 'djapps.auth.external.middleware.MultiSiteAuthMiddleware',
 'django.middleware.locale.LocaleMiddleware',
 'django.middleware.doc.XViewMiddleware',
)
 
TEMPLATE_CONTEXT_PROCESSORS = (
 "djapps.utils.context_processors.site_urls",
 "djapps.utils.context_processors.gae_local_auth",
 "core.context_processors.openid_user",
 'django.core.context_processors.request',
 "django.core.context_processors.debug",
 "django.core.context_processors.i18n",
)
 
INSTALLED_APPS = (
 'djapps.utils',
 'djapps.dynamo',
 'djapps.auth.local',
 'djapps.auth.external',
 'djapps.events',
 'profiles',
)

Thats it.  Nothing escapes customisation this way.  In case y ou are wondering.  The djapps.default_settings.py is just the settings.py that comes when you first create a django project (with startproject).  Atleast it is very close.

Mind you this has served me very very well on past projects and has not required any major rewrites.  One thing that does annoy me is the last line where I have to manually set the settings I have to choose (ie by uncommenting the line).  You can get around this by checking for an environment variable and importing one based on that but again just a personal preference.

Let me know how this goes for you.

Google App Engine

Well ive been spending the last week and a bit with my first AppEngine project.  I had recieved my account on the day it was released (yes a shameless boast there :D)…  Unfortunately due to exams never got around to doing anything with it.

So as a simple project I figured Il write (yet another online) chess game – WizChess.  Note that there is already a Blitz Chess as part of the appengine samples page.  I wanted to write one just to get a feel for the engine and to largely to learn a lot of things myself.  There are a still a few things I have not implemented (eg on a castle, the server updates, but the client does not properly update UI.  en-passante-s are not implemented.  Timed game not hapening yet, chatting between players not yet happening and so on).

It was a very interesting experience.  The documentation was quite nice and full, but still I was not fully satisfied.  For some reason I found django a lot more satisfying as a web-app dev platform.  For the models for the game I had chosen to follow a more django style (relational) rather than harnessing the rich model extensions provided by appengine (mainly the Expando Models).  I had done this as I wasnt sure if I wanted to seal my app purely to the AppEngine platform.  And after finishing the project, I am still not sure that I do.

I had got a whole bunch of puzzle games from here.  Now I had needed a way to update the database with an initial data set.  So my way was to have a url on the webapp (something like /resetdb) that would load the initial board templates (and in the future games that are saved and so on).  However, since this loading would take some time, I had to resort to breaking this up into several parts and calling each of these individually.  eg:  Id nomally have:

class ResetDB(webapp.RequestHandler):
    def get(self):
        for board in my_boards:
            add_board_to_db(board)

Now as the number boards in my data set increased, the time to put them in the data store would naturally increase.  But there seems to a time limit on each request, which would make my request fail.  So I had to resort to writing a client side script that would call ResetDB with parameters (eg, firstboard and lastboard) and then also break em up as more saved games were restored etc.  Absolute Pain!  Its all good for appengine to be smart about things but to impose assumptions severely limits the usefulness of this platform and I hope in the future this is released.

Next thing was the model apis.  The Expando model is awesome, but once again I was not happy about the lockin.  But this was not my main concern.  The querying api is extremely poor.  Django has amazing querying facilities where one can specify pretty much arbitrary query filters (not a and c or a minus x and so on).  AppEngine does not support “not”s in its query.  Aaaaaaaaaaaaaaaaah.   Also I couldnt find an easy way of chaining queries.  Eg, to find a game, where either player0 or player1 (my way of saying white or black) was user, I had to do this:

games_with_user = []
games = models.Game.all()
games_with_user.extend([g for g in games if g['player0'] = current_user])
games_with_user.extend([g for g in games if g['player1'] = current_user])

Yuck!  Same thing in Django happens with:

games_with_user = models.Game.filter(Q(player0 = current_user) or Q(player1 = current_user))

(ok I did that syntax from memory so it may not be right, but it is something similar).

Now the final that REALLY annoyed me was lack of Comet (or any reverse ajax) support.  Essentially I poll the server every 5 seconds to get updates on whether the opponent has played.  Would have been great for the server to notify the client instead of having to keep polling and waste requests (I believe there is a limit of 6.5 million requests a day or is that 650K requests?)

But having said all this, the main (only?) advantage of the platform is the fact that we dont have to worry about scalability.  Unfortunately this is a HUGE advantage.  Normally I could write an app, and host it on my own boxes at home (if i want it to be free), but then il soon be hitting limits with 10 users and Il have to find a hoster and also handle scalability issues myself.  So migrating my own code to appengine at THAT point would be quite a nightmare.  By writing and prototyping it on appengine allows me to fail fast.  Looking at the pricing, app engine would charge about $40 per month per app (with reasonable traffic).  If my app ever got to that stage, I could always port it out of appengine (as long as I dont use Expando and stick to django style object modelling – with which I dont loose any power).

Please let me know if you would like the sources.  I am not sure how to upload files to wordpress here.

Also I would really appreciate it if you could give me feedback on my learning here.  As a python and web-app noob, Id be grateful for pointing any errors in my ways 😀

C++ – Python Bindings

Version 3.0 of omnifs saw the onset of python bindings for the omnifs library (written in C++ using Curl). In this post I am going to talk about what I had learnt in writing python bindings for C++. What this means essentially is that a library written in C++ can now be used from python. This is ideal for situations where the core modules can be written in a more efficient/lower level languages and can be controlled in a higher level scripting language. This, some (including myself) believe would assist in testing and development in general.

I will be referring to libomnifs, part of the omnifs package, in this posting for any examples. I wont be showing the details of libomnifs, and will focus on what additions were required to enable python bindings.

I have used the Boost.python library for creating the python bindings to omnifs. There other ways of doing this, but I wont go into them here. I had chosen Boost.python as it was fairly well documented and easy to follow (plus it looked cool).

Prerequistes

  1. First download and install the boost library. I had used version 1.34.0. Please follow the installation guide for Boost on how to do this.
  2. Then read the bjam tutorial. bjam is the build system created and used by Boost. It is very very powerful and can make life a lot simpler (once you get the hang of it).
  3. Read the boost.python tutorial – This is a must. In fact this highlights everything one would need to create the bindings. I will only document what “extras” I have learnt in the process and will try to keep the repititions to a minimum.
  4. Install python if you do not already have it.

ONCE AGAIN – The boost.python tutorial is fantastic so please have a look at it.

Step 1: First Make a list of Exportable Classes

In my case, one of the classes I had chosen to export was OMNI_Action. OMNI_Action class is superclass of all actions that can be taken agains/on the omnidrive server. For example “ListFolder”, “UploadFile” etc.

The public interface of the OMNI_Action is (protected, static and private members are not shown):

class OMNI_Action
{
public: // Virtual methods and constructors
OMNI_Action(OMNI_Session &p_Session);
virtual ~OMNI_Action();
virtual bool Perform() = 0;

public: // Non virtual methods
int Result();
const char * Message();
OMNI_Session * Session();
};

I have chosen the Action object as the example (rather than the Session object which is required by all Actions) as it is a class that is extended and thus requires more work to create the bindings:

  • every OMNI_Action derived class, needs to only implement the Perform method (to *surprise surprise* perform the method).
  • Each OMNI_Action object also needs a OMNI_Session object – the session in which omnifs is running. T his is another class that is exported but not shown.

Step 2: Include necessary Boost headers

Include the required boost headers as follows:

#include <boost/python/module.hpp>
#include <boost/python/def.hpp>
#include <boost/python/class.hpp>
#include <boost/python/args.hpp>
#include <boost/python/overloads.hpp>
#include <boost/python/docstring_options.hpp>
#include <boost/python/enum.hpp>
#include <boost/python/pure_virtual.hpp>

#include <boost/python/return_internal_reference.hpp>
#include <boost/python/copy_const_reference.hpp>
#include <boost/python/copy_non_const_reference.hpp>
#include <boost/python/return_value_policy.hpp>

or alternatively include ALL python related headers by include just python.hpp as:

#include <boosts/python.hpp>

Also enable the use of the python namespace:

using namespace boost::python;

Note that this could be avoided if you prefer to explicitly qualify items with the python namespace each time.

Step 3: Create the module

All the generated python bindings happen inside a module definition. This is done as:

BOOST_PYTHON_MODULE(module_name)

{

// Put binding definitions here – ie class exports and method exports

}

This is it. no need for a main function or anything. Think of the BOOST_PYTHON_MODULE bit as the main entry point of the generated shared object.

Step 4: Export the Classes

As shown in the “Exposing Classes” section of the boost.python tutorial, exporting OMNI_Action is simply a matter of doing:

class_<OMNI_Action>("Action")

.def("Perform",
pure_virtual(&OMNI_ActionAuthenticate::Perform),
"The function that performs the specific action.")
.def("Result", &OMNI_Action::Result,
"Returns the result of the action after \"Perform\" is called.")
.def("Message", &OMNI_Action::Message,
"Returns the message associated with the return result \n"
"after \"Perform\" is called.")
.def("Session", &OMNI_Action::Session, return_internal_reference<>(),
"Reference to the session object on which the action "
"object is valid.")

;

Now there is on SMALL problem with the above. The Perform method is a pure virtual method. Which means that due to a “no implementation” the above class cannot actually be exported. In order to do this, a wrapper class (called ActionWrap) is created for OMNI_Action, and THIS wrapper class is exported instead as shown below:


struct ActionWrap : OMNI_Action, wrapper<OMNI_Action>
{
bool Perform()
{

#if BOOST_WORKAROUND(BOOST_MSVC, <= 1300) // Workaround for vc6/vc7
return call<int>(this->get_override(“Perform”).ptr());
#else
return this->get_override(“Perform”)();
#endif

}

};

This is it. Now we have provided a proxy implementation of the Perform method which essentially calls the derived Perform method when invoked and we have gotten ridden of the pure virtual method.

The exporting of this class is:

class_<ActionWrap, boost::noncopyable>(“Action”,
“A wrapper for classes deriving from the Action object.\n”
“The Action class is the super class of all classes that \n”
“perform work on the omnidrive server.”, no_init)
.def(“Perform”,
pure_virtual(&OMNI_ActionAuthenticate::Perform),
“The function that performs the specific action.”)
.def(“Result”, &OMNI_Action::Result,
“Returns the result of the action after \”Perform\” is called.”)
.def(“Message”, &OMNI_Action::Message,
“Returns the message associated with the return result \n”
“after \”Perform\” is called.”)
.def(“Session”, &OMNI_Action::Session, return_internal_reference<>(),
“Reference to the session object on which the action ”
“object is valid.”)
;

Note

  • how all the methods of the original OMNI_Action class are also exposed (ie Perform, Result, Message, and Session)
  • The noncopyable specifies that the ActionWrap cannot be copied or instantiated
  • The “return_internal_reference<>” specifies that the return value of the “Session” object is infact a pointer to a class member, which means that the lifetime of the return value (a session object) is tied to the lifetime of the Action object. So unless the Action object is destroyed, the reference count of the return Session instance cannot be reduced! Please refer to Call Policies in the Boost.python tutorial for more information on this.
  • The string values are essentially documentation strings in exported python module (docstrings).

Once this is done, the derived classes can be exposed normally.
Step 5: The Build Process

Once you have exposed all the classes in the above fashion, it is time for setting up your builds!

The builds are performed using bjam – the Boost build system. bjam looks for the file Jamroot in the current directory the same way that make looks for a Makefile.

bjam is a lot more complex than make and the bjam tutorial is an excellent source of all the information you are going to need.

I will go through my Jamroot file line by line:

1. Specify the Boost directory: This is the location where boost was installed.

use-project boost
: /opt/boost_1_34_9 ;

2. Specify the requirements and default builds. To make debug the default build, simply specify that in the default-build line.

project
: requirements <library>/boost/python//boost_python
: default-build release
;

3. Specify a dependency on the libomnifs C++ library. The C++ library is infact built in ../bld/<build_mode>

The following two rules mention that in the debug mode the debug build of the libomnifs is to be used and similar in the release mode, the release build of the libomnifs is to be used, as indicated by the “variant” flags.

lib libomnifs
:
: <file><locateion of omnifs source tree>/bld/debug/libomnifs.so <variant>debug
;

lib libomnifs
:
: <file> <omnifs_build_dir_path>/bld/release/libomnifs.so <variant>release
;

4. This specifies the installation target. Unlike in “make” where installations are arbitrary actions, they have a bit more meaning in bjam. These (in this rule) simply specify the installation location (in this case the current directory “.”) and types (install a library instead of an executable):

install dist : omnipy : <location> . <install-type>LIB ;

5. Finally the following specifies that the object being built is a python-extension library (of the name omnipy). Many other types of projects are also possible. Please refer to the bjam tutorial for a full list of build types. The .cc files are the files in which I had divided the export declarations. It is recommended to break up the class export declarations into multiple files so that changes in one class will not require the compilation of the entire source – there by speeding up the build process.

python-extension omnipy
: omnipy.cc omnipy-logger.cc omnipy-utils.cc omnipy-session.cc
omnipy-connection.cc omnipy-memory.cc omnipy-actions.cc libomnifs
;

Step 6 – In Action

Finally to see a glimpse of the library in action, simply start python and run the following:

— Import required modules including the one we just created

>>> import omnipy, sys, os

— create a session object by calling the Session constructor (refer to the actual distribution)
>>> session = omnipy.Session()

— create an actual action object – the action for creating a folder:

>>> action = omnipy.ActionCreateFolder(session, “new_folder1”, “”)

— Perform the action

>>> action.Perform()

and so on and so forth.

Please refer to the example.py and utils.py in the omnifs source distribution for more details.

Step 7: Gotchas and Todos

There a still a few things I havent yet figured out. One of them is how to export FILE * and streams so that we could use Python file handles instead. Once Im done with that Il put that up in here as well.


Well that concludes my little tip on how to create python bindings (more of a self-learning set of notes more than anything else). I hope this really saves you a lot of time. Please let me know if you find anything I am missing or forgot or just plainly got wrong. Id be glad to learn from it myself.