Activity Workshop
 

OSM Wrangler 2

Further development

The first version of the OsmWrangler is obviously quite limited in what it can do with the Osm file, and deliberately so - it was meant just to "scratch an itch" of removing unwanted amenity nodes from the maps being loaded into my GPS. But since then some other itches have appeared and therefore some work was started on a more comprehensive filtering tool.

The new version isn't in a state to be released yet, and it's not yet clear whether it will be. It's written in python rather than in java, mainly because python is fun to program in and its regular expression matching and list processing are quite convenient for this kind of thing. So it's starting again from scratch with a much wider set of aims:

Some of these functions are already implemented, such as the lat/long filtering and tag filtering for nodes and the squishing. But removing and/or modifying the ways based on which nodes have been removed is a bit of a trickier problem, and it's not clear the best way to handle that at the moment. Should the whole way be deleted, should the removed nodes be removed from the way, or should the already removed nodes be somehow retrieved and put back in to make the ways complete?

Reasons not to do it

A bigger problem threatening the completion of this python-based OsmWrangler is that it turns out that a lot of its intended applications can already be done by other tools, so it was more or less decided to shelve the project. For example:

That means I can already extract ways and relations and convert to GPX for use in Prune and other programs. And I don't feel the urge to split out my own sections of the planet.osm when the cloudmade downloads are so convenient. And all that together put the python version of OsmWrangler on hold, at least temporarily.

Reasons to do it after all

After deciding to leave it, and then thinking about the problems some more, there seemed to be good reasons to extend it after all. These included considerations such as:

Current status

It's still an early prototype, but this command-line only, python-based tool can now not only filter on lat/long and node attributes (thereby already reproducing what version 1 can do and much more), but it also filters out the ways and relations in this function. If an included way contains nodes which didn't pass the node filter, then the extra nodes are included in the output, but ways containing 1 or fewer filtered nodes are removed. So the resulting OSM file is consistent.

Also any relation can be extracted to GPX format, and the nodes are given in the order in which they appear in the relation. Duplicate nodes are removed and ways are reversed as necessary to simplify the output. Relations containing relations are also handled.

Also the squishing is working, obviously only for the osm output, and it reduces the space quite nicely. On the other hand, some tools require the version attributes to be present for it to be considered valid osm format, so squish at your own risk.

There's still plenty to do before release and it seems that development has stalled due to scalability and performance issues. If you have any suggestions or feedback in the meantime then please get in touch by email.