Tuesday, 13 March 2018

Erlang slave nodes and ssh login shells

The Erlang runtime environment is more similar to an operating system than to a traditional language runtime library. An Erlang "node" is an Erlang instance started with the flag -name (or -sname, for "short names" if your network does not rely on DNS). For example:
$ erl -sname foo
will give you an interactive Erlang shell with a prompt like this:
Eshell V9.2.1  (abort with ^G)
and you can see from the prompt that this is running as a "node" with the node name foo@rocka. In practice, this means that networking is enabled, allowing Erlang processes on this node to communicate with processes on other nodes, either on the same host machine or on other machines, through the ordinary Erlang message passing mechanism.

Working with multiple nodes

If I open a separate console and start another node bar@rocka in the same way, I can then connect these two and start doing interesting multi-node stuff:
(foo@rocka)1> net_adm:ping(bar@rocka).
(foo@rocka)2> nodes().                                         
For instance, if I have implemented a module mysrv that runs a simple counter server, I can load the code onto the other node like this - even if the nodes do not share a file system - and start it on that node.
(foo@rocka)3> BeamFile = code:which(mysrv).
(foo@rocka)4> {ok, BeamBin} = file:read_file(BeamFile).
(foo@rocka)5> rpc:call(bar@rocka, code, load_binary, [mysrv, "", BeamBin]).
(foo@rocka)6> Pid = spawn(bar@rocka, mysrv, run, []).
The last command should print something like <8112.88.0> where the non-zero leftmost number in the Pid indicates that it runs on another node than the local one. We can try to request numbers from the server to check that it is running:
(foo@rocka)7> mysrv:next(Pid).
(foo@rocka)8> mysrv:next(Pid).
(foo@rocka)9> mysrv:next(Pid).
That's cool and all, but it requires that you first do the orchestration of starting the nodes on the various machines (and making sure they get restarted if need be). In some situations, it would be much nicer if one Erlang node could simply start another one, on demand. For example:
  • Having a node on one machine coordinate a number of worker nodes on a cluster of machines
  • Writing test suites for code that cooperates across two or more nodes
  • Protecting the main node from failure when running less dependable native code (NIFs or C drivers, or a WX-based GUI)
  • Not running certain operations in the same OS process context as the main Erlang node
And of course, you want to get notified if one of these nodes crashes, and you want them to disappear automatically if the main node dies.

Running slave nodes

Erlang has standard library support for starting and supervising such slave nodes, and it looks straightforward enough:
  slave:start(Hostname, Nodeame, ErlangArgs)
for example:
  slave:start("rocka", "baz", "-s io write hello")
For this to work, you need help from your operating system to actually get the new nodes running, and by default it is assumed that the command for doing this is rsh. (The original Unix rsh command is deprecated these days, but on many machines nowadays, the rsh name is simply a link to ssh.) You can however pass a flag to Erlang to tell it to use any other command, like this:
$ erl -sname foo -rsh /usr/bin/ssh
Easy! But there is a snag: the way this is implemented, you cannot pass additional flags to the command, and the default behaviour for ssh is to create a non-interactive non-login shell, if a command is specified.

This means that your attempt to start a node on a remote machine will probably fail because the command erl could not be found in the path, for a non-login shell (at least if you have standard .profile/.bashrc files). And -rsh "/usr/bin/ssh -l" does not work either, because the given string is treated by Erlang as a single path (which is allowed to contain spaces).

You may also be mystified by why the following might work:
(foo@rocka)1> slave:start("rocka","baz", "").
while this might not:
(foo@rocka)1> slave:start("localhost","baz", "").
bash: erl: command not found
and the reason is that if the given hostname matches the hostname of the local node, the use of the rsh command is bypassed, and the new node is simply launched as erl ... <UserArgs> instead of "<Rsh>" <Hostname> erl ... <UserArgs> - and you probably had erl in your current path already, so it just worked. Also note that there is no way of specifying an absolute path to the erl command - it has to be in the path.

To get around this, you could set up a specific environment on all the remote machines, either by customizing their ssh configurations, or by creating custom user accounts for running the nodes. But if you want it to "just work", using a regular user account with no special modifications, how can you get around these limitations?

Starting the nodes in a login shell

The answer is to use a simple wrapper script like this one, for rearranging the arguments and passing them on to ssh along with the -l flag:
    exec /usr/bin/ssh "$host" "/bin/bash -l -c '$@'"
Put the above in a file myrsh, make it executable with chmod a+x myrsh, and then start Erlang as follows:
$ erl -sname foo -rsh myrsh
That's all! Now it should work, both for "localhost" and for any remote machine that you can log in to (only assuming you have Erlang installed):
(foo@rocka)1> slave:start("localhost","baz", "").
You could augment the script as you please - for example, adding logging of the arguments being passed to the command, which was something I used quite a lot while debugging this stuff. The only tricky part of all of this was to figure out what was actually happening in the different cases, how the actual command line was constructed, whether there are any straightforward alternatives for getting ssh to behave like you want (not really), and whether your shell really should set up your user path only if it is a login shell (as far as I can tell, yes).

I hope this can be of help to anyone else who wants to try out the slave node feature, which understandably has been rather underused.

Monday, 3 November 2014

The Hobbit (1982 game) graphics engine in Javascript

One of the first computer games that really fascinated me was the 1982 text adventure based on The Hobbit, for the ZX Spectrum. I was 12 years old when I first played it, and it was immediately clear that this was different from any other game I'd seen. It was a text adventure, but some locations had illustrations in brightly coloured graphics, which although crude by today's standards really helped transport you to another world back in those days of mostly monochrome screens and text-only adventure games. There is much more to be said about what made the game so special - in fact, the graphics may be the least interesting point - but it's a starting point that will do for now.

There were over 20 illustrations in the game, each covering 2/3 of the screen area. The game itself only occupied 40K and a ZX Spectrum used just under 7K for the graphics memory (bitmap + colour information), so it was clear that you couldn't simply store whole images. Instead, they painted the illustrations from scratch each time, using a small set of graphics primitives, and each illustration was stored as a sequence of drawing instructions compactly encoded as bytes. Since it took many seconds to draw an image, you could easily follow what was happening and understand that this was how it worked.

I always wanted to see how their graphics engine worked, and make it possible to draw the illustrations without running the game in an emulator, but I have only ever found partial disassemblies of the game, and nothing at all about the graphics routines or the format they used. A week or so ago I decided to reverse-engineer it myself, for the hell of it. It turned out to be a bit tricky but not so bad after all. And of course, once I had figured it out, I wanted to reimplement it in a more portable language using a more portable graphics backend, i.e., Javascript and html canvas (something I've never had reason to touch before). This weekend I found some quiet time to implement it, and it works. As far as I have been able to tell, comparing to the game running in an emulator, the screens are rendered pixel perfect. If you too have been playing this game as a kid, you might find it interesting to look at the code on GitHub. Or just use the drop-down menu above to look at the images drawn in real-time in your browser, way faster than an old 3.5MHz Z80 could do it back in the day.

Monday, 30 December 2013

Rewriting history in Subversion with the help of Erlang

Note: This text was originally written for the redhoterlang.org blog in 2011, but since that is now defunct, I'm republishing it here to make it available online again. When I wrote this, we had only been using the resulting repository for a few months and our experience with git was quite limited. Now three years later, I can say that I'm very very happy indeed that I spent a lot of effort importing the entire development history into git. Being able to run "git log" or "git blame" and see every change back to when the initial code base was created almost 10 years ago is incredibly useful.

Our development department at Klarna has grown quite a lot the last year [2009-2010], and because we are trying to be as agile as we can, using scrum and kanban, this has meant more teams, more branches, and more coordination and merging of branches. When summer came, we knew we had a big batch of recruits due to start in August, and that we had to do something about our version control system before then, because it was getting to be a real problem.

Once upon a time, our R&D consisted of a handful of developers getting along fairly well using CVS. After a year or two, they took the step of converting the repository to Subversion, and that worked well enough for a couple of dozen developers, as long as we tended to follow the basic pattern of creating project branches from trunk and only merging them back to trunk when they were done. In Subversion, regularly merging out from trunk to a branch to keep it up to date works fine, but you really want to avoid merging a branch back to trunk except when you're done with it and you're going to remove the branch afterwards.

A typical problem we ran into was that we wanted to have separate per-sprint branches for each team, from which the developers in the team could create individual work branches that they merged back to the team branch at the end of the sprint. The team branches were then reviewed and tested, and reintegrated to trunk. But occasionally, a work branch was not mature enough to be allowed to be merged to the team branch in this sprint. Of course, you'd want to reschedule the work as part of the next sprint and keep the branch. But the team branch (from which the work branch was taken) will get reintegrated to trunk, and can't be kept around - Subversion gets severe hiccups if you try to keep working on it after reintegration. So what do you do with the work branch? There are a few tricks you can try in this case: some discard the history, and some call for manual tweaking of svn:mergeinfo. Neither is good, and both require time and effort by a local Subversion master.

Another annoyance was that Subversion had only had mergeinfo support for a short time, and most of our repository history had no mergeinfo properties. This meant that 'svn blame' was mostly useless, generally showing only who had performed the final merge to trunk. And even when you asked svn to do its best with the mergeinfo it had, it simply took forever to compute the annotations because of the amount of history it had to download from the server. The svn:mergeinfo properties themselves were also getting ridiculous: even though we only had had them for a year, the mergeinfo was already hundreds of lines long, due to all the work branches that had been created and reintegrated along the way. There were other problems as well, but to cut a long story short, Subversion no longer felt like a viable solution. Eventually we decided that Git was the way we wanted to go - Mercurial was the main contender - and that should mean that all we had to do was import the Subversion repository into Git using git-svn, right?

But things are never that easy. Far from it, in fact. Our initial attempts to use git-svn ran for a few hours and then aborted with some odd error message. Apparently, our Subversion history was just too complicated for git-svn to understand. We experimented with different options, tried using svndumpfilter to extract only the most important parts of the repository, and fiddled with graft points to recreate missing information. All to no avail - the result was so incomplete, it was not funny.

At this point, we could have decided to waste no more time and just make a clean import of the sources into Git, keeping the Subversion repository as read-only for historical purposes, but it felt worthwhile to try to import the entire history into git, so we would get full use of 'git blame' and not have to use a separate tool for digging up old details. There was only one way to do this: rewrite the Subversion history so that git-svn would accept it. But that was no simple task - there were many idiosyncrasies in our repository:
  • The first few thousand commits came from CVS, via cvs2svn.
  • At first we had a simple layout with trunk/tags/branches at the top level (and - probably because of the cvs2svn conversion - with an extra directory level under trunk, as in "trunk/foo/*").
  • Later, we reorganized the repository to use a typical multi-project layout: "foo/{trunk,tags,branches}", "bar/{trunk,tags,branches}", etc.
  • Some sub-projects that had originally been part of "foo", such as "foo/trunk/lib/baz" were moved out to be separate projects with their own trunk: "baz/{trunk,tags,branches}".
  • There were lots of commits where people had copied branches to the wrong locations (or copied from the wrong directory level) and then moved or deleted them again. On average, that probably happened once or twice each week. It's OK if you stay with Subversion forever, but if you want to change to another version control system, they become a big problem, making tools like git-svn very confused.
At first, I tried using a combination of svndumpfilter and various shell scripts and temporary files to pry apart the original commits, rewrite them, and join them again, but svndumpfilter is a very simple tool, and does no sanity checking: in general, you only discover consistency problems when you try to load the final dump file back into an empty repository and get an error message - usually after several hours of disk activity. (The original dump of our entire Subversion repository was about 10 GB in size.) The feedback loop was way too slow, and I had to change strategy. I had intentionally postponed thinking about writing my own down-and-dirty dumpfile filtering tool, because I knew that it could potentially be a huge timesink, but August was getting closer and I felt that I had no choice if we were to be able to switch to Git before the new recruits started working.

A good Subversion dumpfile filtering tool had to be able to:
  • Read dumpfiles, as created by 'svnadmin dump' and represent them internally in symbolic form suitable for easy matching and rewriting.
  • Write the possibly rewritten entries back to disk, in valid dumpfile format so that they can be read by 'svnadmin load'.
  • Track a lot of information to be able to perform consistency checks and give early warnings, in order to save time when writing a filter.
  • Handle reading and writing in a streaming fashion, since it was out of the question to load a 10 GB dump file into main memory. (I was doing this on my laptop, which had 4GB RAM.)
As it turned out, Erlang is a really good language for doing this sort of thing! I had expected that the parsing and output should be pretty simple, mainly because of Erlang's binary syntax, but the rest also turned out very well, even if I had to make a bunch of performance hacks. Most of all, the fact that you really want to work with structured symbolic data for a task like this, and not just plain strings, made it a great match for Erlang.

First, hunting down and deciphering all the information I could find about the Subversion dumpfile format and implementing a parser and writer for it, took only a couple of days, and as usual, the hardest part was figuring out the little undocumented things by parsing some real dump files, writing them back out, and trying to load them into Subversion again. Once I had that working, I started working on doing the filtering/rewriting while streaming.

It turned out that in order to do all the necessary consistency checks - for example, checking that if you copy the path "foo/bar/x" to "foo/baz/x" in revision 17, the parent path "foo/baz" actually exists at that point - you really have to keep track of all existing paths in all revisions. To see why, imagine that in revision 54321, you look up the path "foo/baz/x" as it was in revision 17, and copy it to "foo/fum/x". Furthermore, in each revision, there may be tens of thousands of paths that were live at that point. This calls for some serious sharing of internal data structures - just the sort of thing that functional language is good at! Basically, I needed to represent the structure of the repository in much the same way Subversion itself represents it: as a directory tree with file and directory nodes. Each revision is a pointer to the current top node in that revision, but most of the nodes are shared between revisions. (For speed, I used the process dictionary to map revisions to nodes - this still preserves the sharing.)

Obviously, you don't want to store the contents of the files, only their paths, but you do want to check that the MD5 sum of the contents matches what the dumpfile metadata says that they should be - and if your filter modifies the contents, the metadata needs to be automatically updated in the output.

Furthermore, it turned out that in order to be certain that path copy operations were correct, you want to compare the source and target MD5 sums. If the information in the dumpfile says that "in rev. 54321, we copy foo/baz/x of revision 17, whose MD5 is A1B2...FEDC, to foo/fum/x", then you want to check that the file at foo/baz/x in revision 17 really has this checksum. This means that for each path in each revision, you also need to track its current MD5 sum. That's even more data that you only want to store once.

Even though I kept both the paths and the MD5 sums as binaries, not lists, I kept running out of memory: there were lots and lots of occurrences of the same character sequences. To get around this, I had to cache all binaries in the process dictionary: each time a path or MD5 is read from disk, we look it up in the process dictionary: if it is found, we take the old copy and let the newer one be garbage collected. If it's not found, we store it in the dictionary. (Using atoms would not have been possible, because of the limitations on total numbers of atoms and lengths of the atom names.)

This, and some other small things, made it possible to stream the entire 10 GB Subversion dump file of our repository (more than 38 thousand revisions), in under an hour on my measly laptop, keeping all the necessary metadata in memory and sanity check every rewite I made. The program would fail quickly and report any inconsistencies, making it easy to keep adding rules until it all worked. In the end, a couple of hundred of individual rewrite rules were needed to clean up our mess. The result could be loaded back into an empty Subversion repository, and handed over to git-svn which imported it without complaints!

There was only one problem: the merge information was incomplete: for a large number of merge points, Git didn't know where it came from - only the log message hinted that a merge had occurred. If we had been using the standard svnmerge script for simplifying merges in Subversion, then git-svn should have been able to pick up the info from the generated commit messages - even from the time before svn:mergeinfo was implemented. However, we had for some reason always used a merge script of our own. Also, the script had been tweaked from time to time, so the commit messages it produced were not even always on the same form. I had to add functionality to the filter for extracting the correct merge information from our commit messages, and reinsert it as svn:mergeinfo properties for git-svn to use.

But there's another complication: for that to work, when you copy a path in Subversion for branching or tagging it is crucial that the copy gets the merge information from the source path. This meant that apart from MD5 sums, I now also had to track the complete merge information on all trunks, tags and branches, in all revisions. (Thankfully, there was no need to do this for all paths, just "root paths" - I scrubbed away all other svn:mergeinfo properties on sub-paths.) Once I had this working, things just fell in place, and git-svn was able to pick up practically all branchings and taggings that had happened, even from the bad old CVS days. And I was still running the entire conversion on my laptop!

We now had a complete and consistent history in Git, although it was not, strictly speaking, true: it looks as if we had the same nice directory layout all the way from the start. The drawback is that we can't check out a very old revision and expect it to build and run out of the box, because paths in Makefiles and in the code probably won't fully match the layout on disk. If that sort of thing is important to you, think carefully about each modification you make to the history: will it break an old build?

One of the first weeks in August, we announced that everybody should commit any work they had lying around, and after that point we disabled write access to the Subversion server, Friday afternoon. The following Monday morning, developers got to clone the new Git repository, and that was that. The first couple of weeks were a bit chaotic while people were getting to grips with Git, but we've never looked back.

The code I wrote for processing Subversion dumps is open source and can be found at https://github.com/richcarl/erlsvn. It is divided into a generic library module, and a user-defined filter module: there is a trivial filter example as well as a stripped-down version of the filter I wrote for our production repository. I've cleaned up the code a bit, so it shouldn't be too hard to read, but if you think some parts of it are ugly, remember that I was under hard time and space constraints when I wrote it. Feel free to improve on the interface as well as the coding style, and let me know it you use it, be it successfully or not.

- Richard

Monday, 3 January 2011

Guest blogging

[Edit: since the redhoterlang.org blog is now defunct, I have republished the text here.]

As mentioned in my previous post, I did some work this summer with manipulating Subversion dump files using Erlang. I've written a probably way too long text about this for the RedHotErlang blog maintained by my friend and colleague, the legendary Tobbe T.

If you're interested in this sort of thing - for example, if you're currently thinking about how to convert your messy Subversion repository to Git - you might want to read it. Or perhaps you just want something to help you go to sleep. Otherwise, you're forgiven if you don't follow that link.

Thursday, 23 December 2010

It's been a long time...

Wow - I've not updated the blog for a year and a half. Amazing how writing a book can turn you off all other kinds of written expression (apart from the odd Tweet). The book's been out for a month now and I'm quite happy with how it turned out. But following the release, I haven't had the energy to do anything creative when I get home after work, so I've just been playing video games and, well, not writing.

I've felt entitled to it, though: for the last year, I've not had time for anything else, so I've built up a huge backlog of books, films and games that I'm now suddenly able to start working my way through. The only problem with that is that it too can start to feel like work.

In particular, my long list of TODO-notes and feature requests for my pet programming projects has felt too overwhelming to try to pick up. In many cases, I've completely forgotten where I was, so it typically takes half a day just to look over the code and figure out how things work and where to continue hacking. But I'm building up to a point where I'm actually feeling the urge to program again. So maybe I'll be publishing some hacks again in a not too distant future.

It's been an eventful year in other respects as well: This summer I turned 40, and I and Elisabet got married in September (after 6 years of being engaged). At work, the company has kept growing like mad, and the size of our development department doubled (again), so we've been really busy. We also switched from Subversion to Git, and I should probably write about how we did that, because it turned out to be quite complicated.

Blogging is no longer à la mode, but I might be returning to it more regularly now that I have a life again. Meanwhile, have a merry Yule!

Monday, 22 June 2009

The midsummer report

While midsummer eve was as cold and foggy as any I can remember, these following days have been beautiful. The winds are still a bit chilly, which can be deceptive since it makes you forget how much sun you're actually exposing your poor skin to. I spent most of a day mowing the grass around the house, and got myself a really superb redneck tan.

An hour or two ago, a White-tailed eagle circled a few times over our house, as usual supremely unimpressed by the gang of annoyed seagulls that tried to chase it away. A cuckoo is, well, cuckooing, in the distance, and the water is glittering. In fact, it's bloody hard to get any real work done in a place like this.

Over and out.

Wednesday, 3 June 2009


When I planned my trip to California for the Erlang Factory conference, I followed a friend's advice and set aside a couple of days to go and see Yosemite Valley. I don't regret it - it was absolutely fantastic. Yosemite is about 4 hours drive east from San Francisco, up in the high Sierra Nevadas, and it's one of the most stunning landscapes you can imagine. Nothing I had read about it beforehand had prepared me for the feeling of actually being there and seeing this amazing valley in real life; not even Google Earth (although I recommend trying that for yourself - the 3D detail is really good). Of course I took a lot of photos, but it's not until now that I'm done going through the lot of them (discarding more than half), so I've waited with writing about the trip.

I stayed two nights in Curry Village, a camp site of permanent tent-cabins (complete with beds, but no heat) that are the cheapest option if you don't bring your own camping equipment and want to stay one or more nights in the valley. ("Cheap" of course being relative in a place like this; a night at the Ahwahnee Hotel will set you back $500, off season.) Since this was at the end of April - the 27th to 29th, more specifically - the nights could still be cold, but from what I could see on the web before the trip, it seemed like it would not be colder than 5-10 degrees centigrade. Well, it turned out that the 28th was a brilliant day followed by a clear, cold night, as can be seen from the graph here. My sleeping bag was not quite up to the task, and I can't remember ever shivering so much in the wee hours. Still no regrets, though.

One of the things with visiting Yosemite is that you get a whole load of instructions from pretty much everywhere about how to keep anything with a smell (food and other things) stored safely to avoid problems with bears breaking in to cars, tents, etc. It's still not likely that you'll actually see a bear, but they can come roaming around the camp in the nighttime, they have an acute sense of smell, and have been known to even bend up car doors. So, on the day I arrived, I made sure to empty the car and put everything in the bear-safe locker, but there was a fair amount of activity around and it didn't really feel like any bears would ever come near the place. After getting settled, I walked over to the food pavilion to grab some dinner, and with the sun beginning to fade, I went for a stroll afterwards to see a bit of the place before dark. Following the edge of a part of the camp that had been blocked off due to some rock slides earlier, I walked for just a few minutes, and as I stopped and looked up among the trees and boulders, I suddenly realized that I was looking straight at a bear. Less than 100 meters away, I'm sure it was quite aware of me, but it didn't so much as dignify me with a look, not even when I took some pictures. It just slowly sauntered away, apparently circling the camp. A good start to the visit, I thought.

I didn't have any specific plan for the next day; I had simply brought my walking shoes and thought I'd improvise. Mainly, I expected to follow some simple trail around the valley floor, looking up at all the scenery. On a whim, though, I dropped by the hiking equipment store after breakfast and asked one of the guys there if he could recommend any particular trail and maybe sell me a map. Thanks to him, I decided to go on a route marked as "very strenuous", all the way to the top of the Upper Yosemite Fall, which he said was perfect for the season (spring being when the waterfalls are the most spectacular) and would give me superb views, much better than from the bottom of the valley. So with a map in hand, off I went towards the start of that trail, picking up some food and drink on the way. It was a brilliant day, and instead of trying to describe the hike here, I'll just point you to the pictures I took. I started out around 10 in the morning, was up on the top at midday, and got back down again at four, very tired but happy.

After the second, very cold night in Curry Village, I had to drive back to San Francisco again, but first I made a stop at one of the few places where you can see giant sequoia trees. They were quite impressive, but I think the feeling that remained with me was one of sadness - none of the surviving trees were nearly as grand as the ones you could see in the black-and-white photographs on the signs, with lumberjacks posing beside huge sequoias that they were in the process of chopping down.