The Fool on the Hill: More on modelling rivers

I've been writing a lot about politics recently - and with reason. But now it's time to be getting back to writing about software, and, specifically, about river flows again.

Computed river map. Ignore the vegetation, it's run only a few generations and does not yet show natural patterns.

I wrote almost a year ago that I had had the first glimmer of success with modelling river flows. Well, some success was right, but not enough success. I didn't have a software framework in which I could model other things I wanted to model in my world, nor one with which I could play flexibly. I also - because I was working with maps of my fictional world, and not the real world - couldn't assess how well my algorithms were working, particularly as I had persistent diagonal artefacts.

So I decided to restart working in Clojure, using a cellular automaton, and with maps of Britain - maps I'm familiar with. I've written about that, too. From the point of view of modelling settlement, it's working well enough to be promising. So now I'm back to river flows, and once again I'm having 'the first glimmerings of success', and once again, frustrations. The picture above shows where I'm up to.

As you can see, I'm still using the 8Km granularity map. I can now actually compute water flows on the 1Km granularity map, but I haven't yet even tried to render them - it's a massive amount of data. But the 8Km map shows both that my algorithm partly works, and that it really does not work anything like well enough.

What works and what doesn't

What works? You can see the River Clyde from about Biggar down to the Falls of Clyde - but there, it disappears. Bits of the mouth of the Clyde are there, but not joined up. The lower Ayr is not only there, but identified as navigable. All of that is promising. I am getting some bits of some rivers showing up, and in the right place.

What doesn't work? Well, 8Km is a horribly wrong scale for doing this sort of thing. Even 1Km granularity is much too coarse, but attempting to map rivers on an 8Km scale... very few of our rivers have enough flow of water to show up. But worse, I haven't implemented 'laking' (as I did in the earlier, Java implementation). When a river arrives at a local low spot, it cannot flow up hill, so it just vanishes. At this coarse a granularity such local low spots are bound to exist. Actually, they will at any granularity - landscapes are fractal. So I need to rewrite my algorithm to deal with local lowspots - fill them up with water until it spills over and follows the rest of the valley down.

Visualisation of the problem. Numbers represent altitude, intensity of blue represents water flow.

This explains why my map of Britain shows so few rivers. The Spey is clearly visible from Aberlour down to Fochabers; the Tweed is a feature in Peebleshire; the Cree and the Esk are visible and recognised as potentially navigable. In England, the Tyne is entirely missing but the Tees is present and marked as navigable; a bit of the Ribble is recognisable as is a bit of the Mersey; tiny bits of the Thames show up around Oxford; and there's a river running north-south, west of London, that might be the Chess or the Misbourne. The Exe and the Tamar are also present. But several of the major rivers of Britain are not present at all, and many of those that are, are, like the Clyde, present only for parts of their course.

The Algorithm

So what is my algorithm? I had hoped, by this time, to publish all my source code so that you can see it. Unfortunately, when I went through the polite step of running this past my present employers, I hit a snag. Although this is - quite obviously - of no relevance to their business and of no interest to them, they claim that it is their secret, proprietary code. Which means that I and my present employers will part; but in the meantime (and possibly afterwards) I cannot publish this code.

I'm damned if I see why I should not publish a description of the algorithm, though.

Firstly, rainfall. I'm not (yet) trying to model the variability of rainfall. I could; last autumn's algorithm recognised that in middle latitudes the wind is primarily a coriolis wind, from the west. It carries damp air from the ocean to the west, and precipitates that rain as it cools - which it does as it rises over higher ground. So west facing slopes recieve more of the potential rainfall than east facing, highlands than low lands, western areas than eastern. We can verify those statements by looking at actual rainfall maps, and they're fairly easy to model. I have not done that yet; my present code just dumps one unit of rain on every cell of the map.

Then, from each cell, I compute the amount of rain which flows through that cell recursively by summing the flows through each of its neighbours which have a greater altitude. This is the simple-minded observation that water flows down hill. And, of course, it does. Obviously, again, the flow value at which I decide a river exists is arbitrary; for this map, I've shown a river where the flow through the cell is thirty units or above - which is to say, where the cell has collected the rainfall from at least thirty upstream cells.

There are two problems with this algorithm as it exists at present. When water in the real world hits a low point then - disregarding the amount lost to evaporation, which in Britain we can do, although in my fictional world I can't entirely - it builds up in a lake until it overflows the lip of the local hollow and continues. I had hoped that river valleys would show up sufficiently strongly that I could get away without laking, but sadly they don't. At the point the Tweed peters out on my map, the altitude of the last cell in which it is visible is 49 (arbitrary units - a monochrome heightmap gives me only 256 gradations of altitude, and sea level is 10). The altitude of the next cell down valley is 50. Given these are average elevations over an 8Km square, it's not in the least impossible that that's correct, but it stops the flow.

Worse still, the last recognised cell of the Ribble is at elevation 28. The next cell down valley - to which the river should flow - is also 28. The Clyde peters out at elevation 35, and the next cell down valley is also 35. It's inevitably the case that long rivers have sections in which they don't fall 5 metres in 8 kilometres, even disregarding the coarse granularity of my elevation data. But again, my present algorithm doesn't cope.

Parallelism and mutable data

There's a second problem, though. This one is not a modelling problem, it's a software framework problem. I'm working, now, in Clojure, and on the whole I find that extremely comfortable; it's a lovely language in which to write exploratory code. And I've been working with a framework which allows me to map a function across a two dimensional array, passing the function two arguments: the whole array, and the cell on which it's to operate. It returns a new copy of that cell, and the mapping operation constructs a new array from all the new cells. This has served me very well during my exploration of modelling environmental change and human settlement, because this process inherently allows great parallelism. On my eight core desktop machine, I can see 5 cores simultaneously maxed out processing a map.

But, the parallelism works because in Clojure data is immutable. In order to process the flow for each cell, I have to process, recursively, the flow in each cell upstream of it. But because the map is immutable, I can't cache those results on the map, so I can only return the one cell I was considering; all those partial results upstream are lost and must be recomputed when those cells are themselves considered. That's obviously wasteful, wasting all, and more, of the time I saved by paralellism. I haven't yet, however, worked out a practical way to pass those partial results round so that they don't need to be recomputed. That's an issue; it's part of the tradeoff of using immutable data, and also, possibly, part of the matter of learning better to use functional languages. But it isn't a blocker. While running this on a one kilometre resolution map is costly in time, it does run.

Laking

'Laking' can't easily be hacked onto my present algorithm. If you allow the algorithm to peak over hilltops as it works upstream, you have a halting problem. What is needed is a second pass which finds low spots where there is a flow of greater-than-river-flow value, and reworks from there.

One possible algorithm which I shall explore would then find the lowest adjoining cell and propagate the flow to that cell and so downstream. That isn't strictly laking, that's a minimal-uphill algorithm; but it would, I believe, more or less accurately model the river flows.

A slight variation on the minimal-uphill algorithm would be to lower the lowest adjoining cell until it is lower than the cell in which the flow has stopped; that's a slightly more natural solution, with a possible risk of creating gorges where there are no gorges. Again, that's something to explore.

The final workaround is true laking. Let the water level rise - in all cells with the same altitude - until a way out is found. What I worry about laking on my relatively coarse models is that I will construct enormous lakes in unlikely places. It's also likely to turn the whole of East Anglia and Lincolnshire into essentially one lake - but if one interprets these large areas of water on flat land as fen rather than lake, that's more or less right..

Progress is being - very slowly - made.

Mind you, if it takes me two years to work out how to map rivers, this huge game world - which I really know I'll never finish in my lifetime - looks even further out of reach.

The Fool on the Hill

Sunday, 28 September 2014

More on modelling rivers

What works and what doesn't

The Algorithm

Parallelism and mutable data

Laking

No comments: