Thursday 19 April 2018

Mastering Revision Control

This morning, browsing Twitter as one does, I came across this tweet:
I initially assumed that the tweet was meant satirically, but it seems it was not so. After a wee bit of discussion, someone asked me to explain why this is a really bad idea:
So I shall.

Alice and Bob are working together on the same software project. Each makes a copy of the files of the project in their local filestore, which these days tends to be on their own machine. These days, indeed, they tend to make a clone of the repository on their local system, and this is a good thing - but for the present argument it's a detail.

Alice makes changes in two files, foo.src and bar.src; she saves these locally, runs tests locally, checks they pass, and commits her changes to revision control. The continuous integration system pulls the commit, builds it, runs the tests, verifies they pass, and all's good.

Bob makes changes to two files, bar.src and ban.src; he saves locally, runs tests locally, checks they pass, and tries to commit to revision control. And of course he can't immediately because Alice has already changed bar.src, so he has to pull Alice's changes, do a merge, fix the resulting issues and rerun the tests.

Now, that's the game. That's what we all do when working on collaborative teams. We try not to work simultaneously on the same bit of the system; we try not to trample on other people's work; but it happens. So Bob only grumbles a bit.

But it's late in the evening, Bob's tired. It's time to knock off. What to do? Well, in any sane shop, he's working in a feature branch so he pushes his feature branch up to the server, and goes home. But in this shop, management has dictated that they will ignore the last thirty years of software practice and experience and do everything in master. So Bob commits his work in progress to his local clone of the repository, and goes home.

Morning dawns bright and early, and Bob's in work raring to go. He powers up his machine, and... nothing. His local hard disk has died. Doesn't matter, all his work's in... oh.

That's a day's work lost.

Meantime, Clarice is working in another part of the codebase, on a problem where the most efficient implementation isn't obvious. She builds an implementation, and it passes the tests, but she's not convinced it's optimal. So she commits her implementation to revision control. She then starts working on an alternative solution, completes it, and it passes all the tests, too. She wants to commit this to revision control, too, but management have dictated that everything shall be done in master, so she can't. One version or the other can be current.

So what does she do? Overwrite her first solution? Abandon her second solution?

Yes, of course she can commit her second over her first - the point of revision control is that you don't lose stuff - but in practice that means a decision is taken to prefer the second solution, because the nature of the growing edge of a software project is that a commit from several commits back on a branch is not going to be promoted to the head of the branch unless you have very serious breakage.

Any software development which cannot be wholly automated involves design decisions and uncertainty. The cost to the developer of being experimental - of trying one idea, seeing under what circumstances it works well, seeing under what it works poorly, and trying another - needs to be as low as possible. If management puts needless difficulty in the way, that's bad management.

Software development time is expensive. Practices which cause work to be lost are highly undesirable. It needs to be as easy as possible to make sure work is not lost.

Git - and other modern revision control systems - are the product of decades of hard won experience. What we've learned over those decade is that branching is good. Feature branches are especially good. They prevent trampling over other people's changes, and reduce stress and conflict in the team. They make it very easy to track which features are in the build. They make committing and pushing low-stress, low cost activities.

Of course it should be the responsibility of every developer to ensure all tests pass before merging a feature into develop, developers are human and sometimes things go wrong. We always need a branch which we know is production ready code, from which the production system can be rebuilt in case of failure. That's what, conventionally, the master branch is; and the whole point of having a develop branch is to have a branch into which all changes are merged, which can then be tested to demonstrate that it is production ready before merging into master.

No comments:

Creative Commons Licence
The fool on the hill by Simon Brooke is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License