Tuesday, March 16, 2010

Value Proposition in Distributed Version Control Systems

While they've been around for a while, they haven't really come into focus until recently. It seems they're becoming more and more mainstream, pretending to completely replace centralized version control systems. Are we then witnessing a major paradigm shift in source control?

So, what brought this on? What triggered a wide-spread adoption of distributed version control systems? What caused companies to abandon their mature and well-established processes and seek new solutions for managing their codebases?

The answer could be as simple as critical mass. We may very well be at the tipping point. The way it stands today, it's a matter of time before all centralized version control systems become obsolete. Linus Torvalds called the Subversion project "the most pointless project in history". He then goes on to say that if you're using SVN you're "ugly and stupid". I wouldn't go that far, myself.

In order to understand where the need for a paradigm shift is coming from, we need to look at the value proposition from it and the key business drivers that led to the adoption.

The main difference between a distributed version control system and a centralized one is the lack of the central repository used by everyone. In fact, in distributed version control systems, each node can have a number of repositories. What that means is that you can put the repository on your laptop, take it on the road, have full source control capabilities locally and very fast, and then synchronize with other repositories at will. You can do check-ins, logs, diffs, all locally, without the need to be connected to a central repo. Another key differentiator is, due to it's distributed nature, you now have the ability to perform non-linear version control right in the local repo. This is a game-changer.
You really need to adapt your mindset to the new way of doing source control. The way of maintaining feature branches in a central repo is replaced by cloning, pulling/pushing and merging in a distributed version control system.

If we think of key business drivers behind this shift, we're mostly talking about the fact that larger and larger enterprises are becoming increasingly more agile. While agile has been around for a long time, it hasn't really become mainstream in larger enterprises or highly regulated verticals, both legislatively and industry-wise. And for a good reason. High level of regulation means rigid organizational process designed to ensure compliance. But, even highly regulated industries are ever so rapidly changing, and with that came the need for businesses to respond with adapting their rigid organizational processes. It makes all the sense in the world, then, that centrally managed, strictly governed version control processes need to be abandoned in favor of more flexible, agile-promoting, distributed version control systems, that offer far less friction and encourage feature-driven development.

So, how does Joe, chief decision-maker (CDM for short) of ACME Corp., decide to launch an initiative to investigate, evaluate, implement or adopt a distributed version control system?

He thinks in terms of short-term vs. long-term strategies. Short-term Joe would like to somehow marry the two in order to: a. make the transition from centralized to distributed frictionless; and b. get the benefits of both worlds as he sees that centralized worked well for so long and would hate to give up some of the great things it provides, but he would really like to get all the good stuff distributed has to offer. And Joe would be right. Keeping transition in mind is key for short-term success. Luckily, most distributed version control systems in use offer integration with popular centralized version control systems. Long-term, however, Joe sees the need to completely replace the existing centralized version control system in order to reduce the cost of managing two completely separate systems.

Best practice around piloting a project to adopt a distributed VCS starts from the analysis of your source control workflow. Every company has their own process or workflow around source control. They typically have commonalities, like branching strategies, release management etc., but on the other hand each is tailored to the needs of the organization. So, look at how you do version control at your organization and try to envision how it could be implemented using a distributed VCS.
Now that you figured that part out, start by defining the transitional process that includes both centralized and distributed version control. There you would typically define how code from the centralized VCS is imported into the distributed VCS, when, where from, which nodes, how often is it pushed back, by whom etc. An example would be, continue to create feature branches in SVN and import them on each node that needs to do check-ins, do local distributed check-ins, and push them to centralized repo once a day.
Once you have that in place, start to define the lateral processes i.e. start connecting the nodes more directly in islands first, without having to push to the central repo. A natural way to group nodes would be by teams/products. Try to assign root nodes to each group. While root nodes are against the distributed paradigm, they help to manage complexity of the network of repos while transitioning. What you want to do long-term is let the nodes assume leadership naturally.
Eventually, your network will consist of isolated islands of fully distributed version control, with central repo used for release management and such. Some companies will stop here and continue to use this process. That's perfectly reasonable. Others will see the need for further distribution of central repositories, offloading them to other "significant nodes".

Another thing you need to consider is which distributed VCS to use. You pretty much have two options: Git or Mercurial. These two are most popular and for a good reason. I'm not going to go into details about pros and cons of each, there are plently of online resources on the topic. I will say that I worked with both and while Git is more powerfull it is also harder to adopt and transition to from SVN, especially on Windows. Mercurial on the other hand is a little more out-of-the-boxy but slightly less flexible, and also easier to transition to. By all means, evaluate both.

No comments:

Post a Comment