Sunday, March 21, 2010

Technology Review: Architecture Analysis in Visual Studio 2010

I recently had a chance to review Visual Studio 2010 RC. I was mostly interested in new architecture analysis capabilities that come in the form of Architecture Explorer and dependency graph support.

I started from downloading the Visual Studio 2010 RC virtual machine, available from Microsoft. Note that the size of the unrared archive is around 30GB and that when running it needs at least 1.5GB of memory to run smoothly, so keep that in mind before giving it a try.

The download also comes with a couple of hands-on labs. I was particularly interested in the ones that cover dependency graph generation and architecture explorer.

The starting point for using dependency graphs is the generator, available from the Architecture menu. You can generate graphs by assembly, namespace, type, method as well as specify visibility options. The graph that is generated is in DGML form (directed graph markup language). While I wasn’t too crazy about the graphics of the generated diagrams (they could use some polish), I was quite pleasantly surprised with the features of the DGML viewer. I found different diagram layouts (top-down, bottom-up, butterfly, clustering) particularly useful. Clustering will organize the nodes clustered around nodes with highest number of adjacent edges, which gives you a quick and easy way to identify “significant” or “critical” nodes. Butterfly will organize the nodes around a selected node into ones with incoming and ones with outgoing edges. Particularly useful to identify wrong direction dependencies on the architecture. I also liked the neighborhood mode where you can start from a node and only traverse up to a selected number of levels deep. On the other hand, I found the hovering over an edge until the two arrows and a plus show up a little clunky. I would have liked to have seen perhaps a mode where you can select two adjacent nodes and it would automatically select the edge that connects them and show some kind of icon to access the context menu. Still, it proved useful to create new diagrams by concentrating on the details of the connected nodes. I found the grouping feature in the graph viewer working well, although like I said graphics could use some polish. It allows you to show or hide the details by expanding and collapsing different groups such as assemblies, classes etc.

The architecture explorer feature complements the graph viewer well. It’s a tool that allows you to explore the solution and even external assemblies, traversing various relationships between objects, such as fetching classes, references, files, references to types, building call graphs etc. The end result of the traversal is a path that you can simply plot as a graph in the graph viewer. Again, my biggest complaint is the user interface. It seems they tried to stuff a gigantic piece of UI into a panel, completely unnecessary. Furthermore, it features auto-expanding and collapsing vertical areas that are only used for filtering the next node in the traversal. It resembles an accordion. Traversal goes from left to right, with each step pushing the UI further to the left making room for the next node and with that awkward vertical expand/collapse filter area between each step. In one word a disaster. Whatever happened to not cluttering the user interface? I would like to have seen a top-down, entire step collapsible like in MS Outlook, filter stays in one spot kind of UI that I could dock to the left side of the window and show/hide when needed. Left to right just doesn’t make any sense.

So, how useful are these features? If you’re doing a lot of static code analysis, somewhat. There have been other tools on the market for a long time, but still it’s nice to see a tool that comes with Visual Studio. It’s by no means any match for something like Resharper, but it can come in handy for quick architecture review. That said, that’s about the extent of its usefulness. I would love to have seen refactoring support built right into the graph viewer. Reversing the direction of dependencies is perhaps one of the most common and most important refactoring operations. It seems like the logical next step for Microsoft.

Tuesday, March 16, 2010

Value Proposition in Distributed Version Control Systems

While they've been around for a while, they haven't really come into focus until recently. It seems they're becoming more and more mainstream, pretending to completely replace centralized version control systems. Are we then witnessing a major paradigm shift in source control?

So, what brought this on? What triggered a wide-spread adoption of distributed version control systems? What caused companies to abandon their mature and well-established processes and seek new solutions for managing their codebases?

The answer could be as simple as critical mass. We may very well be at the tipping point. The way it stands today, it's a matter of time before all centralized version control systems become obsolete. Linus Torvalds called the Subversion project "the most pointless project in history". He then goes on to say that if you're using SVN you're "ugly and stupid". I wouldn't go that far, myself.

In order to understand where the need for a paradigm shift is coming from, we need to look at the value proposition from it and the key business drivers that led to the adoption.

The main difference between a distributed version control system and a centralized one is the lack of the central repository used by everyone. In fact, in distributed version control systems, each node can have a number of repositories. What that means is that you can put the repository on your laptop, take it on the road, have full source control capabilities locally and very fast, and then synchronize with other repositories at will. You can do check-ins, logs, diffs, all locally, without the need to be connected to a central repo. Another key differentiator is, due to it's distributed nature, you now have the ability to perform non-linear version control right in the local repo. This is a game-changer.
You really need to adapt your mindset to the new way of doing source control. The way of maintaining feature branches in a central repo is replaced by cloning, pulling/pushing and merging in a distributed version control system.

If we think of key business drivers behind this shift, we're mostly talking about the fact that larger and larger enterprises are becoming increasingly more agile. While agile has been around for a long time, it hasn't really become mainstream in larger enterprises or highly regulated verticals, both legislatively and industry-wise. And for a good reason. High level of regulation means rigid organizational process designed to ensure compliance. But, even highly regulated industries are ever so rapidly changing, and with that came the need for businesses to respond with adapting their rigid organizational processes. It makes all the sense in the world, then, that centrally managed, strictly governed version control processes need to be abandoned in favor of more flexible, agile-promoting, distributed version control systems, that offer far less friction and encourage feature-driven development.

So, how does Joe, chief decision-maker (CDM for short) of ACME Corp., decide to launch an initiative to investigate, evaluate, implement or adopt a distributed version control system?

He thinks in terms of short-term vs. long-term strategies. Short-term Joe would like to somehow marry the two in order to: a. make the transition from centralized to distributed frictionless; and b. get the benefits of both worlds as he sees that centralized worked well for so long and would hate to give up some of the great things it provides, but he would really like to get all the good stuff distributed has to offer. And Joe would be right. Keeping transition in mind is key for short-term success. Luckily, most distributed version control systems in use offer integration with popular centralized version control systems. Long-term, however, Joe sees the need to completely replace the existing centralized version control system in order to reduce the cost of managing two completely separate systems.

Best practice around piloting a project to adopt a distributed VCS starts from the analysis of your source control workflow. Every company has their own process or workflow around source control. They typically have commonalities, like branching strategies, release management etc., but on the other hand each is tailored to the needs of the organization. So, look at how you do version control at your organization and try to envision how it could be implemented using a distributed VCS.
Now that you figured that part out, start by defining the transitional process that includes both centralized and distributed version control. There you would typically define how code from the centralized VCS is imported into the distributed VCS, when, where from, which nodes, how often is it pushed back, by whom etc. An example would be, continue to create feature branches in SVN and import them on each node that needs to do check-ins, do local distributed check-ins, and push them to centralized repo once a day.
Once you have that in place, start to define the lateral processes i.e. start connecting the nodes more directly in islands first, without having to push to the central repo. A natural way to group nodes would be by teams/products. Try to assign root nodes to each group. While root nodes are against the distributed paradigm, they help to manage complexity of the network of repos while transitioning. What you want to do long-term is let the nodes assume leadership naturally.
Eventually, your network will consist of isolated islands of fully distributed version control, with central repo used for release management and such. Some companies will stop here and continue to use this process. That's perfectly reasonable. Others will see the need for further distribution of central repositories, offloading them to other "significant nodes".

Another thing you need to consider is which distributed VCS to use. You pretty much have two options: Git or Mercurial. These two are most popular and for a good reason. I'm not going to go into details about pros and cons of each, there are plently of online resources on the topic. I will say that I worked with both and while Git is more powerfull it is also harder to adopt and transition to from SVN, especially on Windows. Mercurial on the other hand is a little more out-of-the-boxy but slightly less flexible, and also easier to transition to. By all means, evaluate both.