Monday, February 22, 2010

Technology Review: NServiceBus

This week I'll do the first in the Technology Review series. In this article I'll give you an overview of an open source service bus framework for .Net called NServiceBus. The man behind this framework is Udi Dahan, a well known name in the industry. You can find it at http://www.nservicebus.com/. So let's start with the intro.

1. Introduction to NServiceBus

Like I said, NServiceBus is an open source enterprise service bus implementation in .NET. If you're not familiar with the concepts around enterprise service bus, service bus architecture or service oriented architecture, I'll try to give you a brief introduction here. But I do encourage you to read up on these topics as this will be a very high-level description.

Enterprise service bus is an architecture that allows distributed message exchange between applications or services while at the same time offering the loose coupling of these services by providing facilities such as message routing, reliability and failover and supporting different message exchange patterns such as publish/subscribe, request/response etc.

The backbone of every service bus implementation is the communication/reliability infrastructure i.e the queuing infrastructure. NServiceBus is written for MSMQ, Microsoft queuing service on the Windows Server platform.

So, what do you do with this framework? In essence, you build communicating services on top of it. An example would be services in support of an online store such as ordering, billing, manufacturing, shipping etc.

2. Features

The core features of NServiceBus are the messaging capabilities. These include the publisher/subscriber and request/response message exchange patterns. Pub/sub support also allows you to manage the subscriptions and persist them in a built-in subscription store such as database or MSMQ, or implement your own. Request/response support allows you to use addressing to route messages to specific recipients and to send responses to the originators of the request.

It also supports a long-running message exchange pattern called a saga. A saga is essentially a long-running, persisted, message exchange protocol between services. You can think of it as service orchestration.

The messages that are sent over the bus can be implemented as .NET classes or interfaces.

Besides the core features, NServiceBus also allows for high configurability, building and wiring up objects using a dependency injection framework from an XML config file.

Along with the framework come a few utilities to help you get started. These include the generic host process for message handlers and the distributor utility used for load balancing.

3. API

The central interface in the NServiceBus API is IBus. This is your entry point to start messaging over the bus. IBus interface has methods that support all message exchange capabilities:
  • Publish - publishes a message on the bus
  • Send - sends a message to the destination
  • Reply - replies with a message to the sender
  • Subscribe - subscribes to a message
All of the methods on IBus are templated by the message type.

The main interface for message handlers is IHandleMessage, templated by the message type T. IHandleMessage derives from IMessageHandler which has one method, Handle. Implementers perform message handling in this method.

To start sending messages you need to get an instance of IBus. This is done with dependency injection. Based on the endpoint configuration in the config file, NServiceBus will build up an IBus object and inject it into your endpoint configuration object.

When it comes to configuring enpoints, you have a choice of a few built-in classes to derive from, AsA_Publisher for instance provides the endpoint configuration for a message publisher. The dependency injection is typically done based on the configuration in the XML config file. There you would configure the MSMQ transport properties such as the name of the queue, number of worker threads etc., as well as for clients the mapping of message types to queues.

When it comes to saga support, NServiceBus comes with a Saga base class, templated by the saga data type. Saga data type contains the data about the state of the long-running message exchange orchestration process.

4. Usability

This is where I have a slight issue with this framework. While the design is very clean and abstracted, the API takes getting used to. You definitely don't want to give this to a junior developer.

It relies heavily on dependency injection to bootstrap the core objects and configuration and it is not super-clear what's happening behind the scenes. For instance, you always get the IBus object from DI, and what it does internally is it scans the assembly for types that implement certain marker interfaces to perform the wire-up.

There's quite a bit of friction, too. For instance, the pub/sub example implements the endpoint configuration for a publisher by inheriting from AsA_Publisher, and implementing IConfigureThisEndpoint and ISpecifyMessageHandlerOrdering (last one being optional and used to specify the ordering). IConfigureThisEndpoint is a marker and it isn't very clear what its purpose is, until you read the documentation.

Similar concerns go for the IWantToRunAtStartup interface. Implementers of this interface are invoked on startup, if hosted in the utility host process and they are created using DI.

Another interesting thing in the design of the API in NServiceBus is the message handler interface IHandleMessage. It itself defines no operations, simply derives from IMessageHandler which defines the Handle operation.

Messages as interfaces feature is interesting and I would argue perhaps unnecessary. Allows you to define the message type as a .NET interface and use the CreateInstance method of the IBus interface to create it. CreateInstance doesn't allow for immutable interfaces for messages as message properties are assigned either from an action passed in or after the fact.

The saga support, again, a little weird. Using the base Saga class creates tight coupling with the framework. I would rather have seen a marker and dependency injection to provide the base implementation as an object.

5. Overall Rating

I'm not s huge fan of MSMQ. So, just based on that, I can't in clear conscience give a super-high rating to this framework. Not that it has anything to do with the quality of NServiceBus. But in general, MSMQ as an enterprise queuing infrastructure is at least questionable.

On one hand, the wide scope of various message exchange patterns supported make it a really good candidate for implementing a generic ESB solution. If you need a framework that can withstand change in your environment, be it introduction of new services or new messaging patterns, it is well suited. All the little utilities that are supposed to make hosting solutions based on it easier I feel actually make manageability worse, so this should be a consideration too. Not to mention issues around OS-level security and MSMQ configuration.

There's somewhat of a learning curve with NServiceBus, especially if you're new to concepts such as dependency injection, reflection, extension methods, generics. From that standpoint it isn't too well suited for a small-scale Q&D type project.

Here are my scores for NServiceBus:

Features: 7/10
Quality: 8/10
Usability: 5/10

6. Conclusion

I hope this review helps in choosing the right solution for your application. Keep in mind the other considerations you need to make, like cost, risks, technology and architectural alignment etc.

Sunday, February 14, 2010

Using Transactions with .Net/SQL Server

I was recently debugging a piece of .Net code that called into a legacy stored procedure that managed its own transaction. Turns out the .Net code was recently changed to utilize the .Net transaction manager, more specifically TransactionScope. And this is where things started falling apart...

If you're working on the .Net/SQL Server technology stack, you basically have two choices when it comes to transactions: use the .Net transaction manager or use native SQL Server transactions (i.e. T-SQL). Both are valid options, each more suitable in certain scenarios.

Here's how the .Net transaction manager works. In the transaction manager framework in .Net, SQL Server data provider acts as a durable transacted resource manager. What this means is that the transaction manager can do a two-phase commit with it if it needs to. Now, the transaction manager generally has two modes of operation: a so called LTM or lightweight transaction manager and a full-blown MSDTC-based distributed transaction. LTM-based transaction is essentially a single-resource manager transaction which can benefit from the optimization of a single-phase commit. The transaction manager also has a special mode in which it allows resource managers to upgrade its transaction to a distributed transaction, should more durable resource manager get enlisted, and only when needed.

The .Net transaction manager can be used in two ways: explicitly and implicitly. Excplicit transaction management is done using the SqlTransaction class. When doing explicit transactions, the caller has to create and commit or roolback the transaction. Implicit transaction management is done using the TransactionScope class. TransactionScope marks a block of code as part of a transaction. Depending on whether there is nesting (i.e. the method was itself called inside a transaction scope) or the ambient transaction was created in some other way, as well as the options passed to the constructor, the transaction scope may create a new transaction in the transaction manager or attach to the existing one. The transaction manager then in turn works with the resource managers for all connections established from the transactions scopes to manage the native resource manager transactions. When more than one resource manager is used in a single scope, the transaction is promoted to a distributed transaction and a full two-phase commit protocol is used. When using a single connection to a database no distributed transaction is created.

On the other hand, T-SQL based transaction management is done using BEGIN TRAN and COMMIT and ROLLBACK statements. Very important note here: T-SQL based transaction management applies only to local SQL Server transactions. No real nesting exists between .Net transaction manager and T-SQL based transactions!

So when do you use one and when do you use the other?

Generally, I recommend using the implicit transactions with .Net transaction manager. They provide a lot of flexibility in where the transaction is defined, hide all the complexity of managing a native resource manager transaction. However, there are cases when T-SQL transactions are required. Imagine if same T-SQL code needs to be shared by two different applications written on different technology stacks. Bringing transaction management down to the database is the only way to ensure consistency, especially if one or more technology stacks don't support transactions.

A typical scenario in using the TransactionScope would be when implementing a business layer that consumes multiple DAOs from the data access layer to perform an atomic operation. A good example would be a component for processing banking withdrawals and deposits. Placing the transaction scope where the data access operations are invoked makes it clear what the purpose of that scope is, while at the same time allowing for the possibility that the whole business layer operation is composed into a larger transaction scope in another business component.

Distributed transactions should be avoided. The performance and reliability implications simply negate the benefits. Not to mention there's a dependency on MSDTC middleware.

Microsoft does not recommend combining T-SQL transaction management with .Net transaction management as it may lead to inconsistent results. One of the common problems is that a sproc written with a T-SQL transaction performs a ROLLBACK, effectively setting @@TRANCOUNT to 0, followed by a transaction scope attempting to perform another rollback and failing. Another typical problem would be a transaction in a sproc doing a commit while the transaction scope performs a rollback due to an application level error.

If you're going to write a sproc in T-SQL that manages its own transaction, here's a good way to do it:

DECLARE @trancount INT

BEGIN TRY

  SET @trancount = @@TRANCOUNT

  IF @trancount = 0
    BEGIN TRANSACTION

  -- Perform some work

  IF @trancount = 0
    COMMIT TRANSACTION

END TRY

BEGIN CATCH

  IF XACT_STATE() <> 0 AND @trancount = 0
    ROLLBACK TRANSACTION

  RAISERROR

END CATCH
 
The previous T-SQL code will only start and commit/rollback its transaction if upon entering it the @@TRANCOUNT was equal to 0. If you call this sproc from .Net inside a transaction scope @@TRANCOUNT will be 1 and the sproc will let the .Net transaction manager handle the transaction. If you call it directly without an outer transaction it will manage its own transaction. Use this method only if you absolutely have to write T-SQL transactions. Otherwise, stick to .Net.

Saturday, February 6, 2010

Distributed Caching in Enterprise Applications

This week we're getting back to an architecture topic.

Caching is one of the most neglected architecture concerns in enterprise applications. Not susprisingly, most people will think of it simply as a cross-cutting concern, one which you're not to consider until late in the cycle as it's not really driven by business needs, drivers and strategy, but is rather a purely architectural consideration. Caching can however be a huge contributor to success on a project.

The main architectural drivers behind caching are quality attributes performance and scalability. How much attention you pay to these quality attributes will in large depend on the architecture method you're applying as well as the maturity of your architecture organization. In organizations that have a more mature architecture method, quality attributes are a core part of both architecture development as well as architecture evaluation processes (see ATAM). Needless to say, those organizations follow a process that guarantees that risks of failing to satisfy these quality attributes are caught early and mitigated accordingly. This is where caching comes into play.

Caching is one of the general approaches (or solutions) to mitigate the risks of poor performance or scalability. But caching is a very broad term. Following is an overview of main architecture considerations around creating a technical solution for a cache in an enterprise application.

First of all, we can't always address both performance and scalability effectively. So, it's important to separate these two quality attributes and clearly identify them, or rather clearly identify the risks associated to them. The meaning of these two quality attributes is well known, but it doesn't hurt to remind yourself. Performance is about response time, or how fast your application can process requests. Scalability is about the ability of the application to grow in terms of key resources (such as users, concurrent requests, data), while maintaining satisfactory performance.
Some caching solutions will address performance, some scalability, and some both. Just keep that in mind, concentrate on the quality attributes which present higher probability/impact risk in your application.

There are different types of caching. Most types are available on all technology stacks. Most types are equally well aligned with all architectural styles. So you have a few to choose from, no matter what technology you're working on.

A by far predominant type of caching in enterprise applications today is distributed caching. Distributed caching involves storing the cached data on a separate tier, from your application tier. This offers several advantages: you're offloading the storage of the cached data to a separate tier, therefore leaving more resources in the application tier for your application; you're allowing for scale-out approaches in the cache tier independent of the application tier, therefore maximimzing scalability while minimizing the cost etc.

When talking about cache organizations, two main types are in use: replicated and partitioned. Both have a very specific purpose. Replicated cache clearly causes the copies of data to exist on multiple servers, while partitioned spreads the data across the cluster. The cases where replicated makes more sense is when you need to cache data that is mostly-static, used frequently and you want it cached very near the place where it's used, for instance in a process on the same server. On the other hand, most business data falls into the category of semi-volatile data that changes on occasion and is accessed on occasion. This type of data fits best in the partitioned distributed cache, where data is stored on a separate tier, equally accessible to all servers in the application tier. And then there are hybrid approaches or multi-tier caching, where data can move from one cache tier to the next depending on it's use.

What are some of the solutions you should consider? The answer to that depends on the technology stack. Some stacks have solutions that fit naturally, for instance Oracle Coherence on the Java stack. If you're looking for custom off the shelf solutions, Microsoft is coming out with it's own distributed cache server called Velocity. By far my favorite distributed cache solution is based on the open-source tool called memcached.

Memcached is an ultra-fast distributed hash implementation. It works with streams of bytes and is accessed over a TCP-based protocol. It's implemented on both Unix-like OSes and Windows and is very commonly used to implement a general purpose distributed cache. Here's how a memcached-based distributed cache works: it's based on a two-level hash. All servers in the cache tier are organized into the first level hash, commonly a consistent hash. All data within one memcached server is organized into second level hash. This way the entire cache cluster behaves like one big distributed hash. Communication is TCP-based so it's ultra-fast and it scales almost linearly (up to a certain point of course when network resources become the bottleneck). It's not perfect, though. It's missing some very important features other COTS distributed cache solutions provide. For instance, there's no inherent locking. While this promotes good performance, it also presents a challenge when working with scaled-out application clusters. Some type of locking logic typically needs to be implemented on top (at the application layer). Another disadvantage is that there's is no built-in failover. When a memcached server goes down, that entire chunk of cache is invalidated. There are known techniques to introduce backup capabilities to memcached by doubling the cache space and this can solve the failover issue but it does require more resources.

So, how do you use something like memcached? Again, depends on the technology stack. Typically, some support for caching would be implemented as a crosscutting layer in your application. A facility that allows your application to utilize the cache both for business objects as well as supporting data. If we're talking about application architecture, more specifically layered architecture, one would usually provide some kind of caching at the entity or data object level. Retrieving objects from the distributed cache rather than from the database can reduce the load on your database by an order of magnitude. It also may or may not improve performance, depending how well your application performed to begin with.

This was just a brief overview of some of the architecural considerations involved with creating a cache solution. Which product/solution you apply is going to depend on many factors: organizational policies, cost, architectural alignment etc.