Saturday, January 9, 2010

The Broken Promise of ORM Frameworks

How many times have you heard the words: "NHibernate writes better SQL than I do"?

I've heard those words a number of times, mostly from people who proud themselves for being masters of domain driven development. So what's wrong with that statement?

It's actually pretty accurate. People who would say something like that probably do write bad SQL, so it is very likely that a framework would write better SQL than them. But there's an often overlooked fact hiding in that statement, one that demands architectural consideration. And that is, not only is NHibernate writing SQL for the developer, it's doing it at runtime. For an architect, that's a game-changer.

There really is no such thing as "better SQL". Seemingly well written statements can take down the database server if the schema is not optimized for them. Including those written by NHibernate. So saying that a framework would write better SQL than a developer is more often used as an excuse to not look at query plans. Query plan analysis is one of the core competencies of developers, some would say database developers, I'd actually say all developers. Every developer needs to be able to recognize poorly optimized schema when they see a key lookup operator or an index scan operator, let alone a table scan or unnecessary joining.

But ask yourself, how can a developer look at query plans if they don't know which SQL queries execute? Why wouldn't they know? Well, if they let NHibernate generate them at runtime, they have no way of telling which SQL queries will execute at build time. They would have to trace the execution at runtime, capture the SQL, then figure out if it's optimal. While that can result in catching all errors, there's a fundamental problem with this process: typically, SQL code review is done by people with specific competencies, being that those people cost more and there is generally fewer of them. So, having to trace the execution of applications to figure out what statements execute isn't very efficient, especially in large-scale development projects.

It all comes down to quality and risk. In order to ensure high quality, a project should follow a certain well established process, and whether industry standard or company-specific, that process will surely involve a review of the performance of data access components. I haven't yet seen an efficient and precise process in which the target keeps moving. And runtime SQL generation is a moving target from the code review perspective.

But there's another point here. Just because a framework like NHibernate generates SQL at runtime doesn't mean that it prevents people from creating ad-hoc queries. In the case of NH, those queries are expressed using its object query syntax. Ad-hoc SQL is bad in general, mostly because it violates the principle of predictability: a database is designed as well as the SQL that runs is optimized for it. Running any kind of SQL against a statically built database can lead to undesirable consequences. And that represents high impact risk. That risk becomes high probability risk when the architecture doesn't allow for an efficient quality review process to mitigate it.

So, is all SQL generation bad? Absolutely not. But there needs to be a way to guarantee the optimal performance of the database. Certain types of SQL statements are "safe", mostly the ones involved with object graph traversal. Everything else needs to be statically compiled. Allowing SQL statements outside of the optimized and approved set should simply not be allowed.

However, most ORM frameworks violate this rule. They do, in fact, support programmable query construction, entity retrieval based on ad-hoc criteria etc. They are simply overly generic and flexible. There's nothing wrong with the ORM pattern. It is the most common data access pattern in use today. It's just used poorly. ORM is not, as some people think, a way to work with the database without writing a line of SQL.

So, what can we do to address these concerns?

When used a certain way, ORM frameworks can be very powerful. They can cut the development time, improve consistency, therefore quality, and they promote clean design by being themselves designed based on SOLID principles.

Here are a few tips on how to avoid the pitfalls:
  1. Keep it constrained to the data access layer. Use it as a data access layer, not a domain.
  2. Limit ad-hoc queries. Concentrate on where the payoff is highest, boilerplate object traversal. Code more complicated SQL by hand.
  3. Review all query plans for all possible SQL statements that can execute by writing integration tests, targeted at the data access methods. Try to achieve near 100% coverage.
  4. Don't forget to build domains on top of the data access. Apply SOLID principles and prevent people from calling into the data access layer directly by hiding it internally.
One last thing is, if you're considering dynamic runtime schema optimization, don't! It's a migration nightmare. No project manager in their right mind would allow it.

No comments:

Post a Comment