Friday, May 6, 2011

My Beef with Databases (Part 2)

This is the second in a series of blog posts about my beef with relational databases today, and some musings on how we could make databases better.

I have a bone to pick with the databases of today. It has developed because of the frustrations I've had in writing, deploying, and interacting with databases over the years. In fact, I have three specific issues I want to talk about, and perhaps, come up with a few ideas to make databases great.

Last time, I described my number one issue, the code separation that databases enforce. That is the biggest issue primarily because the other two are related. So let's review my second biggest beef with databases as they exist today.

My data definition is not in my repository

Which, to a point, isn't completely accurate assessment of the situation. The definition of my data is in two places. Two very different, very disjointed places. And there is no mechanism for the two to know that the definition is different. This issue is exaggerated in the presence of ORMs or other tools that need to or want to understand the structure of the data in the database and depend on the programmer to describe the data definition.

Then the question becomes who is the authority on the data definition. On the theoretical side, the database is in charge of the definition and structure of the database. On the other side, the database is a tool for the application so the definition and structure should stay in the application. I'm a firm believer that is the proper case. My application should define all behaviors, structures and definitions.

When our applications connect to the database, both assumes that they are talking about the same data model. The reality is that often that is not the case. What's even worse is when the database is just subtly different, which means things will appear to happen correctly when it's not, potentially even corrupting data.

I can appreciate that this is a difficult problem in practice. That doesn't mean the problem should be ignored. Quite the opposite. My data is vital, which is why I use a database and not some mish-mash of files and /dev/null. I could accept the issue if I could use always use create scripts and not be forced to use a create script and then a series of alter scripts. Or, maybe even I could give it a create script, and it tells me what's different. That'd also be cool. But that doesn't happen. No one, to my knowledge, does that.

As I was describing this issue to my friend, he asks about object or document data stores, like what GAE uses or MongoDB. These things are cool, and to a degree they solve this issue, but in the general sense, they solve it by not enforcing data structure, making data integrity your responsibility. And that's entirely true, data integrity IS my responsibility. But I think that without the right tools, that responsibility becomes overly difficult and cumbersome. And when things are difficult, programmers are among the first to become lazy and ignore the problem. Ignore it until something goes wrong. And it will eventually go wrong.

I want to ask, why can't my application define my database?

No comments:

Post a Comment