10 things never to do with a relational database

The data explosion demands new solutions, yet the hoary old RDBMS still rules. Here's where you really shouldn't use it.

I am a NoSQLer and a big data guy. That's a nice coincidence, because as you may have heard, data growth is out of control

Old habits die hard. The relational DBMS still reigns supreme. But even if you're a dyed-in-the-wool, Oracle-loving, PL/SQL-slinging glutton for the medieval RAC, think twice, think many times, before using your beloved technology for the following tasks.

[ If you aren't going to use an RDBMS, which freaking database should you use? | See InfoWorld's comparative review of NoSQL databases. | Keep up with the latest developer news with InfoWorld's Developer World newsletter. ]

1. Search: Even the most dedicated Oracle shops tend not to use Oracle Text, the extension Oracle bought for its database but doesn't seem to develop very actively. Instead, you see a lot of people using complicated queries that are heavy on like and or operators. The results for these are ugly and the capabilities are weak -- and the processes for getting the data just the way Oracle needs it are tough. Outside of Oracle, many other RDBMS products don't have real search extensions.

Use the likes of Hibernate Search, Apache Solr, or even Autonomy. Do it for the performance of a better-fitting index. Do it for the capabilities of full-text search.

2. Recommendations: This was the ugliest part of ATG Commerce and other commerce products I've worked with. They capture a lot of data about the user from which they try to make recommendations. Where I've worked, the recommendations capability was almost always turned off for scalability reasons.

Consider the social network. If I want to recommend socks to you because your friend or your friend's friend bought socks, that gets ugly in the RDBMS. We're talking self-joined tables and multiple levels of querying. This is like two lines of code in a graph database like Neo4j. You can work around the RDBMS by pre-flattening social networks and doing odd manipulations to the data, but you'll lose its real-time nature.

3. High-frequency trading: You would think that trading systems would love the RDBMS because the data is at least in part transactional, right? Wrong. High-frequency traders were among the first people to adopt and, in some cases, create NoSQL approaches. Low latency is king for HFT. Sure, if you jumped through serious hoops, you can achieve low latency with your RDBMS, but it really wasn't designed for it.

Oracle tried to answer this by buying TimesTen, which attempts to combine an in-memory database with an RDBMS, but if you staple a goose to a truck you don't get an airplane. Instead, we see the HFT crowd use key-value stores like Riak or more complex solutions like Gemfire.

4. Product cataloguing: This isn't the exciting stuff you hear about, but one of the first nightmare SQL queries I ever wrote was for mapping product data. When I worked for a mobile telephone manufacturer, this was for cellphones -- except "Model XYZ" could mean several different actual phones, and any one of those went under different names in different markets. The same model could have completely different components. Managing these "classes" of devices didn't flatten very well. This was very much the kind of thing for which I could have used a graph database like Neo4j.

I had a very similar problem when I worked for a chemical company. We did some very dumb string mapping that was pretty laborious. Had we kept the product information in a graph database, it would have been simple to map. Even a document database like CouchBase 2.0 or MongoDB would have been nicer.

5. Users/groups and ACLs: To some degree, LDAP was the original NoSQL database. LDAP was designed for users, groups, and ACLs, and it can fit the problem like a glove. Sadly, many people have LDAP hangovers from when the technology was newer, and companies did terrible and monstrous things with it. Some companies have also built such a bureaucracy around it that many developers had no choice but to cheat by creating a database table. This defeats the purpose of centralized user access control. The "users" and "roles" tables should go away in any enterprise environment.

6. Log analysis: If you need a good demonstration of this, turn on the log analysis features of Hadoop or RHQ/JBossON for a small cluster of servers. Set the log level and log capture to anything other than ERROR. Do something more complex and life will be very bad. See, this kind of somewhat unstructured data analysis is exactly what MapReduce à la Hadoop and languages like PIG are for. It's unfortunate that the major monitoring tools are RDBMS-specific -- they really don't need transactions, and low latency is job No. 1.

7. Media repository: It may be OK to store your metadata (though it probably would be better in a document database like Couchbase 2.0 or MongoDB), but BLOBs in an RDBMS are still a pain after all of these years. You're better off using some kind of distributed storage or clustered file system for your images and other binaries. Sadly, many CMS engines still shove everything into an RDBMS.

8. Email: I know this firsthand. After running a project that attempted to integrate email and an RDBMS, I discovered what many others already knew: Email really is moderately unstructured data with metadata that is best stored another way. We optimized the RDBMS as much as possible, doing crazy things for BLOBs and more. Ultimately, email is about metadata, search, and content, none of which lend themselves to relational algebra, and you really don't need transactions here. The file system is fine, and metadata would be better off in document database.

9. Classified ads: A high-scale, lots of users in and out, with mostly short-and-sweet content. Ask Craigslist who uses the document database MongoDB. There's search, there's metadata, there's short-sweet content. Eventual consistency would be good enough here. For these kinds of documents, the best thing a database can do is get out of your way.

10. Time-series/forecasting: This is the most general of the 10, but it takes many forms, from commodities to quants and sunspots to weather. The issues surrounding time in relational databases are the stuff of legend. Sure, it has been done, and sure, after years of hacking around it, for the last decade or so we have temporal fields and functions that are merely deficient rather than woefully inadequate in most RDBMS implementations. That said, if time is your subject, then a MapReduce-friendly column family store like Cassandra may be a better solution. Datastax has specifically targeted its Cassandra distribution to support time-series data, as have other vendors.

Can you use the RDBMS for some or many of these? Sure -- I have and people continue to. However, is it a good fit? Not really. I expect the cranky old men to disagree, but tradition alone is not a good reason to stick with the old way of doing things.

This article, "10 things never to do with a relational database," was originally published at InfoWorld.com. Read more of Andrew C. Oliver's Strategic Developer blog, and keep up on the latest developments in application development at InfoWorld.com For the latest business technology news, follow InfoWorld.com on Twitter.

This story, "10 things never to do with a relational database " was originally published by InfoWorld.