Somebody is going to hate me: NoSPARQL
If you have recently been attending one of those NoSQL/BigData conferences you will have noticed that graph databases are usually considered as the eccentric cousin that nobody understands. I believe there's a bit of confusion about the role of GraphDBs inside of the NoSQL ecosystem and I have tried to expose the idea during this presentation.
First, GraphDBs are far from new. As Leonid Libkin put it at one of his latest talks, "Meet the new data model, same as the old data model": "In the (very) old days, the world of databases was a big mess, dominated by the network (graph) and the hierarchical (tree) data models. Then Codd came, and the nice and clean relational model replaced all others. In addition to providing a steady employment to many logicians, it created a $20.000.000.000/year business. We didn't live in that paradise for too long though: less than 30 years later, the world came back to the hierarchical model (XML). And graph-structured data hasn't been dormant all those years, although it was much less visible than the relational and XML models. Alberto Mendelzon was the first to revisit the graph model back in the 80s, and we saw more activity over the past 10-15 years."
Second, I believe you need to look at modern GraphDBs within their own ecosystem to fully understand them. Just for sake of analogy, and definitely lacking originality, I like calling this ecosystem NoSPARQL. Before nowadays GraphDBs like Neo4J, OrientDB, DEX, InfiniteGraph etc., we already had, and still have, a usable technology to handle graphs: Triplestores. These databases are thought and built with an Algebra-of-Sets-based mindset that is very similar to the one used for traditional Relational Databases. We have sets of triples, we join them, we put indices over them and we write queries with a language that is a dialect (meaning that you can translate SPARQL to SQL queries keeping their semantics) of SQL: SPARQL. And they have the same drawbacks.
Now, if we define the current NoSQL ecosystem as group of technologies that avoid join-based operations and queries based on a descriptive language, you will see that NoSPARQL is not such a bad name after all (at least as an analogy!). GraphDBs don't avoid relations but they embrace them in a way that they are not a computational problem anymore, by making them explicit instead of implicit through joins. To re-phrase M.Rodriguez, Graph Databases are those databases that allow the access to related data efficiently (in constant time, compared to more expensive tree-based operations, like those on which relational databases' indices are based on). So: no joins at query-time, but direct links at storage level.
Also, they provide a different way of accessing the data through an API that is at the lowest level, something like get(key), put(key, value), delete(key) for the NoSQL datastores, instead of SPARQL. Ok, it's still possible to query GraphDBs through SPARQL, in the end it is a descriptive language to describe a graph traversal, the operation of graph exploration, but it is not the only way.
Tinkerpop is a community that is building and feeding this ecosystem, with:
- Blueprints: a jdbc of graph databases.
- Pipes: a dataflow framework using process graphs.
- Gremlin: a graph-based programming language.
- Rexter: a REST-full graph shell.
- And others.
Go and check them out.