Google Pregel's paper is finally out. But what is Pregel? It has been mentioned in many posts talking about NoSQL, GrabhDBs, Big Data, even the Facebook OpenGraph, so it looks like, apart of the hype fuzz, there's a little confusion about what it is, what it does, what it's good for and what it certainly is not. Let's start with the latter.
Google Pregel is not a database (neither RDBMS nor NoSQL), no key-value store or any new means of storing data (big or small it might be). Putting it in the same lists with GraphDBs like Neo4j, HyperGraphDB or even Twitter's FlockDB is somehow like putting MapReduce in the NoSQL group.
GraphDBs are storage systems that use graph representations for data where each node represents an entity with unique ids, type and properties. An arc represents a relationship between two nodes and itself can have a type and properties. Think of a GraphDB as a RDBMS where instead of tables you have a graph. It's Semantic Web's triple stores brought to general purpose.
Why would you use a GraphDB? Well, as you can describe your data in terms of entities and relationship, you're able to avoid defining a schema as we know it. It's a smaller step towards schema-less representation of data that actual NoSQLs provide.
Informally, you can describe your data with ER diagrams without translating them into tables, keeping the representation dynamic and avoiding the costs of schema redefinition in RDBMs. Plus, it's very efficient and easy to write queries that allow you to get, for example, all the followers of a user, all the users he follows, the items related to him (tweets he wrote?) and maybe the users connected to his items (users retweeting or reply his tweets?) in one go. That's basically what Twitter wants to achieve with FlockDB, but with a general GraphDB you can describe your data and the relationships between your data as you wish.
You store data in a GraphDB and you recall it in an easy and efficient way. So what's Pregel good for? What if if you want to mine the data in the graph (i.e. Google's Pagerank, Facebook's social network analysis, Twitter's retweeting/authority analysis)? Google reports that 80% of their distributed computation is based on MapReduce (Google Maps, Search Indexing, clustering in Google News, reports of Google Trends, Google Translate etc.) so we can only guess that the rest 20% is based on Pregel and the authors report they can work with graphs of the size of billions of vertices. Plus, implementing Pagerank is just about 15 lines of code...
That's what Pregel is for. So what is it? Pregel is a system for large-scale graph processing. It provides a fault-tolerant framework for the execution of graph algorithms in parallel over many machines. Think of it as MapReduce re-thought for graph operations.
But what's wrong with MapReduce and graph algorithms? Nothing particularly, though it can lead to suboptimal performance because the graph state has to be passed from one phase to the other generating a lot of I/O, but in general we can say it has some usability issues as it doesn't provide a way to do any per-vertex calculation. In general, it's not easy to express graph algorithms in M/R. Pregel fills a gap as there are no frameworks for graph processing that address both distributability and fault-tolerance.
Pregel's architecture is inspired by the Bulk Synchronous Parallel model introduced by Valiant. BSP is a computational model for the execution of parallel algorithms on top of multiple sequential Von Neumann machines. It gives an abstraction, just like M/R, that allows the programmer to think about the parallel expression of his solution without the hassle of communication and memory allocation in a distributed system. Before we get into details I think two things have to be underlined.
First, again like M/R, although the model is used by Google to distribute computation among multiple computers that's not necessary, in principle BSP fits parallel programming on SMP or NUMA machines and mainframes.
Second, although the model is used by Google to distribute graph processing, BSP can be used to distribute other kind of algorithms like matrix manipulation, just like M/R.
Ok, how does BSP work? I'll take the diagram and snippet from the BSP page from Wikipedia:
“A BSP computer consists of processors connected by a communication network. Each processor has a fast local memory, and may follow different threads of computation.
A BSP computation proceeds in a series of global supersteps. A superstep consists of three ordered stages:
- Concurrent computation: Several computations take place on every participating processor. Each process only uses values stored in the local memory of the processor. The computations are independent in the sense that they occur asynchronously of all the others.
- Communication: At this stage, the processes exchange data between themselves.
- Barrier synchronisation: When a process reaches this point (the barrier), it waits until all other processes have finished their communication actions.
The figure below shows this in a diagrammatic form. The processes are not regarded as having a particular linear order (from left to right or otherwise), and may be mapped to processors in any way.”
Ok, basically at every superstep every processor executes the same algorithm on its data: its state and the incoming messages. At superstep t every processor will work on its state, which is the result of its computation at superstep t-1, and the messages sent to him at superstep t-1. As a result of the superstep t computation the processor will send messages to other processors and these messages will be the incoming messages at superstep t+1. And the cycle goes on. The barrier synchronisation is the moment where t gets to be t+1.
It is easy to see that each computation should take approximately the same amount of time, otherwise a long lasting computation will force the others to wait idle.
How does Pregel implement BSP? Quoting Pregel's original paper: “The input to a Pregel computation is a directed graph in which each vertex is uniquely identiﬁed by a string vertex identiﬁer. Each vertex is associated with a modiﬁable, user deﬁned value. The directed edges are associated with their source vertices, and each edge consists of a modiﬁable, user deﬁned value and a target vertex identiﬁer.
A typical Pregel computation consists of input, when the graph is initialized, followed by a sequence of supersteps separated by global synchronization points until the algorithm terminates, and ﬁnishing with output.
Within each superstep the vertices compute in parallel, each executing the same user-deﬁned function that expresses the logic of a given algorithm. A vertex can modify its state or that of its outgoing edges, receive messages sent to it in the previous superstep, send messages to other vertices (to be received in the next superstep), or even mutate the topology of the graph. Edges are not ﬁrst-class citizens in this model, having no associated computation.
Algorithm termination is based on every vertex voting to halt. In superstep 0, every vertex is in the active state; all active vertices participate in the computation of any given superstep. A vertex deactivates itself by voting to halt. This means that the vertex has no further work to do unless triggered externally, and the Pregel framework will not execute that vertex in subsequent supersteps unless it receives a message. If reactivated by a message, a vertex must explicitly deactivate itself again. The algorithm as a whole terminates when all vertices are simultaneously inactive and there are no messages in transit.”
The mapping between BSP and Pregel is very simple: each local computation of BSP maps to the user-defined function in Pregel and the communication often, but not necessarily, corresponds to edge connectivity between nodes. The barrier is defined by the halt voting of all the active nodes.
From the perspective of the API Pregel requires the implementation of the virtual Compute() method of the class Vertex. The class Vertex itself provides VoteToHalt(), SendMessageTo(), GetValue(), GetOutEdgeIterator() and const methods superstep() and vertex_id().
Like M/R, it provides the possibility to define Combiners in order to reduce message passing overhead by combining messages together where semantically possible. Like Sawzall Pregel provides Aggregators which allow global communication by receiving messages from multiple vertices, combining them and sending the result back to the vertices. They are useful for statistics (think of an histogram of vertex degrees) or for global controlling (for example an aggregator can collect all the vertices' PageRank deltas to calculate the convergence condition).
From the perspective of architecture, Pregel follows a master/worker architecture, like most of the other Google frameworks. The master is responsible of partitioning the graph with a hash function based on the vertex ID (like hash(ID) mod #partitions although a topology-aware partitioner might be able to minimize communication between workers by keeping messages intra-machine) but doesn't compute any partition. At the beginning of computation the workers subscribe to the computation to the master.
Once the graph is partitioned and the partitions are assigned to workers, the master issues the start of the superstep. Each worker loops through all his active vertices, calling the Compute() method and delivering the messages collected in the previous superstep. The new messages are delivered before the end of the superstep, right before telling the master the list of active vertices for the next superstep.
After the computation halts the master might ask the workers to dump their graph partition to disk.
At the moment there are no projects that handle such computational power over graphs. Parallel BGL and CGMLib can handle parallel processing on graphs but don't scale to this size and is not fault-tolerant. Hama, an Apache incubated project, aims at developing a similar model to Pregel, but it's not complete yet.
So, where do I go now?