Distributed Messaging with ZeroMQ

“A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.” -Leslie Lamport

With the increased prevalence and accessibility of cloud computing, distributed systems architecture has largely supplanted more monolithic constructs. The implication of using a service-oriented architecture, of course, is that you now have to deal with a myriad of difficulties that previously never existed, such as fault tolerance, availability, and horizontal scaling. Another interesting layer of complexity is providing consistency across nodes, which itself is a problem surrounded with endless research. Algorithms like Paxos and Raft attempt to provide solutions for managing replicated data, while other solutions offer eventual consistency.

Building scalable, distributed systems is not a trivial feat, but it pales in comparison to building real-time systems of a similar nature. Distributed architecture is a well-understood problem and the fact is, most applications have a high tolerance for latency. Few systems have a demonstrable need for real-time communication, but the few that do present an interesting challenge for developers. In this article, I explore the use of ZeroMQ to approach the problem of distributed, real-time messaging in a scalable manner while also considering the notion of eventual consistency.

The Intelligent Transport Layer

ZeroMQ is a high-performance asynchronous messaging library written in C++. It’s not a dedicated message broker but rather an embeddable concurrency framework with support for direct and fan-out endpoint connections over a variety of transports. ZeroMQ implements a number of different communication patterns like request-reply, pub-sub, and push-pull through TCP, PGM (multicast), in-process, and inter-process channels. The glaring lack of UDP support is, more or less, by design because ZeroMQ was conceived to provide guaranteed-ish delivery of atomic messages. The library makes no actual guarantee of delivery, but it does make a best effort. What ZeroMQ does guarantee, however, is that you will never receive a partial message, and messages will be received in order. This is important because UDP’s performance gains really only manifest themselves in lossy or congested environments.

The comprehensive list of messaging patterns and transports alone make ZeroMQ an appealing choice for building distributed applications, but it particularly excels due to its reliability, scalability and high throughput. ZeroMQ and related technologies are popular within high-frequency trading, where packet loss of financial data is often unacceptable ((ZeroMQ’s founder, iMatix, was responsible for moving JPMorgan Chase and the Dow Jones Industrial Average trading platforms to OpenAMQ)). In 2011, CERN actually performed a study comparing CORBA, Ice, Thrift, ZeroMQ, and several other protocols for use in its particle accelerators and ranked ZeroMQ the highest.

cern

ZeroMQ uses some tricks that allow it to actually outperform TCP sockets in terms of throughput such as intelligent message batching, minimizing network-stack traversals, and disabling Nagle’s algorithm. By default (and when possible), messages are queued on the subscriber, which attempts to avoid the problem of slow subscribers. However, when this isn’t sufficient, ZeroMQ employs a pattern called the “Suicidal Snail.” When a subscriber is running slow and is unable to keep up with incoming messages, ZeroMQ convinces the subscriber to kill itself. “Slow” is determined by a configurable high-water mark. The idea here is that it’s better to fail fast and allow the issue to be resolved quickly than to potentially allow stale data to flow downstream. Again, think about the high-frequency trading use case.

A Distributed, Scalable, and Fast Messaging Architecture

ZeroMQ makes a convincing case for use as a transport layer. Let’s explore a little deeper to see how it could be used to build a messaging framework for use in a real-time system. ZeroMQ is fairly intuitive to use and offers a plethora of bindings for various languages, so we’ll focus more on the architecture and messaging paradigms than the actual code.

About a year ago, while I first started investigating ZeroMQ, I built a framework to perform real-time messaging and document syncing called Zinc. A “document,” in this sense, is any well-structured and mutable piece of data—think text document, spreadsheet, canvas, etc. While purely academic, the goal was to provide developers with a framework for building rich, collaborative experiences in a distributed manner.

The framework actually had two implementations, one backed by the native ZeroMQ, and one backed by the pure Java implementation, JeroMQ ((In systems where near real-time is sufficient, JeroMQ is adequate and benefits by not requiring any native linking.)). It was really designed to allow any transport layer to be used though.

Zinc is structured around just a few core concepts: Endpoints, ChannelListeners, MessageHandlers, and Messages. An Endpoint represents a single node in an application cluster and provides functionality for sending and receiving messages to and from other Endpoints. It has outbound and inbound channels for transmitting messages to peers and receiving them, respectively.

endpoint

ChannelListeners essentially act as daemons listening for incoming messages when the inbound channel is open on an Endpoint. When a message is received, it’s passed to a thread pool to be processed by a MessageHandler. Therefore, Messages are processed asynchronously in the order they are received, and as mentioned earlier, ZeroMQ guarantees in-order message delivery. As an aside, this is before I began learning Go, which would make for an ideal replacement for Java here as it’s quite well-suited to the problem :)

Messages are simply the data being exchanged between Endpoints, from which we can build upon with Documents and DocumentFragments. A Document is the structured data defined by an application, while DocumentFragment represents a partial Document, or delta, which can be as fine- or coarse- grained as needed.

Zinc is built around the publish-subscribe and push-pull messaging patterns. One Endpoint will act as the host of a cluster, while the others act as clients. With this architecture, the host acts as a publisher and the clients as subscribers. Thus, when a host fires off a Message, it’s delivered to every subscribing client in a multicast-like fashion. Conversely, clients also act as “push” Endpoints with the host being a “pull” Endpoint. Clients can then push Messages into the host’s Message queue from which the host is pulling from in a first-in-first-out manner.

This architecture allows Messages to be propagated across the entire cluster—a client makes a change which is sent to the host, who propagates this delta to all clients. This means that the client who initiated the change will receive an “echo” delta, but it will be discarded by checking the Message origin, a UUID which uniquely identifies an Endpoint. Clients are then responsible for preserving data consistency if necessary, perhaps through operational transformation or by maintaining a single source of truth from which clients can reconcile.

cluster

One of the advantages of this architecture is that it scales reasonably well due to its composability. Specifically, we can construct our cluster as a tree of clients with arbitrary breadth and depth. Obviously, the more we scale horizontally or vertically, the more latency we introduce between edge nodes. Coupled with eventual consistency, this can cause problems for some applications but might be acceptable to others.

scalability

The downside is this inherently introduces a single point of failure characterized by the client-server model. One solution might be to promote another node when the host fails and balance the tree.

Once again, this framework was mostly academic and acted as a way for me to test-drive ZeroMQ, although there are some other interesting applications of it. Since the framework supports multicast message delivery via push-pull or publish-subscribe mechanisms, one such use case is autonomous load balancing.

Paired with something like ZooKeeper, etcd, or some other service-discovery protocol, clients would be capable of discovering hosts, who act as load balancers. Once a client has discovered a host, it can request to become a part of that host’s cluster. If the host accepts the request, the client can begin to send messages to the host (and, as a result, to the rest of the cluster) and, likewise, receive messages from the host (and the rest of the cluster). This enables clients and hosts to submit work to the cluster such that it’s processed in an evenly distributed way, and workers can determine whether to pass work on further down the tree or process it themselves. Clients can choose to participate in load-balancing clusters at their own will and when they become available, making them mostly autonomous. Clients could then be quickly spun-up and spun-down using, for example, Docker containers.

ZeroMQ is great for achieving reliable, fast, and scalable distributed messaging, but it’s equally useful for performing parallel computation on a single machine or several locally networked ones by facilitating in- and inter- process communication using the same patterns. It also scales in the sense that it can effortlessly leverage multiple cores on each machine. ZeroMQ is not a replacement for a message broker, but it can work in unison with traditional message-oriented middleware. Combined with Protocol Buffers and other serialization methods, ZeroMQ makes it easy to build extremely high-throughput messaging frameworks.

Real-Time Client Notifications Using Redis and Socket.IO

Backbone.js is great for building structured client-side applications. Its declarative event-handling makes it easy to listen for actions in the UI and keep your data model in sync, but what about changes that occur to your data model on the server? Coordinating user interfaces for data consistency isn’t a trivial problem. Take a simple example: users A and B are viewing the same data at the same time, while user A makes a change to that data. How do we propagate those changes to user B? Now, how do we do it at scale, say, several thousand concurrent users? What about external consumers of that data?

One of our products at WebFilings called for real-time notifications for a few reasons:

  1. We needed to keep users’ view of data consistent.
  2. We needed a mechanism that would alert users to changes in the web client (and allow them to subscribe/unsubscribe to certain events).
  3. We needed notifications to be easily consumable (beyond the scope of a web client, e.g. email alerts, monitoring services, etc.).

I worked on developing a pattern that would address each of these concerns while fitting within our platform’s ecosystem, giving us a path of least resistance with maximum payoff.

Polling sucks. Long-polling isn’t much better. Server-Sent Events are an improvement. They provide a less rich API than the WebSocket protocol, which supports bi-directional communication, but they do have some niceties like handling reconnects and operating over traditional HTTP. Socket.IO provides a nice wrapper around WebSockets while falling back to other transport methods when necessary. It has a rich API with features like namespaces, multiplexing, and reconnects, but it’s built on Node.js, which means it doesn’t plug into our Python stack very easily.

The solution I decided on was a library called gevent-socketio, which is a Python implementation of the Socket.IO protocol built on gevent, making it incredibly simple to hook in to our existing Flask app. 

The gevent-socketio solution really only solves a small part of the overarching problem by providing a way to broadcast messages to clients. We still need a way to hook these messages in to our Backbone application and, more important, a way to publish and subscribe to events across threads and processes. The Socket.IO dispatcher is just one of potentially many consumers after all.

The other piece of the solution is to use Redis for its excellent pubsub capabilities. Redis allows us to publish and subscribe to messages from anywhere, even from different machines. Events that occur as a result of user actions, task queues, or cron jobs can all be captured and published to any interested parties as they happen. We’re already using Redis as a caching layer, so we get this for free. The overall architecture looks something like this:

pubsub

Let’s dive into the code.

Hooking gevent-socketio into our Flask app is pretty straightforward. We essentially just wrap it with a SocketIOServer.

The other piece is registering client subscribers for notifications:

NotificationsNamespace is a Socket.IO namespace we will use to broadcast notification messages. We use gevent-socketio’s BroadcastMixin to multicast messages to clients.

When a connection is received, we spawn a greenlet that listens for messages and broadcasts them to clients in the notifications namespace. We can then build a minimal API that can be used across our application to publish notifications.

Wiring notifications up to the UI is equally simple. To facilitate communication between our Backbone components while keeping them decoupled, we use an event-dispatcher pattern relying on Backbone.Events. The pattern looks something like this:

This pattern makes it trivial for us to allow our views, collections, and models to subscribe to our Socket.IO notifications because we just have to pump the messages into the dispatcher pipeline.

Now our UI components can subscribe and react to client- and server-side events as they see fit and in a completely decoupled fashion. This makes it very easy for us to ensure our client-side views and models are updated automatically while also letting other services consume these events.

Modularizing Infinitum: A Postmortem

In addition to getting the code migrated from Google Code to GitHub, one of my projects over the holidays was to modularize the Infinitum Android framework I’ve been working on for the past year.

Infinitum began as a SQLite ORM and quickly grew to include a REST ORM implementation,  REST client, logging wrapper, DI framework, AOP module, and, of course, all of the framework tools needed to support these various functionalities. It evolved as I added more and more features in a semi-haphazard way. In my defense, the code was organized. It was logical. It made sense. There was no method, but there also was no madness. Everything was in an appropriately named package. Everything was coded to an interface. There was no duplicated code. However, modularity — in terms of minimizing framework dependencies — wasn’t really in mind at the time, and the code was all in a single project.

The Wild, Wild West

The issue wasn’t how the code was organized, it was how the code was integrated. The project was cowboy coding at its finest. I was the only stakeholder, the only tester, the only developer — judge, jury, and executioner. I was building it for my own personal use after all. Consequently, there was no planning involved, unit testing was somewhere between minimal and non-existent, and what got done was at my complete discretion. Ultimately, what was completed any given day, more or less, came down to what I felt like working on.

What started as an ORM framework became a REST framework, which became a logging framework, which became an IOC framework, which became an AOP framework. All of these features, built from the ground up, were tied together through a context, which provided framework configuration data. More important, the Infinitum context stored the bean factory used for storing and retrieving bean definitions used by both the framework and the client. The different modules themselves were not tightly coupled, but they were connected to the context like feathers on a bird.

infinitum-arch

The framework began to grow large. It was only about 300KB of actual code (JARed without ProGuard compression), but it had a number of library dependencies, namely Dexmaker, Simple XML, and GSON, which is over 1MB combined in size. Since it’s an Android framework, I wanted to keep the footprint as small as possible. Additionally, it’s likely that someone wouldn’t be using all of the features in the framework. Maybe they just need the SQLite ORM, or just the REST client, or just dependency injection. The way the framework was structured, they had to take it all or none.

A Painter Looking for a Brush

I began to investigate ways to modularize it. As I illustrated, the central problem lay in the fact that the Infinitum context had knowledge of all of the different modules and was responsible for calling and configuring their APIs. If the ORM is an optional dependency, the context should not need to have knowledge of it. How can the modules be decoupled from the context?

Obviously, there is a core dependency, Infinitum Core, which consists of the framework essentials. These are things used throughout the framework in all of the modules — logging, DI ((I was originally hoping to pull out dependency injection as a separate module, but the framework relies heavily on it to wire up components.)), exceptions, and miscellaneous utilities. The goal was to pull off ORM, REST, and AOP modules.

My initial approach was to try and use the decorator pattern to “decorate” the Infinitum context with additional functionality. The OrmContextDecorator would implement the ORM-specific methods, the AopContextDecorator would implement the AOP-specific methods, and so on. The problem with this was that it would still require the module-specific methods to be declared in the Infinitum context interface. Not only would they need to be stubbed out in the context implementation, a lot of module interfaces would need to be shuffled and placed in Infinitum Core  in order to satisfy the compiler. The problem remained; the context still had knowledge of all the modules.

I had another idea in mind. Maybe I could turn the Infinitum context from a single point of configuration to a hierarchical structure where each module has its own context as a “child” of the root context. The OrmContext interface could extend the InfinitumContext interface, providing ORM-specific functionality while still inheriting the core context methods. The implementation would then contain a reference to the parent context, so if it was unable to perform a certain piece of functionality, it could delegate to the parent. This could work. The Infinitum context has no notion of module X, Y, or Z, and, in effect, the control has been inverted. You could call it the Hollywood Principle — “Don’t call us, we’ll call you.”

infinitum-context-hierarchy

There’s still one remaining question: how do we identify the “child” contexts and subsequently initialize them? The solution is to maintain a module registry. This registry will keep track of the optional framework dependencies and is responsible for initializing them if they are available. We use a marker class from each module, a class we know exists if the dependency is included in the classpath, to check its availability.

Lastly, we use reflection to instantiate an instance of the module context. I used an enum to maintain a registry of Infinitum modules. I then extended the enum to add an initialize method which loads a context instance.

The modules get picked up during a post-processing step in the ContextFactory. It’s this step that also adds them as child contexts to the parent.

New modules can be added to the registry without any changes elsewhere. As long as the context has been implemented, they will be picked up and processed automatically.

Once this architecture was in place, separating the framework into different projects was simple. Now Infinitum Core can be used by itself if only dependency injection is needed, the ORM can be included if needed for SQLite, AOP included for aspect-oriented programming, and Web for the RESTful web service client and various HTTP utilities.

We Shape Our Buildings, and Afterwards, Our Buildings Shape Us

I think this solution has helped to minimize some of the complexity a bit. As with any modular design, not only is it more extensible, it’s more maintainable. Each module context is responsible for its own configuration, so this certainly helped to reduce complexity in the InfinitumContext implementation as before it was handling the initialization for the ORM, AOP, and REST pieces. It also worked out in that I made the switch to GitHub ((Now that the code’s pushed to GitHub, I begin the laborious task of migrating the documentation over from Google Code.)) by setting up four discrete repositories, one for each module.

In retrospect, I would have made things a lot easier on myself if I had taken a more modular approach from the beginning. I ended up having to reengineer quite a bit, although once I had a viable solution, it actually wasn’t all that much work. I was fortunate in that I had things fairly well designed (perhaps not at a very high level, but in general) and extremely organized. It’s difficult to anticipate change, but chances are you’ll be kicking yourself if you don’t. I started the framework almost a year ago, and I never imagined it would grow to what it is today.