Distributed Messaging with ZeroMQ

“A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.” -Leslie Lamport

With the increased prevalence and accessibility of cloud computing, distributed systems architecture has largely supplanted more monolithic constructs. The implication of using a service-oriented architecture, of course, is that you now have to deal with a myriad of difficulties that previously never existed, such as fault tolerance, availability, and horizontal scaling. Another interesting layer of complexity is providing consistency across nodes, which itself is a problem surrounded with endless research. Algorithms like Paxos and Raft attempt to provide solutions for managing replicated data, while other solutions offer eventual consistency.

Building scalable, distributed systems is not a trivial feat, but it pales in comparison to building real-time systems of a similar nature. Distributed architecture is a well-understood problem and the fact is, most applications have a high tolerance for latency. Few systems have a demonstrable need for real-time communication, but the few that do present an interesting challenge for developers. In this article, I explore the use of ZeroMQ to approach the problem of distributed, real-time messaging in a scalable manner while also considering the notion of eventual consistency.

The Intelligent Transport Layer

ZeroMQ is a high-performance asynchronous messaging library written in C++. It’s not a dedicated message broker but rather an embeddable concurrency framework with support for direct and fan-out endpoint connections over a variety of transports. ZeroMQ implements a number of different communication patterns like request-reply, pub-sub, and push-pull through TCP, PGM (multicast), in-process, and inter-process channels. The glaring lack of UDP support is, more or less, by design because ZeroMQ was conceived to provide guaranteed-ish delivery of atomic messages. The library makes no actual guarantee of delivery, but it does make a best effort. What ZeroMQ does guarantee, however, is that you will never receive a partial message, and messages will be received in order. This is important because UDP’s performance gains really only manifest themselves in lossy or congested environments.

The comprehensive list of messaging patterns and transports alone make ZeroMQ an appealing choice for building distributed applications, but it particularly excels due to its reliability, scalability and high throughput. ZeroMQ and related technologies are popular within high-frequency trading, where packet loss of financial data is often unacceptable1. In 2011, CERN actually performed a study comparing CORBA, Ice, Thrift, ZeroMQ, and several other protocols for use in its particle accelerators and ranked ZeroMQ the highest.

cern

ZeroMQ uses some tricks that allow it to actually outperform TCP sockets in terms of throughput such as intelligent message batching, minimizing network-stack traversals, and disabling Nagle’s algorithm. By default (and when possible), messages are queued on the subscriber, which attempts to avoid the problem of slow subscribers. However, when this isn’t sufficient, ZeroMQ employs a pattern called the “Suicidal Snail.” When a subscriber is running slow and is unable to keep up with incoming messages, ZeroMQ convinces the subscriber to kill itself. “Slow” is determined by a configurable high-water mark. The idea here is that it’s better to fail fast and allow the issue to be resolved quickly than to potentially allow stale data to flow downstream. Again, think about the high-frequency trading use case.

A Distributed, Scalable, and Fast Messaging Architecture

ZeroMQ makes a convincing case for use as a transport layer. Let’s explore a little deeper to see how it could be used to build a messaging framework for use in a real-time system. ZeroMQ is fairly intuitive to use and offers a plethora of bindings for various languages, so we’ll focus more on the architecture and messaging paradigms than the actual code.

About a year ago, while I first started investigating ZeroMQ, I built a framework to perform real-time messaging and document syncing called Zinc. A “document,” in this sense, is any well-structured and mutable piece of data—think text document, spreadsheet, canvas, etc. While purely academic, the goal was to provide developers with a framework for building rich, collaborative experiences in a distributed manner.

The framework actually had two implementations, one backed by the native ZeroMQ, and one backed by the pure Java implementation, JeroMQ2. It was really designed to allow any transport layer to be used though.

Zinc is structured around just a few core concepts: Endpoints, ChannelListeners, MessageHandlers, and Messages. An Endpoint represents a single node in an application cluster and provides functionality for sending and receiving messages to and from other Endpoints. It has outbound and inbound channels for transmitting messages to peers and receiving them, respectively.

endpoint

ChannelListeners essentially act as daemons listening for incoming messages when the inbound channel is open on an Endpoint. When a message is received, it’s passed to a thread pool to be processed by a MessageHandler. Therefore, Messages are processed asynchronously in the order they are received, and as mentioned earlier, ZeroMQ guarantees in-order message delivery. As an aside, this is before I began learning Go, which would make for an ideal replacement for Java here as it’s quite well-suited to the problem :)

Messages are simply the data being exchanged between Endpoints, from which we can build upon with Documents and DocumentFragments. A Document is the structured data defined by an application, while DocumentFragment represents a partial Document, or delta, which can be as fine- or coarse- grained as needed.

Zinc is built around the publish-subscribe and push-pull messaging patterns. One Endpoint will act as the host of a cluster, while the others act as clients. With this architecture, the host acts as a publisher and the clients as subscribers. Thus, when a host fires off a Message, it’s delivered to every subscribing client in a multicast-like fashion. Conversely, clients also act as “push” Endpoints with the host being a “pull” Endpoint. Clients can then push Messages into the host’s Message queue from which the host is pulling from in a first-in-first-out manner.

This architecture allows Messages to be propagated across the entire cluster—a client makes a change which is sent to the host, who propagates this delta to all clients. This means that the client who initiated the change will receive an “echo” delta, but it will be discarded by checking the Message origin, a UUID which uniquely identifies an Endpoint. Clients are then responsible for preserving data consistency if necessary, perhaps through operational transformation or by maintaining a single source of truth from which clients can reconcile.

cluster

One of the advantages of this architecture is that it scales reasonably well due to its composability. Specifically, we can construct our cluster as a tree of clients with arbitrary breadth and depth. Obviously, the more we scale horizontally or vertically, the more latency we introduce between edge nodes. Coupled with eventual consistency, this can cause problems for some applications but might be acceptable to others.

scalability

The downside is this inherently introduces a single point of failure characterized by the client-server model. One solution might be to promote another node when the host fails and balance the tree.

Once again, this framework was mostly academic and acted as a way for me to test-drive ZeroMQ, although there are some other interesting applications of it. Since the framework supports multicast message delivery via push-pull or publish-subscribe mechanisms, one such use case is autonomous load balancing.

Paired with something like ZooKeeper, etcd, or some other service-discovery protocol, clients would be capable of discovering hosts, who act as load balancers. Once a client has discovered a host, it can request to become a part of that host’s cluster. If the host accepts the request, the client can begin to send messages to the host (and, as a result, to the rest of the cluster) and, likewise, receive messages from the host (and the rest of the cluster). This enables clients and hosts to submit work to the cluster such that it’s processed in an evenly distributed way, and workers can determine whether to pass work on further down the tree or process it themselves. Clients can choose to participate in load-balancing clusters at their own will and when they become available, making them mostly autonomous. Clients could then be quickly spun-up and spun-down using, for example, Docker containers.

ZeroMQ is great for achieving reliable, fast, and scalable distributed messaging, but it’s equally useful for performing parallel computation on a single machine or several locally networked ones by facilitating in- and inter- process communication using the same patterns. It also scales in the sense that it can effortlessly leverage multiple cores on each machine. ZeroMQ is not a replacement for a message broker, but it can work in unison with traditional message-oriented middleware. Combined with Protocol Buffers and other serialization methods, ZeroMQ makes it easy to build extremely high-throughput messaging frameworks.

  1. ZeroMQ’s founder, iMatix, was responsible for moving JPMorgan Chase and the Dow Jones Industrial Average trading platforms to OpenAMQ []
  2. In systems where near real-time is sufficient, JeroMQ is adequate and benefits by not requiring any native linking. []

Bluetooth Blues

I spent the better part of two days working on Bluetooth connectivity for an Android app I’m developing. Going into it, I had virtually no experience working with Bluetooth, especially on Android. I quickly discovered some of the peculiarities of the platform’s Bluetooth API.

In addition to connecting to Bluetooth devices, the client wanted to pair and unpair from the app. The easy way out, and probably The Android Way™, would be to pass that responsibility off to the OS, à la an Intent:

This will bring up the Bluetooth settings menu, from which you can pair/unpair devices, but the problem is that it’s a complete context switch for the user—they are no longer in your application. I was looking to provide a more seamless experience so that the user didn’t have to leave the app at all to pair a device.

Device Discovery

The entry point for Bluetooth interaction in Android is through the BluetoothAdapter, which is used to orchestrate the device discovery process and fetch paired devices. Calling startDiscovery() will tell the adapter to start scanning for devices, and when one is found, an Intent will be fired off which can then be intercepted by a BroadcastReceiver.

The above code shows how the device discovery process is kicked off and how a BroadcastReceiver is registered to listen for discovery Intents. Note that the BroadcastReceiver is unregistered and discovery is canceled in onDestroy.

In order to react to discovery events, we must implement a BroadcastReceiver.

Device Pairing

Once you have a handle on the BluetoothDevice received in the BroadcastHandler, how do you actually pair with it? Looking at the documentation for the class, you’ll see that there are no methods for doing this. This is where things start to get a little strange.

Diving into the source code for BluetoothDevice, you’ll actually find that there is functionality for doing pairing and unpairing, but the methods are hidden from the API using the @hide annotation. What’s more interesting is that the methods are, in fact, public.

Evidently, device pairing is intended to be performed only by platform applications, which is a little curious considering the permission needed to perform pairing, android.permission.BLUETOOTH_ADMIN, is accessible by third-party applications. Nonetheless, this means we actually can pair a BluetoothDevice, just not in the way the Android engineers intended.

To access the BluetoothDevice methods needed, createBond and removeBond, we can use reflection.

The pairDevice method will prompt the user to enter a PIN for the discovered device, circumventing the need to open the Bluetooth settings. As such, the pairing does not actually complete until the correct PIN is entered. The boolean value returned from the method indicates whether the pairing process was successfully kicked off or not.

It goes without saying that this code, while functional, is volatile because these methods are technically not part of the public API, so they could change or disappear in future platform releases.

We can add an Intent filter to our BroadcastReceiver to listen for pairing events using the action BluetoothDevice.ACTION_BOND_STATE_CHANGED.

There are a few other hidden methods in BluetoothDevice, like cancelPairingUserInput, setPairingConfirmation, convertPinToBytes, and setPin, that you could potentially use to customize the pairing process or perform it programmatically, but use them at your own risk.

Once the devices are paired, they can be connected using one of BluetoothDevice’s createRfcommSocketToServiceRecord or createInsecureRfcommSocketToServiceRecord methods after determining the UUID to use, either with getUuids or fetchUuidsWithSdp (or, in most cases, using the well-known UUID 00001101-0000-1000-8000-00805F9B34FB).

It’s very likely that Android’s Bluetooth API is subject to change soon. It already has changed in some of the more recent releases, although I’m not entirely sure why Google isn’t providing a stable API for pairing. Jelly Bean 4.2 introduces a new Bluetooth stack, moving from BlueZ to a Broadcom solution, so my guess is that it’s related to this.

Implementing Spring-like Classpath Scanning in Android

One of the things that Spring 2.5 introduced back in 2007 was component scanning, a feature which removed the need for XML bean configuration and instead allowed developers to declare their beans using Java annotations. Rather than this:

We can do this:

It’s a pretty simple idea since Java makes it very easy to introspectively check a class’s annotations at runtime through its reflection API. Spring’s component scan feature also allows you to specify the base package(s) to scan for beans.

The big question is how do we get access to the classes in the classpath, specifically, those in the desired package? Java SE doesn’t provide an API for doing it, but there are ways to accomplish this. The most common (if not the only) approach is to load classes by relying on the file system. We know that we can use the ClassLoader to load a class by its package-qualified name, so it becomes a matter of retrieving the file names.

Getting the classpath itself in Java SE is easy:

This will yield something that looks like “/Users/Tyler/Workspace/Test/bin:/Users/Tyler/Workspace/Test/lib/gson-2.1.jar”. Loading the files from here is pretty straightforward, as is filtering on the package name since it maps to a directory one-to-one.

Another similar approach is to use the ClassLoader to load the resources directly:

Transition to Android

Unfortunately, these solutions don’t lend themselves to Android, which made implementing classpath scanning a little more difficult for Infinitum. The reason for this is, more or less, because of the way Android’s Dalvik VM is designed. When an Android application is compiled, the Dalvik bytecode is packaged into a file called “classes.dex” inside the APK. The good news is that the Android SDK provides an API for interacting with DEX files through the DexFile class.

In order to access classes.dex, we need a handle on the APK itself, which is actually quite easy to do:

The above code opens a DexFile for the running APK. Of course, this can have some performance implications. Opening the DexFile will potentially cause the VM to pass classes.dex through a process known as “dexopt”, which is a program that performs bytecode verification and optimization. This is an expensive process, but since we’re opening a DexFile for the APK itself, classes.dex should have already undergone this process, meaning dexopt won’t be run again.

The DexFile gives us access to the classes contained in classes.dex as an enumeration of strings representing the package-qualified class names. With this, we can iterate over the class names and load any which match the desired package.

This gets the job done, and it’s essentially how Infinitum accomplishes component scanning. However, it’s a very expensive operation. DexFile.entries() yields every class in the classpath — that is, every class in classes.dex — which includes not just application binaries, but also those of any libraries included.

It’s great that we can introspect every class in the classpath, but if we’re only interested in classes of a particular package, we’re out of luck. Every class is compiled into classes.dex and, short of decompiling it1,  there’s no way to pull out the classes we want without iterating over the entire classpath.

So, for now we settle with this somewhat inefficient solution. Nonetheless, it accomplishes what it needs to at the cost of maybe a few hundred milliseconds2, so maybe it’s not such a bad approach in the grand scheme of things.

  1. Tools for decompiling DEX files exist, such as Baksmali, but doing such a thing at runtime — if it’s even possible — would arguably not gain you any performance benefits. Still, this is something worth exploring. []
  2. On the emulator running on my MacBook Pro, the classpath scanning takes about 600 milliseconds, while on my Galaxy Nexus, it takes about 200 milliseconds. []

Modularizing Infinitum: A Postmortem

In addition to getting the code migrated from Google Code to GitHub, one of my projects over the holidays was to modularize the Infinitum Android framework I’ve been working on for the past year.

Infinitum began as a SQLite ORM and quickly grew to include a REST ORM implementation,  REST client, logging wrapper, DI framework, AOP module, and, of course, all of the framework tools needed to support these various functionalities. It evolved as I added more and more features in a semi-haphazard way. In my defense, the code was organized. It was logical. It made sense. There was no method, but there also was no madness. Everything was in an appropriately named package. Everything was coded to an interface. There was no duplicated code. However, modularity — in terms of minimizing framework dependencies — wasn’t really in mind at the time, and the code was all in a single project.

The Wild, Wild West

The issue wasn’t how the code was organized, it was how the code was integrated. The project was cowboy coding at its finest. I was the only stakeholder, the only tester, the only developer — judge, jury, and executioner. I was building it for my own personal use after all. Consequently, there was no planning involved, unit testing was somewhere between minimal and non-existent, and what got done was at my complete discretion. Ultimately, what was completed any given day, more or less, came down to what I felt like working on.

What started as an ORM framework became a REST framework, which became a logging framework, which became an IOC framework, which became an AOP framework. All of these features, built from the ground up, were tied together through a context, which provided framework configuration data. More important, the Infinitum context stored the bean factory used for storing and retrieving bean definitions used by both the framework and the client. The different modules themselves were not tightly coupled, but they were connected to the context like feathers on a bird.

infinitum-arch

The framework began to grow large. It was only about 300KB of actual code (JARed without ProGuard compression), but it had a number of library dependencies, namely Dexmaker, Simple XML, and GSON, which is over 1MB combined in size. Since it’s an Android framework, I wanted to keep the footprint as small as possible. Additionally, it’s likely that someone wouldn’t be using all of the features in the framework. Maybe they just need the SQLite ORM, or just the REST client, or just dependency injection. The way the framework was structured, they had to take it all or none.

A Painter Looking for a Brush

I began to investigate ways to modularize it. As I illustrated, the central problem lay in the fact that the Infinitum context had knowledge of all of the different modules and was responsible for calling and configuring their APIs. If the ORM is an optional dependency, the context should not need to have knowledge of it. How can the modules be decoupled from the context?

Obviously, there is a core dependency, Infinitum Core, which consists of the framework essentials. These are things used throughout the framework in all of the modules — logging, DI1, exceptions, and miscellaneous utilities. The goal was to pull off ORM, REST, and AOP modules.

My initial approach was to try and use the decorator pattern to “decorate” the Infinitum context with additional functionality. The OrmContextDecorator would implement the ORM-specific methods, the AopContextDecorator would implement the AOP-specific methods, and so on. The problem with this was that it would still require the module-specific methods to be declared in the Infinitum context interface. Not only would they need to be stubbed out in the context implementation, a lot of module interfaces would need to be shuffled and placed in Infinitum Core  in order to satisfy the compiler. The problem remained; the context still had knowledge of all the modules.

I had another idea in mind. Maybe I could turn the Infinitum context from a single point of configuration to a hierarchical structure where each module has its own context as a “child” of the root context. The OrmContext interface could extend the InfinitumContext interface, providing ORM-specific functionality while still inheriting the core context methods. The implementation would then contain a reference to the parent context, so if it was unable to perform a certain piece of functionality, it could delegate to the parent. This could work. The Infinitum context has no notion of module X, Y, or Z, and, in effect, the control has been inverted. You could call it the Hollywood Principle — “Don’t call us, we’ll call you.”

infinitum-context-hierarchy

There’s still one remaining question: how do we identify the “child” contexts and subsequently initialize them? The solution is to maintain a module registry. This registry will keep track of the optional framework dependencies and is responsible for initializing them if they are available. We use a marker class from each module, a class we know exists if the dependency is included in the classpath, to check its availability.

Lastly, we use reflection to instantiate an instance of the module context. I used an enum to maintain a registry of Infinitum modules. I then extended the enum to add an initialize method which loads a context instance.

The modules get picked up during a post-processing step in the ContextFactory. It’s this step that also adds them as child contexts to the parent.

New modules can be added to the registry without any changes elsewhere. As long as the context has been implemented, they will be picked up and processed automatically.

Once this architecture was in place, separating the framework into different projects was simple. Now Infinitum Core can be used by itself if only dependency injection is needed, the ORM can be included if needed for SQLite, AOP included for aspect-oriented programming, and Web for the RESTful web service client and various HTTP utilities.

We Shape Our Buildings, and Afterwards, Our Buildings Shape Us

I think this solution has helped to minimize some of the complexity a bit. As with any modular design, not only is it more extensible, it’s more maintainable. Each module context is responsible for its own configuration, so this certainly helped to reduce complexity in the InfinitumContext implementation as before it was handling the initialization for the ORM, AOP, and REST pieces. It also worked out in that I made the switch to GitHub2 by setting up four discrete repositories, one for each module.

In retrospect, I would have made things a lot easier on myself if I had taken a more modular approach from the beginning. I ended up having to reengineer quite a bit, although once I had a viable solution, it actually wasn’t all that much work. I was fortunate in that I had things fairly well designed (perhaps not at a very high level, but in general) and extremely organized. It’s difficult to anticipate change, but chances are you’ll be kicking yourself if you don’t. I started the framework almost a year ago, and I never imagined it would grow to what it is today.

  1. I was originally hoping to pull out dependency injection as a separate module, but the framework relies heavily on it to wire up components. []
  2. Now that the code’s pushed to GitHub, I begin the laborious task of migrating the documentation over from Google Code. []

The Importance of Being Idle

“Practice not-doing and everything will fall into place.”

It’s good to be lazy. Sometimes, in programming, it can also be hard to be lazy. It’s this paradox that I will explore today — The Art of Being Lazy. Specifically, I’m going to dive into a design pattern known as lazy loading by discussing why it’s used, the different flavors it comes in, and how it can be implemented.

Lazy loading is a pretty simple concept: don’t load something until you really need it. However, the philosophy can be generalized further: don’t do something until you need to do it. It’s this line of thinking that has helped lead to processes like Kanban and lean software development (and also probably got you through high school). Notwithstanding, this tenet goes beyond the organizational level. It’s about optimizing efficiency and minimizing waste. There’s a lot to be said about optimizing efficiency in a computer program, which is why The Art of Being Lazy is an exceedingly relevant principle.

They Don’t Teach You This in School

My first real job as a programmer was working as a contractor for Thomson Reuters.  I started as a .NET developer (having no practical experience with it whatsoever) working on a web application that primarily consisted of C# and ASP.NET. The project was an internal configuration management database, which is basically just a big database containing information pertaining to all of the components of an information system (in this case, Thomson’s West Tech network, the infrastructure behind their legal technology division).

This CMDB was geared towards providing application-impact awareness, which, more or less, meant that operations and maintenance teams could go in and see what applications or platforms would be affected by a server going down (hopefully for scheduled maintenance and not a datacenter outage), which business units were responsible for said applications, and who the contacts were for those groups. It also provided various other pieces of information pertaining to these systems, but what I’m getting at is that we were dealing with a lot of data, and this data was all interconnected. We had a very complex domain model with a lot of different relationships. What applications are running on what app servers? Which database servers do they depend on? What NAS servers have what NAS volumes mounted on them? The list goes on.

Our object graph was immense. You can imagine the scale of infrastructure a company like Thomson Reuters has. The crux of the problem was that we were persisting all of this data as well as the relationships between it, and we wanted to allow users of this software to navigate this vast hierarchy of information. Naturally, we used an ORM to help manage this complexity. Since we were working in .NET, and many of us were Java developers, we went with NHibernate.

We wanted to be able to load, say, an application server, and see all of the entities associated with it. To the uninitiated (which, at the time, would have included myself), this might seem like a daunting task. Loading any given entity would result in loading hundreds, if not thousands, of related entities because it would load those directly related, then those related to the immediate neighbors, continuing on in what seems like a never-ending cascade. Not only would it take forever, but we’d quickly run out of memory! There’s simply no way you can deal with an object graph of that magnitude and reasonably perform any kind of business logic on it. Moreover, it’s certainly not scalable, so obviously this would be a very naive thing to do. The good news is that, unsurprisingly,  it’s something that’s not necessary to do.

It’s Good to be Lazy

The solution, of course, as I’ve already hit you across the face with, is a design pattern known as lazy loading. The idea is to defer initialization of an object until it’s truly needed (i.e. accessed). Going back to my anecdote, when we load, for example, an application server entity, rather than eagerly loading all its associated entities, such as servers, applications, BIG-IPs, etc., we use placeholders. Those related entities are then loaded on-the-fly when they are accessed.

Lazy loading can be implemented in a few different ways, through lazy initialization, ghost objects, value holders, and dynamic proxies — each has its own trade-offs. I’ll talk about all of them, but I’m going to primarily focus on using proxies since it’s probably the most widely-used approach, especially within the ORM arena.

Lazy initialization probably best illustrates the concept of lazy loading. With lazy initialization, the object to be lazily loaded is represented by a special marker value (typically null) which indicates that the object has yet to be loaded. Every call to the object will first check to see if it has been loaded/initialized, and if it hasn’t, it gets loaded/initialized. Thus, the first call to the object will load it, while subsequent calls will not need to. The code below shows how this is done.

Ghost objects are simply entities that have been partially loaded, usually just having the ID populated so that the full object can be loaded later. This is very similar to lazy initialization. The difference is that the related entity is initialized but not populated.

A value holder is an object that takes the place of the lazily loaded object and is responsible for loading it. The value holder has a getValue method which does the lazy loading. The entity is loaded on the first call to getValue.

The above solutions get the job done, but their biggest problem is that they are pretty intrusive. The classes have knowledge that they are lazily loaded and require logic for loading. Luckily, there’s an option which helps to avoid this issue. Using dynamic proxies1, we can write an entity class which has no knowledge of lazy loading and yet still lazily load it if we want to.

This is possible because the proxy extends the entity class or, if applicable, implements the same interface, allowing it to intercept calls to the entity itself. That way, the object need not be loaded, but when it’s accessed, the proxy intercepts the invocation, loads the object if needed, and then delegates the invocation to it. Since proxying classes requires bytecode instrumentation, we need to use a library like Cglib.

First, we implement an InvocationHandler we can use to handle lazy loading.

Now, we can use Cglib’s Enhancer class to create a proxy.

Now, the first call to any method on foo will invoke loadObject, which in turn will load the object into memory. Cglib actually provides an interface for doing lazy loading called LazyLoader, so we don’t even need to implement an InvocationHandler.

ORM frameworks like Hibernate use proxies to implement lazy loading, which is one of the features we took advantage of while developing the CMDB application. One of the nifty things that Hibernate supports is paged lazy loading, which allows entities in a collection to be loaded and unloaded while it’s being iterated over. This is extremely useful for one-to-many and, in particular, one-to-very-many relationships.

Lazy loading was also one of the features I included in Infinitum’s ORM, implemented using dynamic proxies as well.2 At a later date, I may examine how lazy loading is implemented within the context of an ORM and how Infinitum uses it. It’s a very useful design pattern and provides some pretty significant performance optimizations. It just goes to show that sometimes being lazy pays off.

  1. For more background on proxies themselves, check out one of my previous posts. []
  2. Java bytecode libraries like Cglib are not compatible on the Android platform. Android uses its own bytecode variant. []

Dalvik Bytecode Generation

Earlier, I discussed the use of dynamic proxies and how they can be implemented in Java. As we saw, a necessary part of proxying classes is bytecode generation. From its onset, something I wanted to include in Infinitum was lazy loading. I also wanted to provide support for AOP down the road. Consequently, it was essential to include some way to generate bytecode at runtime.

The obvious choice would be to use a library like Cglib or Javassist, but sadly neither of those would work. That’s because Android does not use a Java VM, it uses its own virtual machine called Dalvik. As a result, Java source code isn’t compiled into Java bytecode (.class files), but rather Dalvik bytecode (.dex files). Since Cglib and Javassist are designed for Java bytecode manipulation, they do not work on the Android platform.1

What’s a programmer to do? Fortunately, some Googlers developed a new library for runtime code generation targeting the Dalvik VM called Dexmaker.

It has a small, close-to-the-metal API. This API mirrors the Dalvik bytecode specification giving you tight control over the bytecode emitted. Code is generated instruction-by-instruction; you bring your own abstract syntax tree if you need one. And since it uses Dalvik’s dx tool as a backend, you get efficient register allocation and regular/wide instruction selection for free.

Even better, Dexmaker provides an API for directly creating proxies called ProxyBuilder. If you followed my previous post on generating proxies, then using ProxyBuilder is a piece of cake. Similar to Java’s Proxy class, ProxyBuilder relies on an InvocationHandler to specify a proxy’s behavior.

Dexmaker enabled me to implement lazy loading and AOP within the Infinitum framework. It also opens up the possibility of using Mockito for unit testing in an Android environment because Mockito relies on proxies for generating mocks.2

  1. ASMDEX, a Dalvik-compatible bytecode-manipulation library was released in March 2012, meaning Cglib could, in theory, be ported to Android since it relies on ASM. []
  2. Infinitum is actually unit tested using Robolectric, which allows for testing Android code in a standard JVM. []

Proxies: Why They’re Useful and How They’re Implemented

I wanted to write about lazy loading, but doing so requires some background on proxies. Proxies are such an interesting and useful concept that I decided it would be worthwhile to write a separate post discussing them. I’ve talked about them in the past, for instance on StackOverflow, so this will be a bit of a rehash, but I will go into a little more depth here.

What is a proxy? Fundamentally, it’s a broker, or mediator, between an object and that object’s user, which I will refer to as its client. Specifically, a proxy intercepts calls to the object, performs some logic, and then (typically) passes the call on to the object itself. I say typically because the proxy could simply intercept without ever calling the object.

proxy

A proxy works by implementing an object’s non-final methods. This means that proxying an interface is pretty simple because an interface is merely a list of method signatures that need to be implemented. This facilitates the interception of method invocations quite nicely. Proxying a concrete class is a bit more involved, and I’ll explain why shortly.

Proxies are useful, very useful. That’s because they allow for the modification of an object’s behavior and do so in a way that’s completely invisible to the user. Few know about them, but many use them, usually without even being aware of it. Hibernate uses them for lazy loading, Spring uses them for aspect-oriented programming, and Mockito uses them for creating mocks. Those are just three (huge) use cases of many.

JDK Dynamic Proxies

Java provides a Proxy class which implements a list of interfaces at runtime. The behavior of a proxy is specified through an implementation of InvocationHandler, an interface which has a single method called invoke. The signature for the invoke method looks like the following:

The proxy argument is the proxy instance the method was invoked on. The method argument is the Method instance corresponding to the interface method invoked on the object.  The last argument, args, is an array of objects which consists of the arguments passed in to the method invocation, if any.

Each proxy has an InvocationHandler associated with it, and it’s this handler which is responsible for delegating method calls made on the proxy to the object being proxied. This level of indirection means that methods are not invoked on an object itself but rather on its proxy. The example below illustrates how an InvocationHandler would be implemented such that “Hello World” is printed to the console before every method invocation.

This is pretty easy to understand. The invoke method will intercept any method call by printing “Hello World” before delegating the invocation to the proxied object. It’s not very useful, but it does lend some insight into why proxies are useful for AOP.

An interesting observation is that invoke provides a reference to the proxy itself, meaning if you were to instead call the method on it, you would receive a StackOverflowError because it would lead to an infinite recursion.

Note that the InvocationHandler alone is of no use. In order to actually create a proxy, we need to use the Proxy class and provide the InvocationHandler. Proxy provides a static method for creating new instances called newProxyInstance. This method takes three arguments, a class loader, an array of interfaces to be implemented by the proxy, and the proxy behavior in the form of an InvocationHandler. An example of creating a proxy for a List is shown below.

The client invoking methods on the List can’t tell the difference between a proxy and its underlying object representation, nor should it care.

Proxying Classes

While proxying an interface dynamically is relatively straightforward, the same cannot be said for proxying a class. Java’s Proxy class is merely a runtime implementation of an interface or set of interfaces, but a class does not have to implement an interface at all. As a result, proxying classes requires bytecode manipulation. Fortunately, there are libraries available which help to facilitate this through a high-level API. For example, Cglib (short for code-generation library) provides a way to extend Java classes at runtime and Javassist (short for Java Programming Assistant) allows for both class modification and creation at runtime. It’s worth pointing out that Spring, Hibernate, Mockito, and various other frameworks make heavy use of these libraries.

Cglib and Javassist provide support for proxying classes because they can dynamically generate bytecode (i.e. class files), allowing us to extend classes at runtime in a way that Java’s Proxy can implement an interface at runtime.

At the core of Cglib is the Enhancer class, which is used to generate dynamic subclasses. It works in a similar fashion to the JDK’s Proxy class, but rather than using a JDK InvocationHandler, it uses a Callback for providing proxy behavior. There are various Callback extensions, such as InvocationHandler (which is a replacement for the JDK version), LazyLoader, NoOp, and Dispatcher.

This code is essentially the same as the earlier example in that every method invocation on the proxied object will first print “Hello World” before being delegated to the actual object. The difference is that MyClass does not implement an interface, so we didn’t need to specify an array of interfaces for the proxy.

Proxies are a very powerful programming construct which enables us to implement things like lazy loading and AOP. In general, they allow us to alter the behavior of objects transparently. In the future, I’ll dive into the specific use cases of lazy loading and AOP.

A Look at Spring’s BeanFactoryPostProcessor

One of the issues my team faced during my time at Thomson Reuters was keeping developer build times down. Many of the groups within WestlawNext had a fairly comprehensive check-in policy in that, after your code was reviewed, you had to run a full build which included running all unit tests and endpoint tests before you could commit your changes. This is a good practice, no doubt, but the group I was with had somewhere in the ballpark of 6000 unit tests. Moreover, since we were also testing our REST endpoints, it was necessary to launch an embedded Tomcat instance and deploy the application to it before those tests could execute.

Needless to say, build times could get pretty lengthy. I think I recall, at one point, it taking as long as 20 minutes to complete a full build. If a developer makes three commits in a day, that’s an hour of lost productivity. Extrapolate that out to a week and five hours are wasted, so you get the idea.

Of course, there were things we could do to cut down on that time — disabling the Cobertura and Javadoc Ant tasks for instance — but that only gets you so far. The annoying thing was that you typically had a Tomcat server running with the application already deployed, yet the build process started up a whole other instance in order to run the endpoint tests.

I explored the possibility of having endpoint tests run against a developer’s local server (or any server, in theory) by introducing a new property to the developer build properties file. It seems like a pretty simple concept: if the property doesn’t exist, run the tests normally by starting up an embedded Tomcat server. If it does exist, then simply route the HTTP requests to the specified host. Granted, it’s not going to significantly reduce the build time, but anything helps.

Unfortunately, it was not that simple. That’s because we couldn’t just run endpoint tests against the “live” app. The underlying issue was that our API, which we called ourselves from JavaScript and was also exposed to other consumers, relied on some other WestlawNext web services, such as user authentication and document services. We weren’t doing end-to-end integration testing, we were just testing our API. As a result, we used a separate Spring context which allowed the embedded Tomcat hook to deploy the application using client stubs in place of the actual web service clients.

So, things started to look a little moot. A developer would have to start their Tomcat server such that the client stub beans were registered with the Spring context in place of the normal client bean implementations. At the very least, it presented an interesting exercise. It was especially interesting because the client stubs were not part of the application’s classpath, they were separate from the app’s source and compiled to a bin-test directory.

Introducing the BeanFactoryPostProcessor

The solution I came up with was to implement one of Spring’s less glamorous (but still really neat) interfaces, the BeanFactoryPostProcessor. This interface provides a way for applications to modify their Spring context’s bean definitions before any beans get created. In my case, I needed to replace the client beans with their stub equivalents.

We start by implementing the interface, which has a single method, postProcessBeanFactory.

So the question is how do we implement registerClientStubBeans? This is the method that will overwrite the client beans in the application context, but in order to avoid the dreaded NoClassDefFoundError, we need to dynamically add the stub classes to the classpath.

The addClasspathDependencies method will add the stubs to the classpath, while getClientStubBeans will do just as its name suggests. I’ve also created a Bean class that will hold a bean name and its BeanDefinition. In order to register beans with the BeanFactory, we use the registerBeanDefinition method and pass in a bean name and corresponding BeanDefinition.

Let’s take a look at how we can add the stubs to the classpath at runtime.

It looks like there’s a lot going on here, but it’s actually not too bad. The addClasspathDependencies method is simply going to call addToClasspath to add some classes we need, which include the stubs in bin-test but also some libraries they rely on in the libs directory. The more interesting code is in the latter of the two methods, which is responsible for taking a File, which will be a .class file, and adding it to the classpath. We do that by getting the context ClassLoader and then, using reflection, we invoke the method “addURL” by passing in the .class URL we want to add.

Lastly, we need to implement the getClientStubBeans method, which returns a list of the bean definitions we want to register with the context.

Again, it’s a lot of code, but it’s not difficult to follow if you take it piece by piece. The getClientStubBeans method is going to get the directory in which the stubs classes are located and pass it to buildBeanDefinitions. This method iterates over each file, extracts the file name (e.g. “com/foo/client/stub/WebServiceClientStub.class”) and converts it into a fully-qualified class name (e.g. “com.foo.client.stub.WebServiceClientStub”). Since we already added the stubs to the classpath, the class is then loaded by this name. Once the class is loaded, we can check if it is indeed a stub by introspectively looking for the ClientStub annotation (this custom annotation makes a bean eligible for auto-detection and specifies a bean name). If it is a stub, we use Spring’s handy BeanDefinitionBuilder to build a BeanDefinition for the stub.

Now, when Spring initializes, it will detect this BeanFactoryPostProcessor and invoke its postProcessBeanFactory method, resulting in the client stubs being registered with the context in place of their respective implementations. It’s a pretty unique use case (and, frankly, not particularly useful for the given scenario), but it helps illustrate how the BeanFactoryPostProcessor interface can be leveraged.

Solving the Referential Integrity Problem

“A man with a watch knows what time it is. A man with two watches is never sure.”

I’ve been developing my open source Android framework, Infinitum, for the better part of 10 months now. It has brought about some really interesting problems that I’ve had to tackle, which is one of the many reasons I enjoy working on it so much.

Chicken or the Egg

Although it’s much more now, Infinitum began as an object-relational mapper which was loosely modeled after Hibernate. One of the first major issues I faced while developing the ORM component was loading object graphs. To illustrate what I mean by this, suppose we’re developing some software for a department store. The domain model for this software might look something like this:

As you can see, an Employee works in one Department, and, conversely, a Department has one or more Employees working in it, forming a many-to-one relationship and resulting in the class below.

Pretty straightforward, right? Now, let’s say we want to retrieve the Employee with, say, the ID 4028 from the database. Thinking about it at a high level and ignoring any notion of lazy loading, this appears to be rather simple.

1. Perform a query on the Employee table.

2. Instantiate a new Employee object.
3. Populate the Employee object’s fields from the query result.

But there’s some handwaving going on in those three steps, specifically the last one. One of the Employee fields is an entity, namely department. Okay, this shouldn’t be a problem. We just need to perform a second query to retrieve the Department associated with the Employee (the result of the first query is going to include the Department foreign key — let’s assume its 14).

Then we just create the Department object, populate it and assign it to the respective field in the Employee.

Once again, there’s a problem. To understand why, it’s helpful to see what the Department class actually looks like.

Do you see what the issue is? In order to construct our Employee, we need to construct his Department. In order to construct his Department, we need to construct the Employee. Our object graph has a cycle that’s throwing us for a (infinite) loop.

Breaking the Cycle

Fortunately, there’s a pretty easy solution for this chicken-or-the-egg problem. We’ll make use of a HashMap to keep tabs on our object graph as we incrementally build it. This will make more sense in just a bit.

We’re going to use a HashMap keyed off of an integer hash where the map values will be the entities in the object graph.

The integer hash will be a unique value computed for each entity we need to load to fulfill the object graph. The idea is that we will store the partially populated entity in the HashMap to have its remaining fields populated later. Loading an entity will take the following steps:

  1. Perform query on the entity table.
  2. Instantiate a new entity object.
  3. Populate the entity object fields which do not belong to a relationship from the query result.
  4. Compute the hash for the partial entity object.
  5. Check if the HashMap contains the computed hash.
  6. If the HashMap contains the hash, return the associated entity object (this breaks any potential cycle).
  7. Otherwise, store the entity object in the HashMap using the hash as its key.
  8. Load related entities by recursively calling this sequence.

Going back to our Employee problem, retrieving an Employee from the database will take these steps:

  1. Perform query on the Employee table.
  2. Instantiate a new Employee object.
  3. Populate the Employee object fields which do not belong to a relationship from the query result.
  4. Compute the hash for the partial Employee object.
  5. Check if the HashMap contains the computed hash (it won’t).
  6. Store the Employee object in the HashMap using the hash as its key.
  7. Perform query on the Department table.
  8. Instantiate a new Department object.
  9. Populate the Department object fields which do not belong to a relationship from the query results.
  10. Compute the hash for the partial Department object.
  11. Check if the HashMap contains the computed hash (again, it won’t).
  12. Store the Department object in the HashMap using the hash as its key.
  13. The cycle will terminate and the two objects in the HashMap, the Employee and the Department, will be fully populated and referencing each other.

Considering the HashMap is not specific to any entity type (i.e. it will hold Employees, Departments, and any other domain types we come up with), how do we compute a unique hash for objects of various types? Moreover, we’re computing hashes for incomplete objects, so what gives?

Obviously, we can’t make use of hashCode() since not every field is guaranteed to be populated. Fortunately, we can take advantage of the fact that every entity must have a primary key, but, unless we’re using a policy where every primary key is unique across every table, this won’t get us very far. We will include the entity type as a factor in our hash code. Here’s the code Infinitum currently uses to compute this hash:

This hash allows us to uniquely identify entities even if they have not been fully populated. Our cycle problem is solved!

Maintaining Referential Integrity

The term “referential integrity” is typically used to refer to a property of relational databases. However, when I say referential integrity, I’m referring to the notion of object references in an object graph. This referential integrity is something ORMs must keep track of or otherwise you run into some big problems.

To illustrate this, say our department store only has one department and two employees who work in said department (this might defeat the purpose of a department store, but just roll with it). Now, let’s say we retrieve one Employee, Bill, from the database. Once again ignoring lazy loading, this should implicitly load an object graph consisting of the Employee, the Department, and the Employees assigned to that Department. Next, let’s subsequently retrieve the second Employee, Frank, from the database. Again, this will load the object graph.

Bill and Frank both work in the same Department, but if referential integrity is not enforced, objects can become out of sync.

The underlying problem is that there are two different copies of the Department object, but we must abide by the Highlander Principle in that “there can be only one.” Bill and Frank should reference the same instance so that, regardless of how the Department is dereferenced, it stays synced between every object in the graph.

In plain terms, when we’re retrieving objects from the database, we must be cautious not to load the same one twice. Otherwise, we’ll have two objects corresponding to a single database row and things will get out of sync.

Enter Identity Map

This presents an interesting problem. Knowing what we learned earlier with regard to the chicken-or-the-egg problem, can we apply a similar solution? The answer is yes! In fact, the solution we discussed earlier was actually masquerading as a fairly common design pattern known as the Identity Map, originally cataloged by Martin Fowler in his book Patterns of Enterprise Application Architecture.

The idea behind the Identity Map pattern is that, every time we read a record from the database, we first check the Identity Map to see if the record has already been retrieved. This allows us to simply return a new reference to the in-memory record rather than creating a new object, maintaining referential integrity.

A secondary benefit to the Identity Map is that, since it acts as a cache, it reduces the number of database calls needed to retrieve objects, which yields a performance enhancement.

An Identity Map is normally tied to some sort of transactional context such as a session. This works exceedingly well for Infinitum because its ORM is built around the notion of a Session object, which  can be configured as a scoped unit of work. The Infinitum Session contains a cache which functions as an Identity Map, solving both the cycle and the referential integrity issues.

It’s worth pointing out, however, that while an Identity Map maintains referential integrity within the context of a session, it doesn’t do anything to prevent incongruities between different sessions. This is a complex problem that usually requires a locking strategy, which is beyond the scope of this blog post.

Under the Hood

It may be helpful to see how Infinitum uses an Identity Map to solve the cycle problem. The method createFromCursor takes a database result cursor and transforms it into an instance of the given type. It makes use of a recursive method that goes through the process I outlined earlier. The call to loadRelationships will result in this recursion.

Entities are stored in the Session cache as they are retrieved, allowing us to enforce referential integrity while also preventing any infinite loops that might occur while building up the object graph.

So that’s it! We’ve learned to make use of the Identity Map pattern to solve some pretty interesting problems. We looked at how we can design an ORM to load object graphs that contain cycles as well as maintain this critical notion of referential integrity. We also saw how the Identity Map helps to give us some performance gain through caching. Infinitum’s ORM module makes use of this pattern in its session caching and many other frameworks use it as well. In a future blog entry, I will talk about lazy loading and how it can be used to avoid loading large object graphs.