More Environments Will Not Make Things Easier

Microservices are hard. They require extreme discipline. They require a lot more upfront thinking. They introduce integration challenges and complexity that you otherwise wouldn’t have with a monolith, but service-oriented design is an important part of scaling organization structure. Hundreds of engineers all working on the same codebase will only lead to angst and the inability to be nimble.

This requires a pretty significant change in the way we think about things. We’re creatures of habit, so if we’re not careful, we’ll just keep on applying the same practices we used before we did services. And that will end in frustration.

How can we possibly build working software that comprises dozens of services owned by dozens of teams? Instinct tells us full-scale integration. That’s how we did things before, right? We ran integration tests. We run all of the services we depend on and develop our service against that. But it turns out, these dozen or so services I depend on also have their own dependencies! This problem is not linear.

Okay, so we can’t run everything on our laptop. Instead, let’s just have a development environment that is a facsimile of production with everything deployed. This way, teams can develop their products against real, deployed services. The trade-off is teams need to provide a high level of stability for these “development” services since other teams are relying on them for their own development. If nothing works, development is hamstrung. Personally, I think this is a pretty reasonable trade-off because if we’re disciplined enough, it shouldn’t be hard to provide stable APIs. In fact, if we’re disciplined, it should be a requirement. This is why upfront thinking is critical. Designing your APIs is the most important thing you do. Service-oriented architecture necessitates API-driven development. Literally nothing else matters but the APIs. It reminds me of the famous Jeff Bezos mandate:

  1. All teams will henceforth expose their data and functionality through service interfaces.

  2. Teams must communicate with each other through these interfaces.

  3. There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.

  4. It doesn’t matter what technology they use. HTTP, Corba, Pubsub, custom protocols – doesn’t matter. Bezos doesn’t care.

  5. All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.

  6. Anyone who doesn’t do this will be fired.

  7. Thank you; have a nice day!

If we’re not disciplined, maintaining stability in a development environment becomes too difficult. So naturally, the solution becomes doubling down—we just need more environments. If every team just gets its own full-scale environment to develop against, no more stability problems. We get to develop our distributed monolith happily in our own little world. That sound you hear is every CFO collectively losing their shit, but whatever, they’re nerds and we’ve gotta get this feature to production!

Besides the obvious cost implications to this approach, perhaps the more insidious problem is it will cause teams to develop in a vacuum. In and of itself, this is not an issue, but for the undisciplined team who is not practicing rigorous API-driven development, it will create moving goalposts. A team will spend months developing its product against static dependencies only to find a massive integration headache come production time. It’s pain deferral, plain and simple. That pain isn’t being avoided or managed, you’re just neglecting to deal with instability and integration to a point where it is even more difficult. It is the opposite of the “fail-fast” mindset. It’s failing slowly and drawn out.

“We need to run everything with this particular configuration to test this, and if anyone so much as sneezes my service becomes unstable.” Good luck with that. I’ve got a dirty little secret: if you’re not disciplined, no amount of environments will make things easier. If you can’t keep your service running in an integration environment, production isn’t going to be any easier.

Similarly, massive end-to-end integration tests spanning numerous services  are an anti-pattern. Another dirty little secret: integrated tests are a scam. With a big enough system, you cannot reasonably expect to write meaningful large-scale tests in any tractable way.

What are we to do then? With respect to development, get it out of your head that you can run a facsimile of production to build features against. If you need local development, the only sane and cost-effective option is to stub. Stub everything. If you have a consistent RPC layer—discipline—this shouldn’t be too difficult. You might even be able to generate portions of stubs.

We used Google App Engine heavily at Workiva, which is a PaaS encompassing numerous services—app server, datastore, task queues, memcache, blobstore, cron, mail—all managed by Google. We were doing serverless before serverless was even a thing. App Engine provides an SDK for developing applications locally on your machine. Numerous times I overheard someone who thought the SDK was just running a facsimile of App Engine on their laptop. In reality, it was running a bunch of stubs!

If you need a full-scale deployed environment, keep in mind that stability is the cost of entry. Otherwise, you’re just delaying problems. In either case, you need stable APIs.

With respect to integration testing, the only tractable solution that doesn’t lull you into a false sense of security is consumer-driven contract testing. We run our tests against a stub, but these tests are also included in a consumer-driven contract. An API provider runs consumer-driven contract tests against its service to ensure it’s not breaking any downstream services.

All of this aside, the broader issue is ensuring a highly disciplined engineering organization. Without this, the rest becomes much more difficult as pain-driven development takes hold. Discipline is a key part of doing service-oriented design and preventing things from getting out of control as a company scales. Moving to microservices means using the right tools and processes, not just applying the old ones in a new context.

A Look at Spring’s BeanFactoryPostProcessor

One of the issues my team faced during my time at Thomson Reuters was keeping developer build times down. Many of the groups within WestlawNext had a fairly comprehensive check-in policy in that, after your code was reviewed, you had to run a full build which included running all unit tests and endpoint tests before you could commit your changes. This is a good practice, no doubt, but the group I was with had somewhere in the ballpark of 6000 unit tests. Moreover, since we were also testing our REST endpoints, it was necessary to launch an embedded Tomcat instance and deploy the application to it before those tests could execute.

Needless to say, build times could get pretty lengthy. I think I recall, at one point, it taking as long as 20 minutes to complete a full build. If a developer makes three commits in a day, that’s an hour of lost productivity. Extrapolate that out to a week and five hours are wasted, so you get the idea.

Of course, there were things we could do to cut down on that time — disabling the Cobertura and Javadoc Ant tasks for instance — but that only gets you so far. The annoying thing was that you typically had a Tomcat server running with the application already deployed, yet the build process started up a whole other instance in order to run the endpoint tests.

I explored the possibility of having endpoint tests run against a developer’s local server (or any server, in theory) by introducing a new property to the developer build properties file. It seems like a pretty simple concept: if the property doesn’t exist, run the tests normally by starting up an embedded Tomcat server. If it does exist, then simply route the HTTP requests to the specified host. Granted, it’s not going to significantly reduce the build time, but anything helps.

Unfortunately, it was not that simple. That’s because we couldn’t just run endpoint tests against the “live” app. The underlying issue was that our API, which we called ourselves from JavaScript and was also exposed to other consumers, relied on some other WestlawNext web services, such as user authentication and document services. We weren’t doing end-to-end integration testing, we were just testing our API. As a result, we used a separate Spring context which allowed the embedded Tomcat hook to deploy the application using client stubs in place of the actual web service clients.

So, things started to look a little moot. A developer would have to start their Tomcat server such that the client stub beans were registered with the Spring context in place of the normal client bean implementations. At the very least, it presented an interesting exercise. It was especially interesting because the client stubs were not part of the application’s classpath, they were separate from the app’s source and compiled to a bin-test directory.

Introducing the BeanFactoryPostProcessor

The solution I came up with was to implement one of Spring’s less glamorous (but still really neat) interfaces, the BeanFactoryPostProcessor. This interface provides a way for applications to modify their Spring context’s bean definitions before any beans get created. In my case, I needed to replace the client beans with their stub equivalents.

We start by implementing the interface, which has a single method, postProcessBeanFactory.

So the question is how do we implement registerClientStubBeans? This is the method that will overwrite the client beans in the application context, but in order to avoid the dreaded NoClassDefFoundError, we need to dynamically add the stub classes to the classpath.

The addClasspathDependencies method will add the stubs to the classpath, while getClientStubBeans will do just as its name suggests. I’ve also created a Bean class that will hold a bean name and its BeanDefinition. In order to register beans with the BeanFactory, we use the registerBeanDefinition method and pass in a bean name and corresponding BeanDefinition.

Let’s take a look at how we can add the stubs to the classpath at runtime.

It looks like there’s a lot going on here, but it’s actually not too bad. The addClasspathDependencies method is simply going to call addToClasspath to add some classes we need, which include the stubs in bin-test but also some libraries they rely on in the libs directory. The more interesting code is in the latter of the two methods, which is responsible for taking a File, which will be a .class file, and adding it to the classpath. We do that by getting the context ClassLoader and then, using reflection, we invoke the method “addURL” by passing in the .class URL we want to add.

Lastly, we need to implement the getClientStubBeans method, which returns a list of the bean definitions we want to register with the context.

Again, it’s a lot of code, but it’s not difficult to follow if you take it piece by piece. The getClientStubBeans method is going to get the directory in which the stubs classes are located and pass it to buildBeanDefinitions. This method iterates over each file, extracts the file name (e.g. “com/foo/client/stub/WebServiceClientStub.class”) and converts it into a fully-qualified class name (e.g. “com.foo.client.stub.WebServiceClientStub”). Since we already added the stubs to the classpath, the class is then loaded by this name. Once the class is loaded, we can check if it is indeed a stub by introspectively looking for the ClientStub annotation (this custom annotation makes a bean eligible for auto-detection and specifies a bean name). If it is a stub, we use Spring’s handy BeanDefinitionBuilder to build a BeanDefinition for the stub.

Now, when Spring initializes, it will detect this BeanFactoryPostProcessor and invoke its postProcessBeanFactory method, resulting in the client stubs being registered with the context in place of their respective implementations. It’s a pretty unique use case (and, frankly, not particularly useful for the given scenario), but it helps illustrate how the BeanFactoryPostProcessor interface can be leveraged.