Getting big wins with small teams on tight deadlines

Part of what we do at Real Kinetic is give companies confidence to ship software in the cloud. Many of our clients are large organizations that have been around for a long time but who don’t always have much experience when it comes to cloud. Others are startups and mid-sized companies who may have some experience, but might just want another set of eyes or are looking to mature some of their practices. Whatever the case, one of the things we frequently talk to our clients about is the value of both serverless and managed services. We have found that these are critical to getting big wins with small teams on tight deadlines in the cloud. Serverless in particular has been key to helping clients get some big wins in ways others didn’t think possible.

We often get pulled into a company to help them develop and launch new products in the cloud. These are typically high-profile projects with tight deadlines. These deadlines are almost always in terms of months, usually less than six. As a result, many of the executives and managers we talk to in these situations are skeptical of their team’s ability to execute on these types of timeframes. Whether it’s lack of cloud experience, operations and security concerns, compliance issues, staffing constraints, or some combination thereof, there’s always a reason as to why it can’t be done.

And then, some months later, it gets done.

Mental Model of the Cloud

The skepticism is valid. Often people’s mental model of the cloud is something like this:

A subset of typical cloud infrastructure concerns

More often than not, this is what cloud infrastructure looks like. In addition to what’s shown, there are other concerns. These include things like managing backups and disaster recovery, multi-zone or regional deployments, VM images, and reserved instances. It can be deceiving because simply getting an app running in this environment isn’t terribly difficult, and most engineers will tell you that—these are the “day-one” costs. But engineers don’t tend to be the best at giving estimates while still undervaluing their own time. The minds of most seasoned managers, however, will usually go to the “day-two” costs—what are the ongoing maintenance and operations costs, the security and compliance considerations, and the staffing requirements? This is why we consistently see so much skepticism. If this is also your initial foray into the cloud, that’s a lot of uncertainty! A manager’s job, after all, is to reduce uncertainty.

We’ve been there. We’ve also had to manage those day-two costs. I’ve personally gone through the phases of building a complex piece of software in the cloud, having to maintain one, having to manage a team responsible for one, and having to help a team go through the same process as an outside consultant. Getting that perspective has helped me develop an appreciation for what it really means to ship software. It’s why we like to take a different tack at Real Kinetic when it comes to cloud.

We are big on picking a cloud platform and going all-in on it. Whether it’s AWS, GCP, or Azure—pick your platform, embrace its capabilities, and move on. That doesn’t mean there isn’t room to use multiple clouds. Some platforms are better than others in different areas, such as data analytics or machine learning, so it’s wise to leverage the strengths of each platform where it makes sense. This is especially true for larger organizations who will inevitably span multiple clouds. What we mean by going “all-in” on a platform, particularly as it relates to application development, is sidestepping the trap that so many organizations fall into—hedging their bets. For a variety of reasons, many companies will take a half measure when adopting a cloud platform by avoiding things like managed services and serverless. Vendor lock-in is usually at the top of their list of concerns. Instead, they end up with something akin to the diagram above, and in doing so, lose out on the differentiated benefits of the platform. They also incur significantly more day-two costs.

The Value and Cost of Serverless

We spend a lot of time talking to our clients about this trade-off. With managers, it usually resonates when we ask if they want their people focusing on shipping business value or doing commodity work. With engineers, architects, or operations folks, it can be more contentious. On more than a few occasions, we’ve talked clients out of using Kubernetes for things that were well-suited to serverless platforms. Serverless is not the right fit for everything, but the reality is many of the workloads we encounter are primarily CRUD-based microservices. These can be a good fit for platforms like AWS Lambda, Google App Engine, or Google Cloud Run. The organizations we’ve seen that have adopted these services for the correct use cases have found reduced operations investment, increased focus on shipping things that matter to the business, accelerated delivery of new products, and better cost efficiency in terms of infrastructure utilization.

If vendor lock-in is your concern, it’s important to understand both the constraints and the trade-offs. Not all serverless platforms are created equal. Some are highly opinionated, others are not. In the early days, Google App Engine was highly opinionated, requiring you to use its own APIs to build your application. This meant moving an application built on App Engine was no small feat. Today, that is no longer the case; the new App Engine runtimes allow you to run just about any application. Cloud Run, a serverless container platform, allows you to deploy a container that can run anywhere. The costs are even less. On the other hand, using a serverless database like Cloud Firestore or DynamoDB requires using a proprietary API, but APIs can be abstracted.

In order to decide if the trade-off makes sense, you need to determine three things:

What is the honest likelihood you’ll need to move in the future?
What are the switching costs—the amount of time and effort needed to move?
What is the value you get using the solution?

These are not always easy things to determine, but the general rule is this: if the value you’re getting offsets the switching costs times the probability of switching—and it often does—then it’s not worth trying to hedge your bet. There can be a lot of hidden considerations, namely operations and development overhead and opportunity costs. It can be easy to forget about these when making a decision. In practice, vendor lock-in tends to be less about code portability and more about capability lock-in—think things like user management, Identity and Access Management, data management, cloud-specific features and services, and so forth. These are what make switching hard, not code.

Another concern we commonly hear with serverless is cost. In our experience, however, this is rarely an issue for appropriate use cases. While serverless can be more expensive in terms of cloud spend for some situations, this cost is normally offset by the reduced engineering and ongoing operations costs. Using serverless and managed services for the right things can be quite cost-effective. This may not always hold true, such as for large organizations who can negotiate with providers for committed cloud spend, but for many cases it makes sense.

Serverless isn’t just about compute. While people typically associate serverless with things like Lambda or Cloud Functions, it actually extends far beyond this. For example, in addition to its serverless compute offerings (Cloud Run, Cloud Functions, and App Engine), GCP has serverless storage (Cloud Storage, Firestore, and Datastore), serverless integration components (Cloud Tasks, Pub/Sub, and Scheduler), and serverless data and machine learning services (BigQuery, AutoML, and Dataflow). While each of these services individually offers a lot of value, it’s not until we start to compose them together in different ways where we really see the value of serverless appear.

Serverless vs. Managed Services

Some might consider the services I mentioned above “managed services”, so let me clarify that. We generally talk about “serverless” being the idea that the cloud provider fully manages and maintains the server infrastructure. This means the notion of “managed services” and “serverless” are closely related, but they are also distinct.

A serverless product is also managed, but not all managed services are serverless. That is to say, serverless is a subset of managed services.

Serverless means you stop thinking about the concept of servers in your architecture. This broadly encompasses words like “servers”, “instances”, “nodes”, and “clusters.” Continuing with our GCP example, these words would be associated with products like GKE, Dataproc, Bigtable, Cloud SQL, and Spanner. These services are decidedly not serverless because they entail some degree of managing and configuring servers or clusters, even though they are managed services.

Instead, you start thinking in terms of APIs and services. This would be things like Cloud Functions, Dataflow, BigQuery, Cloud Run, and Firestore. These have no servers or clusters. They are simply APIs that you interact with to build your applications. They are more specialized managed services.

Why does this distinction matter? It matters because of the ramifications it has for where we invest our time. Managing servers and clusters is going to involve a lot more operations effort, even if the base infrastructure is managed by the cloud provider. Much of this work can be considered “commodity.” It is not work that differentiates the business. This is the trade-off of getting more control—we take on more responsibility. In rough terms, the managed services that live outside of the serverless circle are going to be more in the direction of “DevOps”, meaning they will involve more operations overhead. The managed services inside the serverless circle are going to be more in the direction of “NoOps”. There is still work involved in using them, but the line of responsibility has moved upwards with the cloud provider responsible for more. We get less control over the infrastructure, but that means we can focus more on the business outcomes we develop on top of that infrastructure.

In fairness, it’s not always a black-and-white determination. Things can get a little blurry since serverless might still provide some degree of control over runtime parameters like memory or CPU, but this tends to be limited in comparison to managing a full server. There might also be some notion of “instances”, as in the case of App Engine, but that notion is much more abstract. Finally, some services appear to straddle the line between managed service and serverless. App Engine Flex, for instance, allows you to SSH into its VMs, but you have no real control over them. It’s a heavily sandboxed environment.

Why Serverless?

Serverless enables focusing on business outcomes. By leveraging serverless offerings across cloud platforms, we’ve seen product launches go from years to months (and often single-digit months). We’ve seen release cycles go from weeks to hours. We’ve seen development team sizes go from double digits to a few people. We’ve seen ops teams go from dozens of people to just one or two. It’s allowed these people to focus on more differentiated work. It’s given small teams of people a significant amount of leverage.

It’s no secret. Serverless is how we’ve helped many of our clients at Real Kinetic get big wins with small teams on tight deadlines. It’s not always the right fit and there are always trade-offs to consider. But if you’re not at least considering serverless—and more broadly, managed services—then you’re not getting the value you should be getting out of your cloud platform. Keep in mind that it doesn’t have to be all or nothing. Find the places where you can leverage serverless in combination with managed services or more traditional infrastructure. You too will be surprising and impressing your managers and leadership.

Follow @tyler_treat

September 17, 2019

What’s Going on with GKE and Anthos?

GCP’s Slippery Slide into Enterprise

When former Oracle exec Thomas Kurian took over for Diane Greene as Google Cloud’s CEO, a lot of people expressed concern about what this meant for the future of GCP. Vendor lock-in is already at the forefront of the minds of many cloud adopters, and Oracle is notorious for locking customers into expensive and prolonged contracts. However, I thought the move was smart on Google’s part.

Google has never been a customer-first company. While it has always been a technology leader, it struggles immensely with enterprise sales and support. It continues to have issues dogfooding its own products (Google’s products are typically built on internal versions of services not available to customers, then there are the external GCP versions that their customers actually use). This means its engineers don’t feel the same pain points that its customers experience and their products lose out on a critical feedback loop (contrast this with Amazon where AWS is treated as a separate company to Amazon.com, and there is a mandate to build with the same services Amazon’s customers use). Customer empathy matters.

Now, most people probably wouldn’t characterize Oracle as a customer-first company, but it knows how to meet customers where they are and to sell in a way that resonates with enterprise decision makers. Historically, Google has approached sales engineering in a way that has failed to resonate with customers by attempting to map its superior technology offerings onto actual customer problems. Nothing could be more off-putting to a decision maker with a round hole than a sales engineer with a square peg telling them their hole is wrong.

Thomas Kurian was brought in to address these glaring issues for Google Cloud. Through restructuring and growing its sales organization, key leadership hires, and strategic acquisitions and partnerships, it’s clear he’s serious about fixing Google Cloud’s enterprise perception problem. Slowly but surely, Google is attempting to shift its culture from being technology-obsessed to customer-obsessed. And while Oracle is notorious when it comes to vendor lock-in, all signs thus far have pointed to Google more strategically embracing open APIs with things like GKE (Kubernetes), Traffic Director (Istio), ML Engine (Tensorflow), and Dataflow (Apache Beam). They are also starting to meet customers where they are with things like Dataproc (Apache Spark and Hadoop), Memorystore (Redis), and Cloud SQL (MySQL, PostgreSQL, and Microsoft SQL Server). Hell, they’ll even run Microsoft Active Directory for you now! Who says Google can’t do enterprise? So the future is bright for GCP, right? Maybe. What follows is speculation based on my own observations and anecdotal information.

There’s one thing that could change the outlook on all of this: Anthos. Anthos is GCP’s answer to hybrid-cloud solutions like Pivotal Cloud Foundry (PCF), AWS Outposts, or Azure Stack. It allows organizations to build and manage workloads across public clouds and on-prem by extending GKE. If multi-cloud is your thing and you hate money, these platforms all sound like pretty good things. But here’s the disconcerting thing about Anthos in particular: it’s becoming clear that GCP is deliberately blurring the lines between Anthos and GKE.

I received an email yesterday from GCP announcing that Binary Authorization is now generally available (GA). Binary Authorization is a neat security feature that ensures only trusted container images can be deployed to GKE. It’s been in beta for some time and now it’s GA with a six-month free trial starting today. Great! How much will it cost after the trial? Contact your sales representative. Wait, what? That’s because starting on March 16, 2020, GKE clusters will need to be part of an Anthos-subscribed organization to enable Binary Authorization. If you choose not to upgrade to Anthos, starting March 16, 2020, you will not be able to turn on Binary Authorization on new clusters.

This is a slippery slope for GCP. I can already foresee other features requiring an Anthos subscription just to use them in GKE, where GKE basically becomes an Anthos subscription funnel. Which features go into Anthos and which go into GKE? Now this is something I’d come to expect from Oracle. If GCP starts to roll differentiating features into Anthos instead of GKE, it could mark the beginning of the end.

While the lines between Anthos and GKE are becoming increasingly fuzzy, Google is clear about this particular feature:

Binary Authorization is a feature of the Anthos platform and use of Binary Authorization is included in the Anthos subscription.

That wasn’t clear, however, when I started using it with GKE and started to advise clients to use it there, completely irrespective of Anthos. This sets a very dangerous precedent.

What’s more alarming is the marketing and product language on a number of GCP services and features have quietly replaced “GKE” with “Anthos” or, worse yet, “Anthos GKE.” For example, Cloud Run—which is still in beta—now says it can “run stateless containers on a fully managed environment or on Anthos.” Will I need an Anthos subscription to use Cloud Run with GKE once it goes GA? Based on the Binary Authorization move and the language updates, it seems likely. And looking at the GKE cluster setup wizard, it appears managed Istio might also.

Anthos features listed in GKE cluster setup wizard

Which of these features is going to require a subscription next? We know Binary Authorization already does.

Security features listed in GKE cluster setup wizard

And how much does Anthos even cost? Contact sales. Not a good look for Kurian’s vision of openness and customer choice. As AWS CEO Andy Jassy puts it, no longer does the process of buying technology involve the purchase of heavy proprietary software with multi-year contracts that include annual maintenance fees. Now it’s about choice and ease of use, including letting customers turn things off if they’re not working. But choice also means not bundling all of your differentiating features into a massive contract. List prices for Anthos start at $10,000 per month per 100 virtual CPUs with a minimum one-year commitment. This is just for the software layer. It doesn’t include any of the underlying GCP infrastructure. Again, fine for organizations willing to throw similar sums of money at things like PCF or Outposts, but are plain old GKE users really going to get roped in to this nonsense? Are they going to lose out on value-added features?

Either GCP has a well-thought-out strategy for GKE and Anthos (which, given Google’s history, is frankly unlikely) and is simply tone deaf to how it would be perceived by people already skittish about a former Oracle exec taking the reigns as CEO or this will end in disaster. It’s entirely possible this is all just a misunderstanding and they are, in a misguided fashion, rebranding GKE to Anthos (it’s been renamed once already and GCP has a history of rebranding existing products), but requiring a subscription hidden behind a sales contact form in order to use basic features is spooky.

My hope is that there is some longer-term strategy at play and GCP is not moving to an enterprise-subscription model for what should be GKE features. Best case, Google is just muddying the waters as they’ve done in the past. Worst case, they’re steamrolling their entire platform strategy to make way for enterprise sales. That would be tragic for Google given GKE is still by far and away the best managed Kubernetes service available. So what’s going on with GKE and Anthos?

Follow @tyler_treat

September 14, 2018September 14, 2018

Multi-Cloud Is a Trap

It comes up in a lot of conversations with clients. We want to be cloud-agnostic. We need to avoid vendor lock-in. We want to be able to shift workloads seamlessly between cloud providers. Let me say it again: multi-cloud is a trap. Outside of appeasing a few major retailers who might not be too keen on stuff running in Amazon data centers, I can think of few reasons why multi-cloud should be a priority for organizations of any scale.

A multi-cloud strategy looks great on paper, but it creates unneeded constraints and results in a wild-goose chase. For most, it ends up being a distraction, creating more problems than it solves and costing more money than it’s worth. I’m going to caveat that claim in just a bit because it’s a bold blanket statement, but bear with me. For now, just know that when I say “multi-cloud,” I’m referring to the idea of running the same services across vendors or designing applications in a way that allows them to move between providers effortlessly. I’m not speaking to the notion of leveraging the best parts of each cloud provider or using higher-level, value-added services across vendors.

Multi-cloud rears its head for a number of reasons, but they can largely be grouped into the following points: disaster recovery (DR), vendor lock-in, and pricing. I’m going to speak to each of these and then discuss where multi-cloud actually does come into play.

Disaster Recovery

Multi-cloud gets pushed as a means to implement DR. When discussing DR, it’s important to have a clear understanding of how cloud providers work. Public cloud providers like AWS, GCP, and Azure have a concept of regions and availability zones (n.b. Azure only recently launched availability zones in select regions, which they’ve learned the hard way is a good idea). A region is a collection of data centers within a specific geographic area. An availability zone (AZ) is one or more data centers within a region. Each AZ is isolated with dedicated network connections and power backups, and AZs in a region are connected by low-latency links. AZs might be located in the same building (with independent compute, power, cooling, etc.) or completely separated, potentially by hundreds of miles.

Region-wide outages are highly unusual. When they happen, it’s a high-profile event since it usually means half the Internet is broken. Since AZs themselves are geographically isolated to an extent, a natural disaster taking down an entire region would basically be the equivalent of a meteorite wiping out the state of Virginia. The more common cause of region failures are misconfigurations and other operator mistakes. While rare, they do happen. However, regions are highly isolated, and providers perform maintenance on them in staggered windows to avoid multi-region failures.

That’s not to say a multi-region failure is out of the realm of possibility (any more than a meteorite wiping out half the continental United States or some bizarre cascading failure). Some backbone infrastructure services might span regions, which can lead to larger-scale incidents. But while having a presence in multiple cloud providers is obviously safer than a multi-region strategy within a single provider, there are significant costs to this. DR is an incredibly nuanced topic that I think goes underappreciated, and I think cloud portability does little to minimize those costs in practice. You don’t need to be multi-cloud to have a robust DR strategy—unless, perhaps, you’re operating at Google or Amazon scale. After all, Amazon.com is one of the world’s largest retailers, so if your DR strategy can match theirs, you’re probably in pretty good shape.

Vendor Lock-In

Vendor lock-in and the related fear, uncertainty, and doubt therein is another frequently cited reason for a multi-cloud strategy. Beau hits on this in Stop Wasting Your Beer Money:

The cloud. DevOps. Serverless. These are all movements and markets created to commoditize the common needs. They may not be the perfect solution. And yes, you may end up “locked in.” But I believe that’s a risk worth taking. It’s not as bad as it sounds. Tim O’Reilly has a quote that sums this up:

“Lock-in” comes because others depend on the benefit from your services, not because you’re completely in control.

We are locked-in because we benefit from this service. First off, this means that we’re leveraging the full value from this service. And, as a group of consumers, we have more leverage than we realize. Those providers are going to do what is necessary to continue to provide value that we benefit from. That is what drives their revenue. As O’Reilly points out, the provider actually has less control than you think. They’re going to build the system they believe benefits the largest portion of their market. They will focus on what we, a player in the market, value.

Competition is the other key piece of leverage. As strong as a provider like AWS is, there are plenty of competing cloud providers. And while competitors attempt to provide differentiated solutions to what they view as gaps in the market they also need to meet the basic needs. This is why we see so many common services across these providers. This is all for our benefit. We should take advantage of this leverage being provided to us. And yes, there will still be costs to move from one provider to another but I believe those costs are actually significantly less than the costs of going from on-premise to the cloud in the first place. Once you’re actually on the cloud you gain agility.

The mental gymnastics I see companies go through to avoid vendor lock-in and “reasons” for multi-cloud always astound me. It’s baffling the amount of money companies are willing to spend on things that do not differentiate them in any way whatsoever and, in fact, forces them to divert resources from business-differentiating things.

Avoiding "vendor lock in" with AWS by building your own functions as a service on a container platform with a cloud provider that has *at best* 1/4th the revenue/customers/scale.

Bonus: you also get to avoid 100% utilization! pic.twitter.com/6hsbw7AftB

— xnoɹǝʃ uɐıɹq 💎 (@brianleroux) August 21, 2018

I think there are a couple reasons for this. First, as Beau points out, we have a tendency to overvalue our own abilities and undervalue our costs. This causes us to miscalculate the build versus buy decision. This is also closely related to the IKEA effect, in which consumers place a disproportionately high value on products they partially created. Second, as the power and influence in organizations has shifted from IT to the business—and especially with the adoption of product mindset—it strikes me as another attempt by IT operations to retain control and relevance.

Being cloud-agnostic should not be an important enough goal that it drives key decisions. If that’s your starting point, you’re severely limiting your ability to fully reap the benefits of cloud. You’re just renting compute. Platforms like Pivotal Cloud Foundry and Red Hat OpenShift tout the ability to run on every major private and public cloud, but doing so—by definition—necessitates an abstraction layer that abstracts away all the differentiating features of each cloud platform. When you abstract away the differentiating features to avoid lock-in, you also abstract away the value. You end up with vendor “lock-out,” which basically means you aren’t leveraging the full value of services. Either the abstraction reduces things to a common interface or it doesn’t. If it does, it’s unclear how it can leverage differentiated provider features and remain cloud-agnostic. If it doesn’t, it’s unclear what the value of it is or how it can be truly multi-cloud.

Not to pick on PCF or Red Hat too much, but as the major cloud providers continue to unbundle their own platforms and rebundle them in a more democratized way, the value proposition of these multi-cloud platforms begins to diminish. In the pre-Kubernetes and containers era—aka the heyday of Platform as a Service (PaaS)—there was a compelling story. Now, with the prevalence of containers, Kubernetes, and especially things like Google’s GKE and GKE On-Prem (and equivalents in other providers), that story is getting harder to tell. Interestingly, the recently announced Knative was built in close partnership with, among others, both Pivotal and Red Hat, which seems to be a play to capture some of the value from enterprise adoption of serverless computing using the momentum of Kubernetes.

But someone needs to run these multi-cloud platforms as a service, and therein lies the rub. That responsibility is usually dumped on an operations or shared-services team who now needs to run it in multiple clouds—and probably subscribe to a services contract with the vendor.

It is very difficult to keep up with the quirks, gotchas, roadmaps, schedules, contacts, and bugs of a single cloud vendor. Provisioning across multiple vendors is even trickier.

— Bennett (@yo_bennett) September 12, 2018

A multi-cloud deployment requires expertise for multiple cloud platforms. A PaaS might abstract that away from developers, but it’s pushed down onto operations staff. And we’re not even getting in to the security and compliance implications of certifying multiple platforms. For some companies who are just now looking to move to the cloud, this will seriously derail things. Once we get past the airy-fairy marketing speak, we really get into the hairy details of what it means to be multi-cloud.

There’s just less room today for running a PaaS that is not managed for you. It’s simply not strategic to any business. I also like to point out that revenues for companies like Pivotal and Red Hat are largely driven by services. These platforms act as a way to drive professional services revenue.

Generally speaking, the risk posed to businesses by vendor lock-in of non-strategic systems is low. For example, a database stores data. Whether it’s Amazon DynamoDB, Google Cloud Datastore, or Azure Cosmos DB—there might be technical differences like NoSQL, relational, ANSI-compliant SQL, proprietary, and so on—fundamentally, they just put data in and get data out. There may be engineering effort involved in moving between them, but it’s not insurmountable and that cost is often far outweighed by the benefits we get using them. Where vendor lock-in can become a problem is when relying on core strategic systems. These might be systems which perform actual business logic or are otherwise key enablers of a company’s business. As Joel Spolsky says, “If it’s a core business function—do it yourself, no matter what. Pick your core business competencies and goals, and do those in house.”

Pricing

Price competitiveness might be the weakest argument of all for multi-cloud. The reality is, as they commoditize more and more, all providers are in a race to the bottom when it comes to cost. Between providers, you will end up spending more in some areas and less in others. Multi-cloud price arbitrage is not a thing, it’s just something people pretend is a thing. For one, it’s wildly impractical. For another, it fails to account for volume discounts. As I mentioned in my comparison of AWS and GCP, it really comes down more to where you want to invest your resources when picking a cloud provider due to their differing philosophies.

And to Beau’s point earlier, the lock-in angle on pricing, i.e. a vendor locking you in and then driving up prices, just doesn’t make sense. First, that’s not how economies of scale work. And once you’re in the cloud, the cost of moving from one provider to another is dramatically less than when you were on-premise, so this simply would not be in providers’ best interest. They will do what’s necessary to capture the largest portion of the market and competitive forces will drive Infrastructure as a Service (IaaS) costs down. Because of the competitive environment and desire to capture market share, pricing is likely to converge. For cloud providers to increase margins, they will need to move further up the stack toward Software as a Service (SaaS) and value-added services.

Additionally, most public cloud providers offer volume discounts. For instance, AWS offers Reserved Instances with significant discounts up to 75% for EC2. Other AWS services also have volume discounts, and Amazon uses consolidated billing to combine usage from all the accounts in an organization to give you a lower overall price when possible. GCP offers sustained use discounts, which are automatic discounts that get applied when running GCE instances for a significant portion of the billing month. They also implement what they call inferred instances, which is bin-packing partial instance usage into a single instance to prevent you from losing your discount if you replace instances. Finally, GCP likewise has an equivalent to Amazon’s Reserved Instances called committed use discounts. If resources are spread across multiple cloud providers, it becomes more difficult to qualify for many of these discounts.

Where Multi-Cloud Makes Sense

I said I would caveat my claim and here it is. Yes, multi-cloud can be—and usually is—a distraction for most organizations. If you are a company that is just now starting to look at cloud, it will serve no purpose but to divert you from what’s really important. It will slow things down and plant seeds of FUD.

Some companies try to do build-outs on multiple providers at the same time in an attempt to hedge the risk of going all in on one. I think this is counterproductive and actually increases the risk of an unsuccessful outcome. For smaller shops, pick a provider and focus efforts on productionizing it. Leverage managed services where you can, and don’t use multi-cloud as a reason not to. For larger companies, it’s not unreasonable to have build-outs on multiple providers, but it should be done through controlled experimentation. And that’s one of the benefits of cloud, we can make limited investments and experiment without big up-front expenditures—watch out for that with the multi-cloud PaaS offerings and service contracts.

But no, that doesn’t mean multi-cloud doesn’t have a place. Things are never that cut and dry. For large enterprises with multiple business units, multi-cloud is an inevitability. This can be a result of product teams at varying levels of maturity, corporate IT infrastructure, and certainly through mergers and acquisitions. The main value of multi-cloud, and I think one of the few arguments for it, is leveraging the strengths of each cloud where they make sense. This gets back to providers moving up the stack. As they attempt to differentiate with value-added services, multi-cloud starts to become a lot more meaningful. Secondarily, there might be a case for multi-cloud due to data-sovereignty reasons, but I think this is becoming less and less of a concern with the prevalence of regions and availability zones. However, some services, such as Google’s Cloud Spanner, might forgo AZ-granularity due to being “globally available” services, so this is something to be aware of when dealing with regulations like GDPR. Finally, for enterprises with colocation facilities, hybrid cloud will always be a reality, though this gets complicated when extending those out to multiple cloud providers.

If you’re just beginning to dip your toe into cloud, a multi-cloud strategy should not be at the forefront of your mind. It definitely should not be your guiding objective and something that drives core decisions or strategic items for the business. It has a time and place, but outside of that, it’s just a fool’s errand—a distraction from what’s truly important.Follow @tyler_treat