kubernetes – Brave New Geek

May 8, 2025

What is Koreo?

The platform engineering toolkit for Kubernetes

Last month we open sourced Koreo, our “platform engineering toolkit for Kubernetes.” Since then, we’ve seen a lot of interest from folks in the platform engineering and DevOps space. We’ve also gotten a lot of questions from people trying to better understand how Koreo fits into an already crowded landscape of Kubernetes tools. Koreo is a fairly complex tool, so it can be difficult to quickly grasp just what exactly it is, what problems it’s designed to solve, and how it compares to other, similar tools. In this post, I want to dive into these topics and also discuss the original motivation behind Koreo.

If you’ve read the Koreo website, you’ll know that we refer to it as “a new approach to Kubernetes configuration management and resource orchestration,” but what does this really mean? There are two parts to unpack here: configuration management and resource orchestration.

By far, the most common approach to configuration management in Kubernetes is Helm. Helm charts provide a convenient way to templatize and package up Kubernetes resources and make them configurable via values files. There are some challenges and limitations with Helm though. One is around how it works as a templating system. We talked about this in depth here so I won’t go into a ton of detail, but a challenge with Helm is when charts begin to grow beyond basic templating. Once you start to introduce logic and nested conditionals into your templates, they quickly become difficult to understand, maintain, and evolve. Helm is actually a pretty rudimentary approach because it really just treats configuration management as YAML string templating.

Kustomize can handle some of these cases better by letting you specify base resource manifests and then customize them through overlays and patches that are more structurally aware. However, this largely assumes that the configuration variations are static and don’t rely on computed values or business logic. In other words, as deployments grow in complexity, Kustomize’s reliance on static overlays can become cumbersome and limiting.

Helm and Kustomize are not resource orchestrators—they’re templating tools. If your resources have non-deterministic dependencies on each other, you’ll need to handle this sequencing yourself. For example, to create a VPC and then a Kubernetes cluster in AWS, you must first apply the VPC and then use something like Helm’s lookup function to retrieve its outputs (e.g. subnets) for use in templating the cluster. There are a lot of situations where the inputs of a resource require the outputs of a different resource. Some of these might be deterministic, perhaps like a CIDR block, but many are not. This dependency between resources is what I mean by resource orchestration. Most tools treat configuration management and resource orchestration as separate concerns: tools like Helm, Kustomize, Jsonnet, and Cue handle configuration management, while tools like Crossplane, Argo, and Kro address resource orchestration. But in practice, these responsibilities are often deeply intertwined. Especially in platform engineering, configuring resources almost always requires orchestrating them.

The Motivation Behind Koreo

Koreo was built to make platform engineers’ lives easier. We originally developed it out of a need to both configure and orchestrate cloud resources across GCP and AWS—using Config Connector and ACK—and integrate them cleanly into a broader internal developer platform. Helm wasn’t viable due to its lack of orchestration capabilities. Crossplane, while powerful, proved overly complex but also had several key limitations, namely it’s tightly coupled to Provider APIs and it offers limited support for multitenancy. Kro didn’t exist at the time, but even today it remains too rigid and inflexible for our needs. We also required a way to define standard, base configurations for resources, expose only certain configuration knobs to application teams, and give developers the ability to safely customize what they needed.

Additionally, we felt that much of the innovation in platform engineering and DevOps has thrown away a lot of the learnings and developments that have occurred in software engineering over the last several decades. In particular, many modern infrastructure and platform tools lack robust mechanisms for testing configuration and orchestration logic. Unlike application code, which benefits from well-established practices around unit testing, integration testing, and CI workflows, platform code is often fragile, opaque, and difficult to validate. There’s no easy way to test a change to a Helm chart or a Crossplane composition without applying it to the cluster and hoping for the best. This leads to brittle platforms, longer feedback cycles, and a higher risk of outages. With Koreo, we wanted to bring the rigor of software development—testable components, reproducible workflows, clear interfaces—into the platform space.

We initially built platforms using custom controllers implemented in Go and Python, which allowed us to move quickly. However, extending and iterating on the platform became increasingly difficult and time-consuming. In order to make it easier to implement new capabilities quickly, one of our developers built a lightweight framework to orchestrate workflows by mapping values between steps. Another developer sought to make our existing system more flexible with a simple approach to configuration templating. When we saw these two ideas, which were in two different code bases in two different parts of the system, we realized that they could work well together, and the idea for Koreo was born.

Koreo’s Approach

At the heart of Koreo are functions and workflows. Functions are small, reusable logic units that can either be pure and computational (we call these ValueFunctions) or side-effectful and capable of interacting with the Kubernetes API to create, modify, or read resources (ResourceFunctions). These are then composed into workflows, which define a graph of steps, conditions, and dependencies—kind of like a programmable controller but defined entirely in YAML.

We chose YAML deliberately because it doesn’t abstract what is actually being configured: Kubernetes resource manifests. We’ve found keeping the tool “true” to the underlying thing it manages reduces mental overhead for developers and makes it easier to reason about what is happening. This choice also brings some practical benefits. It’s easy to take an existing Kubernetes resource definition and use it as the basis for a ResourceFunction or ResourceTemplate. It also allows us to very cleanly test materialized resources. With FunctionTests, we can ensure resources are configured as expected based on inputs to the function. And by using ResourceTemplates, we can define standard base configurations and then overlay dynamic values on top. This approach to resource materialization allows us to decompose configuration into small, reusable components that can be tested both in isolation and as part of a larger workflow.

Koreo provides a unified approach to configuration management and resource orchestration. Workflows allow us to connect resources together by mapping the outputs of a resource to the inputs of another resource. For instance, let’s say you want to provision a GKE cluster, configure network policies, and deploy an application—all based on some higher-level spec like a tenant definition. With Koreo, you can define a workflow that reads the spec, provisions the GKE cluster, waits for it to be ready, extracts its outputs (e.g. endpoint, credentials), and passes them downstream to subsequent steps. Each step is explicit, composable, and testable.

In effect, Koreo lets you seamlessly integrate Kubernetes controllers and off-the-shelf operators like Config Connector and ACK into a cohesive platform. With this, you can build your own platform controllers without actually needing to implement a custom operator. Any resource in Kubernetes can be connected, and we can use Koreo’s control-flow primitives to implement workflows that are as simple as provisioning a few resources or as complex as automating an entire internal developer platform.

Why Koreo Matters

What makes Koreo different is that it treats platform engineering not as an afterthought to infrastructure management or application deployment, but as a discipline in its own right—one that deserves the same principles of clarity, testability, and modularity that we apply to software engineering. It’s not just about making YAML more bearable or chaining together a few CLI commands. It’s about enabling platform teams to build robust, composable, and predictable automation that supports developers without hiding important details or creating layers of abstraction that become liabilities over time.

Koreo lets you define your platform architecture as code, test it like code, and evolve it like code. It’s declarative, but programmable. It’s simple to start with, but powerful enough to scale across teams, environments, and cloud providers. It gives you a model for thinking about how platform components fit together and the control to manage that complexity as your needs grow.

What’s Next

Since open sourcing Koreo, we’ve been actively working on improving documentation, adding more examples, and expanding support for new use cases. We’re especially excited about adding support for managing resources in other Kubernetes clusters. This enables running Koreo as a centralized control plane in addition to a federated model. We’re also investing in better tooling around developer experience, like improved type checking, workflow debugging, and enhanced UI capabilities to visualize workflow executions.

If you’re a platform engineer, SRE, or DevOps practitioner struggling to manage the growing complexity of Kubernetes-based platforms, we’d love for you to give Koreo a shot. We’re just getting started, and we’d love your feedback.

Follow @tyler_treat

March 19, 2025March 20, 2025

Controller-Driven Infrastructure as Code

Harnessing the Kubernetes Resource Model for modern infrastructure management

Infrastructure as Code (IaC) revolutionized how we manage infrastructure, enabling developers to define resources declaratively and automate their deployment. However, tools like Terraform and CloudFormation, despite their declarative configuration, rely on an operation-centric model, where resources are created or updated through one-shot commands.

The evolution of IaC: From operations to controllers

In contrast, Kubernetes introduced a new paradigm with its controller pattern and the Kubernetes Resource Model (KRM). This resource-centric approach to APIs redefines infrastructure management by focusing on desired state rather than discrete operations. Kubernetes controllers continuously monitor resources, ensuring they conform to their declarative configurations by performing actions to move the actual state closer to the desired state, much like a human operator would. This is known as a control loop.

Kubernetes also demonstrated the value of providing architectural building blocks that encapsulate standard patterns, such as a Deployment. These can then be composed and combined to provide impressive capabilities with little effort—HorizontalPodAutoscaler is an example of this. Through extensibility, Kubernetes allows developers to define new resource types and controllers, making it a natural fit for managing not just application workloads but infrastructure of any kind. This enables you to actually provide a clean API for common architectural needs that encapsulates a lot of routine business logic. Extending this model to IaC is something we call Controller-Driven IaC.

Building on the Kubernetes controller model

Controller-Driven IaC builds upon the Kubernetes foundation, leveraging its controllers to reconcile cloud resources and maintain continuous alignment between desired and actual states. By extending Kubernetes’ principles of declarative configuration and control loops to IaC, this approach offers a resilient and scalable way to manage modern infrastructure. Integrating cloud and external system APIs into Kubernetes controllers enables continuous state reconciliation beyond Kubernetes itself, ensuring consistency, eliminating configuration drift, and reducing operational complexity. It results in an IaC solution that is capable of working correctly with modern, dynamic infrastructure. Additionally, it brings many of the other benefits of Kubernetes—such as RBAC, policy enforcement, and observability—to infrastructure and systems outside the cluster, creating a unified and flexible management framework. In essence, Kubernetes becomes the control plane for your entire developer platform. That means you can offer developers a self-service experience within defined bounds, and this can further be scoped to specific application domains.

This concept isn’t entirely new. Kubernetes introduced Custom Resource Definitions (CRDs) in 2017, enabling the creation of Operators, or custom controllers, to extend its functionality. Today, countless Operators exist to manage diverse applications and infrastructure, both within and outside of Kubernetes, including those from major cloud providers. For instance, GCP’s Config Connector, AWS’s ACK, and Azure’s ASO offer controllers for managing their respective platform’s infrastructure. However, just as operationalizing Kubernetes requires tooling and investment to build an effective platform, so too does implementing Controller-Driven IaC. Integrating these various controllers into a cohesive platform requires its own kind of orchestration. We need a way to program control loops—whether built-in Kubernetes controllers (like Deployments or Jobs), off-the-shelf controllers (like ACK or Config Connector), or custom controllers we’ve built ourselves.

Introducing Koreo: Programming control loops for modern platforms

There are tools such as Crossplane that take a controller-oriented approach to infrastructure, but they have their own challenges and limitations. In particular, we really need the ability to compose arbitrary Kubernetes resources and controllers, not just specific provider APIs. What if we could treat anything in Kubernetes as a referenceable object capable of acting as the input or output to an automated workflow, and without the need for building tons of CRDs or custom Operators? Additionally, it’s critical that resources can be namespaced rather than cluster-scoped to support multi-tenant environments and that the corresponding infrastructure can live in cloud projects or accounts separate from where the control plane itself lives.

To address these needs and deliver the full potential of Controller-Driven IaC, we’ve developed and open-sourced Koreo, a platform engineering toolkit for Kubernetes. Koreo is a new approach to Kubernetes configuration management and resource orchestration empowering developers through programmable workflows and structured data. It enables seamless integration and automation around the Kubernetes Resource Model, supporting a wide range of use cases centered on Controller-Driven IaC. Koreo serves as a meta-controller programming language and runtime that allows you to compose control loops into powerful abstractions.

*The Koreo UI showing a workflow for a custom AWS workload abstraction*

Koreo is specifically built to empower platform engineering teams and DevOps engineers by allowing them to provide Architecture-as-Code building blocks to the teams they support. With Koreo, you can easily leverage existing Kubernetes Operators or create your own specialized Operators, then expose them through powerful, high-level abstractions aligned with your organization’s needs. For example, you can develop a “StatelessCrudApp” that allows development teams to enable company-standard databases and caches with minimal effort. Similarly, you can build flexible automations that combine and orchestrate various Kubernetes primitives.

*An instance of the custom AWS workload abstraction*

Where Koreo really shines, however, is making it fast and safe to add new capabilities to your internal developer platform. Existing configuration management tools like Helm and Kustomize, while useful for simpler configurations, become unwieldy when dealing with the intricacies of modern Kubernetes deployments. They ultimately treat configuration as static data, and this becomes problematic as configuration evolves in complexity.

Koreo instead embraces configuration as code by providing a programming language and runtime with robust developer tooling. This allows platform engineers to define and manage Kubernetes configurations and resource orchestration in a way that is better suited to modern infrastructure challenges. It offers a solution that scales with complexity. A built-in testing framework makes it easy to quickly validate configuration and iterate on infrastructure, and IDE integration gives developers a familiar programming-like experience.

The future of infrastructure management is controller-driven

By harnessing the power of Kubernetes controllers for Infrastructure as Code, Koreo bridges the gap between declarative configuration and dynamic infrastructure management. It moves beyond the limitations of traditional IaC, offering a truly Kubernetes-native approach that brings the benefits of control loops, composability, and continuous reconciliation to your entire platform. With Koreo, you’re not just managing resources; you’re composing Kubernetes controllers to do powerful things like building internal developer platforms, managing multi-cloud infrastructure, or orchestrating application deployments and other complex workflows.

See what you can build with Koreo.

Follow @tyler_treat

February 12, 2024

Cloud without Kubernetes

I think it’s safe to say Kubernetes has “won” the cloud mindshare game. If you look at the CNCF Cloud Native landscape (and manage to not go cross eyed), it seems like most of the projects are somehow related to Kubernetes. KubeCon is one of the fastest-growing industry events. Companies we talk to at Real Kinetic who are either preparing for or currently executing migrations to the cloud are centering their strategies around Kubernetes. Those already in the cloud are investing heavily in platform-izing their Kubernetes environment. Kubernetes competitors like Nomad, Pivotal Cloud Foundry, OpenShift, and Rancher have sort of just faded to the background (or simply pivoted to Kubernetes). In many ways, “cloud native” seems to be equated with “Kubernetes”.

All this is to say, the industry has coalesced around Kubernetes as the way to do cloud. But after working with enough companies doing cloud, watching their experiences, and understanding their business problems, I can’t help but wonder: should it be? Or rather, is Kubernetes actually the right level of abstraction?

Going k8sless

While we’ve worked with a lot of companies doing Kubernetes, we’ve also worked with some that are deliberately not. Instead, they leaned into serverless—heavily—or as I like to call it, they’ve gone k8sless. These are not small companies or startups, they are name brands you would recognize.

At first, we were skeptical. Our team came from a company that made it all the way to IPO using Google App Engine, one of the earliest serverless platforms available. We have regularly espoused the benefits of serverless. We’ve talked to clients about how they should consider it for their own workloads (often to great skepticism). But using only serverless? For once, we were the serverless skeptics. One client in particular was beginning a migration of their e-commerce platform to Google Cloud. They wanted to do it completely serverless. We gave our feedback and recommendations based on similar migrations we’ve performed:

“There are workloads that aren’t a good fit.”

“It would require major re-architecting.”

“It will be expensive once fully migrated.”

“You’ll have better cost efficiency bin packing lots of services into VMs with Kubernetes.”

We articulated all the usual arguments made by the serverless doubters. Even Google was skeptical, echoing our sentiments to the customer. “Serious companies doing online retail like The Home Depot or Target are using Google Kubernetes Engine,” was more or less the message. We have a team of serverless experts at Real Kinetic though, so we forged ahead and helped execute the migration.

Fast forward nearly three years later and we will happily admit it: we were wrong. You can run a multibillion-dollar e-commerce platform without a single VM. You don’t have to do a full rewrite or major re-architecting. It can be cost-effective. It doesn’t require proprietary APIs or constraints that result in vendor lock-in. It might sound like an exaggeration, but it’s not.

Container as the interface

Over the last several years, Google’s serverless offerings have evolved far beyond App Engine. It has reached the point where it’s now viable to run a wide variety of workloads without much issue. In particular, Cloud Run offers many of the same benefits of a PaaS like App Engine without the constraints. If your code can run in a container, there’s a very good chance it will run on Cloud Run with little to no modification.

In fact, other than using the gcloud CLI to deploy a service, there’s nothing really Google- or Cloud Run-specific needed to get a functioning application. This is because Cloud Run uses Knative, an open-source Kubernetes-based platform, as its deployment interface. And while Cloud Run is a Google-managed backend for the Knative interface, we could just as well switch the backend to GKE or our own Kubernetes cluster. When we implement our Cloud Run services, we actually implement them using a Kubernetes Deployment manifest, shown below, and right before deploying, we swap Deployment for Knative’s Service manifest.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    cloud.googleapis.com/location: us-central1
    service: my-service
  name: my-service
spec:
  template:
    spec:
      containers:
        - image: us.gcr.io/my-project/my-service:v1
          name: my-service
          ports:
            - containerPort: 8080
          resources:
            limits:
              cpu: 2
              memory: 1024Mi

This means we can deploy to Kubernetes without Knative at all, which we often do during development using the combination of Skaffold and K3s to perform local testing. It also allows us to use Kubernetes native tooling such as Kustomize to manage configuration. Think of Cloud Run as a Kubernetes Deployment as a service (though really more like Deployment and Service…as a service).

“Normal” businesses versus internet-scale businesses

What about cost? Yes, the unit cost in terms of compute is higher with serverless. If you execute enough CPU cycles to fill the capacity of a VM, you are better off renting the whole VM as opposed to effectively renting timeshares of it. But here’s the thing: most “normal” businesses tend to have highly cyclical traffic patterns throughout the day and their scale is generally modest.

What do I mean by “normal” businesses? These are primarily non-internet-scale companies such as insurance, fast food, car rental, construction, or financial services, not Google, Netflix, or Amazon. As a result, these companies can benefit greatly from pay-per-use, and those in the retail space also benefit greatly from the elasticity of this model during periods like Black Friday or promotional campaigns. Businesses with brick-and-mortar have traffic that generally follows their operating hours. During off-hours, they can often scale quite literally to zero.

Many of these businesses, for better or worse, treat software development as an IT cost center to be managed. They don’t need—or for that matter, want—the costs and overheads associated with platform-izing Kubernetes. A lot of the companies we interact with fall into this category of “normal” businesses, and I suspect most companies outside of tech do as well.

BYOP—Bring Your Own Platform

I’ve asked it before: is Kubernetes really the end-game abstraction? In my opinion, it’s an implementation detail. I don’t think I’m alone in that opinion. Some companies put a tremendous amount of investment into abstracting Kubernetes from their developers. This is what I mean by “platform-izing” Kubernetes. It typically involves significant and ongoing OpEx investment. The industry has started to coalesce around two concepts that encapsulate this: Platform Engineering and Internal Developer Platform. So while Kubernetes may have become the default container orchestrator, the higher-level pieces—the pieces constituting the Internal Developer Platform—are still very much bespoke. Kelsey Hightower said it best: the majority of people managing infrastructure just want a PaaS. The only requirement: it has to be built by them. That’s a problem.

Imagine having a Kubernetes cluster per Deployment. Full blast radius isolation, complete cost traceability, granular yet simple permissioning. It sounds like a maintenance nightmare though, right? Now imagine those clusters just being hidden from you completely and the Deployment itself is the only thing you interact with and maintain. You just provide your container (or group of containers), configure your CPU and memory requirements, specify the network and resource access, and deploy it. The Deployment manages your load balancing and ingress, automatically scales the pods up and down or canaries traffic, and gives you aggregated logs and metrics out of the box. You only pay for the resources consumed while processing a request. Just a few years ago, this was a futuristic-sounding fantasy.

The platform Kelsey describes above does now exist. From my experience, it’s a nearly ideal solution for those “normal” businesses who are looking to minimize complexity and operational costs and avoid having to bring (more like build) their own platform. I realize GCP is a distant third when it comes to public cloud market share so this will largely fall on deaf ears, but for those who are still listening: stop wasting time on Kubernetes and just use Cloud Run. Let me expand on the reasons why.

Easily and quickly get started with the cloud. Many of the companies we work with who are still in the midst of migrating to the cloud get hung up with analysis paralysis. Cloud Run isn’t a perfect solution for everything, but it’s good enough for the majority of cases. The rest can be handled as exceptions.
Minimize complexity of cloud environments. Cloud Run does not eliminate the need for infrastructure (there are still caches, queues, databases, and so forth), but it greatly simplifies it. Using managed services for the remaining infrastructure pieces simplifies it further.
Increase the efficiency of your developers and reduce operational costs. Rather than spending most of their time dealing with infrastructure concerns, allow your developers to focus on delivering business value. For most businesses, infrastructure is undifferentiated commodity work. By “outsourcing” large parts of your undifferentiated Internal Developer Platform, you can reallocate developers to product or feature development and reduce operational costs. This allows you to get the benefits of Platform Engineering with a fraction of the maintenance and overhead. Lastly, if you are a “normal” business that doesn’t operate at internet scale and has fairly cyclical traffic, it’s entirely likely Cloud Run will be cheaper than VM-based platforms.
Maintain the flexibility to evolve to a more complex solution over time if needed. This is where traditional serverless platforms and PaaS solutions fall short. Again, with Cloud Run there is no actual vendor lock-in, it’s just a Kubernetes Deployment as a Service. Even without Knative, we can take that Deployment and run it in any Kubernetes cluster. This is a very different paradigm from, say, App Engine where you wrote your application using App Engine APIs and deployed your service to the App Engine runtime. In this new paradigm, the artifact is a Plain Old Container. There are cases where Cloud Run is not a good fit, such as certain kinds of stateful legacy applications or services with sustained, non-cyclical traffic. We don’t want to be painted into a corner with these types of situations so having flexibility is important.

There are similar analogs to Cloud Run on other cloud platforms. For example, AWS has AppRunner. However, in my experience these fall short in terms of developer experience because of either lack of investment from the cloud provider or environment complexity (as I would argue is the case for AWS). Managed services like Cloud Run are one of the areas that GCP truly excels and differentiates itself.

Just use Cloud Run, seriously

I realize not everyone will be convinced. The gravitational pull of Kubernetes is strong and as a platform, it’s a safe bet. However, operationalizing Kubernetes properly—whether it’s a managed offering like GKE or not—requires some kind of platform team and ongoing investment. We’ve seen it approached without this where developers are given clusters or allowed to spin them up and fend for themselves. This quickly becomes untenable because standards are non-existent, security and compliance is unmanageable, and developer time is split between managing infrastructure and actual feature development.

If your organization is unable or unwilling to make this investment, I urge you to consider Cloud Run. There’s still work needed on the periphery to properly operationalize it, such as implementing CI/CD pipelines and managing accessory infrastructure, but it’s a much lower investment. Additionally, it provides an escape hatch—unlike App Engine or traditional PaaS solutions, there is no real switching cost in moving to Kubernetes if you need to in the future. With Cloud Run, serverless has finally reached a tipping point where it’s now viable for a majority of workloads rather than a niche subset. Unlike Kubernetes, it provides the right level of abstraction for most businesses building software. In my opinion, serverless is still not taken seriously due to preconceived notions, but it’s time to start reevaluating those notions.

Agree? Disagree? I’d love to hear your thoughts. If you’re an organization that would like to do cloud differently or are looking for the playbook to operationalize Google Cloud Platform, please get in touch.

Follow @tyler_treat

June 24, 2020

Using Google-Managed Certificates and Identity-Aware Proxy With GKE

Ingress on Google Kubernetes Engine (GKE) uses a Google Cloud Load Balancer (GCLB). GCLB provides a single anycast IP that fronts all of your backend compute instances along with a lot of other rich features. In order to create a GCLB that uses HTTPS, an SSL certificate needs to be associated with the ingress resource. This certificate can either be self-managed or Google-managed. The benefit of using a Google-managed certificate is that they are provisioned, renewed, and managed for your domain names by Google. These managed certificates can also be configured directly with GKE, meaning we can configure our certificates the same way we declaratively configure our other Kubernetes resources such as deployments, services, and ingresses.

GKE also supports Identity-Aware Proxy (IAP), which is a fully managed solution for implementing a zero-trust security model for applications and VMs. With IAP, we can secure workloads in GCP using identity and context. For example, this might be based on attributes like user identity, device security status, region, or IP address. This allows users to access applications securely from untrusted networks without the need for a VPN. IAP is a powerful way to implement authentication and authorization for corporate applications that are run internally on GKE, Google Compute Engine (GCE), or App Engine. This might be applications such as Jira, GitLab, Jenkins, or production-support portals.

IAP works in relation to GCLB in order to secure GKE workloads. In this tutorial, I’ll walk through deploying a workload to a GKE cluster, setting up GCLB ingress for it with a global static IP address, configuring a Google-managed SSL certificate to support HTTPS traffic, and enabling IAP to secure access to the application. In order to follow along, you’ll need a GKE cluster and domain name to use for the application. In case you want to skip ahead, all of the Kubernetes configuration for this tutorial is available here.

Deploying an Application Behind GCLB With a Managed Certificate

First, let’s deploy our application to GKE. We’ll use a Hello World application to test this out. Our application will consist of a Kubernetes deployment and service. Below is the configuration for these:

Apply these with kubectl:

$ kubectl apply -f .

At this point, our application is not yet accessible from outside the cluster since we haven’t set up an ingress. Before we do that, we need to create a static IP address using the following command:

$ gcloud compute addresses create web-static-ip --global

The above will reserve a static external IP called “web-static-ip.” We now can create an ingress resource using this IP address. Note the “kubernetes.io/ingress.global-static-ip-name” annotation in the configuration:

Applying this with kubectl will provision a GCLB that will route traffic into our service. It can take a few minutes for the load balancer to become active and health checks to begin working. Traffic won’t be served until that happens, so use the following command to check that traffic is healthy:

$ curl -i http://<web-static-ip>

You can find <web-static-ip> with:

$ gcloud compute addresses describe web-static-ip --global

Once you start getting a successful response, update your DNS to point your domain name to the static IP address. Wait until the DNS change is propagated and your domain name now points to the application running in GKE. This could take 30 minutes or so.

After DNS has been updated, we’ll configure HTTPS. To do this, we need to create a Google-managed SSL certificate. This can be managed by GKE using the following configuration:

Ensure that “example.com” is replaced with the domain name you’re using.

We now need to update our ingress to use the new managed certificate. This is done using the “networking.gke.io/managed-certificates” annotation.

We’ll need to wait a bit for the certificate to finish provisioning. This can take up to 15 minutes. Once it’s done, we should see HTTPS traffic flowing correctly:

$ curl -i https://example.com

We now have a working example of an application running in GKE behind a GCLB with a static IP address and domain name secured with TLS. Now we’ll finish up by enabling IAP to control access to the application.

Securing the Application With Identity-Aware Proxy

If you’re enabling IAP for the first time, you’ll need to configure your project’s OAuth consent screen. The steps here will walk through how to do that. This consent screen is what users will see when they attempt to access the application before logging in.

Once IAP is enabled and the OAuth consent screen has been configured, there should be an OAuth 2 client ID created in your GCP project. You can find this under “OAuth 2.0 Client IDs” in the “APIs & Services” > “Credentials” section of the cloud console. When you click on this credential, you’ll find a client ID and client secret. These need to be provided to Kubernetes as secrets so they can be used by a BackendConfig for configuring IAP. Apply the secrets to Kubernetes with the following command, replacing “xxx” with the respective credentials:

$ kubectl create secret generic iap-oauth-client-id \
--from-literal=client_id=xxx \
--from-literal=client_secret=xxx

BackendConfig is a Kubernetes custom resource used to configure ingress in GKE. This includes features such as IAP, Cloud CDN, Cloud Armor, and others. Apply the following BackendConfig configuration using kubectl, which will enable IAP and associate it with your OAuth client credentials:

We also need to ensure there are service ports associated with the BackendConfig in order to trigger turning on IAP. One way to do this is to make all ports for the service default to the BackendConfig, which is done by setting the “beta.cloud.google.com/backend-config” annotation to “{“default”: “config-default”}” in the service resource. See below for the updated service configuration.

Once you’ve applied the annotation to the service, wait a couple minutes for the infrastructure to settle. IAP should now be working. You’ll need to assign the “IAP-secured Web App User” role in IAP to any users or groups who should have access to the application. Upon accessing the application, you should now be greeted with a login screen.

Your Kubernetes workload is now secured by IAP! Do note that VPC firewall rules can be configured to bypass IAP, such as rules that allow traffic internal to your VPC or GKE cluster. IAP will provide a warning indicating which firewall rules allow bypassing it.

For an extra layer of security, IAP sets signed headers on inbound requests which can be verified by the application. This is helpful in the event that IAP is accidentally disabled or misconfigured or if firewall rules are improperly set.

Together with GCLB and GCP-managed certificates, IAP provides a great solution for serving and securing internal applications that can be accessed anywhere without the need for a VPN.

Follow @tyler_treat

September 17, 2019

What’s Going on with GKE and Anthos?

GCP’s Slippery Slide into Enterprise

When former Oracle exec Thomas Kurian took over for Diane Greene as Google Cloud’s CEO, a lot of people expressed concern about what this meant for the future of GCP. Vendor lock-in is already at the forefront of the minds of many cloud adopters, and Oracle is notorious for locking customers into expensive and prolonged contracts. However, I thought the move was smart on Google’s part.

Google has never been a customer-first company. While it has always been a technology leader, it struggles immensely with enterprise sales and support. It continues to have issues dogfooding its own products (Google’s products are typically built on internal versions of services not available to customers, then there are the external GCP versions that their customers actually use). This means its engineers don’t feel the same pain points that its customers experience and their products lose out on a critical feedback loop (contrast this with Amazon where AWS is treated as a separate company to Amazon.com, and there is a mandate to build with the same services Amazon’s customers use). Customer empathy matters.

Now, most people probably wouldn’t characterize Oracle as a customer-first company, but it knows how to meet customers where they are and to sell in a way that resonates with enterprise decision makers. Historically, Google has approached sales engineering in a way that has failed to resonate with customers by attempting to map its superior technology offerings onto actual customer problems. Nothing could be more off-putting to a decision maker with a round hole than a sales engineer with a square peg telling them their hole is wrong.

Thomas Kurian was brought in to address these glaring issues for Google Cloud. Through restructuring and growing its sales organization, key leadership hires, and strategic acquisitions and partnerships, it’s clear he’s serious about fixing Google Cloud’s enterprise perception problem. Slowly but surely, Google is attempting to shift its culture from being technology-obsessed to customer-obsessed. And while Oracle is notorious when it comes to vendor lock-in, all signs thus far have pointed to Google more strategically embracing open APIs with things like GKE (Kubernetes), Traffic Director (Istio), ML Engine (Tensorflow), and Dataflow (Apache Beam). They are also starting to meet customers where they are with things like Dataproc (Apache Spark and Hadoop), Memorystore (Redis), and Cloud SQL (MySQL, PostgreSQL, and Microsoft SQL Server). Hell, they’ll even run Microsoft Active Directory for you now! Who says Google can’t do enterprise? So the future is bright for GCP, right? Maybe. What follows is speculation based on my own observations and anecdotal information.

There’s one thing that could change the outlook on all of this: Anthos. Anthos is GCP’s answer to hybrid-cloud solutions like Pivotal Cloud Foundry (PCF), AWS Outposts, or Azure Stack. It allows organizations to build and manage workloads across public clouds and on-prem by extending GKE. If multi-cloud is your thing and you hate money, these platforms all sound like pretty good things. But here’s the disconcerting thing about Anthos in particular: it’s becoming clear that GCP is deliberately blurring the lines between Anthos and GKE.

I received an email yesterday from GCP announcing that Binary Authorization is now generally available (GA). Binary Authorization is a neat security feature that ensures only trusted container images can be deployed to GKE. It’s been in beta for some time and now it’s GA with a six-month free trial starting today. Great! How much will it cost after the trial? Contact your sales representative. Wait, what? That’s because starting on March 16, 2020, GKE clusters will need to be part of an Anthos-subscribed organization to enable Binary Authorization. If you choose not to upgrade to Anthos, starting March 16, 2020, you will not be able to turn on Binary Authorization on new clusters.

This is a slippery slope for GCP. I can already foresee other features requiring an Anthos subscription just to use them in GKE, where GKE basically becomes an Anthos subscription funnel. Which features go into Anthos and which go into GKE? Now this is something I’d come to expect from Oracle. If GCP starts to roll differentiating features into Anthos instead of GKE, it could mark the beginning of the end.

While the lines between Anthos and GKE are becoming increasingly fuzzy, Google is clear about this particular feature:

Binary Authorization is a feature of the Anthos platform and use of Binary Authorization is included in the Anthos subscription.

That wasn’t clear, however, when I started using it with GKE and started to advise clients to use it there, completely irrespective of Anthos. This sets a very dangerous precedent.

What’s more alarming is the marketing and product language on a number of GCP services and features have quietly replaced “GKE” with “Anthos” or, worse yet, “Anthos GKE.” For example, Cloud Run—which is still in beta—now says it can “run stateless containers on a fully managed environment or on Anthos.” Will I need an Anthos subscription to use Cloud Run with GKE once it goes GA? Based on the Binary Authorization move and the language updates, it seems likely. And looking at the GKE cluster setup wizard, it appears managed Istio might also.

Anthos features listed in GKE cluster setup wizard

Which of these features is going to require a subscription next? We know Binary Authorization already does.

Security features listed in GKE cluster setup wizard

And how much does Anthos even cost? Contact sales. Not a good look for Kurian’s vision of openness and customer choice. As AWS CEO Andy Jassy puts it, no longer does the process of buying technology involve the purchase of heavy proprietary software with multi-year contracts that include annual maintenance fees. Now it’s about choice and ease of use, including letting customers turn things off if they’re not working. But choice also means not bundling all of your differentiating features into a massive contract. List prices for Anthos start at $10,000 per month per 100 virtual CPUs with a minimum one-year commitment. This is just for the software layer. It doesn’t include any of the underlying GCP infrastructure. Again, fine for organizations willing to throw similar sums of money at things like PCF or Outposts, but are plain old GKE users really going to get roped in to this nonsense? Are they going to lose out on value-added features?

Either GCP has a well-thought-out strategy for GKE and Anthos (which, given Google’s history, is frankly unlikely) and is simply tone deaf to how it would be perceived by people already skittish about a former Oracle exec taking the reigns as CEO or this will end in disaster. It’s entirely possible this is all just a misunderstanding and they are, in a misguided fashion, rebranding GKE to Anthos (it’s been renamed once already and GCP has a history of rebranding existing products), but requiring a subscription hidden behind a sales contact form in order to use basic features is spooky.

My hope is that there is some longer-term strategy at play and GCP is not moving to an enterprise-subscription model for what should be GKE features. Best case, Google is just muddying the waters as they’ve done in the past. Worst case, they’re steamrolling their entire platform strategy to make way for enterprise sales. That would be tragic for Google given GKE is still by far and away the best managed Kubernetes service available. So what’s going on with GKE and Anthos?

Follow @tyler_treat