Serverless on GCP

Like many other marketing buzzwords, the concept of “serverless” has taken on a life of its own, which can make it difficult to understand what serverless actually means. What it really means is that the cloud provider fully manages server infrastructure all the way up to the application layer. For example, GCE isn’t serverless because, while Google manages the physical server infrastructure, we still have to deal with patching operating systems, managing load balancers, configuring firewall rules, and so on. Serverless means we merely worry about our application code and business logic and nothing else. This concept extends beyond pure compute though, including things like databases, message queues, stream processing, machine learning, and other types of systems.

There are several benefits to the serverless model. First, it allows us to focus on building products, not managing infrastructure. These operations-related tasks, while important, are not generally things that differentiate a business. It’s just work that has to be done to support the rest of the business. With cloud—and serverless in particular—many of these tasks are becoming commoditized, freeing us up to focus on things that matter to the business.

Another benefit related to the first is that serverless systems provide automatic scaling and fault-tolerance across multiple data centers or, in some cases, even globally. When we leverage GCP’s serverless products, we also leverage Google’s operational expertise and the experience of an army of SREs. That’s a lot of leverage. Few companies are able to match the kind of investment cloud providers like Google or Amazon are able to make in infrastructure and operations, nor should they. If it’s not your core business, leverage economies of scale.

Finally, serverless allows us to pay only for what we use. This is quite a bit different from what traditional IT companies are used to where it’s more common to spend several millions of dollars on a large solution with a contract. It’s also different from what many cloud-based companies are used to where you typically provision some baseline capacity and pay for bursts of additional capacity as needed. With serverless, VMs are eschewed and we pay only for the resources we use to serve the traffic we have. This means no more worrying about over-provisioning or under-provisioning.

GCP’s Compute Options

GCP has a comprehensive set of compute options ranging from minimally managed VMs all the way to highly managed serverless backends. Below is the full spectrum of GCP’s compute services at the time of this writing. I’ll provide a brief overview of each of these services just to get the lay of the land. We’ll start from the highest level of abstraction and work our way down, and then we’ll hone in on the serverless solutions.

GCP compute platforms

Firebase is Google’s managed Backend as a Service (BaaS) platform. This is the highest level of abstraction that GCP offers (short of SaaS like G Suite) and allows you to build mobile and web applications quickly and with minimal server-side code. For example, it can implement things like user authentication and offline data syncing for you. This is often referred to as a “backend as a service” because there is no server code. The trade-off is you have less control over the system, but it can be a great fit for quickly prototyping applications or building a proof of concept with minimal investment. The primary advantage is that you can focus most of your development effort on client-side application code and user experience. Note that some components of Firebase can be used outside of the Firebase platform, such as Cloud Firestore and Firebase Authentication

Cloud Functions is a serverless Functions as a Service (FaaS) offering from GCP. You upload your function code and Cloud Functions handles the runtime of it. Because it’s a sandboxed environment, there are some restrictions to the runtime, but it’s a great choice for building event-driven services and connecting systems together. While you can develop basic user-facing APIs, the operational tooling is not sufficient for complex systems. The benefit is Cloud Functions are highly elastic and have minimal operational overhead since it is a serverless platform. They are an excellent choice for dynamic, event-driven plumbing such as moving data between services or reacting to log events. They work well for basic APIs, but can rapidly become operationally complex for more than a few endpoints.

App Engine is Google’s Platform as a Service (PaaS). Like Cloud Functions, it’s an opinionated but fully managed runtime that lets you upload your application code while handling the operational aspects such as autoscaling and fault-tolerance. App Engine has two modes: Standard, which is the opinionated PaaS runtime, and Flexible, which allows providing a custom runtime using a container—this is colloquially referred to as a Container as a Service (CaaS). For stateless applications with quick instance start-up times, it is often an excellent choice. It offers many of the benefits of Cloud Functions but simplifies operational aspects since larger components are easy to deploy and manage. App Engine allows developers to focus most of their effort on business logic. Standard is a great fit for greenfield applications where server-side processing and logic is required. Flex can be easier for migrating existing workloads because it is less opinionated.

Cloud Run is a new offering in GCP that provides a managed compute platform for stateless containers. Essentially, Google manages the underlying compute infrastructure and all you have to do is provide them an application container. Like App Engine, they handle scaling instances up and down, load balancing, and fault-tolerance. Cloud Run actually has two modes: the Google-managed version, which runs your containers on Google’s internal compute infrastructure known as Borg, and the GKE version, which allows running workloads on your own GKE cluster. This is because Cloud Run is built on an open source Kubernetes platform for serverless workloads called Knative.

Cloud Run and App Engine Flex are similar to each other, but there are some nuanced differences. One key difference is Cloud Run has very fast instance start-up time due to its reliance on the gVisor container runtime. Flex instances, on the other hand, usually take minutes to start because they involve provisioning GCE instances, load balancers, and other GCP-managed infrastructure. Flex is also more feature-rich than Cloud Run, supporting things like traffic splitting, deployment rollbacks, WebSocket connections, and VPC connections.

Kubernetes Engine, or GKE, is Google’s managed Kubernetes service. GKE effectively adds a container orchestration layer on top of GCE, putting it somewhere between IaaS (Infrastructure as a Service) and CaaS. This is typically the lowest level of abstraction most modern applications should require. There is still a lot of operational overhead involved with using a managed Kubernetes service.

Lastly, Compute Engine, or GCE, is Google’s VM offering. GCE VMs are usually run on multi-tenant hosts, but GCP also offers sole-tenant nodes where a physical Compute Engine server is dedicated to hosting a single customer’s VMs. This is the lowest level of infrastructure that GCP offers and the lowest common denominator generally available in the public clouds, usually referred to as IaaS. This means there are a lot of operational responsibilities that come with using it. There are generally few use cases that demand a bare VM.

Choosing a Serverless Option

Now that we have an overview of GCP’s compute services, we can focus in on the serverless options.

GCP serverless compute platforms

GCP currently has four serverless compute options (emphasis on computebecause there are other serverless offerings for things like databases, queues, and so forth, but these are out of scope for this discussion).

  • Cloud Run: serverless containers (CaaS)
  • App Engine: serverless platforms (PaaS)
  • Cloud Functions: serverless functions (FaaS)
  • Firebase: serverless applications (BaaS)

With four different serverless options to choose from, how do we decide which one is right? The first thing to point out is that we don’t necessarily need to choose a single solution. We might end up using a combination of these services when building a system. However, I’ve provided some criteria below on selecting solutions for different types of problems.

Firebase

If you’re looking to quickly prototype an application or focus only on writing code, Firebase can be a good fit. This is especially true if you’re wanting to focus most of your investment and time on the client-side application code and user experience. Likewise, if you want to build a mobile-ready application and don’t want to implement things like user authentication, it’s a good option. 

Firebase is obviously the most restrictive and opinionated solution, but it’s great for rapid prototyping and accelerating development of an MVP. You can also complement it with services like App Engine or Cloud Functions for situations that require server-side compute.

Good Fit Characteristics

  • Mobile-first (or ready) applications
  • Rapidly prototyping applications
  • Applications where most of the logic is (or can be) client-side
  • Using Firebase components on other platforms, such as using Cloud Firestore or Firebase Authentication on App Engine, to minimize investment in non-differentiating work

Bad Fit Characteristics

  • Applications requiring complex server-side logic or architectures
  • Applications which require control over the runtime

Cloud Functions

If you’re looking to react to real-time events, glue systems together, or build a simple API, Cloud Functions are a good choice provided you’re able to use one of the supported runtimes (Node.js, Python, and Go). If the runtime is a limitation, check out Cloud Run.

Good Fit Characteristics

  • Event-driven applications and systems
  • “Glueing” systems together
  • Deploying simple APIs

Bad Fit Characteristics

  • Highly stateful systems
  • Deploying large, complex APIs
  • Systems that require a high level of control or need custom runtimes or binaries

App Engine

If you’re looking to deploy a full application or complex API, App Engine is worth looking at. Standard is good for greenfield applications which are able to fit within the constraints of the runtime. It can scale to zero and deploys take seconds. Flexible is easier for existing applications where you’re unwilling or unable to make changes fitting them into Standard. Deploys to Flex can take minutes, and you must have a minimum of one instance running at all times.

Good Fit Characteristics

  • Stateless applications
  • Rapidly developing CRUD-heavy applications
  • Applications composed of a few services
  • Deploying complex APIs

Bad Fit Characteristics

  • Stateful applications that require lots of in-memory state to meet performance or functional requirements
  • Applications built with large or opinionated frameworks or applications that have slow start-up times (this can be okay with Flex)
  • Systems that require protocols other than HTTP

Cloud Run

If you’re looking to react to real-time events but need custom runtimes or binaries not supported by Cloud Functions, Cloud Run is a good choice. It’s also a good option for building stateless HTTP-based web services. It’s trimmed down compared to App Engine Flex, which means it has fewer features, but it also has faster instance start-up times, can scale to zero, and is billed only by actual request-processing time rather than instance time. 

Good Fit Characteristics

  • Stateless services that are easily containerized
  • Event-driven applications and systems
  • Applications that require custom system and language dependencies

Bad Fit Characteristics

  • Highly stateful systems or systems that require protocols other than HTTP
  • Compliance requirements that demand strict controls over the low-level environment and infrastructure (might be okay with the Knative GKE mode)

Finally, Google also provides a decision tree for choosing a serverless compute platform.

* App Engine standard environment supports Node.js, Python, Java, Go, PHP
* Cloud Function supports Node.js, Python, Go

Summary

Going serverless can provide a lot of efficiencies by freeing up resources and investment to focus on things that are more strategic and differentiating for a business rather than commodity infrastructure. There are trade-offs when using managed services and serverless solutions. We lose some control and visibility. At certain usage levels there can be a premium, so eventually renting VMs might be the more cost-effective solution once you crack that barrier. However, it’s important to consider not just operational costs involved in managing infrastructure, but also opportunity costs. These trade-offs have to be weighed carefully against the benefits they bring to the business.

One thing worth pointing out is that it’s often easier to move down a level of abstraction than up. That is, there’s typically less friction involved in moving from a more opinionated platform to a less opinionated one than vice versa. This is why we usually suggest starting with the highest level of abstraction possible and dropping down if and when needed.

Authenticating Stackdriver Uptime Checks for Identity-Aware Proxy

Google Stackdriver provides a set of tools for monitoring and managing services running in GCP, AWS, or on-prem infrastructure. One feature Stackdriver has is “uptime checks,” which enable you to verify the availability of your service and track response latencies over time from up to six different geographic locations around the world. While Stackdriver uptime checks are not as feature-rich as other similar products such as Pingdom, they are also completely free. For GCP users, this provides a great starting point for quickly setting up health checks and alerting for your applications.

Last week I looked at implementing authentication and authorization for APIs in GCP using Cloud Identity-Aware Proxy (IAP). IAP provides an easy way to implement identity and access management (IAM) for applications and APIs in a centralized place. However, one thing you will bump into when using Stackdriver uptime checks in combination with IAP is authentication. For App Engine in particular, this can be a problem since there is no way to bypass IAP. All traffic, both internal and external to GCP, goes through it. Until Cloud IAM Conditions is released and generally available, there’s no way to—for example—open up a health-check endpoint with IAP.

While uptime checks have support for Basic HTTP authentication, there is no way to script more sophisticated request flows (e.g. to implement the OpenID Connect (OIDC) authentication flow for IAP-protected resources) or implement fine-grained IAM policies (as hinted at above, this is coming with IAP Context-Aware Access and IAM Conditions). So are we relegated to using Nagios or some other more complicated monitoring tool? Not necessarily. In this post, I’ll present a workaround solution for authenticating Stackdriver uptime checks for systems protected by IAP using Google Cloud Functions.

The Solution

The general strategy is to use a Cloud Function which can authenticate with IAP using a service account to proxy uptime checks to the application. Essentially, the proxy takes a request from a client, looks for a header containing a host, forwards the request that host after performing the necessary authentication, and then forwards the response back to the client. The general architecture of this is shown below.

There are some trade-offs with this approach. The benefit is we get to rely on health checks that are fully managed by GCP and free of charge. Since Cloud Functions are also managed by GCP, there’s no operations involved beyond deploying the proxy and setting it up. The first two million invocations per month are free for Cloud Functions. If we have an uptime check running every five minutes from six different locations, that’s approximately 52,560 invocations per month. This means we could run roughly 38 different uptime checks without exceeding the free tier for invocations. In addition to invocations, the free tier offers 400,000 GB-seconds, 200,000 GHz-seconds of compute time and 5GB of Internet egress traffic per month. Using the GCP pricing calculator, we can estimate the cost for our uptime check. It generally won’t come close to exceeding the free tier.

The downside to this approach is the check is no longer validating availability from the perspective of an end user. Because the actual service request is originating from Google’s infrastructure by way of a Cloud Function as opposed to Stackdriver itself, it’s not quite the same as a true end-to-end check. That said, both Cloud Functions and App Engine rely on the same Google Front End (GFE) infrastructure, so as long as both the proxy and App Engine application are located in the same region, this is probably not all that important. Besides, for App Engine at least, the value of the uptime check is really more around performing a full-stack probe of the application and its dependencies than monitoring the health of Google’s own infrastructure. That is one of the goals behind using managed services after all. The bigger downside is that the latency reported by the uptime check no longer accurately represents the application. It can still be useful for monitoring aggregate trends nonetheless.

The Implementation Setup

I’ve built an open-source implementation of the proxy as a Cloud Function in Python called gcp-oidc-proxy. It’s runnable out of the box without any modification. We’ll assume you have an IAP-protected application you want to setup a Stackdriver uptime check for. To deploy the proxy Cloud Function, first clone the repository to your machine, then from there run the following gcloud command:

$ gcloud functions deploy gcp-oidc-proxy \
   --runtime python37 \
   --entry-point handle_request \
   --trigger-http

This will deploy a new Cloud Function called gcp-oidc-proxy to your configured cloud project. It will assume the project’s default service account. Ordinarily, I would suggest creating a separate service account to limit scopes. This can be configured on the Cloud Function with the –service-account flag, which is under gcloud beta functions deploy at the time of this writing. We’ll omit this step for brevity however.

Next, we need to add the “Service Account Actor” IAM role to the Cloud Function’s service account since it will need it to sign JWTs (more on this later). In the GCP console, go to IAM & admin, locate the appropriate service account (in this case, the default service account), and add the respective role.

The Cloud Function’s service account must also be added as a member to the IAP with the “IAP-secured Web App User” role in order to properly authenticate. Navigate to Identity-Aware Proxy in the GCP console, select the resource you wish to add the service account to, then click Add Member.

Find the OAuth2 client ID for the IAP by clicking on the options menu next to the IAP resource and select “Edit OAuth client.” Copy the client ID on the next page and then navigate to the newly deployed gcp-oidc-proxy Cloud Function. We need to configure a few environment variables, so click edit and then expand more at the bottom of the page. We’ll add four environment variables: CLIENT_ID, WHITELIST, AUTH_USERNAME, and AUTH_PASSWORD.

CLIENT_ID contains the OAuth2 client ID we copied for the IAP. WHITELIST contains a comma-separated list of URL paths to make accessible or * for everything (I’m using /ping in my example application), and AUTH_USERNAME and AUTH_PASSWORD setup Basic authentication for the Cloud Function. If these are omitted, authentication is disabled.

Save the changes to redeploy the function with the new environment variables. Next, we’ll setup a Stackdriver uptime check that uses the proxy to call our service. In the GCP console, navigate to Monitoring then Create Check from the Stackdriver UI. Skip any suggestions for creating a new uptime check. For the hostname, use the Cloud Function host. For the path, use /gcp-oidc/proxy/<your-endpoint>. The proxy will use the path to make a request to the protected resource.

Expand Advanced Options to set the Forward-Host to the host protected by IAP. The proxy uses this header to forward requests. Lastly, we’ll set the authentication username and password that we configured on the Cloud Function.

Click “Test” to ensure our configuration works and the check passes.

The Implementation Details

The remainder of this post will walk you through the implementation details of the proxy. The implementation closely resembles what we did to authenticate API consumers using a service account. We use a header called Forward-Host to allow the client to specify the IAP-authenticated host to forward requests to. If the header is not present, we just return a 400 error. We then use this host and the path of the original request to construct the proxy request and retain the HTTP method and headers (with the exception of the Host header, if present, since this can cause problems).

Before sending the request, we perform the authentication process by generating a JWT signed by the service account and exchange it for a Google-signed OIDC token.

We can cache this token and renew it only once it expires. Then we set the Authorization header with the OIDC token and send the request.

We simply forward on the resulting content body, status code, and headers. We strip HTTP/1.1 “hop-by-hop” headers since these are unsupported by WSGI and Python Cloud Functions rely on Flask. We also strip any Content-Encoding header since this can also cause problems.

Because this proxy allows clients to call into endpoints unauthenticated, we also implement a whitelist to expose only certain endpoints. The whitelist is a list of allowed paths passed in from an environment variable. Alternatively, we can whitelist * to allow all paths. Wildcarding could be implemented to make this even more flexible. We also implement a Basic auth decorator which is configured with environment variables since we can setup uptime checks with a username and password in Stackdriver.

The only other code worth looking at in detail is how we setup the service account credentials and IAM Signer. A Cloud Function has a service account attached to it which allows it to assume the roles of that account. Cloud Functions rely on the Google Compute Engine metadata server which stores service account information among other things. However, the metadata server doesn’t expose the service account key used to sign the JWT, so instead we must use the IAM signBlob API to sign JWTs.

Conclusion

It’s not a particularly simple solution, but it gets the job done. The setup of the Cloud Function could definitely be scripted as well. Once IAM Conditions is generally available, it should be possible to expose certain endpoints in a way that is accessible to Stackdriver without the need for the OIDC proxy. That said, it’s not clear if there is a way to implement uptime checks without exposing an endpoint at all since there is currently no way to assign a service account to a check. Ideally, we would be able to assign a service account and use that with IAP Context-Aware Access to allow the uptime check to access protected endpoints.