Security, Maintainability, Velocity: Choose One

There are three competing priorities that companies have as it relates to software development: security, maintainability, and velocity. I’ll elaborate on what I mean by each of these in just a bit. When I originally started thinking about this, I thought of it in the context of the “good, fast, cheap: choose two” project management triangle. But after thinking about it for more than a couple minutes, and as I related it to my own experience and observations at other companies, I realized that in practice it’s much worse. For most organizations building software, it’s more like security, maintainability, velocity: choose one.

The Software Development Triangle

Of course, most organizations are not explicitly making these trade-offs. Instead, the internal preferences and culture of the company reveal them. I believe many organizations, consciously or not, accept this trade-off as an immovable constraint. More risk-averse groups might even welcome it. Though the triangle most often results in a “choose one” sort of compromise, it’s not some innate law. You can, in fact, have all three with a little bit of careful thought and consideration. And while reality is always more nuanced than what this simple triangle suggests, I find looking at the extremes helps to ground the conversation. It emphasizes the natural tension between these different concerns. Bringing that tension to the forefront allows us to be more intentional about how we manage it.

It wasn’t until recently that I distilled down these trade-offs and mapped them into the triangle shown above, but we’ve been helping clients navigate this exact set of competing priorities for over six years at Real Kinetic. We built Konfig as a direct response to this since it was such a common challenge for organizations. We’re excited to offer a solution which is the culmination of years of consulting and which allows organizations to no longer compromise, but first let’s explore the trade-offs I’m talking about.

Security

Companies, especially mid- to large- sized organizations, care a great deal about security (and rightfully so!). That’s not to say startups don’t care about it, but the stakes are just much higher for enterprises. They are terrified of being the next big name in the headlines after a major data breach or ransomware attack. I call this priority security for brevity, but it actually consists of two things which I think are closely aligned: security and governance.

Governance directly supports security in addition to a number of other concerns like reliability, risk management, and compliance. This is sometimes referred to as Governance, Risk, and Compliance or GRC. Enterprises need control over, and visibility into, all of the pieces that go into building and delivering software. This is where things like SDLC, separation of duties, and access management come into play. Startups may play it more fast and loose, but more mature organizations frequently have compliance or regulatory obligations like SOC 2 Type II, PCI DSS, FINRA, FedRAMP, and so forth. Even if they don’t have regulatory constraints, they usually have a reputation that needs to be protected, which typically means more rigid processes and internal controls. This is where things can go sideways for larger organizations as it usually leads to practices like change review boards, enterprise (ivory tower) architecture programs, and SAFe. Enterprises tend to be pretty good at governance, but it comes at a cost.

It should come as no surprise that security and governance are in conflict with speed, but they are often in contention with well-architected and maintainable systems as well. When organizations enforce strong governance and security practices, it can often lead to developers following bad practices. Let me give an example I have seen firsthand at an organization.

A company has been experiencing stability and reliability issues with its software systems. This has caused several high-profile, revenue-impacting outages which have gotten executives’ attention. The response is to implement a series of process improvements to effectively slow down the release of changes to production. This includes a change review board to sign-off on changes going to production and a production gating process which new workloads going to production must go through before they can be released. The hope is that these process changes will reduce defects and improve reliability of systems in production. At this point, we are wittingly trading off velocity.

What actually happened is that developers began batching up more and more changes to get through the change review board which resulted in “big bang” releases. This caused even more stability issues because now large sets of changes were being released which were increasingly complex, difficult to QA, and harder to troubleshoot. Rollbacks became difficult to impossible due to the size and complexity of releases, increasing the impact of defects. Release backlogs quickly grew, prompting developers to move on to more work rather than sit idle, which further compounded the issue and led to context switching. Decreasing the frequency of deployments only exacerbated these problems. Counterintuitively, slowing down actually increased risk.

To avoid the production gating process, developers began adding functionality to existing services which, architecturally speaking, should have gone into new services. Services became bloated grab bags of miscellaneous functionality since it was easier to piggyback features onto workloads already in production than it was to run the gauntlet of getting a new service to production. These processes were directly and unwittingly impacting system architecture and maintainability. In economics, this is called a “negative externality.” We may have security and governance, but we’ve traded off velocity and maintainability. Adding insult to injury, the processes were not even accomplishing the original goal of improving reliability, they were making it worse!

Maintainability

It’s critical that software systems are not just built to purpose, but also built to last. This means they need to be reliable, scalable, and evolvable. They need to be conducive to finding and correcting bugs. They need to support changing requirements such that new features and functionality can be delivered rapidly. They need to be efficient and cost effective. More generally, software needs to be built in a way that maximizes its useful life.

We simply call this priority maintainability. While it covers a lot, it can basically be summarized as: is the system architected and implemented well? Is it following best practices? Is there a lot of tech debt? How much thought and care has been put into design and implementation? Much of this comes down to gut feel, but an experienced engineer can usually intuit whether or not a system is maintainable pretty quickly. A good proxy can often be the change fail rate, mean time to recovery, and the lead time for implementing new features.

Maintainability’s benefits are more of a long tail. A maintainable system is easier to extend and add new features later, easier to identify and fix bugs, and generally experiences fewer defects. However, the cost for that speed is basically frontloaded. It usually means moving slower towards the beginning while reaping the rewards later. Conversely, it’s easy to go fast if you’re just hacking something together without much concern for maintainability, but you will likely pay the cost later. Companies can become crippled by tech debt and unmaintained legacy systems to the point of “bankruptcy” in which they are completely stuck. This usually leads to major refactors or rewrites which have their own set of problems.

Additionally, building systems that are both maintainable and secure can be surprisingly difficult, especially in more dynamic cloud environments. If you’ve ever dealt with IAM, for example, you know exactly what I mean. Scoping identities with the right roles or permissions, securely managing credentials and secrets, configuring resources correctly, ensuring proper data protections are in place, etc. Misconfigurations are frequently the cause of the major security breaches you see in the headlines. The unfortunate reality is security practices and tooling lag in the industry, and security is routinely treated as an afterthought. Often it’s a matter of “we’ll get it working and then we’ll come back later and fix up the security stuff,” but later never happens. Instead, an IAM principal is left with overly broad access or a resource is configured improperly. This becomes 10x worse when you are unfamiliar with the cloud, which is where many of our clients tend to find themselves.

Velocity

The last competing priority is simply speed to production or velocity. This one probably requires the least explanation, but it’s consistently the priority that is sacrificed the most. In fact, many organizations may even view it as the enemy of the first two priorities. They might equate moving fast with being reckless. Nonetheless, companies are feeling the pressure to deliver faster now more than ever, but it’s much more than just shipping quickly. It’s about developing the ability to adapt and respond to changing market conditions fast and fluidly. Big companies are constantly on the lookout for smaller, more nimble players who might disrupt their business. This is in part why more and more of these companies are prioritizing the move to cloud. The data center has long been their moat and castle as it relates to security and governance, however, and the cloud presents a new and serious risk for them in this space. As a result, velocity typically pays the price.

As I mentioned earlier, velocity is commonly in tension with maintainability as well, it’s usually just a matter of whether that premium is frontloaded or backloaded. More often than not, we can choose to move quickly up front but pay a penalty later on or vice versa. Truthfully though, if you’ve followed the DORA State of DevOps Reports, you know that a lot of companies neither frontload nor backload their velocity premium—they are just slow all around. These are usually more legacy-minded IT shops and organizations that treat software development as an IT cost center. These are also usually the groups that bias more towards security and governance, but they’re probably the most susceptible to disruption. “Move fast and break things” is not a phrase you will hear permeating these organizations, yet they all desire to modernize and accelerate. We regularly watch these companies’ teams spend months configuring infrastructure, and what they construct is often complex, fragile, and insecure.

Choose Three

Businesses today are demanding strong security and governance, well-structured and maintainable infrastructure, and faster speed to production. The reality, however, is that these three priorities are competing with each other, and companies often end up with one of the priorities dominating the others. If we can acknowledge these trade-offs, we can work to better understand and address them.

We built Konfig as a solution that tackles this head-on by providing an opinionated configuration of Google Cloud Platform and GitLab. Most organizations start from a position where they must assemble the building blocks in a way that allows them to deliver software effectively, but their own biases result in a solution that skews one way or the other. Konfig instead provides a turnkey experience that minimizes time-to-production, is secure by default, and has governance and best practices built in from the start. Rather than having to choose one of security, maintainability, and velocity, don’t compromise—have all three. In a follow-up post I’ll explain how Konfig addresses concerns like security and governance, infrastructure maintainability, and speed to production in a “by default” way. We’ll see how IAM can be securely managed for us, how we can enforce architecture standards and patterns, and how we can enable developers to ship production workloads quickly by providing autonomy with guardrails and stable infrastructure.

Introducing Konfig: GitLab and Google Cloud preconfigured for startups and enterprises

Real Kinetic helps businesses transform how they build and deliver software in the cloud. This encompasses legacy migrations, app modernization, and greenfield development. We work with companies ranging from startups to Fortune 500s and everything in between. Most recently, we finished helping Panera Bread migrate their e-commerce platform to Google Cloud from on-prem and led their transition to GitLab. In doing this type of work over the years, we’ve noticed a problem organizations consistently hit that causes them to stumble with these cloud transformations. Products like GCP, GitLab, and Terraform are quite flexible and capable, but they are sort of like the piles of Legos below.

These products by nature are mostly unopinionated, which means customers need to put the pieces together in a way that works for their unique situation. This makes it difficult to get started, but it’s also difficult to assemble them in a way that works well for 1 team or 100 teams. Startups require a solution that allows them to focus on product development and accelerate delivery, but ideally adhere to best practices that scale with their growth. Larger organizations require something that enables them to transform how they deliver software and innovate, but they need it to address enterprise concerns like security and governance. Yet, when you’re just getting started, you know the least and are in the worst position to make decisions that will have a potentially long-lasting impact. The outcome is companies attempting a cloud migration or app modernization effort fail to even get off the starting blocks.

It’s easy enough to cobble together something that works, but doing it in a way that is actually enterprise-ready, scalable, and secure is not an insignificant undertaking. In fact, it’s quite literally what we have made a business of helping customers do. What’s worse is that this is undifferentiated work. Companies are spending countless engineering hours building and maintaining their own bespoke “cloud assembly line”—or Internal Developer Platform (IDP)—which are all attempting to address the same types of problems. That engineering time would be better spent on things that actually matter to customers and the business.

This is what prompted us to start thinking about solutions. GitLab and GCP don’t offer strong opinions because they address a broad set of customer needs. This creates a need for an opinionated configuration or distribution of these tools. The solution we arrived at is Konfig. The idea is to provide this distribution through what we call “Platform as Code.” Where Infrastructure as Code (IAC) is about configuring the individual resource-level building blocks, Platform as Code is one level higher. It’s something that can assemble these discrete products in a coherent way—almost as if they were natively integrated. The result is a turnkey experience that minimizes time-to-production in a way that will scale, is secure by default, and has best practices built in from the start. A Linux distro delivers a ready-to-use operating system by providing a preconfigured kernel, system library, and application assembly. In the same way, Konfig delivers a ready-to-use platform for shipping software by providing a preconfigured source control, CI/CD, and cloud provider assembly. Whether it’s legacy migration, modernization, or greenfield, Konfig provides your packaged onramp to GCP and GitLab.

Platform as Code

Central to Konfig is the notion of a Platform. In this context, a Platform is a way to segment or group parts of a business. This might be different product lines, business units, or verticals. How these Platforms are scoped and how many there are is different for every organization and depends on how the business is structured. A small company or startup might consist of a single Platform. A large organization might have dozens or more.

A Platform is then further subdivided into Domains, a concept we borrow from Domain-Driven Design. A Domain is a bounded context which encompasses the business logic, rules, and processes for a particular area or problem space. Simply put, it’s a way to logically group related services and workloads that make up a larger system. For example, a business providing online retail might have an E-commerce Platform with the following Domains: Product Catalog, Customer Management, Order Management, Payment Processing, and Fulfillment. Each of these domains might contain on the order of 5 to 10 services.

This structure provides a convenient and natural way for us to map access management and governance onto our infrastructure and workloads because it is modeled after the organization structure itself. Teams can have ownership or elevated access within their respective Domains. We can also specify which cloud services and APIs are available at the Platform level and further restrict them at the Domain level where necessary. This hierarchy facilitates a powerful way to enforce enterprise standards for a large organization while allowing for a high degree of flexibility and autonomy for a small organization. Basically, it allows for governance when you need it (and autonomy when you don’t). This is particularly valuable for organizations with regulatory or compliance requirements, but it’s equally valuable for companies wanting to enforce a “golden path”—that is, an opinionated and supported way of building something within your organization. Finally, Domains provide clear cost visibility because cloud resources are grouped into Domain projects. This makes it easy to see what “Fulfillment” costs versus “Payment Processing” in our E-commerce Platform, for example.

“Platform as Code” means these abstractions are modeled declaratively in YAML configuration and managed via GitOps. The definitions of Platforms and Domains consist of a small amount of metadata, shown below, but that small amount of metadata ends up doing a lot of heavy lifting in the background.

apiVersion: konfig.realkinetic.com/v1beta1
kind: Platform
metadata:
  name: ecommerce-platform
  namespace: konfig-control-plane
  labels:
    konfig.realkinetic.com/control-plane: konfig-control-plane
spec:
  platformName: Ecommerce Platform
  gitlab:
    parentGroupId: 82224252
  gcp:
    billingAccountId: "123ABC-456DEF-789GHI"
    parentFolderId: "1080778227704"
    defaultEnvs:
      - dev
      - stage
      - prod
    services:
      defaults:
        - cloud-run
        - cloud-sql
        - cloud-storage
        - secret-manager
        - cloud-kms
        - pubsub
        - redis
        - firestore
    api:
      path: /ecommerce

platform.yaml

apiVersion: konfig.realkinetic.com/v1beta1
kind: Domain
metadata:
  name: payment-processing
  namespace: konfig-control-plane
  labels:
    konfig.realkinetic.com/platform: ecommerce-platform
spec:
  domainName: Payment Processing
  gcp:
    services:
      disabled:
        - pubsub
        - redis
        - firestore
    api:
      path: /payment
  groups:
    dev: [payment-devs@example.com]
    maintainer: [payment-maintainers@example.com]
    owner: [gitlab-owners@example.com]

domain.yaml

The Control Plane

Platforms, Domains, and all of the resources contained within them are managed by the Konfig control plane. The control plane consumes these YAML definitions and does whatever is needed in GitLab and GCP to make the “real world” reflect the desired state specified in the configuration.

The control plane manages the structure of groups and projects in GitLab and synchronizes this structure with GCP. This includes a number of other resources behind the scenes as well: configuring OpenID Connect to allow GitLab pipelines to authenticate with GCP, IAM resources like service accounts and role bindings, managing SAML group links to sync user permissions between GCP and GitLab, and enabling service APIs on the cloud projects. The Platform/Domain model allows the control plane to specify fine-grained permissions and scope access to only the things that need it. In fact, there are no credentials exposed to developers at all. It also allows us to manage what cloud services are available to developers and what level of access they have across the different environments. This governance is managed centrally but federated across both GitLab and GCP.

The net result is a configuration- and standards-driven foundation for your cloud development platform that spans your source control, CI/CD, and cloud provider environments. This foundation provides a golden path that makes it easy for developers to build and deliver software while meeting an organization’s internal controls, standards, or regulatory requirements. Now we’re ready to start delivering workloads to our enterprise cloud environment.

Managing Workloads and Infrastructure

The Konfig control plane establishes an enterprise cloud environment in which we could use traditional IAC tools such as Terraform to manage our application infrastructure. However, the control plane is capable of much more than just managing the foundation. It can also manage the workloads that get deployed to this cloud environment. This is because Konfig actually consists of two components: Konfig Platform, which configures and manages our cloud platform comprising GitLab and GCP, and Konfig Workloads, which configures and manages application workloads and their respective infrastructure resources.

Using the Lego analogy, think of Konfig Platform as providing a pre-built factory and Konfig Workloads as providing pre-built assembly lines within the factory. You can use both in combination to get a complete, turnkey experience or just use Konfig Platform and “bring your own assembly line” such as Terraform.

Konfig Workloads provides an IAC alternative to Terraform where resources are managed by the control plane. Similar to how the platform-level components like GitLab and GCP are managed, this works by using an operator that runs in the control plane cluster. This operator runs on a control loop which is constantly comparing the desired state of the system with the current state and performs whatever actions are necessary to reconcile the two. A simple example of this is the thermostat in your house. You set the temperature—the desired state—and the thermostat works to bring the actual room temperature—the current state—closer to the desired state by turning your furnace or air conditioner on and off. This model removes potential for state drift, where the actual state diverges from the configured state, which can be a major headache with tools like Terraform where state is managed with backends.

The Konfig UI provides a visual representation of the state of your system. This is useful for getting a quick understanding of a particular Platform, Domain, or workload versus reading through YAML that could be scattered across multiple files or repos (and which may not even be representative of what’s actually running in your environment). With this UI, we can easily see what resources a workload has configured and can access, the state of these resources (whether they are ready, still provisioning, or in an error state), and how the workload is configured across different environments. We can even use the UI itself to provision new resources like a database or storage bucket that are scoped automatically to the workload. This works by generating a merge request in GitLab with the desired changes, so while we can use the UI to configure resources, everything is still managed declaratively through IAC and GitOps. This is something we call “Visual IAC.”

Your Packaged Onramp to GCP and GitLab

The current cloud landscape offers powerful tools, but assembling them efficiently, securely, and at scale remains a challenge. This “undifferentiated work” consumes valuable engineering resources that could be better spent on core business needs, and it often prevents organizations from even getting off the starting line when beginning their cloud journey. Konfig, built around the principles of Platform as Code and standards-driven development, addresses this very gap. We built it to help our clients move quicker through operationalizing the cloud so that they can focus on delivering business value to their customers. Whether you’re migrating to the cloud, modernizing, or starting from scratch, Konfig provides a preconfigured and opinionated integration of GitLab, GCP, and Infrastructure as Code which gives you:

  • Faster time-to-production: Streamlined setup minimizes infrastructure headaches and allows developers to focus on building and delivering software.
  • Enterprise-grade security: Built-in security best practices and fine-grained access controls ensure your cloud environment remains secure.
  • Governance: Platforms and Domains provide a flexible model that balances enterprise standards with team autonomy.
  • Scalability: Designed to scale with your business, easily accommodating growth without compromising performance or efficiency.
  • Great developer UX: Designed to provide a great user experience for developers shipping applications and services.

Konfig functions like an operating system for your development organization to deliver software to the cloud. It’s an opinionated IDP specializing in cloud migrations and app modernization. This allows you to focus on what truly matters—building innovative software products and delivering exceptional customer experiences.

We’ve been leveraging these patterns and tools for years to help clients ship with confidence, and we’re excited to finally offer a solution that packages them up. Please reach out if you’d like to learn more and see a demo. If you’re undertaking a modernization or cloud migration effort, we want to help make it a success. We’re looking for a few organizations to partner with to develop Konfig into a robust solution.