Provisioning our Kubernetes clusters

We use Kubernetes as the infrastructure for our backend services. If you run Kubernetes in production, then you almost certainly run one or more other Kubernetes clusters to test/stage changes. Tidepool is no different.

As of this writing, we run five Kubernetes clusters in Amazon EKS. How do we provision each cluster so that the results of testing on one cluster can be used to justify making changes to the production cluster? We need a way of provisioning that makes clear both the differences in the clusters and their commonalities. For Kubenetes, that consists of a set of resources.

Declarative infrastructure with Kubernetes

To deploy services to Kubernetes, one declares to Kubernetes the desired state. Via a set of controller processes, Kubernetes modifies the current state to achieve the desired state.

Kubernetes provides an API to manipulate a set of resources and a CLI command called kubectl that uses that API. One can apply a resource that is represented as a YAML file to a Kubernetes cluster using the kubectl apply command. This adds a resource to — or replaces an existing resource of the same name in — the cluster. One may delete a resource using the kubectl delete command.

Alternatively, one may use the Kubernetes API directly to manipulate the set of resources in the Kubernetes cluster. Applications that use the Kubernetes API directly are called cloud native applications.

The need for abstraction

Using either the Kubernetes API or the kubectl command, one can manipulate the resources in a Kubernetes cluster. This presupposes that the resources exist. How are these resources created?

One may craft Kubernetes resources as YAML files using a simple text editor. We call those YAML files “manifests”. This is adequate for simple examples, such as those found in the Kubernetes documentation.

However, to deploy the infrastructure needed to develop, test, and run a software service such as Tidepool, one needs multiple resources and multiple clusters. Crafting manifests by hand is error-prone and time-consuming. Instead, one looks to identify common patterns and create abstractions that can be used to generate manifests automatically.

Manifest generation

Since Kubernetes resources may be represented textually as YAML files, any tool that can generate a manifest can be used to generate Kubernetes resources. Any programming language that can generate text output can, therefore, be used to generate Kubernetes resources.

Every modern programming language has multiple packages for generating text from templates.

The Go template engine is popular. It is used to generate generic text data, so it is possible to generate syntactically flawed YAML data.
Jsonnet is a new language that is specifically designed for generating JSON data.

In addition to generic templating tools, there are also custom tools that are more aware of the target language.

Helm augments the Go template engine with the Sprig library to generate manifests.
kubecfg, ksonnet, box, and KPM all use Jsonnet to generate manifests.
Kustomize applies a sequence of transformations to valid YAML files. Kustomize is built into kubectl.
Pulumi provides libraries of code for use in standard programming languages such as Typescript, Python, and (soon) Go to assist in the generation of Kubernetes resources.
Terraform provides a domain-specific language to express infrastructure, including Kubernetes resources.

Change management

If the resources used in a Kubernetes cluster were largely static, then kubectl apply would be sufficient. However, the resources in a Kubernetes cluster change as the needs of the users of the services change. For a software team such as Tidepool that is producing new versions of custom services each day, we need to change the Kubernetes resources just as frequently.

Let us look at the source of changes.

The Development team may create a new version of a service under development.
The QA team may be asked to test a new version of an internally developed service.
The DevOps team may discover a new third party service or a new version of an existing service to update.

While the source of these changes is a person, there are corresponding events that we can use to trigger the deployment of new service or updates:

Publication of a new Docker image to a container registry.
Publication of a new Helm chart in a helm chart registry.
Publication to a Git repository of new parameters for an existing service.

If we use Kubernetes manifests (as opposed to the Kubernetes API), then these events signal to us the need to:

Modify a manifest.
Apply the manifest to (one or more) Kubernetes clusters.

We seek a system that automates the process of modifying manifests and applying them to our Kubernetes clusters.

Staging store

Kubernetes stores state, including resources, in etcd. When we apply (delete) a resource to a Kubernetes cluster, we are effectively adding (deleting) that resource in (from) etcd. Kubernetes controllers listen for changes to resources and modify the state of the cluster accordingly.

In this way, Kubernetes does via controllers what was done by Ansible, Puppet, Salt, and Terraform via external processes or via ssh commands: It modifies the system state to match the desired state. Each of these predecessors stores a desired state in a state store.

However, while individual resources in the Kubernetes are versioned, the Kubernetes state store as a whole is not versioned.

Reconciliation

Manifests are named with a specific key. When a manifest of that key is applied to a cluster, etcd replaces the resource of the same name. If no such resource exists of that key, then a new resource is created.

Kubernetes controllers reconcile the difference between the old resources and the new, desired resources. Some changes are simple, such as applying a new label to a resource. These changes are not destructive. Others, such as changing the selector for a Deployment resource are not allowed. In most cases, the Kubernetes will determine a legal migration from the old state to the desired state. We call this reconciliation.

Tools such as Ansible, Salt, Puppet, and Terraform have their own reconciliation algorithms. With Kubernetes, this is almost entirely performed by the controllers.

Tools

We have mentioned several tools used with Kubernetes. We have also identified several dimensions that differentiate them. Let us summarize the features of a few of the most popular tools:

Helm

Helm is one of the earliest and most commonly used tools to aggregate, customize, and apply a set of resources to a Kubernetes cluster.

Helm leverages the notion of a template that can be applied to create a YAML file that represents a Kubernetes resource. Internally, Helm uses the Go templating engine that is designed for templating raw text. One provides a set of key/value pairs to Helm via the command line or via a simple YAML file. Helm uses that to instantiate all templates in a given directory.

Helm understands the Kubernetes API and recognizes the standard Kubernetes resource types. So, Helm can apply the generated resources to a Kubernetes cluster.

To delete resources, Helm maintains a record of the resources that it has applied to a Kubernetes cluster. The latest major release of Helm, Helm 3, maintains that state in Kubernetes Secrets.

Terraform

Terraform is one of the earlier declarative infrastructure tools. It provides a domain-specific language for specifying and templating configuration. Terraform stores state in either your local file system or in a service provided by Hashicorp. To make a change to infrastructure you rerun Terraform, which then computes what changes to make to your infrastructure.

Pulumi

Pulumi is similar to Terraform. However, instead of providing a domain-specific language, Pulumi provides a set of libraries in your programming language of choice to use to declare the desired infrastructure state.

Kubecfg

Kubecfg is a simple CLI that creates Kubernetes manifest from Jsonnet templates and (optionally) applies those manifests to a given Kubernetes cluster.

Flux

Kubernetes state is stored in etcd. Each element of state is versioned, but the overall state is not.

To address this, Weaveworks had the brilliant idea of leveraging Git, the most popular version control system, as a versioned Kubernetes staging store. Weaveworks built a controller, Flux, that runs in Kubernetes that uses Git to version the desired state of the whole cluster. When new commits are made to a configuration repository in Git, Flux applies those changes to the Kubernetes cluster by writing them to etcd. They call this approach, GitOps.

Flux watches the configuration repository for the K8s resources to apply to a cluster. Flux also watches a container registry for the publication of new versions of a Docker image. When it finds the latter, it updates the manifests in the Git repository.

By listening to events, Flux enables a clean split between the CI system which creates new artifacts and the CD system which pulls those artifacts. By introducing Git as a desired state store, Flux decouples the generation of desired state from the application of that state in a Kubernetes cluster.

Helm Operator

Weaveworks also provides a tool called the Helm Operator to allow one to instantiate helm charts from within a Kubernetes cluster. The helm operator defines a single Kubernetes Custom Resource, the HelmRelease. This resource identifies a helm chart to instantiate and provides the template values to use.

The Helm Operator uses the Helm tool to reconcile state changes. Because it uses Helm, it uses Kubernetes secrets to store state.

Our solution

We have chosen to embrace GitOps and use Flux and the Helm Operator to keep our Kubernetes clusters up-to-date.

This means that we maintain our staging state in a Git repository. Flux and the Helm Operator allow us to support raw manifests and the HelmRelease manifest to evaluate helm charts. We need to generate the manifests to place in that repository.

Most services need to be customized for their environment. This could be as simple as the name of the cluster or the namespace to create a manifest in. Since we run multiple clusters at Tidepool, we naturally would want to parameterize our manifests by these values.

However, some services such as the Sumo Logic log exporter have cross-cutting concerns. With Sumo Logic, we can identify namespaces or pods to exclude from log collection. Others such as the Gloo API Gateway are more complex to configure. We need a templating solution.

Upcoming

In our next blog, we introduce tpctl, the templating tool that we use to configure a Kubernetes cluster.