Designing and implementing tpctl

In our most recent engineering blog post, we identified the need for a tool to generate our Kubernetes manifests. In this post, we will describe the design and implementation of such a tool.

To address the current gap in the Kubernetes CD tooling, we created a simple tool called tpctl, which consists of a bash script and a number of (> 130) jsonnet files. tpctl generates the manifest files and commits them to a Git configuration repository.

We chose jsonnet because it is specifically designed to generate JSON files. However, since jsonnet is a pure, functional language, it must be accompanied by a program that can interact with and modify the external state. We chose bash for that language. (N.B. Python 3 could also serve that role well.)

The input to tpctl is a single values.yaml file that indicates which packages (collection of interrelated templates) to instantiate, which specific values to use when instantiating the jsonnet templates, and common global values that may be used as well.

The file has a simple structure. We follow the Kubernetes custom resource format and store in each file a single custom resource:

apiVersion: tidepool.org/v1alpha1
kind: Config
metadata:
  name: cluster-dev

The first part is a description of our AWS account and the Amazon IAM users are to be afforded system:master privileges on the cluster:

aws:
  accountNumber: 123456789012
  iamUsers:
  - derrick

The second part of the file describes the Kubernetes cluster itself and what Amazon CloudWatch services to enable:

cluster:
  cloudWatch:
    clusterLogging:
      enableTypes:
      - authenticator
      - api
      - controllerManager
      - scheduler
  metadata:
    rootDomain: tidepool.org
    domain: production.tidepool.org
    name: production
    region: us-west-2
    version: 1.14.7
  nodeGroups:
  - desiredCapacity: 3
    instanceType: c5.xlarge
    maxSize: 7
    minSize: 2
    name: ngd
  vpc:
    cidr: 10.44.0.0/16

The third section of the file describes the contents of the different Kubernetes namespaces that we run. Within each namespace object, one may configure aspects of the namespace itself or for the packages to run in that namespace. In this example, we see the declaration of the kube-system namespace along with a set of labels to apply to it:

namespaces:
  kube-system:
    namespace:
      enabled: true
      labels:
        'config.linkerd.io/admission-webhooks': disabled

Each package describes a set of Kubernetes resources to install in that namespace.

Of particular note is the tidepool package that describes a Tidepool environment. Each environment is parameterized by the DNS names that it serves, the S3 buckets it uses to store blob and image data, which images to track for GitOps, which helm chart to use, and other data.

Here is an example that configures the qa1 namespace to run a tidepool service:

namespaces:
  qa1:
    namespace:
      enabled: true
    tidepool:
      chart:
        version: 0.4.9
      dnsNames:
      - qa1.development.tidepool.org
      enabled: true

In addition to the tidepool package, we can install over 45 other packages, including:

cadvisor — Analyzes resource usage and performance characteristics of running containers.
cert-manager — Automatically provisions and manage TLS certificates.
cloudwatch-agent — Agent to collect system-level metrics from Amazon EC2 instances.
cluster autoscaler — Agent that automatically adjusts the size of the Kubernetes cluster.
datadog agent — A collection agent for Datadog hosted infrastructure monitoring platform.
elasticsearch — Distributed, RESTful search, and analytics engine.
elastic-operator — Deploys, secures, and upgrades Elasticsearch clusters.
external-dns — Configures external DNS servers.
flux — GitOps operator.
gloo — API Gateway.
gloo enterprise — API gateway with authentication and rate-limiting (not open source).
grafana — Analytics and monitoring solution for every database.
helm-operator — Operator that declaratively manages Helm chart releases.
jaeger — Distributed tracing.
kube-state-metrics — Add-on agent to generate and expose cluster-level metrics.
linkerd — Ultralight service mesh.
metrics-server — Cluster-wide aggregator of resource usage data.
open census collector — Receives trace spans and metrics emitted by supported services, and can be added to custom services.
pomerium — Identity-aware access proxy.
prometheus — Monitoring system and time series database.
prometheus operator — Creates/configures/manages Prometheus clusters.
reloader — Controller that watches changes in ConfigMap and Secrets and performs rolling upgrades on Pods.
scope — Real-time interactive display of processes, containers, and hosts status.
sumologic agent — Agent to Sumologic hosted logging service.
thanos — Highly available Prometheus setup with long term storage capabilities.
velero — Backup and migrate Kubernetes clusters and persistent volumes.

The format is simple. For each package, we indicate whether it is enabled or not and a set of package-specific configuration values.

Here is a simple example that installs the external-dns package in the default namespace:

namespaces:
  default:
    namespace:
      enabled: true
    external-dns:
      enabled: true

Here is a more complex example that configures the gloo-system namespace with the various HTTP services available in Gloo enterprise for single sign-on access using pomerium:

namespaces:
  gloo-system:
    gloo-ee:
      enabled: true
      sso:
      - serviceName: apiserver-ui
        externalName: apiserver
      - port: 19000
        serviceName: gateway-proxy
        externalName: envoy-admin
      - port: 80
        serviceName: glooe-grafana
        externalName: glooe-monitoring
      - externalName: glooe-metrics
        serviceName: glooe-prometheus-server
        port: 80

Lastly, we have a general section that consists of a description of the location of the GitHub configuration repo, the email address of the maintainer, the sops encryption keys, and the default log level to use for services:

general:
 email: derrick@tidepool.org
 github:
   git: git@github.com:tidepool-org/cluster-qa1
   https: https://github.com/tidepool-org/cluster-qa1
 kubeconfig: $HOME/.kube/config
 logLevel: debug
 sops:
   keys:
     pgp: CDE5317D7CCA7B80294FB32721A60B1450343446
 sso:
   allowed_groups:
   - eng@tidepool.org

From these values, we are able to generate all Kubernetes manifests needed to run the named services and Tidepool environments, except for Kubernetes Secrets and Configmaps, which are provided in separate top-level directories. Secrets are encoded via sops using the keys listed, as discussed in a prior blog post.

To add a new package, we add JSON or jsonnet template files to the Git repo that contains all such templates under a subdirectory named for the package. For each enabled package, the shell script evaluates each template file with the entire values.yaml file encoded as a single JSON object. In this way information about cross-cutting concerns, such as whether Prometheus is installed, is shared.

Tpctl is idempotent. Each time it is run, it regenerates all Kubernetes resources (except Secrets and Configmaps). In the Kubernetes way, the values.yaml declares the intent and tpctl evaluates that declaration to produce Kubernetes declarations.

Tpctl has been extremely valuable in keeping our separate clusters consistent.

Tpctl is under very active development.

Upcoming

In our next blog post, we will discuss telemetry, i.e. how we monitor our Tidepool services.

Designing and implementing tpctl

Upcoming

Stay in the Tidepool Loop

Support our work

Stay in the Tidepool loop