Creating an Autoscaling EKS Cluster using AWS Spot Instances

...that also pulls secrets automatically from AWS Parameter Store or Secrets Manager, automatically creates ALBs and registers DNS entries, and last but not least uses IRSA to grant AWS access to pods. This post is going to use Terraform and Helm, and will assume a working knowledge of AWS, Kubernetes, Terraform, and Helm principles. Whew. Let's get started.


The VPC

First, you're going to need a VPC to put the EKS cluster and everything you build in. We're going to use the terraform-aws-modules/vpc/aws module to help us stand up the cluster, because it applies sane/safe defaults and there's no reason to repeat code that has already been written and battle tested. I've chosen here to allocate the largest IP address space available because it does not change the cost but allows our cluster to grow as much as we need it to.

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 2"

  name = terraform.workspace
  cidr = "10.0.0.0/16"

  azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
  # "10.0.0.0/18", "10.0.64.0/18", "10.0.128.0/18" are the CIDR blocks dedicated to each AZ
  # Unallocated CIDR blocks are spares.
  # Each subnet is a divided piece of the AZs allocated CIDR block
  private_subnets     = ["10.0.0.0/21", "10.0.64.0/21", "10.0.128.0/21"]
  public_subnets      = ["10.0.8.0/21", "10.0.72.0/21", "10.0.136.0/21"]
  database_subnets    = ["10.0.16.0/21", "10.0.80.0/21", "10.0.144.0/21"]
  elasticache_subnets = ["10.0.24.0/21", "10.0.88.0/21", "10.0.152.0/21"]
  redshift_subnets    = ["10.0.32.0/21", "10.0.96.0/21", "10.0.160.0/21"]
  intra_subnets       = ["10.0.40.0/21", "10.0.104.0/21", "10.0.168.0/21"]

  enable_nat_gateway   = true
  single_nat_gateway   = true
  enable_dns_hostnames = true

  public_subnet_tags = {
    "kubernetes.io/cluster/${terraform.workspace}" = "shared"
    "kubernetes.io/role/elb"                       = "1"
  }

  private_subnet_tags = {
    "kubernetes.io/cluster/${terraform.workspace}" = "shared"
    "kubernetes.io/role/internal-elb"              = "1"
  }

  tags = {
    env                                            = terraform.workspace
    cost_center                                    = "devops"
    "kubernetes.io/cluster/${terraform.workspace}" = "shared"
  }
}

The EKS Cluster

Now that we have our VPC, let's create an EKS cluster within the VPC again using a public Terraform module from terraform-aws-modules/eks/aws to help us apply sane defaults.

module "eks" {
  source = "terraform-aws-modules/eks/aws"

  cluster_name = terraform.workspace
  vpc_id       = module.vpc.vpc_id
  subnets      = concat(
  	module.vpc.private_subnets,
    module.vpc.public_subnets,
    module.vpc.database_subnets,
    module.vpc.elasticache_subnets,
    module.vpc.redshift_subnets,
    module.vpc.intra_subnets
  )


  worker_groups_launch_template = [
    {
      name                    = "eks-spot-${terraform.workspace}"
      override_instance_types = ["t3.medium", "t3.large"]
      spot_instance_pools     = 2 // how many spot pools per az, len matches instances types len
      asg_max_size            = 5
      kubelet_extra_args      = "--node-labels=kubernetes.io/lifecycle=spot"
      public_ip               = true
      autoscaling_enabled     = true
      protect_from_scale_in   = true
    },
  ]

  map_accounts = []
  map_roles    = []
  map_users    = []

  tags = {
    env         = terraform.workspace
    cost_center = "devops"
  }
}

This will create an EKS cluster that uses t3.medium and t3.large spot instances to populate the node pool, so that if AWS raises the cost for one instance type or reclaims a node the cluster can use the other to cover the load. In production, I'd recommend using three or more instance types of either c5 or m5 class instances.

I mentioned above that we would be using IRSA (IAM Roles for Service Accounts) to grant our pods access to AWS resources. In order to do so, we need to set up an OIDC (OpenID Connect Provider) that uses our Kubernetes OIDC issuer URL to identify the service accounts making calls to it. This might not make sense now, but will later. Here's the Terraform we're going to use to setup the OIDC connection:

data "external" "thumbprint" {
  program = ["bash", "${path.module}/helpers/thumbprint.sh", data.aws_region.current.name]
}

resource "aws_iam_openid_connect_provider" "eks" {
  url = module.eks.cluster_oidc_issuer_url

  client_id_list = ["sts.amazonaws.com"]

  thumbprint_list = [data.external.thumbprint.result.thumbprint]
}

You might have noticed that we are using a bash script to thumbprint the EKS OIDC server for our AWS region, which is unfortunately the best way to make this code Terraform friendly and handle changes to the fingerprint of the EKS OIDC server. Here's the script being run:

#!/bin/bash
# Sourced from https://github.com/terraform-providers/terraform-provider-aws/issues/10104

THUMBPRINT=$(echo | openssl s_client -servername oidc.eks.$1.amazonaws.com -showcerts -connect oidc.eks.$1.amazonaws.com:443 2>&- | tail -r | sed -n '/-----END CERTIFICATE-----/,/-----BEGIN CERTIFICATE-----/p; /-----BEGIN CERTIFICATE-----/q' | tail -r | openssl x509 -fingerprint -noout | sed 's/://g' | awk -F= '{print tolower($2)}')
THUMBPRINT_JSON="{\"thumbprint\": \"${THUMBPRINT}\"}"
echo $THUMBPRINT_JSON

This uses openssl to connect to the OIDC server for your region and fingerprint the certificate it presents which is required for creating the OIDC Connect Provider in AWS.

At this point, you should have a VPC with an EKS cluster that is running using AWS Spot Instances.


Cluster Autoscaling

When running an EKS cluster, it's very popular to also run the cluster-autoscaler service within that cluster. This service will automatically detect and shut down underutilized nodes to save cost, but when you have Pending pods it will add nodes to the cluster in order to allow all of your Pending pods to schedule. We're going to use the helm chart for cluster-autoscaler which can be found here. This is pretty easy to set up, here's the values.yaml file used to configure the deployment, followed by the bash used to deploy it into the devops namespace in the EKS cluster above.

rbac:
  create: true

cloudProvider: aws
awsRegion: us-east-1

autoDiscovery:
  clusterName: prod
  enabled: true
values.yaml
helm upgrade -i -n devops -f values.yaml cluster-autoscaler stable/cluster-autoscaler

Roles and permissions for these actions are automatically handled by the Terraform EKS module we used above, which is detailed here. When using cluster-autoscaler, it is very important to require all pods to have resource requests and limits in order to prevent resource starvation in your cluster. Without resource requests and limits, cluster-autoscaler and Kubernetes will not know how to allocate and provision resources.


ALB Ingress Controller

The ALB Ingress Controller allows ALBs to be automatically created and pointed at Kubernetes Ingresses. We're going to use a Helm chart for this as well, which can be found here. In order for the alb-ingress-controller service to work, we are going to turn to Terraform to create a new IAM role that can be attached to the service account created by the Helm chart. Here's that (lengthy) Terraform:

data "aws_iam_policy_document" "alb_ingress_controller" {
  statement {
    sid    = "AllowACMGets"
    effect = "Allow"
    actions = [
      "acm:DescribeCertificate",
      "acm:ListCertificates",
      "acm:GetCertificate"
    ]
    resources = ["*"]
  }
  statement {
    sid    = "AllowEC2"
    effect = "Allow"
    actions = [
      "ec2:AuthorizeSecurityGroupIngress",
      "ec2:CreateSecurityGroup",
      "ec2:CreateTags",
      "ec2:DeleteTags",
      "ec2:DeleteSecurityGroup",
      "ec2:DescribeAccountAttributes",
      "ec2:DescribeAddresses",
      "ec2:DescribeInstances",
      "ec2:DescribeInstanceStatus",
      "ec2:DescribeInternetGateways",
      "ec2:DescribeNetworkInterfaces",
      "ec2:DescribeSecurityGroups",
      "ec2:DescribeSubnets",
      "ec2:DescribeTags",
      "ec2:DescribeVpcs",
      "ec2:ModifyInstanceAttribute",
      "ec2:ModifyNetworkInterfaceAttribute",
      "ec2:RevokeSecurityGroupIngress"
    ]
    resources = ["*"]
  }
  statement {
    sid    = "AllowELB"
    effect = "Allow"
    actions = [
      "elasticloadbalancing:*",
    ]
    resources = ["*"]
  }
  statement {
    sid    = "AllowIAM"
    effect = "Allow"
    actions = [
      "iam:CreateServiceLinkedRole",
      "iam:GetServerCertificate",
      "iam:ListServerCertificates"
    ]
    resources = ["*"]
  }
  statement {
    sid    = "AllowCognito"
    effect = "Allow"
    actions = [
      "cognito-idp:DescribeUserPoolClient"
    ]
    resources = ["*"]
  }
  statement {
    sid    = "AllowWAF"
    effect = "Allow"
    actions = [
      "waf-regional:GetWebACLForResource",
      "waf-regional:GetWebACL",
      "waf-regional:AssociateWebACL",
      "waf-regional:DisassociateWebACL",
      "waf:GetWebACL"
    ]
    resources = ["*"]
  }
  statement {
    sid    = "AllowTag"
    effect = "Allow"
    actions = [
      "tag:GetResources",
      "tag:TagResources"
    ]
    resources = ["*"]
  }
}

resource "aws_iam_policy" "alb_ingress_controller" {
  name   = "alb-ingress-controller-ps-${terraform.workspace}"
  policy = data.aws_iam_policy_document.alb_ingress_controller.json
}

resource "aws_iam_role" "alb_ingress_controller" {
  name = "alb-ingress-controller-${terraform.workspace}"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
          "Federated": "${aws_iam_openid_connect_provider.eks.arn}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${aws_iam_openid_connect_provider.eks.url}:sub": "system:serviceaccount:devops:alb-ingress-controller-aws-alb-ingress-controller"
        }
      }
    }
  ]
}
EOF
}

resource "aws_iam_role_policy_attachment" "alb_ingress_controller" {
  role       = "${aws_iam_role.alb_ingress_controller.name}"
  policy_arn = "${aws_iam_policy.alb_ingress_controller.arn}"
}

Now that we have the IAM Role created, let's grab the ARN and put it into our values.yaml file for alb-ingress-conroller before we add the service to our cluster using Helm.

clusterName: prod
autoDiscoverAwsRegion: true
autoDiscoverAwsVpcID: true

rbac:
  serviceAccountAnnotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<ACCOUNT_ID>:role/alb-ingress-controller-prod
values.yaml
helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
helm upgrade -i -n devops -f values.yaml alb-ingress-controller incubator/aws-alb-ingress-controller

We can now create ingresses and annotate them as follows to have this ALB Ingress Controller automatically create ALBs that point to our services.

  annotations:
    kubernetes.io/ingress.class: "alb"
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/tags: env=prod,cost_center=api
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:<ACCOUNT_ID>:certificate/<CERT_ID>
    alb.ingress.kubernetes.io/healthcheck-path: /v1/health

External Secrets

GoDaddy has released a custom Kubernetes resource called external-secrets that can be used to sync secret values from multiple secret providers like AWS Secret Manager, AWS Parameter Store, Hashicorp Vault, and more. You can find documentation for it here. We're going to integrate it with AWS Parameter Store today, and use IRSA to grant permissions to it. What is amazing about this setup is that zero secrets need to be put into Terraform or your Helm values.yaml files, and you can use Kubernetes RBAC to prevent users from reading/writing to Kubernetes secrets.

This Terraform code creates an IAM Policy and Role that we can then annotate our service account with to grant these permissions to any pod that uses that service account.

data "aws_iam_policy_document" "external_secrets" {
  statement {
    sid    = "AllowParameterStoreGets"
    effect = "Allow"
    actions = [
      "ssm:GetParameter",
    ]
    # We should restrict to something like "/${terrform.workspace}/*" ideally
    # and if using per-team namespaces the namespace could be added as well
    # to ensure applications can only access secrets they are supposed to
    resources = ["*"]
  }
}

resource "aws_iam_policy" "external_secrets" {
  name   = "external-secrets-ps-${terraform.workspace}"
  policy = data.aws_iam_policy_document.external_secrets.json
}

resource "aws_iam_role" "external_secrets" {
  name = "external-secrets-${terraform.workspace}"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
          "Federated": "${aws_iam_openid_connect_provider.eks.arn}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${aws_iam_openid_connect_provider.eks.url}:sub": "system:serviceaccount:devops:external-secrets"
        }
      }
    }
  ]
}
EOF
}

resource "aws_iam_role_policy_attachment" "external_secrets_ecr" {
  role       = "${aws_iam_role.external_secrets.name}"
  policy_arn = "${aws_iam_policy.external_secrets.arn}"
}

Now that we have our role created, you'll need to know your AWS Account ID to code into the values.yaml file but the Account ID is not considered a sensitive value by AWS.

rbac:
  create: true

env:
  AWS_REGION: us-east-1

securityContext:
  fsGroup: 65534

image:
  tag: latest

serviceAccount:
  name: external-secrets
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<ACCOUNT_ID>:role/external-secrets-prod
values.yaml
helm repo add external-secrets https://godaddy.github.io/kubernetes-external-secrets/
helm upgrade -i -n devops -f values.yaml external-secrets external-secrets/kubernetes-external-secrets

Now that external-secrets is running, we can start creating services that use it to fetch credentials from AWS Parameter Store.


External DNS

Now we're going to put together the alb-ingress-controller and external-secrets services using External DNS to automatically create DNS records for all of our ALBs that are automatically created for our ingresses. We are going to use Cloudflare for our DNS, but external-dns works for all major DNS providers that offer an API for managing DNS.

The first order of business is to put our Cloudflare API Token into AWS Parameter Store using the path /cloudflare/api_token. Once this manual step is done, we're going to use external-secrets to create a Kubernetes secret with the API Token.

apiVersion: 'kubernetes-client.io/v1'
kind: ExternalSecret
metadata:
  name: external-dns
  namespace: devops
secretDescriptor:
  backendType: systemManager
  data:
    - key: /cloudflare/api_token
      name: cloudflare_api_token

You'll want to apply this yaml using kubectl, after which external-secrets will very quickly create a secret using the values specified above. Now that this secret has been created, we can install the external-dns Helm chart linked above.

provider: cloudflare

cloudflare:
  secretName: external-dns
values.yaml
helm upgrade -i -n devops -f values.yaml external-dns stable/external-dns

The external-dns service will look at all of your ingresses and automatically register DNS records for every host that is specified. A full example ingress would look like this:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: "alb"
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/tags: env=prod,cost_center=api
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:<ACCOUNT_ID>:certificate/<CERT_ID>
    alb.ingress.kubernetes.io/healthcheck-path: /v1/health
  labels:
    app: "api-web"
  name: "api-web"
  namespace: "apps"
spec:
  rules:
    - host: api.example.com
      http:
        paths:
        - backend:
            serviceName: api-web
            servicePort: 80
          path: /*

Closing

If you've made it this far, thanks for reading and I hope this was helpful. You should now have an EKS Cluster running in a VPC that uses cluster-autoscaler, alb-ingress-controller, external-secrets, and external-dns to create a self-service Kubernetes cluster for you and/or the engineers in your company.