Working with multiple AWS EKS instances

I’ve recently been working on a project that uses AWS EKS managed Kubernetes Service.

For various reasons too complicated to go into here we’ve ended up with multiple clusters owned by different AWS Accounts so flipping back and forth between them has been a little trickier than normal.

Here are my notes on how to manage the AWS credentials and the kubectl config to access each cluster.

AWS CLI

First task is to authorise the AWS CLI to act as the user in question. We do this by creating a user with the right permissions in the IAM console and then export the Access key ID and Secret access key values usually as a CSV file. We then take these values and add them to the ~/.aws/credentials file.

[dev]
aws_access_key_id = AKXXXXXXXXXXXXXXXXXX
aws_secret_access_key = xyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxy

[test]
aws_access_key_id = AKYYYYYYYYYYYYYYYYYY
aws_secret_access_key = abababababababababababababababababababab

[prod]
aws_access_key_id = AKZZZZZZZZZZZZZZZZZZ
aws_secret_access_key = nmnmnmnmnmnmnmnmnmnmnmnmnmnmnmnmnmnmnmnm

We can pick which set of credential the AWS CLI uses by adding the --profile option to the command line.

$ aws --profile dev sts get-caller-identity
{
    "UserId": "AIXXXXXXXXXXXXXXXXXXX",
    "Account": "111111111111",
    "Arn": "arn:aws:iam::111111111111:user/dev"
}

Instead of using the --profile option you can also set the AWS_PROFILE environment variable. Details of all the ways to switch profiles are in the docs here.

$ export AWS_PROFILE=test
$ aws sts get-caller-identity
{
    "UserId": "AIYYYYYYYYYYYYYYYYYYY",
    "Account": "222222222222",
    "Arn": "arn:aws:iam::222222222222:user/test"
}

Now we can flip easily between different AWS accounts we can export the EKS credential with

$ export AWS_PROFILE=prod
$ aws eks update-kubeconfig --name foo-bar --region us-east-1
Updated context arn:aws:eks:us-east-1:333333333333:cluster/foo-bar in /home/user/.kube/config

The user that created the cluster should also follow these instructions to make sure the new account is added to the cluster’s internal ACL.

Kubectl

If we run the previous command with each profile it will add the connection information for all 3 clusters to the ~/.kube/config file. We can list them with the following command

$ kubectl config get-contexts
CURRENT   NAME                                                  CLUSTER                                               AUTHINFO                                              NAMESPACE
*         arn:aws:eks:us-east-1:111111111111:cluster/foo-bar   arn:aws:eks:us-east-1:111111111111:cluster/foo-bar   arn:aws:eks:us-east-1:111111111111:cluster/foo-bar   
          arn:aws:eks:us-east-1:222222222222:cluster/foo-bar   arn:aws:eks:us-east-1:222222222222:cluster/foo-bar   arn:aws:eks:us-east-1:222222222222:cluster/foo-bar   
          arn:aws:eks:us-east-1:333333333333:cluster/foo-bar   arn:aws:eks:us-east-1:333333333333:cluster/foo-bar   arn:aws:eks:us-east-1:333333333333:cluster/foo-bar 

The star is next to the currently active context, we can change the active context with this command

$ kubectl config set-context arn:aws:eks:us-east-1:222222222222:cluster/foo-bar
Switched to context "arn:aws:eks:us-east-1:222222222222:cluster/foo-bar".

Putting it all together

To automate all this I’ve put together a collection of script that look like this

export AWS_PROFILE=prod
aws eks update-kubeconfig --name foo-bar --region us-east-1
kubectl config set-context arn:aws:eks:us-east-1:222222222222:cluster/foo-bar

I then use the shell source ./setup-prod command (or it’s shortcut . ./setup-prod) , this is instead of adding the shebang to the top and running it as a normal script. This is because when environment variables are set in scripts they go out of scope. Leaving the AWS_PROFILE variable in scope means that the AWS CLI will continue to use the correct account settings when it’s used later while working on this cluster.

Setting up a AWS EC2 Mac

I recently needed to debug some problems running a Kubernetes app on a Mac. The problem is I don’t have a Mac or easy access to one that I can have full control over to poke and prod at things. (I also am not the biggest fan of OSx, but that’s a separate story)

Recently AWS started to offer Mac Mini EC2 instances. These differ a little from most normal EC2 instances as they are an actual dedicated bit of hardware that you have exclusive access to rather than a VM on hardware shared with others.

Because of the fact it’s a dedicated bit of hardware the process for setting one up is a little different.

Starting the Instance

First you probably need to request to have a limit increasing on your account. as the default limit for dedicated hardware looks to be 0. This limit is also per region so you will need to ask for the update in every one you would need. To request the update use the AWS Support Center, user the “Create Case” button and select “Service Limit Increase”. From the drop down select “EC2 Dedicated Hosts”, then the region and you want to request and update to the mac1 instance type and enter the number of concurrent instances you will need. It took a little time for my request to be processed, but I did submit it on Friday afternoon and it was approved on Sunday morning.

Once it has been approved you can create a new “Dedicated Hosts” instance on the EC2 console, with a “Instance Family” of mac1 and a “Instance Type” of mac1.metal. You can pick your availability zone (not all Regions and AZ have all instance type so it might not be possible to allocate a mac in every zone). I also suggest you tick the “Instance auto-placement” box.

Once that is complete you can actually start allocate an EC2 instance on this dedicated host. You get to pick which version of OSx you want to run. Assuming you only have one dedicated host and you ticked the auto-placement box then you shouldn’t need to pick the hardware you want to run the instance on.

The other main things to pick as you walk through the wizard are the amount of disk space (default is 60gb), which security policy you want (be sure to pick one with ssh access) and which SSH key you’ll use to log in.

The instances do take a while to start, but given it’s doing a fresh OSx install the hardware this is probably not a surprise. But once the console says it’s up and both the status checks are passing you’ll be able to ssh into the box.

Enabling a GUI

Once logged in you can do most things from the command line, but I needed to run Docker, and all the instructions I could find online said I needed to download Docker Desktop and install that via the GUI.

I found the following gist which helped.

  • Fist up set a password for the ec2-user
    sudo passwd ec2-user
  • Second enabled the the VNC
% sudo /System/Library/CoreServices/RemoteManagement/ARDAgent.app/Contents/Resources/kickstart \
-activate -configure -access -on \
-configure -allowAccessFor -specifiedUsers \
-configure -users ec2-user \
-configure -restart -agent -privs -all

% sudo /System/Library/CoreServices/RemoteManagement/ARDAgent.app/Contents/Resources/kickstart \
 -configure -access -on -privs -all -users ec2-user

You can then add -L 5900:localhost:5900 to the ssh command that you use to log into the mac. This will port forward the VNC port to localhost.

VNCViewer or Remmina can be used to start a session that gives full access to the Mac’s gui.

Expand the disk

If you have allocated more than the default 60gb then you will need to expand the disk to make full use of it.

% PDISK=$(diskutil list physical external | head -n1 | cut -d" " -f1)
APFSCONT=$(diskutil list physical external | grep "Apple_APFS" | tr -s " " | cut -d" " -f8)
% sudo diskutil repairDisk $PDISK
# Accept the prompt with "y", then paste this command
% sudo diskutil apfs resizeContainer $APFSCONT 0

Add tools

The instance comes with Homebrew pre-setup so you can install nearly anything else you might need.

Shut it down when you are done

Mac EC2 instances really are not cheap ($25.99 per day…) so remember to kill it off when you are done.

Working with multiple EFS file system in EKS

I’ve been building a system recently on AWS EKS and using EFS filesystems as volumes for persistent storage.

I initially only had one container that required any storage, but as I added a second I ran into the issue that there didn’t look to be a way to bind a EFS volume to a specific PersistentVolumeClaim so no way to make sure the same volume was mounted into the same container each time.

A Pod requests a volume by referencing a PersistentVolumeClaim as follows:

apiVersion: v1
kind: Pod
metadata:
  name: efs-app
spec:
  containers:
  - name: app
    image: centos
    command: ["/bin/sh"]
    volumeMounts:
    - name: efs-volume
      mountPath: /data
  volumes:
  - name: efs-volume
    persistentVolumeClaim:
      claimName: efs-claim

The PersistentVolumeClaim would look:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: efs-claim
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: efs-sc
  resources:
    requests:
      storage: 5Gi

You can bind the EFS volume to a PersistentVolume as follows

apiVersion: v1
kind: PersistentVolume
metadata:
  name: efs-persistent-volume
spec:
  capacity:
    storage: 5Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: efs-sc
  csi:
    driver: efs.csi.aws.com
    volumeHandle: fs-6eb2fc16

The volumeHandle points to the EFS volume you want to back it.

If there is only one PersistentVolume then there is not a problem as the PersistentVolumeClaim will grab the only one available. But if there are more than one then you can include the volumeName in the PersistentVolumeClaim description to bind the two together.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: efs-claim
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: efs-sc
  resources:
    requests:
      storage: 5Gi
  volumeName: efs-persistent-volume

After a bit of poking around I found this Stack Overflow question which pointed me in the right direction.

Updated AWS Lambda NodeJS Version checker

I got another of those emails from Amazon this morning that told me that the version of the NodeJS runtime I’m using in the Lambda for my Node-RED Alexa Smart Home Skill is going End Of Life.

I’ve previously talked about wanting a way to automate checking what version of NodeJS was in use across all the different AWS Availability Regions. But when I tried my old script it didn’t work.

for r in `aws ec2 describe-regions --output text | cut -f3`; 
do
  echo $r;
  aws --region $r lambda list-functions | jq '.[] | .FunctionName + " - " + .Runtime';
done

This is most likely because the output from awscli tool has changed slightly.

The first change looks to be in the listing of the available regions

$ aws ec2 describe-regions --output text
REGIONS	ec2.eu-north-1.amazonaws.com	opt-in-not-required	eu-north-1
REGIONS	ec2.ap-south-1.amazonaws.com	opt-in-not-required	ap-south-1
REGIONS	ec2.eu-west-3.amazonaws.com	opt-in-not-required	eu-west-3
REGIONS	ec2.eu-west-2.amazonaws.com	opt-in-not-required	eu-west-2
REGIONS	ec2.eu-west-1.amazonaws.com	opt-in-not-required	eu-west-1
REGIONS	ec2.ap-northeast-2.amazonaws.com	opt-in-not-required	ap-northeast-2
REGIONS	ec2.ap-northeast-1.amazonaws.com	opt-in-not-required	ap-northeast-1
REGIONS	ec2.sa-east-1.amazonaws.com	opt-in-not-required	sa-east-1
REGIONS	ec2.ca-central-1.amazonaws.com	opt-in-not-required	ca-central-1
REGIONS	ec2.ap-southeast-1.amazonaws.com	opt-in-not-required	ap-southeast-1
REGIONS	ec2.ap-southeast-2.amazonaws.com	opt-in-not-required	ap-southeast-2
REGIONS	ec2.eu-central-1.amazonaws.com	opt-in-not-required	eu-central-1
REGIONS	ec2.us-east-1.amazonaws.com	opt-in-not-required	us-east-1
REGIONS	ec2.us-east-2.amazonaws.com	opt-in-not-required	us-east-2
REGIONS	ec2.us-west-1.amazonaws.com	opt-in-not-required	us-west-1
REGIONS	ec2.us-west-2.amazonaws.com	opt-in-not-required	us-west-2

This looks to have added something extra to the start of the each line, so I need to change which filed I select with the cut command by changing -f3 to -f4.

The next problem looks to be with the JSON that is output for the list of functions in each region.

$ aws --region $r lambda list-functions
{
    "Functions": [
        {
            "TracingConfig": {
                "Mode": "PassThrough"
            }, 
            "Version": "$LATEST", 
            "CodeSha256": "wUnNlCihqWLXrcA5/5fZ9uN1DLdz1cyVpJV8xalNySs=", 
            "FunctionName": "Node-RED", 
            "VpcConfig": {
                "SubnetIds": [], 
                "VpcId": "", 
                "SecurityGroupIds": []
            }, 
            "MemorySize": 256, 
            "RevisionId": "4f5bdf6e-0019-4b78-a679-12638412177a", 
            "CodeSize": 1080463, 
            "FunctionArn": "arn:aws:lambda:eu-west-1:434836428939:function:Node-RED", 
            "Handler": "index.handler", 
            "Role": "arn:aws:iam::434836428939:role/service-role/home-skill", 
            "Timeout": 10, 
            "LastModified": "2018-05-11T16:20:01.400+0000", 
            "Runtime": "nodejs8.10", 
            "Description": "Provides the basic framework for a skill adapter for a smart home skill."
        }
    ]
}

This time it looks like there is an extra level of array in the output, this can be fixed with a minor change to the jq filter

$aws lambda list-functions | jq '.[] | .[] | .FunctionName + " - " + .Runtime'
"Node-RED - nodejs8.10"

Putting it all back together to get

for r in `aws ec2 describe-regions --output text | cut -f4`;  do
  echo $r;
  aws --region $r lambda list-functions | jq '.[] | .[] | .FunctionName + " - " + .Runtime'; 
done

eu-north-1
ap-south-1
eu-west-3
eu-west-2
eu-west-1
"Node-RED - nodejs8.10"
ap-northeast-2
ap-northeast-1
sa-east-1
ca-central-1
ap-southeast-1
ap-southeast-2
eu-central-1
us-east-1
"Node-RED - nodejs8.10"
us-east-2
us-west-1
us-west-2
"Node-RED - nodejs8.10"

Listing AWS Lambda Runtimes

For the last few weeks I’ve been getting emails from AWS about Node 6.10 going end of life and saying I have deployed Lambda using this level.

The emails don’t list which Lambda or which region they think are at fault which makes tracking down the culprit difficult. I only really have 1 live instance deployed across multiple regions (and 1 test instance on a single region).

AWS Lambda region list

Clicking down the list of regions is time consuming and prone to mistakes.

In the email AWS do provide a command to list which Lambda are running with Node 6.10:

aws lambda list-functions --query="Functions[?Runtime=='nodejs6.10']"

But what they fail to mention is that this only checks your current default region. I can’t find a way to get the aws command line tool to list the Lambda regions, the closest I’ve found is the list of ec2 regions which hopefully match up. Pairing this with the command line JSON search tool jq and a bit of Bash scripting I’ve come up with the following:

for r in `aws ec2 describe-regions --output text | cut -f3`; 
do
  echo $r;
  aws --region $r lambda list-functions | jq '.[] | .FunctionName + " - " + .Runtime';
done

This walks over all the regions as prints out all the function names and the runtime they are using.

eu-north-1
ap-south-1
eu-west-3
eu-west-2
eu-west-1
"Node-RED - nodejs8.10"
"oAuth-test - nodejs8.10"
ap-northeast-2
ap-northeast-1
sa-east-1
ca-central-1
ap-southeast-1
ap-southeast-2
eu-central-1
us-east-1
"Node-RED - nodejs8.10"
us-east-2
us-west-1
us-west-2
"Node-RED - nodejs8.10"

In my case it only lists NodeJS 8.10 so I have no idea why AWS keep sending me these emails. Also since I’m only on the basic level I can’t even raise a technical help desk query to find out.

Anyway I hope this might be useful to others with the same problem.