Contributing to Upstream

Over the last couple of weeks I’ve run into a few little bugs/niggles in Open Software components we are using at FlowForge. As a result I have been opening Pull Requests and Feature Requests against those projects to get them fixed.

mosquitto

First up was a really small change for the Mosquitto MQTT broker when using MQTT over WebSockets. In FlowForge v0.8.0 we added support to use MQTT to communicate between different projects in the same team and also as a way for devices to send state back to the core Forge App. To support this we have bundled the Mosquitto MQTT broker as part of the deployment.

When running on Docker or Kubernetes we use MQTT over WebSockets and the MQTT broker is exposed via the same HTTP reverse proxy as the Node-RED projects. I noticed that in the logs all the connections were coming from the IP address of the HTTP reverse proxy, not the actual clients. This makes sense because as far as mosquitto is concerned this is the source of the connection. To work around this the proxy usually adds a HTTP Header with the original clients IP address as follows:

X-Forwarded-For: 192.168.1.100

Web applications normally have a flag you can set to tell them to trust the proxy to add the correct value and substitute this IP address in any logging. Mosquitto uses the libwebsocket library to handle the set up of the WebSocket connection and this library supports exposing this HTTP Header when a new connection is created.

I submitted a Pull Request (#2616) to allow mosquitto to make use of this feature. Which adds the following code in src/websockets.c here.

...
    if (lws_hdr_copy(wsi, ip_addr_buff, sizeof(ip_addr_buff), WSI_TOKEN_X_FORWARDED_FOR) > 0) {
        mosq->address = mosquitto__strdup(ip_addr_buff);
    } else {
        easy_address(lws_get_socket_fd(wsi), mosq);
    }
...

This will use the HTTP header value if it exists, or fallback to the remote socket address if not.

Roger Light reviewed and merged this for me pretty much straight away and then released mosquitto v2.0.15, which was amazing.

mosquitto-go-auth

To secure our mosquitto broker we make use of the mosquitto-go-auth plugin which allows us to dynamically create users and ACL entries as we add/remove projects or users from the system. To make life easy this project publishes a Docker container with mosquitto, libwebsocket and the plugin all pre-built and setup to work together.

I had earlier run into a problem with the container not always working well when using MQTT over WebSockets connections. This turned out to be down to the libwebsockets instance in the container not being compiled with a required flag.

To fix this I submitted a Pull Request (#241) that

  • Updated the version of libwebsockets built into the container
  • Added the missing compile time flag when building libwebsockets
  • Bumped the mosquitto version to the just released v2.0.15

Again the project maintainer was really responsive and got the Pull Request turned round and release in a couple of days.

This means the latest version of the plugin container works properly with MQTT over WebSocket connections and will log the correct IP address of the connecting clients.

grafana

As I mentioned earlier I spotted the problem with the client IP addresses while looking at the logs from the MQTT broker of our Kubernetes deployment. To gather the logs we make use of Grafana Loki. Loki gathers the logs from all the different components and these can then be interrogated from the Gafana front end.

To deploy Grafana and Loki I made use of the Helm charts provided by the project. This means that all the required parts can be installed and configured with a single command and a file containing the specific customisations required.

One of the customisations is the setting up a Kubernetes Ingress entry for the Grafana frontend to make it accessible outside the cluster. The documentation says that you can pass a list of hostnames using the grafana.ingress.hosts key. If you just set this then most things work properly until you start to build dashboards based on the logs at which point you get some strange behaviour where you get redirected to http://localhost which doesn’t work. This is because to get the redirects to work correctly you also need to pass the hostname in the grafana."grafana.ini".server.domain setting.

In order to get this to work cleanly you have to know that you need to pass the same hostname in two different places. To try and make this a little simpler for the default basic case I have submitted a PR (#1689) that will take the first enty in the grafana.ingress.hosts list and use it for the value of grafana."grafana.ini".server.domain if there isn’t one provided.

Many thanks to Marcus Noble for helping me with the Go Templating needed for this.

This PR is currently undergoing review and hopefully will be merged soon.

eksctl

And finally this isn’t a code contribution, but I opened a feature request against WeaveWorks’ eksctl command. This is a tool create and managed Kubernetes clusters on AWS systems.

A couple of weeks ago we received a notice from AWS that one of the EC2 machines that makes up the FlowForge Cloud instance was due to be restarted as part of some EC2 planned maintenance.

We had a couple of options as to how to approach this

  1. Just leave it alone, the nodegroup would automatically replace the node when it was shutdown, but this would lead to downtime as the new EC2 instance was spun up.
  2. Add an extra node to the impacted nodegroup and migrate the running workload to the new node once it was fully up and running.

We obviously went with option 2, but this left us with an extra node in the group after maintenance. eksctl has options to allow the group size to be reduced but it doesn’t come with a way to say which node should be removed, so there was no way to be sure it would be the (new) node with no pods scheduled that is removed.

I asked a question on AWS’s re:Post forum as to how to remove a specific node and got details of how to do this with the awscli tool. While this works it would be really nice to be able to do it all from eksctl as a single point of contact.

I’ve raised a feature request (#5629) asking if this can be added and it’s currently waiting review and hopefully design and planning.

Building a Kubernetes Test Environment

Over the last couple of weekends I’ve been noodling around with my home lab set up to build a full local environment to test out FlowForge with both the Kubernetes Container and Docker Drivers.

The other reason to put all this together is to help to work the right way to put together a proper CI pipeline to build, automatically test and deploy to our staging environment.

Components

NPM Registry

This is somewhere to push the various FlowForge NodeJS modules so they can then be installed while building the container images for the FlowForge App and the Project Container Stacks.

This is a private registry so that I can push pre-release builds without them slipping out in to the public domain, but also so I can delete releases and reuse version numbers which is not allowed on the public NPM registry.

I’m using the Verdaccio registry as I’ve used this in the past to host custom Node-RED nodes (which it will probably end up doing again in this set up as things move forwards). This runs as Docker container and I use my Nginx instance to reverse proxy for it.

As well as hosting my private builds it can proxy for the public npmjs.org regisry which speeds up local builds.

Docker Container Registry

This is somewhere to push the Docker containers that represent both the FlowForge app it’s self and the containers that represent the Project Stacks.

Docker ship a container image available that will run a registry.

As well as the registry I’m also running second container with this web UI project to help keep track of what I’ve pushed to the registry and also allows me to delete tags which is useful when testing

Again my internet facing Nginx instance is proxying for both of these (on the same virtual host since their routes do not clash and it makes CORS easier since the UI is all browser side JavaScript)

Helm Chart Repository

This isn’t really needed, as you can generate all the required files with the helm command and host the results on any Web server, but this lets me test the whole stack end to end.

I’m using a package called ChartMuseum which will automatically generate index.yaml manifest file when charts are uploaded via it’s simple UI.

Nginx Proxy

All of the previous components have been stood up as virtual hosts on my public Nginx instance so that they can get HTTPS certificates from LetsEncrypt. This is makes things a lot easier because both Docker and Kubernetes basically require the container registry be secure by default.

While it is possible to add exceptions for specific registries, these days it’s just easier to do it “properly” up front.

MicroK8s Cluster

And finally I need a Kubernetes cluster to run all this on. In this case I have a 3 node cluster made up of

  • 2 Raspberry Pi 4s with 8gb of RAM each
  • 1 Intel Celeron based mini PC with 8gb of RAM

All 3 of these are running 64bit Ubuntu 20.04 and MicroK8s. The Intel machine is needed at the moment because the de facto standard PostrgresSQL Helm Chat only have amd64 based containers at the moment so won’t run on the Raspberry Pi based nodes.

The cluster uses the NFS Persistent Volume provisioner to store volumes on my local NAS so they are available to all the nodes.

Usage

I’ll write some more detailed posts about how I’ve configured each of these components and then how I’m using them.

As well as testing the full Helm chart builds, I can also use this to run the FlowForge app locally and have the Kubernetes Container driver running locally on my development machine and have it create Projects in the Kubernetes cluster.