Elixir for ARM via GH runners

Elixir for ARM via GH runners
Photo by Nick Abrams / Unsplash

I am currently evaluating Coolify to deploy and manage apps. I saw that Hetzer also offers ARM machines to run your stuff on. As ARM is less resource intensive and slightly cheaper I picked one of those machines.

One important step was to setup a custom app as the main point of the evaluation is to determine if Coolify could be a successor to Dokku and serve as staging / testing environment. This is where the journey started. More on Coolify maybe in a different post.

I first chose to try a manual Docker image deployment as there is already a build process for images in place for that project.

As the repo was already setup to build an AMD image and push it to a registry then pull it down on deploy the first step was to authenticate the registry (which was a bit fiddling around) and add the image name of respective image. An error was promptly returned: no matching manifest for linux/arm64/v8 in the manifest list entries. Up to now I only configured a default image build which is amd64 via GitHub Actions.

After a quick search I stumbled across the following addition for the docker/build-push-action action: platforms: linux/amd64,linux/arm64/v8. A quick addition. Problem solved.

Not so fast. As mentioned in the title the app is an Elixir app running on the BEAM. The image build process failed with something like:

[linux/arm64 builder  6/19] RUN mix deps.get --only prod
#50 8.103 Segmentation fault (core dumped)
#50 ERROR: process "/bin/sh -c mix deps.get --only $MIX_ENV" did not complete successfully: exit code: 139

Oh my. Doesn't look too good.

After investigating and looking around I found some reported issues with ARM builds, OTP etc., f.e. https://elixirforum.com/t/apple-silicon-and-cross-platform-docker-fails-minikube/60699 or https://elixirforum.com/t/arm64-dockerfile-failing/57317 describing also potential solutions to the problem.

The next try was to add the proposed env to the Dockerfile: ENV ERL_FLAGS="+JPperf true" and trigger a rebuild. There was no segfault anymore but the build time went from ~5mins to ~25mins. Really a bummer. More reading on this revealed that QEMU is the culprit (for the segfault and slowness). I also tried to add ENV ERL_FLAGS="+JMsingle true" as this is a "subset" of the JPperf flag for OTP 26 (I think). The build time was ~24mins. Not much better.

GitHub recently (some days ago) announced that ARM based runner for Linux (and Windows) are in beta. GA is planned for the end of the year.

The next step was to try out building the image on Coolify itself which is not what I want as this is also where the apps are running. A first try also lead to a failed build. After some tweaks and switching to build it from the Dockerfile worked. The downside is that the webhook sent by GitHub comes directly when merging and the current "wait for CI" step cannot be performed. But the build was performed in a reasonable time without any QEMU fixing flags involved. Another downside is that all the caching that comes with GitHub Actions needs to be rebuilt if possible at all.

Overall this has nothing to do in particular with Coolify but more with GitHub Actions and QEMU and ARM cross-builds. An issue for this is open on Gitlab. I can't imagine how to fix such a problem from skimming over the issue comments.