Best way to manage all my services as containers?

valar@lemmy.ca · edit-2 17 hours ago

Best way to manage all my services as containers?

Jul (they/she)@piefed.blahaj.zone · 13 hours ago

What do you use for repeatable recovery and deployment of systems?

I’ve looked at ArgoCD and FlexCD. ArgoCD was too flaky. When I made changes to helm files it would often fail to deploy them and the UI often wouldn’t really show the detailed errors from things like helm syntax errors, so it was a pain to troubleshoot.

FlexCD was just really a pain to configure in the first-place and I didn’t want to learn kustomize when I already have helm charts.

And neither really supported staged deployments or dealt with dependant services well. So I couldn’t get it to deploy the infrastructure level helm charts like PostgreSQL before deploying the services that depend on it. Technically, with Kubernetes it shouldn’t matter about the order of deployment but in reality when ArgoCD would deploy the other stuff first and wait for it to come up and it never came up because the dependencies weren’t there, it caused it to choke a lot.

Just an example of the issues I’ve had. But I really want an easy way to make lots of small changes to charts and deploy them quickly as well as being able to quickly recover the cluster from backups if something catastrophic happens like a fire without having to manually deploy each chart. Just curious how others handle it or if it’s always manual deployment of charts via CLI only.

Daniel Quinn@lemmy.ca · edit-2 3 hours ago

I’ve used FluxCD in the past and have looked into ArgoCD, but honestly, I’ve not seen any big benefit from either to be honest. I use k8s both at home and at work, and in both cases, we do “imperative” deploys: you run helm install ... either directly or via the CI and stuff is deployed.

So for example at my last job, our GitLab CI just had a section triggered exclusively for merges into master that ran helm install ... for all three environments. We had three values.yaml files, one for each environment, and when we wanted to deploy a new version, the process was:

Create a tag for our release version (ie. 1.2.3) and push it to the repo. This would trigger a build and push the resulting image into the container registry.
Push an update to the repo with the new tag set in the appropriate Helm values file. If we wanted to deploy 1.2.3 to development but not yet to staging or production, then the tag: value in each of the environment files would look like this:

k8s/chart/environments/development.yaml: tag: 1.2.3
k8s/chart/environments/staging.yaml: tag: 1.2.2
k8s/chart/environments/production.yaml: tag: 1.2.2

Once that change is pushed, the CI will automatically apply it with helm install ... and make sure that all three environments are what they’re supposed to be.

As for dependent services, that should all be in your Helm chart so they’re stood up and torn down together. The specific case you mention about “Service A” being dependent on “Service B” but stood up before “Service B” is ready is a classic problem, but easily solved:

The dependent service (“A” in this case) should have an entrypoint that checks for everything else before starting. Here’s what I’m using right now in a project:

#!/bin/sh

while ! nc -z postgres 5432; do
  echo "Waiting for postgres..."
  sleep 0.1
done
echo "PostgreSQL started"

touch /tmp/ready

exec "$@"

I’ve even got some code that checks that all the Django migrations have run first for the same situation. The Kubernetes philosophy is that any container should be able to die at any time and be eventually be brought back up and that every container needs to be prepared for this. Typically this means that your containers should operate on the basis of “if I can’t work, die, and hope the problem is solved by the time Kubernetes redeploys me”.