Currently, I run Unraid and have all of my services’ setup there as docker containers. While this is nice and easy to setup initially, it has some major downsides:
- It’s fragile. Unraid is prone to bugs/crashes with docker that take down my containers. It’s also not resilient so when things break I have to log in and fiddle.
- It’s mutable. I can’t use any infrastructure-as-code tools like terraform, and configuration sort of just exist in the UI. I can’t really roll back or recover easily.
- It’s single-node. Everything is tied to my one big server that runs the NAS, but I’d rather have the NAS as a separate fairly low-power appliance and then have a separate machine to handle things like VMs and containers.
So I’m looking ahead and thinking about what the next iteration of my homelab will look like. While I like unraid for the storage stuff, I’m a little tired of wrangling it into a container orchestrator and hypervisor, and I think this year I’ll split that job out to a dedicated machine. I’m comfortable with, and in fact prefer, IaC over fancy UIs and so would love to be able to use terraform or Pulumi or something like that. I would prefer something multi-node, as I want to be able to tie multiple machines together. And I want something that is fault-tolerant, as I host services for friends and family that currently require a lot of manual intervention to fix when they go down.
So the question is: how do you all do this? Kubernetes, docker-compose, Hashicorp Nomad? Do you run k3s, Harvester, or what? I’d love to get an idea of what people are doing and why, so I can get some ideas as to what I might do.
In my opinion trying to set up a highly available fault tolerant homelab adds a large amount of unnecessary complexity without an equivalent benefit. It’s good to have redundancy for essential services like DNS, but otherwise I think it’s better to focus on a robust backup and restore process so that if anything goes wrong you can just restore from a backup or start containers on another node.
I configure and deploy all my applications with Ansible roles. It can programmatically create config files, pass secrets, build or start containers, cycle containers automatically after config changes, basically everything you could need.
Sure it would be neat if services could fail over automatically but things only ever tend to break when I’m making changes anyway.
Yeah I guess that’s true, I do think the other part about having configs done programatically is a lot more important anyway. If things go down but all it takes to get it back is to re-run the configs from files then it’s not so bad
More importantly, if you do things programmatically you will still have the information how you did it last time the next time you need to move to a new major version of something which is particularly important in a home setting where you don’t do tasks like that often.
This, I used to have a kubernetes setup but how much redudency can you really have at home. Do you have a generator? Multiple Internet lines?
The fact is most hardware is highly reliable. Having good backups to restore from is all you need and you gain a huge improvement in simplicity which adds reliability in and of itself.