• 4 Posts
  • 62 Comments
Joined 11 months ago
cake
Cake day: July 29th, 2023

help-circle


  • I started as more “homelab” than “selfhosted” as first - so I was just stuffing around playing with things, but then that seemed sort of pointless and I wanted to run real workloads, then I discovered that was super useful and I loved extracting myself from commercial cloud services (dropbox etc). The point of this story is that I sort of built most of the infrastructure before I was running services that I (or family) depended on - which is where it can become a source of stress rather than fun, which is what I’m guessing you’re finding yourself in.

    There’s no real way around this (the pressure you’re feeling), if you are running real services it is going to take some sysadmin work to get to the point where you feel relaxed that you can quickly deal with any problems. There’s lots of good advice elsewhere in this thread about bit and pieces to do this - the exact methods are going to vary according to your needs. Here’s mine (which is not perfect!).

    • I’m running on a single mini PC & a Synology NAS setup for RAID 5
    • I’ve got a nearly identical spare mini PC, and swap over to it for a couple of weeks (originally every month, but stretched out when I’m busy). That tests my ability to recover from that hardware failure.
    • All my local workloads are in LXC containers or VM’s on Proxmox with automated snapshots that are my (bulky) backups, but allow for restoration in minutes if needed.
    • The NAS is backed up locally to an external USB that’s not usually plugged in, and to a lower speced similar setup 300km away.
    • All the workloads are dockerised, and I have a standard directory structure and compose approach so if I need to upgrade something or do some other maintenance of something I don’t often touch, I know where everything is with out looking back to the playbook
    • I don’t use a script or Terrafrom to set those up, I’ve got a proxmox template with docker and tailscale etc installed that I use, so the only bit of unique infrastructure is the docker compose file which is source controlled on Forgejo
    • Everything’s on UPSs
    • A have a bunch of ansible playbooks for routine maintenance such as apt updates, also in source control
    • all the VPS workloads are dockerised with the same directory structure, and behind NGINX PM. I’ve gotten super comfortable with one VPS provider, so that’s a weakness. I should try moving them one day. They are mostly static websites, plus one important web app that I have a tested backup strategy for, but not an automated one, so that needs addressed.
    • I use a local and an external UptimeKuma for monitoring, enhanced by running a tiny server on every instance that just exposes a disk free and memory free api that can be consumed by Uptime.

    I still have lots of single points of failure - Tailscale, my internet provider, my domain provider etc, but I think I’ve addressed the most common which would be hardware failures at home. My monitoring is also probably sub-par, I’m not really looking at logs unless I’m investigating a problem. Maybe there’s a Netdata or something in my future.

    You’ve mentioned that a syncing to a remote server for backups is a step you don’t want to take, if you mean managing your own is a step you don’t want to take, then your solutions are a paid backup service like backblaze or, physically shuffling external USB drives (or extra NASs) back and forth to somewhere - depending on what downtime you can tolerate.




  • I run two local physical servers, one production and one dev (and a third prod2 kept in case of a prod1 failure), and two remote production/backup servers all running Proxmox, and two VPSs. Most apps are dockerised inside LXC containers (on Proxmox) or just docker on Ubuntu (VPSs). Each of the three locations runs a Synology NAS in addition to the server.

    Backups run automatically, and I manually run apt updates on everything each weekend with a single ansible playbook. Every host runs a little golang program that exposes the memory and disk use percent as a JSON endpoint, and I use two instances of Uptime Kuma (one local, and one on fly.io) to monitor all of those with keywords.

    So -

    • weekly: 10 minutes to run the update playbook, and I usually ssh into the VPS’s, have a look at the Fail2Ban stats and reboot them if needed. I also look at each of the Proxmox GUIs to check the backs have been working as expected.
    • Monthly: stop the local prod machine and switch to the prod2 machine (from backups) for a few days. Probably 30 minutes each way, most of it waiting for backups.
    • From time to time (if I hear of a security update), but generally every three months: Look through my container versions and see if I want to update them. They’re on docker compose so the steps are just backup the LXC, docker down, pull, up - probs 5 minutes per container.
    • Yearly: consider if I need to do operating systems - eg to Proxmox 8, or a new Debian or Ubuntu LTS
    • Yearly: visit the remotes and have a proper check/clean up/updates

  • I routinely run my homelab services as a single Docker inside an LXC - they are quicker, and it makes backups and moving them around trivial. However, while you’re learning, a VM (with something conventional like Debian or Ubuntu) is probably advised - it’s a more common experience so you’ll get more helpful advice when you ask a question like this.



  • For light touch monitoring this is my approach too. I have one instance in my network, and another on fly.io for the VPSs (my most common outage is my home internet). To make it a tiny bit stronger, I wrote a Go endpoint that exposes the disk and memory usage of a server including with mem_okay and disk_okay keywords, and I have Kuma checking those.

    I even have the two Kuma instances checking each other by making a status page and adding checks for each other’s ‘degraded’ state. I have ntfy set up on both so I get the Kuma change notifications on my iPhone. I love ntfy so much I donate to it.

    For my VPSs, this is probably not enough, so I am considering the more complicated solutions (I’ve started wanting to know things like an influx of fali2ban bans etc.)




  • Yo dawg, I put most of my services in a Docker container inside their own LXC container. It used to bug me that this seems like a less than optimal use of resources, but I love the management - all the VM and containers on one pane of glass, super simple snapshots, dead easy to move a service between machines, and simple to instrument the LXC for monitoring.

    I see other people doing, and I’m interested in, an even more generic system (maybe Cockpit or something) but I’ve been really happy with this. If OP’s dream is managing all the containers and VM’s together, I’d back having a look at Proxmox.


  • This is where I landed on this decision. I run a Synology which just does NAS on spinning rust and I don’t mess with it. Since you know rsync this will all be a painless setup apart from the upfront cost. I’d trust any 2 bay synology less than 10 years old (I think the last two digits in the model number is the year), then if your budget is tight, grab a couple 2nd hand disks from different batches (or three if you budget stretches to it,).

    I also endorse u/originalucifer’s comment about a real machine. Thin clients like the HP minis or lenovos are a great step up.




  • I’ve just been down this exact journey, and ended up settling on Kavita. It has all the browse, search and library stuff you’d expect. You can download or read things in the web interface. I’m only using it for epub and PDF books, but its focus is comics and manga so I expect it to shine there.

    I don’t think it does mobi, but since I use Calibre on my laptop to neaten up covers and metadata before I drop books on to the server it’s a simple matter to convert the odd mobi I end up with. Installation (using docker inside an LXC) was simple.

    It’s been a really straightforward, good experience. Highly recommend. I like it better than AudioBookshelf (which I’m already hosting for audio books) which I also tried, but didn’t like as much for inexplicable reasons. I also considered Calibre-Web, but that seemed a bit messy since I guess I’d use Calibre on my laptop to manage my books on a NAS share then serve it headless from the server with Calibre-Web? I might have that completely wrong, I didn’t spend any time looking into it because Kavita was the second thing I tried and it did exactly what I wanted.





  • Your head might be spinning from all the different advice you’re getting - don’t worry, there are a lot of options and lots of folk are jumping in with genuinely good (and well meaning) advice. I guess I’ll add my two cents, but try and explain the ‘why’ of my thinking.

    I’m assuming from your questions you know your way around a computer, can figure things out, but haven’t done much self-hosting. If I’m wrong about that, go ahead and skip this suggestion.

    • Jellyfin good - a common gateway drug to homelabbing, and the only thing you’ll do that non-tech friends will appreciate
    • Proxmox good - it makes the backups simple and provides a path forward for all sorts of things
    • Docker good - you’ve said it increases complexity; this is correct in that you’re adding more layers of stuff, but it reduces your complexity of management by removing a heap of dependency issues. There is a compute and memory overhead involved, but it’s small and the tradeoff is worth it.
    • VM good - yes an LXC is more efficient, but it’s harder to run docker in. Save that for a future project
    • Media data somewhere else good - I run a separate NAS with an SMB share. A NAS in a VM is a compromise, but like all things self hosting, you start out with what you’ve got. I let Jellyfin keep the metadata in the VM that’s hosting my Jellyfin though since the NAS is over the network. That’s less of a consideration if you are visualizing your NAS on the same machine, but I’d still do it my way for future proofing.
    • Passthrough magic not yet - this can also be a future project. If your metal has quicksync that can be utilized to reduce the CPU load, but that can also be a future project.