How old are the disks in your NAS?

blackstrat@lemmy.fwgx.uk · 1 day ago

How old are the disks in your NAS?

TechnicallyColors@lemm.ee · 1 day ago

“Cattle not pets” in this instance means you have a specific plan for the random death of a HDD (which RAIDZ2 basically already handles), and because of that you can work your HDDs until they are completely dead. If your NAS is a “pet” then your strategy is more along the lines of taking extra-good care of your system (e.g. rotating HDDs out when you think they’re getting too old, not putting too much stress on them) and praying that nothing unexpected happens. I’d argue it’s not really “okay” to have pets just because you’re in a homelab, as you don’t really have to put too much effort into changing your setup to be more cynical instead of optimistic, and it can even save you money since you don’t need to worry about keeping things fresh and new.

“In the old way of doing things, we treat our servers like pets, for example Bob the mail server. If Bob goes down, it’s all hands on deck. The CEO can’t get his email and it’s the end of the world. In the new way, servers are numbered, like cattle in a herd. For example, www001 to www100. When one server goes down, it’s taken out back, shot, and replaced on the line.”

~from https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/

taiidan@slrpnk.net · 6 hours ago

I get that. But I think the quote refers to corporate infrastructure. In the case of a mail server, you would have automated backup servers that kick-in and you would simply pull the rack of the failed mail server.

Replacing drives based on SMART messages (pets) means you can do the replacement on your time and make sure you can do resilvering or whatever on your schedule. I think that is less burdensome than having a drive fail when you’re quite busy and being stressed about having the system is running in a degraded state until you have time to replace the drive.

TechnicallyColors@lemm.ee · edit-2 5 hours ago

I don’t think ‘cattle not pets’ is all that corporate, especially w/r/t death of the author. For me, it’s more about making sure that failure modes have (rehearsed) plans of action, and being cognizant of any manual/unreplicable “hand-feeding” that you’re doing. Random and unexpected hardware death should be part of your system’s lifecycle, and not something to spend time worrying about. This is also basically how ZFS was designed from a core level, with its immense distrust for hardware allowing you to connect whatever junky parts you want and letting ZFS catch drives that are lying/dying. In the original example, uptime seems to be an emphasized tenet, but I don’t think it’s the most important part.

RE replacements on scheduled time, that might be true for RAIDZ1, but IMO a big selling point of RAIDZ2 is that you’re not in a huge rush to get resilvering done. I keep a cold drive around anyway.

SayCyberOnceMore@feddit.uk · 19 hours ago

Yep, numbering’s the key.

When you create NAS01, you know there’s going to be a NAS02 one day

Talaraine@fedia.io · 1 day ago

When one server goes down, it’s taken out back, shot, and replaced on the line.

And then Skynet remembers…