I have a ZFS RAIDZ2 array made of 6x 2TB disks with power on hours between 40,000 and 70,000. This is used just for data storage of photos and videos, not OS drives. Part of me is a bit concerned at those hours considering they’re a right old mix of desktop drives and old WD reds. I keep them on 24/7 so they’re not too stressed in terms of power cycles bit they have in the past been through a few RAID5 rebuilds.
Considering swapping to 2x ‘refurbed’ 12TB enterprise drives and running ZFS RAIDZ1. So even though they’d have a decent amount of hours on them, they’d be better quality drives and fewer disks means less change of any one failing (I have good backups).
The next time I have one of my current drives die I’m not feeling like staying with my current setup is worth it, so may as well change over now before it happens?
Also the 6x disks I have at the moment are really crammed in to my case in a hideous way, so from an aesthetic POV (not that I can actually seeing the solid case in a rack in the garage),it’ll be nicer.
3:2:1 - Cattle not pets - If your data is backed up in multiple sites, the death of one site shouldn’t overwhelm you, and give you time to recover.
If your primary site drives are getting above their designed lifetime, rotate them out, sure - but they could be used as part of the backup architecture else where (like a live offsite sync location with enough tolerance for 2 disk failures to account for the age).
3 copies of your data; 2 types of media; 1 copy offsite.
I mean if it’s homelab, it’s ok to be pets. Not everything has to be commoditized for the whims of industry.
“Cattle not pets” in this instance means you have a specific plan for the random death of a HDD (which RAIDZ2 basically already handles), and because of that you can work your HDDs until they are completely dead. If your NAS is a “pet” then your strategy is more along the lines of taking extra-good care of your system (e.g. rotating HDDs out when you think they’re getting too old, not putting too much stress on them) and praying that nothing unexpected happens. I’d argue it’s not really “okay” to have pets just because you’re in a homelab, as you don’t really have to put too much effort into changing your setup to be more cynical instead of optimistic, and it can even save you money since you don’t need to worry about keeping things fresh and new.
“In the old way of doing things, we treat our servers like pets, for example Bob the mail server. If Bob goes down, it’s all hands on deck. The CEO can’t get his email and it’s the end of the world. In the new way, servers are numbered, like cattle in a herd. For example, www001 to www100. When one server goes down, it’s taken out back, shot, and replaced on the line.”
~from https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/
I get that. But I think the quote refers to corporate infrastructure. In the case of a mail server, you would have automated backup servers that kick-in and you would simply pull the rack of the failed mail server.
Replacing drives based on SMART messages (pets) means you can do the replacement on your time and make sure you can do resilvering or whatever on your schedule. I think that is less burdensome than having a drive fail when you’re quite busy and being stressed about having the system is running in a degraded state until you have time to replace the drive.
I don’t think ‘cattle not pets’ is all that corporate, especially w/r/t death of the author. For me, it’s more about making sure that failure modes have (rehearsed) plans of action, and being cognizant of any manual/unreplicable “hand-feeding” that you’re doing. Random and unexpected hardware death should be part of your system’s lifecycle, and not something to spend time worrying about. This is also basically how ZFS was designed from a core level, with its immense distrust for hardware allowing you to connect whatever junky parts you want and letting ZFS catch drives that are lying/dying. In the original example, uptime seems to be an emphasized tenet, but I don’t think it’s the most important part.
RE replacements on scheduled time, that might be true for RAIDZ1, but IMO a big selling point of RAIDZ2 is that you’re not in a huge rush to get resilvering done. I keep a cold drive around anyway.
Yep, numbering’s the key.
When you create NAS01, you know there’s going to be a NAS02 one day
And then Skynet remembers…