I get a lot of folks are correctly pointing out the need to back up data but isn’t that a little bit of victim blaming? This isn’t a situation where the guy had a 10 year old drive with all his photos and videos sitting around unbacked up. He had a new drive and it failed. Can we agree that brand new drives aren’t supposed to fail?
Can we agree that brand new drives aren’t supposed to fail?
No.
The typical failure rates, for pretty much all electronics, even mechanic stuff, form a “bathtub graph”: relatively many early failures, very few failures for a long time, with a final increasing number of failures tending to a 100%.
That’s why you’re supposed to have a “burn in” period for everything, before you can trust it within some probably (still make backups), and beware of it reaching end of life (make sure the backups actually work).
Indeed. An old EE mentor told me once that most component aging takes place the first two weeks of operation. If it operates for two weeks, it will probably operate for a long, long time after that. When you’re burning in a piece of gear, it helps the testing process if you put it in a high temperature environment as well (within reason) to place more stress on the components.
The high temperature part is kind of a trap with SSDs: flash memory is easier to write (less likely to error out) at temperatures above 50C, so if you run a write heavy application at higher temperature, it’s less likely to fail than if it was kept colder.
Properly stress testing an SSD would be writing to it while cold (below 20C) and checking read errors while hot (above 60C).
For normal use you’d want the opposite: write hot, read cold.
That’s absolutely true in the physical sense, but in the “commercial”/practical sense, most respectable companies’ QA process would shave off a large part of that first bathtub slope through testing and good quality practices. Not everything off of the assembly line is meant to make it into a boxed up product.
Apparently even respectable companies are finding out that it’s cheaper to skimp on QA and just ship a replacement item when a customer complains. Particularly when it’s small items that aren’t too expensive to ship, but some are doing it even with full blown HDDs.
They should at least try to recover the data. Maybe a data recovery program like spinrite would just do it. https://www.grc.com/sr/spinrite.htm .
Not running raid, not backing up, and not even trying the simplest recovery approaches is just sloppy and lazy. Do at least one of the three.
Like someone else said. Expect the biggest risk of failure when you buy it. Then like maybe 5 years out rising failure rates. Refreshing the disk pattern as it gets older can help too.
I had a high failure rate in some Seagate drives in the early 00s. Switch vendors and never had the problem again.
We also do no know how they failed. Are they still image readable with ddrescue or spinrite for example or are they truly crashed. It is not clear if they even tried.
You can be mad at it but what they said is largely true. Not having the data backed up somewhere and expecting everything to be perfectly fine forever is like not having old photos backed up somewhere and expecting everything to be perfectly fine forever.
It’s even more egregious here because if OP can afford a 3TB SSD, they should be able to afford a 3+TB HDD as a backup no problem. The money isn’t an issue for OP, just improper knowledge of how to handle data storage. It isn’t necessarily their fault this happened since the average person isn’t given this info, but at its core, “pay more money” because you need backups is the only true answer
I get a lot of folks are correctly pointing out the need to back up data but isn’t that a little bit of victim blaming? This isn’t a situation where the guy had a 10 year old drive with all his photos and videos sitting around unbacked up. He had a new drive and it failed. Can we agree that brand new drives aren’t supposed to fail?
No.
The typical failure rates, for pretty much all electronics, even mechanic stuff, form a “bathtub graph”: relatively many early failures, very few failures for a long time, with a final increasing number of failures tending to a 100%.
That’s why you’re supposed to have a “burn in” period for everything, before you can trust it within some probably (still make backups), and beware of it reaching end of life (make sure the backups actually work).
Indeed. An old EE mentor told me once that most component aging takes place the first two weeks of operation. If it operates for two weeks, it will probably operate for a long, long time after that. When you’re burning in a piece of gear, it helps the testing process if you put it in a high temperature environment as well (within reason) to place more stress on the components.
The high temperature part is kind of a trap with SSDs: flash memory is easier to write (less likely to error out) at temperatures above 50C, so if you run a write heavy application at higher temperature, it’s less likely to fail than if it was kept colder.
Properly stress testing an SSD would be writing to it while cold (below 20C) and checking read errors while hot (above 60C).
For normal use you’d want the opposite: write hot, read cold.
That’s absolutely true in the physical sense, but in the “commercial”/practical sense, most respectable companies’ QA process would shave off a large part of that first bathtub slope through testing and good quality practices. Not everything off of the assembly line is meant to make it into a boxed up product.
Apparently even respectable companies are finding out that it’s cheaper to skimp on QA and just ship a replacement item when a customer complains. Particularly when it’s small items that aren’t too expensive to ship, but some are doing it even with full blown HDDs.
They should at least try to recover the data. Maybe a data recovery program like spinrite would just do it. https://www.grc.com/sr/spinrite.htm .
Not running raid, not backing up, and not even trying the simplest recovery approaches is just sloppy and lazy. Do at least one of the three.
Like someone else said. Expect the biggest risk of failure when you buy it. Then like maybe 5 years out rising failure rates. Refreshing the disk pattern as it gets older can help too.
All of this skills the point. This is a second drive that failed, it was the replacement for an earlier drive that failed.
That’s what the article is all about.
A high, unexpected and unreasonable failure rate.
I had a high failure rate in some Seagate drives in the early 00s. Switch vendors and never had the problem again.
We also do no know how they failed. Are they still image readable with ddrescue or spinrite for example or are they truly crashed. It is not clear if they even tried.
Just pay triple! Don’t be a poor!
Such great advice.
You can be mad at it but what they said is largely true. Not having the data backed up somewhere and expecting everything to be perfectly fine forever is like not having old photos backed up somewhere and expecting everything to be perfectly fine forever.
It’s even more egregious here because if OP can afford a 3TB SSD, they should be able to afford a 3+TB HDD as a backup no problem. The money isn’t an issue for OP, just improper knowledge of how to handle data storage. It isn’t necessarily their fault this happened since the average person isn’t given this info, but at its core, “pay more money” because you need backups is the only true answer