Unfortunately it seems that hetzner only checks smarctl selftest errors and not the single values. Yesterday the Raw_Read_Error_Rate value was 78, today it's 70 already. Basically you have to wait until the HDD fails, at that point they will change it in few minutes. At this rate at 10 points per day we should have 4-5 days autonomy.
webwit wrote: ↑
I am just talking out of my ass here, since the last time I got deep into drive technology was Amiga floppy disks. If I remember correctly, such a disk was divided in a bunch of tracks, which were divided in a bunch of sectors. Each sector had a checksum. So when you read data from the sector, and then compared with the checksum, you knew if the data was healthy or corrupt. I presume technology hasn't deteriorated and modern HDD and SSD also checksum or otherwise validate parts, so in a raid 1 setup you know which disk has the right data and which the broken?
RAID is not a backup system, it's just a way to have some redundancy (or a nice way to be able to add disk space to an array).
Without a raid after yesterday's failure we would probably have a dead server. So hurrah for us! But if it worked the way you are saying we wouldn't have corrupted data, the good bits should have been sync'ed from the healthy HDD, but we had data loss anyway. RAID1 is fine and dandy, but it doesn't save you from data loss, actually since the failure rate of an HDD is around 1.5-3%, having 2 HDD we double our chances of a broken HDD. In a sense having just 1 new HDD is better than having 2 old ones... but hetzner uses hard drives that are running non-stop for ages, so raid even with just two drives makes sense.
But if data loss is your concern, backup is the only solutions.