Server crash. UPDATE: New server ordered

webwit wrote:I am not, it's just how the hoster set it up.

Maybe get in touch with their support. Since we're on the default config maybe they can do the mdadm magic as well when replacing that hdd!
My experience with Hetzner support has been very good so far.
Wodan
User avatar
ISO Advocate

Unread post19 Jul 2017, 19:16

User avatar
X
Wodan
ISO Advocate
 
Posts: 3238
Joined: 23 Nov 2015, 20:43
Location: ISO-DE
Main keyboard: Intense Rotation!!!
Main mouse: Logitech G903
Favorite switch: ALL OF THEM
DT Pro Member: -
 
Ok, I sent them a support request explaining the issues.
webwit
User avatar
Wild Duck

Unread post19 Jul 2017, 19:41

User avatar
X
webwit
Wild Duck
 
Posts: 10313
Joined: 27 Jan 2011, 23:27
Location: The Netherlands
Main keyboard: HHKB Pro II
Favorite switch: IBM beam spring
DT Pro Member: 0000
 
 
OK, they are quick. What time is convenient? :lol:

Dear Client,

We would like to check the hard drives briefly from our side. Please tell us when we may turn off the server for approx 30-45 minutes in order to perform the test.

Kind regards

xxxxxx
webwit
User avatar
Wild Duck

Unread post19 Jul 2017, 19:48

User avatar
X
webwit
Wild Duck
 
Posts: 10313
Joined: 27 Jan 2011, 23:27
Location: The Netherlands
Main keyboard: HHKB Pro II
Favorite switch: IBM beam spring
DT Pro Member: 0000
 
I told them you are all just silly people so any time is convenient, better sooner than later, but not at night, so I can check after reboot if everything is running well, and they should let me know one hour in advance, so I can announce the downtime.
webwit
User avatar
Wild Duck

Unread post19 Jul 2017, 20:00

User avatar
X
webwit
Wild Duck
 
Posts: 10313
Joined: 27 Jan 2011, 23:27
Location: The Netherlands
Main keyboard: HHKB Pro II
Favorite switch: IBM beam spring
DT Pro Member: 0000
 
Thanks for taking care of this. Really hoping Hetzner doesn't drop the ball!
Wodan
User avatar
ISO Advocate

Unread post19 Jul 2017, 20:06

User avatar
X
Wodan
ISO Advocate
 
Posts: 3238
Joined: 23 Nov 2015, 20:43
Location: ISO-DE
Main keyboard: Intense Rotation!!!
Main mouse: Logitech G903
Favorite switch: ALL OF THEM
DT Pro Member: -
 
The server and thus deskthority will be down from 22:25 UTC July 19th (00:25 CEST July 20th, 18:25 EST July 19th, 15:25 PST July 19th) for an estimated 30 to 45 minutes, for a health check of our hard drives. This is 2 hours and 15 minutes from now. See you on the other side of the event horizon!

P.S. I just completed another off-site backup, just in case.
webwit
User avatar
Wild Duck

Unread post19 Jul 2017, 21:11

User avatar
X
webwit
Wild Duck
 
Posts: 10313
Joined: 27 Jan 2011, 23:27
Location: The Netherlands
Main keyboard: HHKB Pro II
Favorite switch: IBM beam spring
DT Pro Member: 0000
 
matt3o wrote:and that's where raid on just 2 drives is a little pointless since not always the machine can tell which data is actually bad and which one is good (50-50).

I am just talking out of my ass here, since the last time I got deep into drive technology was Amiga floppy disks. If I remember correctly, such a disk was divided in a bunch of tracks, which were divided in a bunch of sectors. Each sector had a checksum. So when you read data from the sector, and then compared with the checksum, you knew if the data was healthy or corrupt. I presume technology hasn't deteriorated and modern HDD and SSD also checksum or otherwise validate parts, so in a raid 1 setup you know which disk has the right data and which the broken?
webwit
User avatar
Wild Duck

Unread post19 Jul 2017, 22:22

User avatar
X
webwit
Wild Duck
 
Posts: 10313
Joined: 27 Jan 2011, 23:27
Location: The Netherlands
Main keyboard: HHKB Pro II
Favorite switch: IBM beam spring
DT Pro Member: 0000
 
Any update?

And yeah, RAID1 would be pretty pointless if the bad drive couldn't be told apart from the good drive
Wodan
User avatar
ISO Advocate

Unread post20 Jul 2017, 06:16

User avatar
X
Wodan
ISO Advocate
 
Posts: 3238
Joined: 23 Nov 2015, 20:43
Location: ISO-DE
Main keyboard: Intense Rotation!!!
Main mouse: Logitech G903
Favorite switch: ALL OF THEM
DT Pro Member: -
 
Yeah, they took 2 hours 15 minutes instead of 30-45 minutes, and then told me this:
Dear Client

Both hard drives are fine. We have started your server back into your installed system. But note there is currently a rebuild of one device running.

Kind regards
xxxxxxx

However, I just checked smartctl -a again, and the numbers seem significantly worse than yesterday.
Code: Select all
root@server [~]# while true; do smartctl -a /dev/sdb |grep Raw_Read_Error_Rate; sleep 300; done
  1 Raw_Read_Error_Rate     0x000f   070   063   044    Pre-fail  Always       -       12163138
webwit
User avatar
Wild Duck

Unread post20 Jul 2017, 06:43

User avatar
X
webwit
Wild Duck
 
Posts: 10313
Joined: 27 Jan 2011, 23:27
Location: The Netherlands
Main keyboard: HHKB Pro II
Favorite switch: IBM beam spring
DT Pro Member: 0000
 
Aw rats. Did they take note of the SMART readouts?

Maybe we should get our shit together and move to a new server. I've really learned to appreciate AWS lately. Depending on the performance we need it might even be cheepcheeper than a small Hetzner root EX server.
Wodan
User avatar
ISO Advocate

Unread post20 Jul 2017, 06:53

User avatar
X
Wodan
ISO Advocate
 
Posts: 3238
Joined: 23 Nov 2015, 20:43
Location: ISO-DE
Main keyboard: Intense Rotation!!!
Main mouse: Logitech G903
Favorite switch: ALL OF THEM
DT Pro Member: -
 
I did sent them yesterday's readouts. It's probably best to keep it simple right now and just hop to another hetzner server. Not the right time for a bigger move.
webwit
User avatar
Wild Duck

Unread post20 Jul 2017, 07:16

User avatar
X
webwit
Wild Duck
 
Posts: 10313
Joined: 27 Jan 2011, 23:27
Location: The Netherlands
Main keyboard: HHKB Pro II
Favorite switch: IBM beam spring
DT Pro Member: 0000
 
 
Unfortunately it seems that hetzner only checks smarctl selftest errors and not the single values. Yesterday the Raw_Read_Error_Rate value was 78, today it's 70 already. Basically you have to wait until the HDD fails, at that point they will change it in few minutes. At this rate at 10 points per day we should have 4-5 days autonomy.
webwit wrote:I am just talking out of my ass here, since the last time I got deep into drive technology was Amiga floppy disks. If I remember correctly, such a disk was divided in a bunch of tracks, which were divided in a bunch of sectors. Each sector had a checksum. So when you read data from the sector, and then compared with the checksum, you knew if the data was healthy or corrupt. I presume technology hasn't deteriorated and modern HDD and SSD also checksum or otherwise validate parts, so in a raid 1 setup you know which disk has the right data and which the broken?

RAID is not a backup system, it's just a way to have some redundancy (or a nice way to be able to add disk space to an array).

Without a raid after yesterday's failure we would probably have a dead server. So hurrah for us! But if it worked the way you are saying we wouldn't have corrupted data, the good bits should have been sync'ed from the healthy HDD, but we had data loss anyway. RAID1 is fine and dandy, but it doesn't save you from data loss, actually since the failure rate of an HDD is around 1.5-3%, having 2 HDD we double our chances of a broken HDD. In a sense having just 1 new HDD is better than having 2 old ones... but hetzner uses hard drives that are running non-stop for ages, so raid even with just two drives makes sense.

But if data loss is your concern, backup is the only solutions.
matt3o
User avatar
-[°_°]-

Unread post20 Jul 2017, 07:26

User avatar
X
matt3o
-[°_°]-
 
Posts: 8657
Joined: 03 Sep 2012, 15:14
Location: Italy
Main keyboard: WhiteFox
Main mouse: Anywhere MX
Favorite switch: Anything, really
DT Pro Member: 0030
 
Weirdly the raw value is going up but the value is at 72 now.

Code: Select all
root@server [~]# while true; do smartctl -a /dev/sdb |grep Raw_Read_Error_Rate; sleep 300; done
  1 Raw_Read_Error_Rate     0x000f   070   063   044    Pre-fail  Always       -       12163138
  1 Raw_Read_Error_Rate     0x000f   070   063   044    Pre-fail  Always       -       12518172
  1 Raw_Read_Error_Rate     0x000f   071   063   044    Pre-fail  Always       -       12762654
  1 Raw_Read_Error_Rate     0x000f   071   063   044    Pre-fail  Always       -       13082807
  1 Raw_Read_Error_Rate     0x000f   071   063   044    Pre-fail  Always       -       13765149
  1 Raw_Read_Error_Rate     0x000f   071   063   044    Pre-fail  Always       -       14005397
  1 Raw_Read_Error_Rate     0x000f   071   063   044    Pre-fail  Always       -       14182096
  1 Raw_Read_Error_Rate     0x000f   071   063   044    Pre-fail  Always       -       14432541
  1 Raw_Read_Error_Rate     0x000f   072   063   044    Pre-fail  Always       -       14697695
  1 Raw_Read_Error_Rate     0x000f   072   063   044    Pre-fail  Always       -       14840703
webwit
User avatar
Wild Duck

Unread post20 Jul 2017, 07:31

User avatar
X
webwit
Wild Duck
 
Posts: 10313
Joined: 27 Jan 2011, 23:27
Location: The Netherlands
Main keyboard: HHKB Pro II
Favorite switch: IBM beam spring
DT Pro Member: 0000
 
yeah the values fluctuate. If you look at the 4th column that is the worst value that has even been registered, while the 5th is the threshold that we should never reach.
matt3o
User avatar
-[°_°]-

Unread post20 Jul 2017, 07:39

User avatar
X
matt3o
-[°_°]-
 
Posts: 8657
Joined: 03 Sep 2012, 15:14
Location: Italy
Main keyboard: WhiteFox
Main mouse: Anywhere MX
Favorite switch: Anything, really
DT Pro Member: 0030
 
 
This signature intentionally left blank...
I ordered a new server.
webwit
User avatar
Wild Duck

Unread post20 Jul 2017, 08:10

User avatar
X
webwit
Wild Duck
 
Posts: 10313
Joined: 27 Jan 2011, 23:27
Location: The Netherlands
Main keyboard: HHKB Pro II
Favorite switch: IBM beam spring
DT Pro Member: 0000
 
 
check the hard drives before installing anything
matt3o
User avatar
-[°_°]-

Unread post20 Jul 2017, 08:47

User avatar
X
matt3o
-[°_°]-
 
Posts: 8657
Joined: 03 Sep 2012, 15:14
Location: Italy
Main keyboard: WhiteFox
Main mouse: Anywhere MX
Favorite switch: Anything, really
DT Pro Member: 0030
 
 
This signature intentionally left blank...
matt3o wrote:check the hard drives before installing anything

Very good point, they re-use servers and we should request a brand new one considering their policy with worn out hdds!
Wodan
User avatar
ISO Advocate

Unread post20 Jul 2017, 09:14

User avatar
X
Wodan
ISO Advocate
 
Posts: 3238
Joined: 23 Nov 2015, 20:43
Location: ISO-DE
Main keyboard: Intense Rotation!!!
Main mouse: Logitech G903
Favorite switch: ALL OF THEM
DT Pro Member: -
 
Both sda and sdb on the new server have a fluctuating Raw_Read_Error_Rate, which after a few queries stabilizes at 080.
I'll run some longer tests.
webwit
User avatar
Wild Duck

Unread post20 Jul 2017, 10:05

User avatar
X
webwit
Wild Duck
 
Posts: 10313
Joined: 27 Jan 2011, 23:27
Location: The Netherlands
Main keyboard: HHKB Pro II
Favorite switch: IBM beam spring
DT Pro Member: 0000
 
80 is fine if the raw value is more or less stable
matt3o
User avatar
-[°_°]-

Unread post20 Jul 2017, 10:09

User avatar
X
matt3o
-[°_°]-
 
Posts: 8657
Joined: 03 Sep 2012, 15:14
Location: Italy
Main keyboard: WhiteFox
Main mouse: Anywhere MX
Favorite switch: Anything, really
DT Pro Member: 0030
 
matt3o wrote:80 is fine if the raw value is more or less stable

Great work webwit and matt3o, I promise I'll refrain from posting memes excessively if that helps. :mrgreen:
seebart
User avatar
Offtopicthority Instigator

Unread post20 Jul 2017, 10:38

User avatar
X
seebart
Offtopicthority Instigator
 
Posts: 11544
Joined: 22 Nov 2013, 20:20
Location: Germany
Main keyboard: Rotation
Main mouse: Steelseries Sensei
Favorite switch: IBM capacitive buckling spring
DT Pro Member: 0061
 
So, not SSD time, yet?

If not as the primary, I'd love to see an SSD being used in a write-through cache.
XMIT
User avatar
[ XMIT ]

Unread post20 Jul 2017, 16:22

User avatar
X
XMIT
[ XMIT ]
 
Posts: 3484
Joined: 21 Dec 2014, 15:32
Location: Austin, TX area
Main keyboard: XMIT Hall Effect
Main mouse: CST L-Trac Trackball
Favorite switch: XMIT 60g Tactile Hall Effect
DT Pro Member: 0093
 
This one:
https://www.hetzner.de/dedicated-rootserver/ex41

When you order you can pick options such as extra SSD drive (cheapest one 250 GB 11,90 EUR per month), but the real question is, do we need it? In any case, that's a different discussion, priority is now to get a stable environment asap. I'm planning the move on Saturday or Sunday.
webwit
User avatar
Wild Duck

Unread post20 Jul 2017, 18:21

User avatar
X
webwit
Wild Duck
 
Posts: 10313
Joined: 27 Jan 2011, 23:27
Location: The Netherlands
Main keyboard: HHKB Pro II
Favorite switch: IBM beam spring
DT Pro Member: 0000
 
What you think about hitting up AWS and getting Elastic Beanstalk hosting for free? I think thats a possibility... then you wouldn't ever have to bother with server hardware
tobsn

Unread post20 Jul 2017, 18:26

X
tobsn
 
Posts: 15
Joined: 21 Aug 2016, 20:50
Location: Poland
DT Pro Member: -
 
If you don't want to go SSD, at least get 15k SAS.
wobbled
User avatar

Unread post20 Jul 2017, 18:27

User avatar
X
wobbled
 
Posts: 939
Joined: 03 Jul 2016, 02:59
Location: UK
Main keyboard: HHKB Hipro
Main mouse: Microsoft IMO 1.1A White
Favorite switch: Capacitive Buckling Spring
DT Pro Member: 0192
 
webwit wrote:This one:
https://www.hetzner.de/dedicated-rootserver/ex41

When you order you can pick options such as extra SSD drive (cheapest one 250 GB 11,90 EUR), but the real question is, do we need it? In any case, that's a different discussion, priority is now to get a stable environment asap. I'm planning the move on Saturday or Sunday.

Unless we are experiencing HDD performance bottlenecks I would prefer a good enterprise hdd over a ssd.
Most HDDs die slowly and give you time to react .. while some SSDs just stop working and there is no way to recover your data.

Maybe get weekls SMART reports from the server for an early watch :)
Wodan
User avatar
ISO Advocate

Unread post20 Jul 2017, 22:20

User avatar
X
Wodan
ISO Advocate
 
Posts: 3238
Joined: 23 Nov 2015, 20:43
Location: ISO-DE
Main keyboard: Intense Rotation!!!
Main mouse: Logitech G903
Favorite switch: ALL OF THEM
DT Pro Member: -
 
Wodan wrote:
webwit wrote:This one:
https://www.hetzner.de/dedicated-rootserver/ex41

When you order you can pick options such as extra SSD drive (cheapest one 250 GB 11,90 EUR), but the real question is, do we need it? In any case, that's a different discussion, priority is now to get a stable environment asap. I'm planning the move on Saturday or Sunday.

Unless we are experiencing HDD performance bottlenecks I would prefer a good enterprise hdd over a ssd.
Most HDDs die slowly and give you time to react .. while some SSDs just stop working and there is no way to recover your data.

Maybe get weekls SMART reports from the server for an early watch :)

Losing your data to something like SSD failure as opposed to catching and replacing a failing HDD is kind of irrelevant IMO, because SSD failure is much less common than HDD failure, by like, an order of magnitude, and i find that generally early warning measures for HDD failure aren't as reliable as one would hope. It can be just as sudden and unexpected as SSD failure.

Not to mention if you don't have some sort of redundancy/backup you might as well just delete everything manually.

And while I'm here, friendly reminder that raid is not a backup.

Of course, not that this entire conversation matters...because deskthority is actually pretty fast and doesn't even need SSDs at all lol.
Norman_
User avatar

Unread post20 Jul 2017, 22:56

User avatar
X
Norman_
 
Posts: 167
Joined: 31 May 2015, 22:48
Location: New Jersey
Main keyboard: RedScarf II+ (RS78)
Main mouse: Zowie FK2
Favorite switch: Anything Alps
DT Pro Member: -
 
Previous

Who is online

Users browsing this forum: Laser, pansku and 14 guests