18/11/2017 - The Backup Failure

Posted on Sat 18 November 2017 in Virtualization

Well that's all over now, sort of.

I have a week off work, it's traditional to spend that time working on home PC projects.

Last holiday I added a 4TB drive and accidentally got GPU passthrough working. Result. Since doing so, I've noticed a few performance issues and have come to quite a worrying realisation.

My brand new super awesome virtualized desktop is sitting on the same hard drive as all my other VM's and most worryingly, my RDM files. When my desktop boots it's sluggish, when MKD kicks in to full swing everything slows right down. I'm worried that the desktop, hammering the disk all day, will eventually wear out the quite old 1TB DS01 on which everything currently resides. Everything could fuck up. I need to fix this.

I have two 4TB drives to add to MKV32 and a spare SSD I want to experiment with. (can I RDM a desktop to one?) Little bit of background, I have a failing disk, have just moved a bunch of stuff around at home and don’t want to have to go through another upgrade process for a while. I have a just over a terabyte free so can ride for a while but really do need to get the remaining threes out and some fours in. I also use drive pool.

DrivePool is supposed to store NTFS partitions so if things go wrong, unlike RAID, you can simply hot-plug and read the disk. I have used this function before.

Full disclaimer, where I say DrivePool fucked up, I fucked up. DrivePool and Scanner worked exactly as they should have done. I failed to take action. I failed proper procedure. I fucked up. I've used DrivePool for years and it’s a fantastic product. As is DriveScanner. Buy them.

My DrivePool fucked up. I had a bad disk. It'd been flagging for a while. I'm 99% sure a disk check would have fixed things. As I had two brand new 4TB's in hand I decided to remove the failing 3TB ASAP and get a 4 in with replication going. Knowing full well I had cold storage backups I clicked remove. A few hours later, its 1am now, it failed. Fuck it. Bed time. Woke up, Googled drive removal - I'm doing things correctly. Click remove again. Few hours later it errors. So I reboot. MK8 won't start up again - missing a disk. The failing one. Edit settings - disk missing. Putty in to MKV32, the disk, the RDM file, are nowhere to be seen. Take drive out, USB to desktop - NTFS intact all files are ok. Great. 4TB in, replication starts.

As it's going I notice a few things missing. Gaps in sequences. I check, some files are missing, like 3/10 from each folder. I check the pulled drive. They are included in the .parts file. Start to copy back over, explorer window flashes, 90% of the data disappears! Disk still showing as nearly full? CHKDSK, corrects the fulness issue however the drive is in fact, nearly empty. The failing disk, has failed. Fuck.

So I'm easily able to copy back from cold storage. I break out the disks. Start to copy over. 3-2-1 drive 1 of 7 fails. Double fuck. A few pulls and tests later. Yup. Gone. This happened to another disk. I lost two of seven disks. TWO of SEVEN BACKUP DISKS

My storage has failed. My backups have failed (in part). Fuck fuck fuck.

After restoration efforts from cold storage, if you break the backups in to three sets, 2 and 3, completely unaffected, all recovered. Set 1 however missing somewhere between 250-350 files. I can't be assed to count but thankfully, I have a 3rd, last line, text based backup I'm able to refer to. Awful but functional.

I'm able to restore those files so really nothing has been deleted forever. Just painfully annoying that MY FUCKING BACKUP PLAN FUCKING FAILED. COLD FUCKING STORAGE FUCKING FAILED, within a year! Kicker.

I now have to re-address my backups and sort out the performance issues which could lead to more data loss! I need at least another week….