Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
mdraid5 recovery
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7072
Location: almost Mile High in the USA

PostPosted: Wed Aug 09, 2017 8:40 am    Post subject: mdraid5 recovery Reply with quote

Well, I had a MD raid5 die from degraded to nothing. It was a 4-disk which dropped to 3-disk degraded. Another disk subsequently failed.

Well, it looks like the disk had a bad sector show up but the disk appears to still be somewhat usable versus completely dead.

So I was able to --force assemble the RAID but when I tried to add another disk to the array, it started to resync but when it hit the bad sector, it kicks the sick disk out and the RAID dies with the kick.

What I wonder: is it possible to stop md from kick that sick disk that would kill degraded mode? What I was wondering at this point is whether I can just do a disk verify of all files on the disk and see which files landed on the bad sectors and restore just those files from backup.

However as it stands, first bad read and the drive gets kicked... any suggestions for dealing with this?

I suppose what's lost is lost at this point, I've lost 4 days from a restored backup, which isn't too bad. At this point I'm just curious of what I can do with the failed RAID member, perhaps the bad sectors are storing noncritical or old files that haven't changed in a while and a four day old backup restore would have been like nothing ever happened.

Ideas?
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Telemin
l33t
l33t


Joined: 25 Aug 2005
Posts: 734
Location: Glasgow, UK

PostPosted: Wed Aug 09, 2017 11:19 am    Post subject: Reply with quote

Yes, you can do this by forcing the reallocation of the bad sector and then telling mdraid to continue. I have done it once before following this approach: http://www.sj-vs.net/forcing-a-hard-disk-to-reallocate-bad-sectors/ It does work but I found the disk had too many bad blocks so in the end I just let the raid die and restored from backup.

With hindsight I might have been able to use badblocks non-destructive mode to generate a list to reallocate. You could try that if there is more than one bad block on the drive.

-Telemin-
_________________
The Geek formerly known as -Freestyling-
When you feel your problem has been solved please add [Solved] to the topic title.
Please adopt an unanswered post
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2968
Location: Germany

PostPosted: Wed Aug 09, 2017 12:48 pm    Post subject: Reply with quote

ddrescue copy the bad disk to a good one and then cross fingers

you might have to disable the bad block list the raid itself keeps (update=force-no-bbl) or md will keep reporting read errors even if the drive is now good
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 42850
Location: 56N 3W

PostPosted: Wed Aug 09, 2017 1:50 pm    Post subject: Reply with quote

eccerr0r,

I've done the ddrescue thing after I had two drives drop out of a raid5 15 min apart.

I ddrescued the newest dropout and put the raid back together. I lost one 4k block somewhere in my DVD collection, so I was lucky.
If it was a piece of a directory or filesystem metadata, the damage would have been much worse.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7072
Location: almost Mile High in the USA

PostPosted: Wed Aug 09, 2017 1:52 pm    Post subject: Reply with quote

Well, that's what I was trying to avoid :) I was thinking I could just force remap the bad sectors, but this would make reads succeed. Then I won't know which sectors are bad when I subsequently test read the files, they will come back as corrupted "good" instead of "bad sector".

I'm most worried because I have mdraid -> lvm -> ext3fs so I have a few layers of block translations and rather the kernel do the computations for me :)

I would think there should be a kernel option to disable dropping of disks when it falls below degraded mode? Marking the array as read-only when this happens is a must.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7072
Location: almost Mile High in the USA

PostPosted: Thu Aug 10, 2017 4:37 pm    Post subject: Reply with quote

I tried to ddrescue the lvm volume, that didn't work so well. It D-state hung with no progress for an hour and still says it has 0 seconds since last successful read.

I tried mounting the volume with md set readonly... failed because ext4fs (I assume ext3fs too) wants to replay the journal (since it was forcibly shutdown) and thus fails.

Ugh. Wonder if I have to copy all three disks' images onto another disk so I can play with reassembling them... and perhaps overwrite each of the images' corresponding bad sectors with a canary and grep each file for the canary.

Though disks are getting larger, finding a disk or disks that can hold several 500GB images on them is not trivial for home use.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7072
Location: almost Mile High in the USA

PostPosted: Sun Aug 13, 2017 11:11 pm    Post subject: Reply with quote

I ddrescued the sick disk to my new RAID and got exactly two 4KB blocks that it could not read.
Took a greater part of a day to read all three disks (about 1.5 TB) and write the images to my replacement RAID through NFS. Incidentally, I copied two disks with ddrescue and the third with regular dd with bs=4k. The dd was started last, and it finished first (there were two good disks, one was using straight dd, other using ddrescue for the heck of it). Can't explain that other than possible bottlenecking for whatever reason, though everything would be bottlenecked through nfs and the destination RAID.

---

I tried overwriting the ddrescued images with a canary and loop mounted the images, and force assembled them...
Grepping through all the files...

I FIND NOTHING!

Aaaaaaauuugh mdraid dropped a drive because it found a bad sector on free space? :( Alas there's no way for it to know.

I suppose once btrfs works well for raid5 I'll have to switch to that. It at least should know which sectors are free sectors...

Fortunately the only data lost as far as I know is... a couple of emerge --updates... Everything else was safe on another partition that had a backup just an hour before the mishap.

Moral of the story: If you value your data, do not run in degraded mode even if you have backups!
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum