Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
RAID array broken, can't boot
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 44153
Location: 56N 3W

PostPosted: Tue Feb 12, 2019 9:03 pm    Post subject: Reply with quote

ExecutorElassus,

You can migrate to raid1 later.
If there is a risk of not fitting everything into 250G of one SSD, the second one can be used for another 250G of space.

If you want to make it easy to migrate to raid1 later, set up the raid1 sets on the SSD as degraded now.
You can add the other drive later, if its not got carrier partitions on it.

If you use both SSDs in the raid1 now, what space will you recover the nine carrier partitions to?
The 2Tb drive is the wrong answer. sdb may fail totally at any time, then the 2TB drive, or at least sdd4, becomes essential to your data recovery.

Go for the SSD raid one now, if you will still have space to recover the carrier partitions.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
ExecutorElassus
Veteran
Veteran


Joined: 11 Mar 2004
Posts: 1192
Location: Stuttgart, Germany

PostPosted: Tue Feb 12, 2019 9:30 pm    Post subject: Reply with quote

Hi Nedda,

OK, I'll just set it up as a degraded array for now. I wish I could find the old thread where somebody told me how to go through using rsync to copy over each of the relevant subdirectories to the new locations, but I guess it won't matter as much if I'm partitioning the SSD with LVMs to match.

One other wrinkle, when we get to it: Neddy, do you remember that guide you wrote years ago about setting up an initrd to pre-mount /var and /usr to boot? I still use that, which means I'm going to need to re-do the initrd to use the new mountpoints on the SSD before I can boot the system itself. When I get that far, that is.

I could use the other SSD as a midway storage medium for the carrier partitions, but I have no hope of doing so with the last (it's over 1TB in size). That one I'll have to check some other way.

So once I have the SSD partitioned, I'll let you know (it'll have to be tomorrow) and ask for help setting up the filesystems and copying them over from sd[ab]4.

Thanks for the help,

EE
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 44153
Location: 56N 3W

PostPosted: Tue Feb 12, 2019 9:37 pm    Post subject: Reply with quote

ExecutorElassus,

If that's the initrd guide I posted on the wiki somewhere, after it mounts root, it reads the real /etc/fstab no find out where /var and /usr are.
That means it will not need to be changed because it will read the /etc/fstab from the SSD, which you will have updated.

I won't be around until about 7:00 PM Wednesday.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 14364

PostPosted: Wed Feb 13, 2019 2:45 am    Post subject: Reply with quote

I cannot help with RAID questions. I defer to Neddy for that.

As for rsync, -a is short for -rlptgoD, so the -tr shown is unnecessary, but harmless. For this purpose, I would add in -A -X. See man rsync for what all these flags do. Depending on the size of the files, I might use --inplace if I expected to need to interrupt the rsync (or have it interrupted for me) and restart it later. You can use -n to make rsync print what it would do, without doing any work. Once you are satisfied that your invocation will do what you want, remove -n.

If you have specific questions for me, please ask and I will help as best I can. From what I can tell, Neddy's advice is thorough and correct, so I am standing back until called. I think your last remark to me was asking about guidance for rsync. What more do you want to know there?
Back to top
View user's profile Send private message
ExecutorElassus
Veteran
Veteran


Joined: 11 Mar 2004
Posts: 1192
Location: Stuttgart, Germany

PostPosted: Wed Feb 13, 2019 8:16 am    Post subject: Reply with quote

Hi Neddy, Hu,

do I need a special module to work on an SSD? Because when Itried to run fdisk on /dev/sde, I got an I/O error.

Thanks for the help,

EE
UPDATE: Help! I rebooted, and it immediately assembled md127 out of sd[dca]4 and started rebuilding onto sdd4! How do I stop this? How do I preserve the data I spent days trying to recover from sdb4?
Back to top
View user's profile Send private message
ExecutorElassus
Veteran
Veteran


Joined: 11 Mar 2004
Posts: 1192
Location: Stuttgart, Germany

PostPosted: Wed Feb 13, 2019 12:33 pm    Post subject: Reply with quote

UPDATE: Well, "§%&/!!. I was too scared to stop the raid rebuild midway, so I let it complete. Since all the VGs mounted OK on the liveCD, I shut down, unplugged the bad drive and removed the CD, and booted.

Everything booted OK, fsck ran on a couple partitions and fixed a couple missing inodes, and I got to a prompt. I started X. Now I'm forcing fsck to check each of the carrier partitions.

There's a very good chance that the first rebuild, a week ago when I accidentally pulled sdc instead of sdb, synced everything correctly. As you noted, the bad blocks only start once I'm onto the carrier partitions, and those weren't mounted when I booted without sdc. So they wouldn't have any data written to them that would corrupt a rebuild later.

But as I said: I'm running fsck on all the carrier partitions just in case.

Good heavens, that was a stressful two hours. I was afraid I'd børked all my data, and knew that if the rebuild went wrong I'd lose the last week of painstaking ddrescue work and have to start over.

But I'm back on my desktop, and everything works so far.

I'll report back with updates.

Stay tuned,

EE
Back to top
View user's profile Send private message
ExecutorElassus
Veteran
Veteran


Joined: 11 Mar 2004
Posts: 1192
Location: Stuttgart, Germany

PostPosted: Wed Feb 13, 2019 1:53 pm    Post subject: Reply with quote

Update: well, it booted, and all the carrier partitions checked out. I ran 'emerge --sync' to re-populate the portage tree (it still had the permissions problems it was exhibiting that precipitated this whole mess), and so far I haven't been able to find any bad files or corrupted data (but, like I said, there are thousands of files in hundreds of directories, so it's going to take a while to check everything).

In any case, the main OS is working. I'm going to update the kernel and let 'emerge -uD world' run later, but for now I think I'm going to leave it in place and hope hope that whatever bad blocks there were on sdb never made it onto sdc or sda, and that the rebuild thus went through OK.

I'll post again in a couple days, but I'm cautiously optimistic.

Stay tuned,

EE
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 44153
Location: 56N 3W

PostPosted: Wed Feb 13, 2019 7:01 pm    Post subject: Reply with quote

ExecutorElassus,

Not mounted means that there were no writes via the filesystem.
The raid can do housekeeping writes anywhere, anytime, e.g. resyncing.

fsck is a very bad idea if you let it change anything. Its harmless to use to see if the filesystem really is self consistent.
Like I said, in the face of missing metadata, it guesses and it doesn't always guess correctly.
Letting fsck 'fix' a filesystem can make a bad situation worse.
Did fsck do anything?
Look in /lost+found at the top level of each filesystem.

We know that sd[dca]4 has holes in and sdd4 is not correct but it rebuilt sdd4 from the other drives.
You now have a self consistent raid raid set based on whatever was on sd[ac]4 which we didn't ever test.
You still have sdb4 but I suspect that its no longer useful.

You just have to sift through your data now.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
ExecutorElassus
Veteran
Veteran


Joined: 11 Mar 2004
Posts: 1192
Location: Stuttgart, Germany

PostPosted: Thu Feb 14, 2019 5:40 am    Post subject: Reply with quote

Hi Neddy,

fsck didn't change anything on any of the partitions it checked, except (I think) one inode on /usr with zero dtime. /usr/lost+found has *thousands* of files, but nothing newer than 2012. /var/lost+found/, /opt/lost+found/, and /lost+found/ are all empty. Is it safe to delete the contents of /usr/lost+found/?

I know this isn't the best result, and I'm frustrated that I lost the ddrescue'd sdd4. I'm checking through files now, but so far everything looks all right. Of course, as always, I have no way to know until I happen across a file or directory that's actually broken (this happened the last time I went through this on an emerge, when it turned out one whole directory of a particular package's .so files had its contents all turned to directories, and emerge renamed all its .so files to .so.backup, which I then had to clean up).

But I'm afraid at this point that if I try to ddrescue sdb4 again I'm going to get even less data than I managed the last time.

Sigh. I guess I'm just going to have to take whatever losses I have, hope that they're few and can be rebuilt, and remember next time this happens to use 'mdadm --replace' before I start pulling drives out.

I'll report back if I come across any damage, but for now I think the rest of this is on me to fix.

Thank you, as always, for helping me get my machine back up and running. You've always been the one to walk me through my various crises, and I'm really thankful the community has someone like you to help. If I'm ever in your part of the world, I owe you a few drinks.

Cheers,

EE
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page Previous  1, 2, 3, 4
Page 4 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum