Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
btrfs: replacing failing device
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Unsupported Software
View previous topic :: View next topic  
Author Message
dufeu
l33t
l33t


Joined: 30 Aug 2002
Posts: 901
Location: US-FL-EST

PostPosted: Sun Jan 23, 2011 3:30 am    Post subject: btrfs: replacing failing device Reply with quote

Is there anyone who's actually gone through the experience of replacing a failing device from a btrfs set? I do have the btrfs wiki instructions, but I'm not sure I understand what I'm reading.

I'm asking this question here and now because I feel I'll only have one real shot at getting this right and I'd rather not mess it up. BTW - all the data on this is system is either duplicated on another system or archived onto DVDs. My fallback if things do go pearshaped is simply to re-initialise this system and reload it from my primary workstation or archival DVDs as appropriate.

This is what my btrfs setup looks like:
Code:
# btrfs fi show
failed to read /dev/sdc
failed to read /dev/sr0
failed to read /dev/pktcdvd/pktcdvd0
failed to read /dev/pktcdvd/sr0
Label: 'PUBLIC'  uuid: b71c7140-e845-4891-a8c9-98599be7d29c
        Total devices 5 FS bytes used 2.77TB
        devid    5 size 233.76GB used 233.63GB path /dev/sdb
        devid    4 size 931.51GB used 841.13GB path /dev/sdg
        devid    3 size 931.51GB used 841.38GB path /dev/sdf
        devid    1 size 465.76GB used 465.39GB path /dev/sdd
        devid    2 size 465.76GB used 465.50GB path /dev/sde

Btrfs v0.19-35-g1b444cd-dirty
The device which is failing is /dev/sde. I'm currently backing up a 1T drive in another computer with the idea I'll remove said drive and use it to replace the failing drive.

The relevant snip of /etc/fstab looks like this:
Code:
/dev/sda1                       /boot           ext2    noauto,noatime          1 2
/dev/sda2                       none            swap    sw,pri=1                0 0
/dev/sda3                       /               ext4    noatime                 0 1
/dev/disk/by-label/PUBLIC       /public         btrfs   noauto,noatime          0 1

As I interpret what I've read, I need to first add the replacement device, then delete the failing device and then cleanup anything that needs cleaning up. From a command standpoint, I believe I need to:
Code:
# btrfs device add /dev/sdh /public
# btrfs device delete /dev/sde /public
# btrfs filesystem balance /public
with '/dev/sdh' being the temporary device path of the replacement drive.

Part of what's confusing is the definition of <path> in the current documentation. Sometimes it seems to mean 'device' path and sometimes it seems to mean 'mountpoint' path. For the above commands, I believe it means 'mountpoint. An example of a possible 'device' path could be '/dev/disk/by-label/PUBLIC'.

After I've deleted the failing hard drive, cleanup would consist of physically moving the new drive to the old drive's slot and making whatever adjustments are needed (if any) to '/etc/fstab' etc. In theory, there should be no adjustments needed.

If the device has failed altogether {i.e. the superblock isn't readable}, then I believe the commands would look like this:
Code:
# mount -o degraded /dev/disk/by-label/PUBLIC /public
# btrfs device delete /dev/sde /public
# btrfs device add /dev/sdh /public
# btrfs filesystem balance /public

Does my above interpretation seem correct?

Finally, for those whom are curious, these are snips from the relevant dmesg:
Code:
[    9.935894] device label PUBLIC devid 5 transid 299901 /dev/sdb
[   10.016415] device label PUBLIC devid 2 transid 299901 /dev/sde
[   10.062778] btrfs bad tree block start 3242998435840 3244877484032
[   10.085405] btrfs bad tree block start 3243054534656 3244933582848
[   10.098738] btrfs bad tree block start 3242998562816 3244877611008
[   10.099089] btrfs bad tree block start 3242998587392 3244877635584
[   10.099435] btrfs bad tree block start 3242998595584 3244877643776
[   10.110150] btrfs bad tree block start 3243091718144 3244970766336
[   10.185609] btrfs bad tree block start 3243089805312 3244968853504
[   10.341323] btrfs bad tree block start 3243034783744 3244913831936
[   10.407348] btrfs bad tree block start 3243132051456 3245011099648
[   10.407353] btrfs bad tree block start 3243132055552 3245011103744
[   11.016176] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x2400000 action 0x0
[   11.016182] ata4.00: BMDMA2 stat 0x6d0009
[   11.016187] ata4: SError: { Handshk UnrecFIS }
[   11.016193] ata4.00: failed command: READ DMA
[   11.016203] ata4.00: cmd c8/00:08:e8:ca:19/00:00:00:00:00/e0 tag 0 dma 4096 in
[   11.016204]          res 51/04:00:ef:ca:19/00:00:00:00:00/f0 Emask 0x1 (device error)
[   11.016208] ata4.00: status: { DRDY ERR }
[   11.016211] ata4.00: error: { ABRT }
[   11.033674] ata4.00: configured for UDMA/100
[   11.033689] ata4: EH complete
[   15.080131] btree_readpage_end_io_hook: 181 callbacks suppressed
[   15.080136] btrfs bad tree block start 3243091783680 3244970831872
[   15.080513] btrfs bad tree block start 3243065462784 3244944510976
[   15.080880] btrfs bad tree block start 3243067731968 3244946780160
[   15.552459] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x2400000 action 0x0
[   15.552465] ata4.00: BMDMA2 stat 0x6d0009
[   15.552471] ata4: SError: { Handshk UnrecFIS }
[   15.552476] ata4.00: failed command: READ DMA EXT
[   15.552485] ata4.00: cmd 25/00:08:a8:58:2e/00:00:3a:00:00/e0 tag 0 dma 4096 in
[   15.552487]          res 51/04:00:af:58:2e/00:00:3a:00:00/f0 Emask 0x1 (device error)
[   15.552491] ata4.00: status: { DRDY ERR }
[   15.552493] ata4.00: error: { ABRT }
[   15.570336] ata4.00: configured for UDMA/100
[   15.570353] ata4: EH complete

Code:
Jan 20 15:56:02 slizard kernel: [713460.227912] sd 4:0:0:0: [sde] Unhandled error code
Jan 20 15:56:02 slizard kernel: [713460.227915] sd 4:0:0:0: [sde]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 20 15:56:02 slizard kernel: [713460.227919] sd 4:0:0:0: [sde] CDB: Write(10): 2a 00 1f 5d 1e 00 00 00 80 00
Jan 20 15:56:02 slizard kernel: [713460.227928] end_request: I/O error, dev sde, sector 526196224
Jan 20 15:56:02 slizard kernel: [713460.229730] sd 4:0:0:0: [sde] Unhandled error code
Jan 20 15:56:02 slizard kernel: [713460.229734] sd 4:0:0:0: [sde]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 20 15:56:02 slizard kernel: [713460.229738] sd 4:0:0:0: [sde] CDB: Write(10): 2a 00 1f 5d 1e 80 00 00 80 00
Jan 20 15:56:02 slizard kernel: [713460.229746] end_request: I/O error, dev sde, sector 526196352
Jan 20 15:56:02 slizard kernel: [713460.230915] sd 4:0:0:0: [sde] Unhandled error code
Jan 20 15:56:02 slizard kernel: [713460.230918] sd 4:0:0:0: [sde]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 20 15:56:02 slizard kernel: [713460.230922] sd 4:0:0:0: [sde] CDB: Write(10): 2a 00 1f 5d 1f 00 00 00 80 00
Jan 20 15:56:02 slizard kernel: [713460.230930] end_request: I/O error, dev sde, sector 526196480

_________________
People whom think M$ is mediocre, don't know the half of it.
Back to top
View user's profile Send private message
gerdesj
l33t
l33t


Joined: 29 Sep 2005
Posts: 621
Location: Yeovil, Somerset, UK

PostPosted: Tue Jan 25, 2011 1:27 am    Post subject: Re: btrfs: replacing failing device Reply with quote

What RAID level are you using?

I will hazard a guess - its RAID 0 and it has a failed disc.

Code:
 
Label: 'PUBLIC'  uuid: b71c7140-e845-4891-a8c9-98599be7d29c
        Total devices 5 FS bytes used 2.77TB
        devid    5 size 233.76GB used 233.63GB path /dev/sdb
        devid    4 size 931.51GB used 841.13GB path /dev/sdg
        devid    3 size 931.51GB used 841.38GB path /dev/sdf
        devid    1 size 465.76GB used 465.39GB path /dev/sdd
        devid    2 size 465.76GB used 465.50GB path /dev/sde


You can write off all data on there. Glad to hear that you have a backup.

I don't know exactly what btrfs needs but from experience of countless RAID controller, offline the failed disk, install the new one and then rebuild the array. Then put the data back.

You used RAID0 there on five discs- please experiment and research it properly. There is nothing wrong with doing that if you have a good backup regime. You'll get fantastic performance across five spindles.

If redundancy might be more important than raw space notice that you have 2 pairs that would make good RAID1s.

Cheers
Jon
Back to top
View user's profile Send private message
Letharion
Veteran
Veteran


Joined: 13 Jun 2005
Posts: 1320
Location: Sweden

PostPosted: Sat Apr 02, 2011 10:59 am    Post subject: Reply with quote

I have the exact same question. Failed drive, in this case a raid1, that I want to replace.
The way I read the wiki I should do:
Code:
mount -o degraded /dev/sdb /mnt/gentoo
btrfs device delete missing /mnt/gentoo

Mounting works well, and I can see all the data, but the delete returns: ERROR: error removing the device 'missing'
Tried the same with /dev/sdc, the old devices ID, but that gives the same error.
Anyone who knows why that is?
Back to top
View user's profile Send private message
paddlaren
Tux's lil' helper
Tux's lil' helper


Joined: 23 Nov 2005
Posts: 97
Location: Hörby, Sweden

PostPosted: Fri Jun 22, 2012 10:49 am    Post subject: Reply with quote

Hi!

From https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices:
Quote:
In case of raidXX layout, you cannot go below the minimum number of the device required. So before removing a device (even the missing one) you may need to add a new one. For example if you have a raid1 layout with two device, and a device fails, you must:


This does not always help though :( In my test set-up I have 4 disks in RAD10 that is close to full. Can mount but still not able to repair.

BR
Erik
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Unsupported Software All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum