Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
trying to recover btrfs pool
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
dufeu
l33t
l33t


Joined: 30 Aug 2002
Posts: 896
Location: US-FL-EST

PostPosted: Wed Apr 06, 2016 2:03 pm    Post subject: trying to recover btrfs pool Reply with quote

I can't mount my main btrfs pool:
Code:
# mount -t btrfs -o ro,recovery,nospace_cache,clear_cache /dev/sdb /PublicA
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so

The relevent system log messages {from dmesg} from automount at boot time and from the above manual recovery mount attempt are:
Code:
[   20.295632] BTRFS: device label PhoenixRootSSD devid 1 transid 300544 /dev/sda5
[   20.300144] BTRFS info (device sda5): disk space caching is enabled
[   20.300148] BTRFS: has skinny extents
[   20.321855] BTRFS: detected SSD devices, enabling SSD mode
[   20.998071] BTRFS: device label FSgyroA devid 9 transid 625039 /dev/sdm
[   20.999984] BTRFS: device label FSgyroA devid 10 transid 625039 /dev/sdn
[   21.004127] BTRFS: device label FSgyroA devid 11 transid 625039 /dev/sds
[   21.011808] BTRFS: device label FSgyroA devid 12 transid 625039 /dev/sdu
[   21.109647] BTRFS: device label FSgyroA devid 6 transid 625065 /dev/sdf
[   21.130846] BTRFS: device label FSgyroA devid 5 transid 625065 /dev/sde
[   21.131920] BTRFS: device label FSgyroA devid 3 transid 625065 /dev/sdc
[   21.133196] BTRFS: device label FSgyroA devid 17 transid 625065 /dev/sdac
[   21.152346] BTRFS: device label FSgyroA devid 19 transid 625065 /dev/sdx
[   21.158732] BTRFS: device label FSgyroA devid 15 transid 625065 /dev/sdz
[   21.168634] BTRFS: device label FSgyroA devid 20 transid 625065 /dev/sdad
[   21.172592] BTRFS: device label FSgyroA devid 1 transid 625065 /dev/sdb
[   21.173639] BTRFS: device label FSgyroA devid 18 transid 625065 /dev/sdaf
[   21.178384] BTRFS: device label FSgyroA devid 2 transid 625065 /dev/sdd
[   21.212464] BTRFS: device label FSgyroA devid 16 transid 625065 /dev/sdy
[   21.290614] BTRFS: device label FSgyroA devid 7 transid 625065 /dev/sdi
[   21.309370] BTRFS: device label FSgyroA devid 8 transid 625065 /dev/sdj
[   21.372684] BTRFS: device label FSgyroA devid 4 transid 625065 /dev/sdh
[   21.443467] BTRFS: device label FSgyroA devid 14 transid 625065 /dev/sdag
[   21.495110] BTRFS: device fsid a7e2e4f6-e324-4cf4-8b76-33bb7dedf5d1 devid 1 transid 14 /dev/sdab1
[   21.652071] BTRFS: device label PhoenixRoot devid 1 transid 593561 /dev/sdg5
[   29.881428] BTRFS info (device sda5): enabling auto defrag
[   29.881436] BTRFS info (device sda5): disk space caching is enabled
[   30.063829] BTRFS info (device sdg5): enabling auto defrag
[   30.063837] BTRFS info (device sdg5): disk space caching is enabled
[   30.063838] BTRFS: has skinny extents
[  340.714491] BTRFS info (device sdag): disk space caching is enabled
[  340.714496] BTRFS: has skinny extents
[  341.010175] BTRFS: failed to read chunk tree on sdag
[  341.030490] BTRFS: open_ctree failed
[  341.056664] BTRFS info (device sdag): disk space caching is enabled
[  341.056668] BTRFS: has skinny extents
[  341.070958] BTRFS: failed to read chunk tree on sdag
[  341.090538] BTRFS: open_ctree failed
[  341.176337] BTRFS info (device sdag): disk space caching is enabled
[  341.176340] BTRFS: has skinny extents
[  341.181257] BTRFS: failed to read chunk tree on sdag
[  341.193838] BTRFS: open_ctree failed
[  341.301907] BTRFS info (device sdag): disk space caching is enabled
[  341.301911] BTRFS: has skinny extents
[  341.302754] BTRFS: failed to read chunk tree on sdag
[  341.313773] BTRFS: open_ctree failed
[  341.681433] BTRFS info (device sdag): disk space caching is enabled
[  341.681437] BTRFS: has skinny extents
[  341.682436] BTRFS: failed to read chunk tree on sdag
[  341.700410] BTRFS: open_ctree failed
[  342.535884] BTRFS info (device sdag): disk space caching is enabled
[  342.535887] BTRFS: has skinny extents
[  342.536531] BTRFS: failed to read chunk tree on sdag
[  342.550450] BTRFS: open_ctree failed
[  342.562704] BTRFS info (device sdag): disk space caching is enabled
[  342.562708] BTRFS: has skinny extents
[  342.564068] BTRFS: failed to read chunk tree on sdag
[  342.594017] BTRFS: open_ctree failed
[  343.059777] BTRFS info (device sdag): disk space caching is enabled
[  343.059782] BTRFS: has skinny extents
[  343.061271] BTRFS: failed to read chunk tree on sdag
[  343.083753] BTRFS: open_ctree failed
[  343.501960] BTRFS info (device sdag): disk space caching is enabled
[  343.501963] BTRFS: has skinny extents
[  343.506562] BTRFS: failed to read chunk tree on sdag
[  343.520391] BTRFS: open_ctree failed
[  344.010038] BTRFS info (device sdag): disk space caching is enabled
[  344.010042] BTRFS: has skinny extents
[  344.014591] BTRFS: failed to read chunk tree on sdag
[  344.037124] BTRFS: open_ctree failed
[  344.249147] BTRFS info (device sdag): disk space caching is enabled
[  344.249152] BTRFS: has skinny extents
[  344.270668] BTRFS: failed to read chunk tree on sdag
[  344.283740] BTRFS: open_ctree failed

[  570.894920] BTRFS info (device sdag): enabling auto recovery
[  570.894926] BTRFS info (device sdag): disabling disk space caching
[  570.894929] BTRFS info (device sdag): force clearing of disk cache
[  570.894931] BTRFS: has skinny extents
[  570.896272] BTRFS: failed to read chunk tree on sdag
[  570.907534] BTRFS: open_ctree failed

When I ask btrfs to do a filesystem show, I get this:
Code:
# btrfs fi show --si
Label: 'PhoenixRootSSD'  uuid: ed1790a7-87e6-466c-a68c-e375303fd99f
        Total devices 1 FS bytes used 91.81GB
        devid    1 size 214.79GB used 118.12GB path /dev/sda5

Label: 'PhoenixRoot'  uuid: 7ba4f981-c2ff-4a70-96a6-4c4b25f96e96
        Total devices 1 FS bytes used 2.20TB
        devid    1 size 2.98TB used 2.61TB path /dev/sdg5

Label: none  uuid: a7e2e4f6-e324-4cf4-8b76-33bb7dedf5d1
        Total devices 1 FS bytes used 393.22kB
        devid    1 size 3.00TB used 2.17GB path /dev/sdab1

checksum verify failed on 120890386268160 found D7319043 wanted 33D22DF5
checksum verify failed on 120890386268160 found 50ECAB17 wanted 2D8EEBCA
checksum verify failed on 120890386268160 found D7319043 wanted 33D22DF5
bytenr mismatch, want=120890386268160, have=65536
Label: 'FSgyroA'  uuid: 4dae41b0-a459-4c20-a09d-0aca9563b9ad
        Total devices 20 FS bytes used 58.44TB
        devid    1 size 4.00TB used 3.80TB path /dev/sdb
        devid    2 size 4.00TB used 3.80TB path /dev/sdd
        devid    3 size 4.00TB used 3.80TB path /dev/sdc
        devid    4 size 3.00TB used 3.00TB path /dev/sdh
        devid    5 size 5.00TB used 4.94TB path /dev/sde
        devid    6 size 5.00TB used 4.95TB path /dev/sdf
        devid    7 size 5.00TB used 4.94TB path /dev/sdi
        devid    8 size 5.00TB used 4.94TB path /dev/sdj
        devid    9 size 6.00TB used 5.70TB path /dev/sdm
        devid   10 size 6.00TB used 5.70TB path /dev/sdn
        devid   11 size 6.00TB used 5.70TB path /dev/sds
        devid   12 size 6.00TB used 5.70TB path /dev/sdu
        devid   14 size 3.00TB used 3.00TB path /dev/sdag
        devid   15 size 3.00TB used 3.00TB path /dev/sdz
        devid   16 size 3.00TB used 3.00TB path /dev/sdy
        devid   17 size 3.00TB used 3.00TB path /dev/sdac
        devid   18 size 3.00TB used 3.00TB path /dev/sdaf
        devid   19 size 3.00TB used 3.00TB path /dev/sdx
        devid   20 size 3.00TB used 3.00TB path /dev/sdad
        *** Some devices missing

All my btrfs devices are found and the correct number of devices are displayed for each btrfs pool. Using 'blkid', I can verify physical access at the system level for all 20 devices in the LABELed "FSgyroA" pool:
Code:
# blkid | sort | grep FSgyroA
/dev/sdac: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="be88762e-f4fd-4bec-a7ee-f27d78f162c0" TYPE="btrfs"
/dev/sdad: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="4c10a330-bfc7-4c4b-9119-b410a686b712" TYPE="btrfs"
/dev/sdaf: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="662b24ce-5aad-4c16-9029-44b335712496" TYPE="btrfs"
/dev/sdag: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="2b9aa34f-710c-46c4-8541-5895afdf4da3" TYPE="btrfs"
/dev/sdb: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="5a067076-bec5-41fc-b8b5-7e119563a9d4" TYPE="btrfs"
/dev/sdc: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="55e72f11-edf0-4cdb-a745-ac325c347747" TYPE="btrfs"
/dev/sdd: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="78eae4ce-010b-434b-b2b0-08d0ff8ed24e" TYPE="btrfs"
/dev/sde: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="dace92ea-2e68-4b55-8037-be5e18b3cf0f" TYPE="btrfs"
/dev/sdf: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="3362a38e-fc7b-4673-bda5-db1807a7abeb" TYPE="btrfs"
/dev/sdh: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="bdbd2806-9427-4cbc-86b2-9c700714ffcf" TYPE="btrfs"
/dev/sdi: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="3ae03c08-816d-4293-b2d3-0aa78bb3ba96" TYPE="btrfs"
/dev/sdj: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="8f47d63e-db18-4ad0-a332-b0244963d4ad" TYPE="btrfs"
/dev/sdm: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="5024ddc8-87b8-44d3-ba2c-e006ecd7c352" TYPE="btrfs"
/dev/sdn: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="a5ac4e80-bda5-4fcf-899e-3cedea601b65" TYPE="btrfs"
/dev/sds: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="93053d50-6b11-4629-8381-1433ce9fd9ae" TYPE="btrfs"
/dev/sdu: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="5e77969f-0e7a-404d-97fb-2bc0f5125e5f" TYPE="btrfs"
/dev/sdx: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="8b5d618f-37f2-4e89-a37c-f1d8e85795e3" TYPE="btrfs"
/dev/sdy: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="9cd887e7-8825-435f-9988-e59b7488f235" TYPE="btrfs"
/dev/sdz: LABEL="FSgyroA" UUID="4dae41b0-a459-4c20-a09d-0aca9563b9ad" UUID_SUB="2e2a515c-d8c2-492e-b0e9-89cdc5ff9b0d" TYPE="btrfs"

I can also use 'smartctl' to verify access to each device. SMART reports the health status of all 20 devices as "Passed".

When I try to do a dry-run of 'btrfs restore', it also fails:
Code:
# btrfs restore -D -i -v /dev/sdb /dev/null
checksum verify failed on 120890386268160 found D7319043 wanted 33D22DF5
checksum verify failed on 120890386268160 found 50ECAB17 wanted 2D8EEBCA
checksum verify failed on 120890386268160 found D7319043 wanted 33D22DF5
bytenr mismatch, want=120890386268160, have=65536
This is a dry-run, no files are going to be restored
parent transid verify failed on 120874721263616 wanted 625047 found 625039
parent transid verify failed on 120874721263616 wanted 625047 found 625039
checksum verify failed on 120874721263616 found 6FE4916B wanted 824E1F4D
checksum verify failed on 120874721263616 found 6FE4916B wanted 824E1F4D
bytenr mismatch, want=120874721263616, have=45608042283264
Error searching -5

Based on my reading of the 'btrfs-check' and 'btrfs-rescue' man pages, I can either do:
Code:
btrfs rescue chunk-recover -v /dev/sdag

or do:
Code:
btrfs check --init-csum-tree --init-extent-tree /dev/sdb
btrfs check --repair

To the best of my knowlege, the sistuation did not arise from hard disk failure. I believe the sequence of events is:
  1. One or possibly more of my external devices had the USB 3.0 comunications link fail. I recall seeing the message which is generated when a USB based storage device is newly connected.
  2. I was near the end of a 'btrfs balance' run which included converting the pool from RAID5 to RAID6. There were approximately 1000 chunks {out of 22K+ chunks} left to go.
  3. I was also participating in several torrents {this means my btrfs pool was active}

From the ouput of 'dmesg', the section:
Code:
[   20.998071] BTRFS: device label FSgyroA devid 9 transid 625039 /dev/sdm
[   20.999984] BTRFS: device label FSgyroA devid 10 transid 625039 /dev/sdn
[   21.004127] BTRFS: device label FSgyroA devid 11 transid 625039 /dev/sds
[   21.011808] BTRFS: device label FSgyroA devid 12 transid 625039 /dev/sdu

bothers me because the transid value of these four devices doesn't match the other 16 devices in the pool {should be 625065}. In theory, I bleieve these should all have the same transid value. These four devices are all on a single USB 3.0 port and this is the link I believe went down and came back up. This is an external, four drive bay case with 4 6T drives in it.

From the 'btrfs fi show' command, the message "*** Some devices missing" is frustrating because it doesn't tell me which devices it thinks are missing.

The messages in the system log suggest to me that the device "/dev/sdag" has a corrupted 'chunk tree' and this is the device that is considered "missing". This assumes that each device has it's own 'chunk-tree'. Since I know nothing about the internals of btrfs, I suppose there could be {effectively} a single 'chunk-tree' for the entire pool.

I understand I'll be losing some data. Of course, I'd like to recover as much as possible. There are two possible approaches:
  1. Somehow fix things so I can mount the pool 'in place'. I don't mind rolling back {if possible} the other 16 devices so that all devices are at the same transid. I can recreate any corrupt/missing files up to several weeks back. This might include fixing the chunk-tree, re-creating any other trees or other repairs.
  2. Somehow fix things so that I can perform 'btrfs restore' which will copy all recoverable files to a new storage location. This could mean fixing the chunk-tree as well or possibly disconnecting the apparent problem device /dev/sdag and mount the pool as read-only,degraded.


I'm reluctant to do anything further which can change the data on these drives until I have a btter understanding of what the possible options can do and how they are expected to work.

I'd appreciate any help and education I can get in making some progress.
_________________
People whom think M$ is mediocre, don't know the half of it.
Back to top
View user's profile Send private message
Syl20
Guru
Guru


Joined: 04 Aug 2005
Posts: 564
Location: France

PostPosted: Thu Apr 07, 2016 1:08 pm    Post subject: Reply with quote

Sorry for rubbing salt in the wound, but loosing important data, or being close to loose important data (I hope you'll recover all that have some value for you. Sincerely), is an excellent manner to never ever forget to make backups. And backups of backups.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum