Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
I/O errors
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Adel Ahmed
Veteran
Veteran


Joined: 21 Sep 2012
Posts: 1158

PostPosted: Thu Dec 31, 2015 1:18 pm    Post subject: I/O errors Reply with quote

after a power outage I ge tthe following:
Dec 31 15:10:52 pc.home kernel: sd 4:0:0:0: [sdd] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Dec 31 15:10:52 pc.home kernel: sd 4:0:0:0: [sdd] tag#24 Sense Key : 0x3 [current] [descriptor]
Dec 31 15:10:52 pc.home kernel: sd 4:0:0:0: [sdd] tag#24 ASC=0x11 ASCQ=0x4
Dec 31 15:10:52 pc.home kernel: sd 4:0:0:0: [sdd] tag#24 CDB: opcode=0x28 28 00 00 00 00 80 00 00 08 00
Dec 31 15:10:52 pc.home kernel: blk_update_request: I/O error, dev sdd, sector 128
Dec 31 15:10:52 pc.home kernel: Buffer I/O error on dev sdd, logical block 16, async page read
Dec 31 15:10:52 pc.home kernel: ata5: EH complete


I want to make sure this is a hardware error and nothing could be done before I remve that disk from my raid configuration(btrfs raid5)

has anyone recovered from this error before?

when I try to mount I get:
mount: wrong fs type, bad option, bad superblock on /dev/sdc,
missing codepage or helper program, or other error

In some cases useful info is found in syslog - try
dmesg | tail or so.


pc / # blkid /dev/sdc
/dev/sdc: LABEL="raid" UUID="906ed4e3-52c5-4eb7-bd06-4810c0b84902" UUID_SUB="f6058c50-230b-46f1-8afb-c13a05bd5089" TYPE="btrfs"

pc / # blkid /dev/sdd
pc / #

pc / # blkid /dev/sde
/dev/sde: LABEL="raid" UUID="906ed4e3-52c5-4eb7-bd06-4810c0b84902" UUID_SUB="0c835328-dbcd-488e-b524-337b3cbfc6ce" TYPE="btrfs"
Back to top
View user's profile Send private message
szatox
Veteran
Veteran


Joined: 27 Aug 2013
Posts: 1746

PostPosted: Thu Dec 31, 2015 6:58 pm    Post subject: Reply with quote

Quote:
after a power outage

Perhaps it's just filesystem that got corrupted? Happened a few times... even with "mature" and "journaled" stuff like ext3. And when it happens it can report IO errors and reject mount attempts just like in your case.
Now, I don't know btrfs itself, but there should be fsck for it, probably supporting -p option (automagically repair errors that do not require manual intervention), and running it is typically enough to let you recover from FS corruption.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43184
Location: 56N 3W

PostPosted: Thu Dec 31, 2015 7:28 pm    Post subject: Reply with quote

Adel Ahmed,

It can be poor quality SATA data cables.
Code:
Dec 31 15:10:52 pc.home kernel: blk_update_request: I/O error, dev sdd, sector 128


That's right at the start of the drive. Thats very odd. It would be in the primary GPT partition table if you use GPT, then its only read once at startup.
The HDD should have relocated the data there but it seems it can no longer read its own writing.

Get the smart data (smartmontools) with smartclt -a /dev... and post it here.

You should be able to provoke the error again by rereading that block ... or it will work and maybe get relocated.
If the drive is under warranty, don't 'footer'. That log fragment will justify a warranty replacement.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Adel Ahmed
Veteran
Veteran


Joined: 21 Sep 2012
Posts: 1158

PostPosted: Thu Dec 31, 2015 7:51 pm    Post subject: Reply with quote

changed the sata cable, no dice

here's the journalctl entry while I tried to run fsck on /dev/sdc
Dec 31 21:42:28 pc.home rpc.mountd[308]: authenticated unmount request from 192.168.1.4:762 for /media/raid (/media/raid)
Dec 31 21:42:29 pc.home rpc.mountd[308]: authenticated mount request from 192.168.1.4:806 for /media/raid (/media/raid)
Dec 31 21:42:29 pc.home kernel: ata5.00: exception Emask 0x0 SAct 0x100 SErr 0x0 action 0x0
Dec 31 21:42:29 pc.home kernel: ata5.00: irq_stat 0x40000008
Dec 31 21:42:29 pc.home kernel: ata5.00: cmd 60/08:40:80:00:00/00:00:00:00:00/40 tag 8 ncq 4096 in
res 41/40:00:80:00:00/00:00:00:00:00/00 Emask 0x409 (media error) <F>
Dec 31 21:42:29 pc.home kernel: ata5.00: configured for UDMA/133
Dec 31 21:42:29 pc.home kernel: sd 4:0:0:0: [sdd] tag#8 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Dec 31 21:42:29 pc.home kernel: sd 4:0:0:0: [sdd] tag#8 Sense Key : 0x3 [current] [descriptor]
Dec 31 21:42:29 pc.home kernel: sd 4:0:0:0: [sdd] tag#8 ASC=0x11 ASCQ=0x4
Dec 31 21:42:29 pc.home kernel: sd 4:0:0:0: [sdd] tag#8 CDB: opcode=0x28 28 00 00 00 00 80 00 00 08 00
Dec 31 21:42:29 pc.home kernel: blk_update_request: I/O error, dev sdd, sector 128
Dec 31 21:42:29 pc.home kernel: Buffer I/O error on dev sdd, logical block 16, async page read
Dec 31 21:42:29 pc.home kernel: ata5: EH complete
Dec 31 21:42:32 pc.home rpc.mountd[308]: authenticated unmount request from 192.168.1.4:778 for /media/raid (/media/raid)
Dec 31 21:42:33 pc.home kernel: ata5.00: exception Emask 0x0 SAct 0x100000 SErr 0x0 action 0x0
Dec 31 21:42:33 pc.home kernel: ata5.00: irq_stat 0x40000008
Dec 31 21:42:33 pc.home kernel: ata5.00: cmd 60/08:a0:80:00:00/00:00:00:00:00/40 tag 20 ncq 4096 in
res 41/40:00:80:00:00/00:00:00:00:00/00 Emask 0x409 (media error) <F>
Dec 31 21:42:33 pc.home kernel: ata5.00: configured for UDMA/133
Dec 31 21:42:33 pc.home kernel: sd 4:0:0:0: [sdd] tag#20 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Dec 31 21:42:33 pc.home kernel: sd 4:0:0:0: [sdd] tag#20 Sense Key : 0x3 [current] [descriptor]
Dec 31 21:42:33 pc.home kernel: sd 4:0:0:0: [sdd] tag#20 ASC=0x11 ASCQ=0x4
Dec 31 21:42:33 pc.home kernel: sd 4:0:0:0: [sdd] tag#20 CDB: opcode=0x28 28 00 00 00 00 80 00 00 08 00
Dec 31 21:42:33 pc.home kernel: blk_update_request: I/O error, dev sdd, sector 128
Dec 31 21:42:33 pc.home kernel: Buffer I/O error on dev sdd, logical block 16, async page read
Dec 31 21:42:33 pc.home kernel: ata5: EH complete


pc ~ # btrfsck /dev/sdc
warning, device 2 is missing
checksum verify failed on 21037056 found BB7411C5 wanted 5F97AC73
bytenr mismatch, want=21037056, have=65536
Couldn't read chunk tree
Couldn't open file system


pc ~ # btrfsck /dev/sdd
No valid Btrfs found on /dev/sdd
Couldn't open file system

and the same info in journalctl

smartctl:
http://pastebin.com/xQjNZSyR

I'd much rather have the h/w lost, I'm using raid 5 and I will simply cough up the money to buy a new drive to keep my data protected, I'm definitely still under warranty and I'll start looking for that warranty

Of course, I would like to be sure the drive is damaged before I go through the ordeal of getting a refund

I really appreciate your assistance
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43184
Location: 56N 3W

PostPosted: Thu Dec 31, 2015 8:03 pm    Post subject: Reply with quote

Adel Ahmed,

Code:
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1


The drive has one sector that it would like to relocate but can't because it cant read it.
That's one sector that it knows about. There may be more.

You won't have any problems getting a warranty replacement and WD will even put one in the post before you need to return yours. Put the smart log in the RMA form when you fill it in.

I didn't check your warranty status as I would need your region.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Adel Ahmed
Veteran
Veteran


Joined: 21 Sep 2012
Posts: 1158

PostPosted: Thu Dec 31, 2015 8:16 pm    Post subject: Reply with quote

thanks for the link, I live in Egypt and I'll get to replacgin the hard drive as soon as I get things fixed
I'm going to buya new 1 TB hard disk tomorrow to get the raid in place, but for now how do I mount the degraded raid?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43184
Location: 56N 3W

PostPosted: Thu Dec 31, 2015 8:24 pm    Post subject: Reply with quote

Adel Ahmed,

Is that btrfs on top of mdadm raid or is btrfs doing the raid too?
Knowing that you are in Egypt, your warranty expires on 14 May 2017.

I had 2 WD greens in a raid5 go down within 15 minutes of one another. They were in warranty too.

Your raid may be assembled but not running. Look in /proc/mdstat.

Keep your smartctl log. If its a raid set with mdadm you can add the faulty drive back to the raid. It will then get resynced, which will rewrite all the data on it.
This may force the bad block to be relocated.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Adel Ahmed
Veteran
Veteran


Joined: 21 Sep 2012
Posts: 1158

PostPosted: Thu Dec 31, 2015 8:56 pm    Post subject: Reply with quote

btrfs is doing the raid as well
I'm not eager to use the data now I just want to make sure the data is safe :)

sorry to hear about those 2 hard disks :( that must've been a terrible incident

I'm unplugging the other disks from the motherboard just in case.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 43184
Location: 56N 3W

PostPosted: Thu Dec 31, 2015 9:04 pm    Post subject: Reply with quote

Adel Ahmed,

I don't know how to bring up a btrfs raid in degraded mode.

I actually only lost a single block and that was in the middle of a DVD rip somewhere.

I had 3 out of 5 good disks and the one that had been kicked out most recently.
ddrescue got back all but one block of that disk. I could have done without the learning experience.
I really wasn't looking forward to ripping 1500+ DVDs again.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Adel Ahmed
Veteran
Veteran


Joined: 21 Sep 2012
Posts: 1158

PostPosted: Thu Dec 31, 2015 9:23 pm    Post subject: Reply with quote

well thank god for that, I hope I don;t have to go through that EVER
I'll wait till I buy my hard disk and replace the damaged one
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 5761

PostPosted: Fri Jan 01, 2016 11:59 pm    Post subject: Reply with quote

The good news is, it's straightforward to yank failing disks from a btrfs RAID:
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices

Adding a replacement right away is optional if you've got enough capacity in the remaining good devices. The device-delete/balance command pair will do the right thing.

There's also a btrfs-replace subcommand, but I can't find any good examples for how to use that.
Back to top
View user's profile Send private message
Adel Ahmed
Veteran
Veteran


Joined: 21 Sep 2012
Posts: 1158

PostPosted: Sat Jan 02, 2016 12:49 pm    Post subject: Reply with quote

I've added the device and removed the old one and rebalancing was completed successfully
devices was mounted and data is present

one reboot later:
pc ~ # mount -a
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
missing codepage or helper program, or other error

In some cases useful info is found in syslog - try
dmesg | tail or so.

[ 130.644798] BTRFS info (device sdb): enabling auto defrag
[ 130.644806] BTRFS info (device sdb): disk space caching is enabled
[ 130.658952] verify_parent_transid: 16 callbacks suppressed
[ 130.658960] BTRFS (device sdb): parent transid verify failed on 1699771072512 wanted 270891 found 270069
[ 130.668081] BTRFS (device sdb): parent transid verify failed on 1699771088896 wanted 270891 found 270069
[ 130.742669] BTRFS: bdev /dev/sdc errs: wr 274243, rd 0, flush 271639, corrupt 0, gen 0
[ 130.813959] BTRFS (device sdb): parent transid verify failed on 1699621765120 wanted 271706 found 270443
[ 130.888539] BTRFS (device sdb): parent transid verify failed on 1698604236800 wanted 271748 found 270340
[ 130.929401] BTRFS (device sdb): parent transid verify failed on 1698810986496 wanted 271756 found 270353
[ 130.953951] BTRFS (device sdb): parent transid verify failed on 1698069135360 wanted 271732 found 270326
[ 130.980951] BTRFS (device sdb): parent transid verify failed on 1698078883840 wanted 271731 found 270329
[ 131.004352] BTRFS (device sdb): parent transid verify failed on 1698083651584 wanted 271732 found 270329
[ 131.051876] BTRFS (device sdb): parent transid verify failed on 1698882453504 wanted 271763 found 270358
[ 131.240464] BTRFS (device sdb): parent transid verify failed on 1698776694784 wanted 271757 found 270353
[ 135.180623] ------------[ cut here ]------------
[ 135.180642] WARNING: CPU: 1 PID: 1522 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x46/0x110()
[ 135.180645] BTRFS: Transaction aborted (error -5)
[ 135.180647] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 tun bridge stp llc ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter via_rhine r8169 ohci_pci ohci_hcd
[ 135.180719] CPU: 1 PID: 1522 Comm: mount Tainted: G W 4.0.5-gentoo #1
[ 135.180723] Hardware name: Gigabyte Technology Co., Ltd. GA-790XT-USB3/GA-790XT-USB3, BIOS F4 05/13/2010
[ 135.180727] 0000000000000000 ffffffff81758719 ffffffff8158b2dd ffff88028daf7ab8
[ 135.180735] ffffffff810817ac ffff8800cb4a7160 ffff8800c0507000 00000000fffffffb
[ 135.180746] ffffffff8164b330 0000000000000ae6 ffffffff81081825 ffffffff81750510
[ 135.180753] Call Trace:
[ 135.180766] [<ffffffff8158b2dd>] ? dump_stack+0x4a/0x74
[ 135.180777] [<ffffffff810817ac>] ? warn_slowpath_common+0x7c/0xb0
[ 135.180786] [<ffffffff81081825>] ? warn_slowpath_fmt+0x45/0x50
[ 135.180793] [<ffffffff8124d796>] ? __btrfs_abort_transaction+0x46/0x110
[ 135.180800] [<ffffffff8126a479>] ? btrfs_run_delayed_refs.part.66+0x129/0x280
[ 135.180810] [<ffffffff8127a1ab>] ? btrfs_commit_transaction+0x3b/0x9f0
[ 135.180817] [<ffffffff810a48f7>] ? preempt_count_add+0x47/0xa0
[ 135.180824] [<ffffffff81590c31>] ? _raw_spin_unlock+0x11/0x30
[ 135.180830] [<ffffffff8129b031>] ? release_extent_buffer+0x21/0xc0
[ 135.180837] [<ffffffff812ba032>] ? btrfs_recover_log_trees+0x392/0x450
[ 135.180843] [<ffffffff812717a0>] ? free_root_pointers+0x60/0x60
[ 135.180849] [<ffffffff812b7710>] ? replay_one_extent+0x6c0/0x6c0
[ 135.180858] [<ffffffff81277ca8>] ? open_ctree+0x17a8/0x20d0
[ 135.180866] [<ffffffff8124f08e>] ? btrfs_mount+0x60e/0x880
[ 135.180873] [<ffffffff81133b9b>] ? pcpu_alloc+0x35b/0x680
[ 135.180881] [<ffffffff8115a56c>] ? mount_fs+0xc/0x90
[ 135.180890] [<ffffffff8117325d>] ? vfs_kern_mount+0x5d/0x110
[ 135.180897] [<ffffffff81175ef3>] ? do_mount+0x1b3/0xab0
[ 135.180903] [<ffffffff81121082>] ? __get_free_pages+0x12/0x50
[ 135.180911] [<ffffffff81176af3>] ? SyS_mount+0x83/0xd0
[ 135.180920] [<ffffffff81591532>] ? system_call_fastpath+0x12/0x17
[ 135.180924] ---[ end trace f7322caa403bc2aa ]---
[ 135.180930] BTRFS: error (device sdb) in btrfs_run_delayed_refs:2790: errno=-5 IO failure
[ 135.183392] BTRFS: error (device sdb) in open_ctree:2898: errno=-5 IO failure (Failed to recover log tree)
[ 135.747992] verify_parent_transid: 132 callbacks suppressed
[ 135.748001] BTRFS (device sdb): parent transid verify failed on 1698203516928 wanted 271612 found 270330
[ 135.775232] BTRFS (device sdb): parent transid verify failed on 1699047669760 wanted 271132 found 270376
[ 135.798876] BTRFS (device sdb): parent transid verify failed on 1698344681472 wanted 271222 found 270331
[ 135.872745] BTRFS (device sdb): parent transid verify failed on 1698960424960 wanted 270745 found 270367
[ 135.888021] BTRFS (device sdb): parent transid verify failed on 1698530361344 wanted 271739 found 270331
[ 135.909895] BTRFS (device sdb): parent transid verify failed on 1698501984256 wanted 271740 found 270336
[ 135.924789] BTRFS (device sdb): parent transid verify failed on 1698543632384 wanted 271486 found 270079
[ 135.925251] BTRFS (device sdb): parent transid verify failed on 1698502017024 wanted 271740 found 270332
[ 135.932269] BTRFS (device sdb): parent transid verify failed on 1698512224256 wanted 271617 found 270337
[ 135.955203] BTRFS (device sdb): parent transid verify failed on 1698502066176 wanted 271613 found 270332
[ 136.299296] BTRFS: open_ctree failed
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum