Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
csum failed ino
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Adel Ahmed
Veteran
Veteran


Joined: 21 Sep 2012
Posts: 1158

PostPosted: Mon Feb 15, 2016 4:43 pm    Post subject: csum failed ino Reply with quote

again I keep getting these messages, this is a bad omen, whenever I get these messages, a FS corruption is around the block

[ 5478.403071] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497
[ 5478.403345] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497
[ 5478.403808] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497
[ 5478.404050] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497
[ 5478.405374] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497
[ 5478.405616] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497
[ 5478.405825] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497
[ 5478.406056] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497
[ 5478.407265] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497
[ 5478.407501] BTRFS warning (device sdb): csum failed ino 260 off 63901696 csum 2566472073 expected csum 4261454497


any idea on how to deal with these errors?

this is a raid 5 btrfs FS made up of 4 1 TB disks



just as I had expected, corruption:
whenever I start up my newly created rhel7 VM I get a kernel panic unable to mount root fs on unknownblock
and this is fora snasphot I had reverted to plenty of times today, before thepanic comes up I get:

failure reading sector 0x1e508 from 'hd0'

thanks
Back to top
View user's profile Send private message
ct85711
Veteran
Veteran


Joined: 27 Sep 2005
Posts: 1696

PostPosted: Mon Feb 15, 2016 5:46 pm    Post subject: Reply with quote

well, considering it's all at the same offset, I'd start thinking checking your drives for bad sectors. So, I'd suggest running smartctl on your drives, see if what it says for reallocated sectors, and also current_pending_sector iirc (I may be missing some others, but other people should be able to correct what I am missing). Depending on the results, I'd start considering to get your backup ready in the worst case, and a new drive(s) ready to replace the failing drive(s). With 1 failing drive, your relatively safe still on your data, but you can't loose another till the raid is back up fully. Rebuilding the raid to replace the failed drive can potentially cause other drives to start failing, so the risk is there.
Back to top
View user's profile Send private message
davidm
Guru
Guru


Joined: 26 Apr 2009
Posts: 557
Location: US

PostPosted: Mon Feb 15, 2016 6:43 pm    Post subject: Reply with quote

You could run a btrfs scrub on the array. With Raid 5 it should have duplicate data to be able to correct the data errors present. You might also run a SMART extended test as well as a non destructive probe using badblocks (but make sure you use the non-destructive options).

Ofc ourse as said previously you should have backups. But that goes without saying because you should always have backups for anything you cannot afford to lose. Raid isn't a backup and especially not btrfs raid as it isn't entirely stable and well tested. Even more so for the raid 5/6 implementations.
Back to top
View user's profile Send private message
Adel Ahmed
Veteran
Veteran


Joined: 21 Sep 2012
Posts: 1158

PostPosted: Mon Feb 15, 2016 6:48 pm    Post subject: Reply with quote

all 0s:
http://pastebin.com/sw4ewkhZ

I see no I/o errors so far
I see this in dmesg though:
[ 5327.890593] ata3.00: exception Emask 0x50 SAct 0x10000 SErr 0x90a00 action 0xe frozen
[ 5327.890596] ata3.00: irq_stat 0x01400000, PHY RDY changed
[ 5327.890601] ata3.00: cmd 60/80:80:00:d3:31/01:00:01:00:00/40 tag 16 ncq 196608 in
res 40/00:80:00:d3:31/00:00:01:00:00/40 Emask 0x50 (ATA bus error)
Back to top
View user's profile Send private message
Adel Ahmed
Veteran
Veteran


Joined: 21 Sep 2012
Posts: 1158

PostPosted: Tue Feb 16, 2016 12:39 pm    Post subject: Reply with quote

tried a scrub on the fs
and dmesg just went crazy:
http://pastebin.com/XAxnYpVQ

ended up with:
ERROR: There are uncorrectable errors.

pc ~ # btrfs scrub status /media/raid/
scrub status for 4be16663-041d-4aa8-8557-e272e0d534af
scrub started at Tue Feb 16 14:22:06 2016 and finished after 268 seconds
total bytes scrubbed: 5.54GiB with 432 errors
error details: read=16 csum=416
corrected errors: 384, uncorrectable errors: 48, unverified errors: 0


I definitely need to take a look atthe ata bus errors
any idea what that might be?
all errors are on disk /dev/sdc which is good at least I'm narrowing things down:

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 118 100 006 Pre-fail Always - 186957558
3 Spin_Up_Time 0x0003 098 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 088 088 020 Old_age Always - 12377
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 7451
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 892
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 088 088 020 Old_age Always - 12337
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 053 053 045 Old_age Always - 47 (Min/Max 47/47)
194 Temperature_Celsius 0x0022 047 047 000 Old_age Always - 47 (0 17 0 0 0)
195 Hardware_ECC_Recovered 0x001a 038 031 000 Old_age Always - 186957558
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 119 (88 230 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 1923886593
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 54783455
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum