Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
EXT4/Software RAID Issues
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
matt2kjones
Tux's lil' helper
Tux's lil' helper


Joined: 03 Mar 2004
Posts: 87

PostPosted: Thu Jun 04, 2015 11:36 am    Post subject: EXT4/Software RAID Issues Reply with quote

Hello,

I have two file servers, one with 24TB RAID 10 which is used by our business for scanner operators to scan large files to a network drive, this system works fine, and is very fast. The other is used as a backup, and uses ltfp in mirror mode to copy the changes from the live server to the backup server every night.

The second server was built recently using a brand new CPU, Server Motherboardm, RAM and 2 LSI 2 port sas controllers, so that I can connect 16 drives to the backplain. It is using 16 2TB Discs I had laying around, some of which may or may not be faulty setup in software RAID 6. I also bought another 4 2TB harddrives as spares which currently aren't installed in the machine.

I am getting errors show up in my logs, here is the output from dmesg:

Code:

[1019097.763009] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 364262149 (offset 16777216 size 4722688 starting block 2908071936)
[1019097.763010] buffer_io_error: 3062 callbacks suppressed
[1019097.763011] Buffer I/O error on device md4, logical block 2908071936
[1019097.763013] Buffer I/O error on device md4, logical block 2908071937
[1019097.763014] Buffer I/O error on device md4, logical block 2908071938
[1019097.763015] Buffer I/O error on device md4, logical block 2908071939
[1019097.763016] Buffer I/O error on device md4, logical block 2908071940
[1019097.763016] Buffer I/O error on device md4, logical block 2908071941
[1019097.763017] Buffer I/O error on device md4, logical block 2908071942
[1019097.763018] Buffer I/O error on device md4, logical block 2908071943
[1019097.763019] Buffer I/O error on device md4, logical block 2908071944
[1019097.763020] Buffer I/O error on device md4, logical block 2908071945
[1019097.763061] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 364262149 (offset 16777216 size 4722688 starting block 2908072064)
[1019097.767236] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 364262149 (offset 16777216 size 8388608 starting block 2908072192)
[1019097.767280] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 364262149 (offset 16777216 size 8388608 starting block 2908072320)
[1019097.767323] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 364262149 (offset 16777216 size 8388608 starting block 2908072448)
[1019097.767367] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 364262149 (offset 16777216 size 8388608 starting block 2908072576)
[1019097.767410] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 364262149 (offset 16777216 size 8388608 starting block 2908072704)
[1019097.767452] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 364262149 (offset 16777216 size 8388608 starting block 2908072832)
[1019097.835346] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 364262149 (offset 16777216 size 8388608 starting block 2908073472)
[1019097.835393] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 364262149 (offset 16777216 size 8388608 starting block 2908073600)
[1055480.219360] EXT4-fs warning: 6 callbacks suppressed
[1055480.219364] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 159895151 (offset 41943040 size 8388608 starting block 2906665856)
[1055480.219365] buffer_io_error: 2038 callbacks suppressed
[1055480.219366] Buffer I/O error on device md4, logical block 2906665856
[1055480.219367] Buffer I/O error on device md4, logical block 2906665857
[1055480.219368] Buffer I/O error on device md4, logical block 2906665858
[1055480.219369] Buffer I/O error on device md4, logical block 2906665859
[1055480.219370] Buffer I/O error on device md4, logical block 2906665860
[1055480.219371] Buffer I/O error on device md4, logical block 2906665861
[1055480.219372] Buffer I/O error on device md4, logical block 2906665862
[1055480.219373] Buffer I/O error on device md4, logical block 2906665863
[1055480.219374] Buffer I/O error on device md4, logical block 2906665864
[1055480.219374] Buffer I/O error on device md4, logical block 2906665865
[1055480.219416] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 159895151 (offset 41943040 size 8388608 starting block 2906665984)
[1055480.219460] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 159895151 (offset 41943040 size 8388608 starting block 2906666112)
[1055480.219506] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 159895151 (offset 41943040 size 8388608 starting block 2906666240)
[1055480.219551] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 159895151 (offset 41943040 size 8388608 starting block 2906666368)
[1055480.219595] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 159895151 (offset 41943040 size 8388608 starting block 2906664960)
[1055480.219641] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 159895151 (offset 41943040 size 8388608 starting block 2906665088)
[1055480.219686] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 159895151 (offset 41943040 size 8388608 starting block 2906665216)
[1055480.219731] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 159895151 (offset 41943040 size 8388608 starting block 2906665344)
[1055480.219777] EXT4-fs warning (device md4): ext4_end_bio:317: I/O error -5 writing to inode 159895151 (offset 41943040 size 8388608 starting block 2906665472)


However, /proc/mdstat reports that none of the discs are failing:

Code:

Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] [multipath]
md2 : active raid1 sdk2[0] sdl2[1]
      16760832 blocks super 1.2 [2/2] [UU]

md4 : active raid6 sdc1[0] sdp1[13] sdo1[12] sdn1[11] sdm1[10] sdj1[9] sdb1[8] sdg1[15] sdi1[6] sdh1[5] sda1[14] sdf1[3] sde1[2] sdd1[1]
      23440588800 blocks super 1.2 level 6, 512k chunk, algorithm 2 [14/14] [UUUUUUUUUUUUUU]
      bitmap: 7/15 pages [28KB], 65536KB chunk

md1 : active raid1 sdk1[0] sdl1[1]
      1048512 blocks [2/2] [UU]

md3 : active raid1 sdk3[0] sdl3[1]
      1935556672 blocks super 1.2 [2/2] [UU]
      bitmap: 4/15 pages [16KB], 65536KB chunk

unused devices: <none>


Surely if EXT4 is reporting I/O errors, then there must be I/O errors happening on individual drives within the array and these should be flagged as bad? I want the bad drives to fail so that I can identify them and replace them with the good ones I have.

Any Ideas?
_________________
OSST - Formally: The Linux Mirror Project
OSST - Open Source Software Downloads - Torrents for over 80 Distributions
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2970
Location: Germany

PostPosted: Thu Jun 04, 2015 11:58 am    Post subject: Reply with quote

If you suspect broken disks, see if they pass smartctl -t long and replace the ones that have read failures. But it's odd that there are no errors for a specific /dev/sdx in the dmesg.

Are you using RAID controllers or HBAs? RAID controllers often have their own ways of caching and timeout issues...

Can you show mdadm --detail /dev/md4?
Back to top
View user's profile Send private message
matt2kjones
Tux's lil' helper
Tux's lil' helper


Joined: 03 Mar 2004
Posts: 87

PostPosted: Thu Jun 04, 2015 12:39 pm    Post subject: Reply with quote

Hello,

Thanks for the reply.

I'm using two SAS HBA Card, 2 ports each, 4 ports total. In each port im using breakout cables, 1 sas -> 4 sata, with the 16 sata connectors plugged into a direct attach backplane.

The output from mdadm is below:

Code:
mdadm --detail /dev/md4
/dev/md4:
        Version : 1.2
  Creation Time : Thu May 21 09:36:16 2015
     Raid Level : raid6
     Array Size : 23440588800 (22354.69 GiB 24003.16 GB)
  Used Dev Size : 1953382400 (1862.89 GiB 2000.26 GB)
   Raid Devices : 14
  Total Devices : 14
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Thu Jun  4 21:32:33 2015
          State : active
 Active Devices : 14
Working Devices : 14
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : livecd:4
           UUID : 4b40ac8c:4f7ea8a7:722cbf0a:97537a64
         Events : 4044

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       8       49        1      active sync   /dev/sdd1
       2       8       65        2      active sync   /dev/sde1
       3       8       81        3      active sync   /dev/sdf1
      14       8        1        4      active sync   /dev/sda1
       5       8      113        5      active sync   /dev/sdh1
       6       8      129        6      active sync   /dev/sdi1
      15       8       97        7      active sync   /dev/sdg1
       8       8       17        8      active sync   /dev/sdb1
       9       8      145        9      active sync   /dev/sdj1
      10       8      193       10      active sync   /dev/sdm1
      11       8      209       11      active sync   /dev/sdn1
      12       8      225       12      active sync   /dev/sdo1
      13       8      241       13      active sync   /dev/sdp1
hercules Batch 16 - 223-2 (1st Part) #


Thanks
_________________
OSST - Formally: The Linux Mirror Project
OSST - Open Source Software Downloads - Torrents for over 80 Distributions
Back to top
View user's profile Send private message
frostschutz
Advocate
Advocate


Joined: 22 Feb 2005
Posts: 2970
Location: Germany

PostPosted: Thu Jun 04, 2015 6:35 pm    Post subject: Reply with quote

That all looks fine to me. I'm not sure why there would be a "Buffer I/O error on device md*" but not on any of its members. Maybe it's a question for the linux-raid mailing list.

Which kernel version are you using? There was some noise about raid and ext in 3.19.8-4.0.4 but it was something regarding RAID0 and SSD.

Just for kicks you could try disabling NCQ ( libata.force=noncq ) which can be the source of various issues...

Are you using any other optimizations such as custom stripe cache sizes?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum