Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Excessively slow I/O on system with RAID 6
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
XelKarin
n00b
n00b


Joined: 29 Dec 2003
Posts: 55

PostPosted: Mon Sep 16, 2019 9:32 pm    Post subject: Excessively slow I/O on system with RAID 6 Reply with quote

I have a RAID 6 system built with 6 WD1002FAEX 7200RPM drives on a PERC 6/i RAID controller. I/O performance is incredibly poor and was averaging about 100KB/s RW as reported by iostat. It seems to have been reduced to about 50KB/s since I've upgraded the kernel to 4.19.72 this weekend. All CPUs tend to be in between 2.0 and 20.0 wait state according to top.

I've been running performance benchmarks using sysbench fileio. It takes over 5 minutes to prepare 256MB of files for the test on the server exhibiting the performance issues during peak usage period. It's not much faster when the system is not under load.

Code:
268435456 bytes written in 374.18 seconds (0.68 MiB/sec)


Preparing the tests on another system with the same 7200RPM drive (although not configured as RAID) is much faster.

Code:
268435456 bytes written in 11.73 seconds (21.83 MiB/sec)


The full benchmark for the RAID 6 machine looks as follows:

Code:

File operations:
    reads/s:                      83.52
    writes/s:                     55.68
    fsyncs/s:                     178.28

Throughput:
    read, MiB/s:                  1.30
    written, MiB/s:               0.87

General statistics:
    total time:                          150.0511s
    total number of events:              47511

Latency (ms):
         min:                                    0.00
         avg:                                    3.16
         max:                                  287.96
         95th percentile:                        0.09
         sum:                               150002.74

Threads fairness:
    events (avg/stddev):           47511.0000/0.00
    execution time (avg/stddev):   150.0027/0.00


Would RAID 6 really cause so much of a performance hit, or is it possible there may be something else affecting I/O performance? I don't see any hardware errors reported by the kernel. The RAID controller also doesn't report any disk errors. I'm thinking of converting the system to RAID 10, but if there are other issues at play that may not be much of an improvement.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7414
Location: almost Mile High in the USA

PostPosted: Mon Sep 16, 2019 11:28 pm    Post subject: Reply with quote

If that is a modern 1TB disk, 21MB/sec is miserable and you should look into hardware issues. You should be getting at least 80MB/sec sequential no-seek transfers. I have even older 120GB PATA disks that I could get 60MB/sec sequential reads and only slightly lower than that TTFS reads and writes. Many of my TB-class disks (0.5TB and higher) handle at least 75MB/sec sequential reads if not much more.

In any case MD RAID6 is highly dependent on tuning and how well your hardware works together to get best transfer rates. Allocating more RAM for the read buffers and stripe cache helps.

(I tried MD RAID6 on a set of five dissimilar 500GB disks. It too was pretty miserable in performance but was still getting several MB/sec and not fractional. However I do know I have some flaky disks in that set, so I couldn't get real reliable data from it.)
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
XelKarin
n00b
n00b


Joined: 29 Dec 2003
Posts: 55

PostPosted: Tue Sep 17, 2019 9:33 pm    Post subject: Reply with quote

Actually, the single disk is an older model than I thought, but it still runs at 7200RPM.

I tried to determine if there may be some hardware issues I was missing on the RAID server, but couldn't find anything obvious. I tried running smartctl on the individual disks and none of them were reporting errors. I did notice this though:

Code:
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s)


It doesn't seem like the SATA controller is operating at it's full bandwidth. I'll have to head to the colocation and check the BIOs settings for that. But I'm also not certain that would cause the drastically slow performance I'm seeing.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7414
Location: almost Mile High in the USA

PostPosted: Tue Sep 17, 2019 11:59 pm    Post subject: Reply with quote

SATA should pretty much always be faster than PATA, and if I have a 60MB/sec PATA drive, SATA should be able to beat that. Check cabling and drivers -- unless the disk is actually failing which I've seen plenty of despite not directly saying so in SMART), I've yet to see a 3.5" SATA drive less than about 70MB/sec sequential reads (hdparm -t). 2.5" SATA drives are somewhat slower but not that much.

My 2T 3.5" drives are at least 110MB/sec sequential reads IIRC (connected at SATA 3Gb/sec). Even at SATA 1.5Gb/sec you should be hitting 60MB/sec easily as it was meant to be a competitor to PATA133.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
XelKarin
n00b
n00b


Joined: 29 Dec 2003
Posts: 55

PostPosted: Wed Sep 18, 2019 9:08 pm    Post subject: Reply with quote

Alright, I discovered that the perccli tool works with my hardware and is much better than megacli which is what I was using before. All the disks report themselves as being in optimal condition, but dumping the logs of the RAID controller show that the controller is having to constantly perform error correction for physical disk 5. Hopefully replacing that will fix the issue.

Thanks, eccerr0r, for clearing some things up on what can be expected performance-wise.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 44945
Location: 56N 3W

PostPosted: Wed Sep 18, 2019 9:30 pm    Post subject: Reply with quote

XelKarin,

It might just be a poor quality data cable.
If the SMART data (line by line), not pass/fail, looks good, its probably not the drive.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
XelKarin
n00b
n00b


Joined: 29 Dec 2003
Posts: 55

PostPosted: Mon Sep 23, 2019 12:06 am    Post subject: Reply with quote

NeddySeagoon,

I didn't even have to replace the data cable. Just jiggling the disk bay seems to have done the trick. Apparently it was just a bad connection.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 44945
Location: 56N 3W

PostPosted: Mon Sep 23, 2019 8:36 am    Post subject: Reply with quote

XelKarin,

Maybe the cable is faulty, maybe the connector was not properly mated.
Keep an eye on it. It will probably recur. Bad connections go bad about every six months or so.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
toralf
Developer
Developer


Joined: 01 Feb 2004
Posts: 3750
Location: Hamburg

PostPosted: Mon Sep 23, 2019 11:41 am    Post subject: Reply with quote

XelKarin wrote:
Just jiggling the disk bay seems to have done the trick
Which yields to the question about the current read/write speed? ;-)
Back to top
View user's profile Send private message
XelKarin
n00b
n00b


Joined: 29 Dec 2003
Posts: 55

PostPosted: Tue Sep 24, 2019 9:25 pm    Post subject: Reply with quote

Well, I jumped to conclusions regarding my last post. I also see the issue being a bad connection or bad SATA cable as being less likely.

I/O performance degraded again after a day of system activity. It remained degraded after system activity decreased. Rebooting the system returns sequential write I/O to around 230MiB/s. By the time the system reaches peak load during the afternoon, sequential write I/O is around 0.6MiB/s. It remains at 0.6MiB/s even when the system returns to idle, until the next reboot.

The error correction warnings are gone from the RAID controller logs, but I'm also seeing the following at regular 10 minute intervals.

Code:
Event Description: Unexpected sense: PD 00(e0x20/s0) Path 1221000000000000, CDB: 4d 00 4d 00 00 00 00 00 20 00, Sense: 5/24/00
Event Description: Unexpected sense: PD 01(e0x20/s1) Path 1221000001000000, CDB: 4d 00 4d 00 00 00 00 00 20 00, Sense: 5/24/00
Event Description: Unexpected sense: PD 02(e0x20/s2) Path 1221000002000000, CDB: 4d 00 4d 00 00 00 00 00 20 00, Sense: 5/24/00
Event Description: Unexpected sense: PD 03(e0x20/s3) Path 1221000003000000, CDB: 4d 00 4d 00 00 00 00 00 20 00, Sense: 5/24/00
Event Description: Unexpected sense: PD 04(e0x20/s4) Path 1221000004000000, CDB: 4d 00 4d 00 00 00 00 00 20 00, Sense: 5/24/00
Event Description: Unexpected sense: PD 05(e0x20/s5) Path 1221000005000000, CDB: 4d 00 4d 00 00 00 00 00 20 00, Sense: 5/24/00


These are 'Invalid field in CBD errors', which as far as I can gather means the RAID controller is trying to send commands to the drives that they don't recognize.

I found a spec sheet for the drives I'm using. It's really more of an advertising flyer, but in small print it claims that the drives are rated for RAID 0/1 and haven't been tested for use in enterprise array configurations. I haven't found this mentioned on any HTML/text spec sheets or on specs provided by retail sites.
http://products.wdc.com/library/SpecSheet/ENG/2879-701276.pdf

I think I'll try reconfiguring the system as RAID 10 and see what happens, but may have to purchase new hardware.
Back to top
View user's profile Send private message
XelKarin
n00b
n00b


Joined: 29 Dec 2003
Posts: 55

PostPosted: Sat Oct 26, 2019 9:40 pm    Post subject: Reply with quote

I built a new server and had the same issue both with RAID-6 and RAID-5 configurations. But booting from a kernel built from the same kernel configuration as used on the x86 minimal install CD seems to have done the trick. A stock genkernel configuration (/usr/share/genkernel/arch/x86/generated-config) surprisingly does not. So the issue appears to be a missing kernel configuration option that's only included on the install CDs, though I'm not sure what it would be.
Back to top
View user's profile Send private message
XelKarin
n00b
n00b


Joined: 29 Dec 2003
Posts: 55

PostPosted: Mon Oct 28, 2019 8:26 pm    Post subject: Reply with quote

It turns out the issue was running Linux in 32-bit mode using PAE on a system with a large amount of memory. I was able to fix it by enabling the VMSPLIT_2G kernel configuration option and limiting addressable memory to 16G using mem=0x4000M.

Here's a reference to where I found the solution: https://flaterco.com/kb/PAE_slowdown.html
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 14971

PostPosted: Tue Oct 29, 2019 1:58 am    Post subject: Reply with quote

Is there a reason you are using 32-bit in PAE mode instead of using a native 64-bit kernel? Generally, when you have so much memory that PAE is a good idea, you're better off switching to 64-bit if your CPU supports it.
Back to top
View user's profile Send private message
XelKarin
n00b
n00b


Joined: 29 Dec 2003
Posts: 55

PostPosted: Tue Oct 29, 2019 3:57 am    Post subject: Reply with quote

Legacy DOS software that requires DOSEMU and VM86. It's very slow and unstable when run under 64-bit.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum