Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
NVMe SSD has Become Very Slow
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
jagdpanther
Guru
Guru


Joined: 22 Nov 2003
Posts: 514

PostPosted: Wed Feb 13, 2019 7:30 pm    Post subject: NVMe SSD has Become Very Slow Reply with quote

My 1TB NVMe M.2 SSD has become VERY slow over the last month or two and I would really like to restore the original performance. This is a request for debugging suggestions and advice. Below I listed some some system information and benchmarks (sysbench random read/write tests.) (At the bottom, for comparison, I have the same benchmark run on a properly working Dell with Gentoo and a NVMe SSD.)

This Gentoo system was built last fall and is based on a SuperMicro C9X299-PG300 motherboard with three drives:
System drive: Samsung 970 PRO NVMe M.2
Home and Data: Crucial MX500 Sata2 SSD
Backup: Seagate BarraCuda Pro Sata2 HDD
Current Kernel: 4.20.7-gentoo

All partitions on both the NVMe M.2 SSD and Sata2 SSD are ext4 and have a
daily run of /sbin/fstrim and each drive was originally formatted with parted.
10GB was not formatted on each SSD.

There are no obvious errors that I see in /var/log/messages.

I have smartd run a short test each week on all drives and no errors are reported.

Benchmark results:

The SuperMicro MB on a SLOW Samsung 970 PRO NVMe M.2 partition:
Code:
sysbench fileio --file-total-size=128G prepare
sysbench fileio --file-total-size=128G --file-test-mode=rndrw --time=120 --max-requests=0 run
sysbench 1.0.15 (using system LuaJIT 2.0.5)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Extra file open flags: (none)
128 files, 1GiB each
128GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      93.14
    writes/s:                     62.09
    fsyncs/s:                     199.69

Throughput:
    read, MiB/s:                  1.46
    written, MiB/s:               0.97

General statistics:
    total time:                          120.4662s
    total number of events:              42629

Latency (ms):
         min:                                    0.00
         avg:                                    2.81
         max:                                  232.93
         95th percentile:                        8.28
         sum:                               119588.52

Threads fairness:
    events (avg/stddev):           42629.0000/0.00
    execution time (avg/stddev):   119.5885/0.00


SuperMicro MB Crucial SATA2 SSD Partiton:
Code:

sysbench fileio --file-total-size=128G prepare

sysbench fileio --file-total-size=128G --file-test-mode=rndrw --time=120 --max-requests=0 run
sysbench 1.0.15 (using system LuaJIT 2.0.5)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Extra file open flags: (none)
128 files, 1GiB each
128GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      171.96
    writes/s:                     114.64
    fsyncs/s:                     367.84

Throughput:
    read, MiB/s:                  2.69
    written, MiB/s:               1.79

General statistics:
    total time:                          120.0254s
    total number of events:              78423

Latency (ms):
         min:                                    0.01
         avg:                                    1.52
         max:                                   76.46
         95th percentile:                        4.41
         sum:                               119243.66

Threads fairness:
    events (avg/stddev):           78423.0000/0.00
    execution time (avg/stddev):   119.2437/0.00


The random read throughput of the SATA2 SSD is 1.8 times faster than the NVMe SSD on
the same system.

Same test on SuperMicro MB spinning HD:
Code:
sysbench fileio --file-total-size=128G prepare

 sysbench fileio --file-total-size=128G --file-test-mode=rndrw --time=120 --max-requests=0 run
sysbench 1.0.15 (using system LuaJIT 2.0.5)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Extra file open flags: (none)
128 files, 1GiB each
128GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      81.04
    writes/s:                     54.02
    fsyncs/s:                     173.62

Throughput:
    read, MiB/s:                  1.27
    written, MiB/s:               0.84

General statistics:
    total time:                          120.1649s
    total number of events:              36966

Latency (ms):
         min:                                    0.01
         avg:                                    3.24
         max:                                  242.17
         95th percentile:                       12.98
         sum:                               119633.22

Threads fairness:
    events (avg/stddev):           36966.0000/0.00
    execution time (avg/stddev):   119.6332/0.00


The read and write throughput is only slightly better on the NVMe SSD
compaired to the spinning HD.


Finally for comparison, here is a NVMe SSD benchmark from my 2nd Gentoo (Dell
Based) system that is working well:

Code:
sysbench fileio --file-total-size=128G prepare

sysbench fileio --file-total-size=128G --file-test-mode=rndrw --time=120 --max-requests=0 run
sysbench 1.0.15 (using system LuaJIT 2.0.5)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Extra file open flags: (none)
128 files, 1GiB each
128GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      1684.65
    writes/s:                     1123.10
    fsyncs/s:                     3594.39

Throughput:
    read, MiB/s:                  26.32
    written, MiB/s:               17.55

General statistics:
    total time:                          120.0210s
    total number of events:              768288

Latency (ms):
         min:                                    0.00
         avg:                                    0.16
         max:                                    9.61
         95th percentile:                        0.59
         sum:                               119533.68

Threads fairness:
    events (avg/stddev):           768288.0000/0.00
    execution time (avg/stddev):   119.5337/0.00


The random read throughput is 26 times faster on the properly working Dell NVMe drive than the SuperMicro with a Samsung 970 PRO.

On both the Slow SuperMicro and 'fast' Dell 'cat /sys/block/nvme0n1/queue/scheduler'
gives: [mq-deadline] kyber none

Any suggestions on speeding the Samsung 970 Pro back up would be appreciated.
Back to top
View user's profile Send private message
mike155
Veteran
Veteran


Joined: 17 Sep 2010
Posts: 1302
Location: Frankfurt, Germany

PostPosted: Thu Feb 14, 2019 4:50 pm    Post subject: Reply with quote

I know that write performance degrades on SSDs if you don't trim them. I've never heard of any effects that degrade read performance.

1) Are you sure that your fstrim cron job really works? Run fstrim manually for all partitions on your SSDs and look what happens.

2) You say that read performance degraded. But you performed a combined read/write test. It could well be that only write performance degraded - and that you measured a degraded read performance only because the controller was still busy with previous write requests. Don't run combined write and read tests. Please run a read-only test and tell us the result.
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 3956
Location: Dallas area

PostPosted: Thu Feb 14, 2019 5:10 pm    Post subject: Reply with quote

Haven't heard of any problems with the 970's but I vaguely remember that they did have problems with the 840 or 850 (don't remember which) series having reading slow down problems. IIRC it was fixed with a firmware update.

You might check with samsung support/forums to see if others are having problems with that model.
_________________
Asus m5a99fx, FX 8320 - nouveau, oss4, rx550 for qemu passthrough
Acer laptop E5-575, i3-7100u - i965, alsa
---both---
5.0.13 zen kernel, profile 17.0 (no-pie) amd64-no-multilib
gcc 8.2.0, eudev, openrc, openbox, palemoon
Back to top
View user's profile Send private message
jagdpanther
Guru
Guru


Joined: 22 Nov 2003
Posts: 514

PostPosted: Thu Feb 14, 2019 9:57 pm    Post subject: Reply with quote

mike 155:

Thanks for the reply:

Quote:
1) Are you sure that your fstrim cron job really works?


I kick-off fstrim via cron with the following script. (The data1 line is for the SATA2 SSD the rest are for the slow NVMe SSD)
Code:

#! /bin/bash
#  excluded notes and version number
/bin/date '+%F_%R'
/sbin/fstrim -v /
/sbin/fstrim -v /boot
/sbin/fstrim -v /data0
/sbin/fstrim -v /data1


Last night's run looks like :

Code:
2019-02-14_00:10
/: 8 GiB (8566865920 bytes) trimmed
/boot: 0 B (0 bytes) trimmed
/data0: 0 B (0 bytes) trimmed
/data1: 2.6 GiB (2785058816 bytes) trimmed


After a restart or power cycle, the next fstab run shows the amount trimmed being close to the free space on each partition.


Quote:
Please run a read-only test and tell us the result.


Will do. I'll add it to this thread tomorrow.
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 5762

PostPosted: Thu Feb 14, 2019 11:09 pm    Post subject: Reply with quote

I may be splitting hairs here but fstrim on /boot seems a bit pointless, maybe even counter-productive. It's usually a tiny partition so the same LBAs are likely to get overwritten regardless.
Back to top
View user's profile Send private message
jagdpanther
Guru
Guru


Joined: 22 Nov 2003
Posts: 514

PostPosted: Fri Feb 15, 2019 3:02 am    Post subject: Reply with quote

Yes, boot is tiny (120M) and is mounted read-only. (Only remounted rw when a new kernel comes out.) Guess I could omit it from the fstrim job.
Back to top
View user's profile Send private message
jagdpanther
Guru
Guru


Joined: 22 Nov 2003
Posts: 514

PostPosted: Fri Feb 15, 2019 3:53 am    Post subject: Reply with quote

Quote:
2) You say that read performance degraded. But you performed a combined read/write test. It could well be that only write performance degraded - and that you measured a degraded read performance only because the controller was still busy with previous write requests. Don't run combined write and read tests. Please run a read-only test and tell us the result.


Here are the same three tests from above on the SuperMicro system. The first test on the NVMe M.2 SSD, the second from a SATA2 SSD, and the third from a spinning HD. This time I am running random read-only tests. The performance is much better than the random r/w on the NVMe but it still pails when compared to the random rw test on the second (Dell) system. I won't have access to that second system till next week to run the same random read-only benchmark.

Random read-only benchmark from NVMe M.2 SSD. (Samsung 970 Pro)

Code:
sysbench fileio --file-total-size=128G prepare

sysbench fileio --file-total-size=128G --time=120 --max-requests=0 --file-test-mode=rndrd run
sysbench 1.0.15 (using system LuaJIT 2.0.5)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Extra file open flags: (none)
128 files, 1GiB each
128GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random read test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      644.97
    writes/s:                     0.00
    fsyncs/s:                     0.00

Throughput:
    read, MiB/s:                  10.08
    written, MiB/s:               0.00

General statistics:
    total time:                          120.0028s
    total number of events:              77401

Latency (ms):
         min:                                    0.00
         avg:                                    1.54
         max:                                  251.64
         95th percentile:                        3.55
         sum:                               119217.50

Threads fairness:
    events (avg/stddev):           77401.0000/0.00
    execution time (avg/stddev):   119.2175/0.00


Much better 'read' performance than the 'read' in the random read-write test.

random read-only test from SATA2 SSD (Crucial MX500 Sata2 SSD)

Code:
sysbench fileio --file-total-size=128G prepare

sysbench fileio --file-total-size=128G --time=120 --max-requests=0 --file-test-mode=rndrd run
sysbench 1.0.15 (using system LuaJIT 2.0.5)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Extra file open flags: (none)
128 files, 1GiB each
128GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random read test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      478.29
    writes/s:                     0.00
    fsyncs/s:                     0.00

Throughput:
    read, MiB/s:                  7.47
    written, MiB/s:               0.00

General statistics:
    total time:                          120.0029s
    total number of events:              57398

Latency (ms):
         min:                                    0.00
         avg:                                    2.08
         max:                                  144.29
         95th percentile:                        3.82
         sum:                               119438.46

Threads fairness:
    events (avg/stddev):           57398.0000/0.00
    execution time (avg/stddev):   119.4385/0.00


Finally the read-only test from the spinning hard drive
Seagate Barracuda Pro Sata2 HDD:

Code:
sysbench fileio --file-total-size=128G prepare

sysbench fileio --file-total-size=128G --time=120 --max-requests=0 --file-test-mode=rndrd run
sysbench 1.0.15 (using system LuaJIT 2.0.5)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Extra file open flags: (none)
128 files, 1GiB each
128GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random read test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      93.43
    writes/s:                     0.00
    fsyncs/s:                     0.00

Throughput:
    read, MiB/s:                  1.46
    written, MiB/s:               0.00

General statistics:
    total time:                          120.0070s
    total number of events:              11213

Latency (ms):
         min:                                    0.01
         avg:                                   10.69
         max:                                  717.62
         95th percentile:                       14.21
         sum:                               119890.74

Threads fairness:
    events (avg/stddev):           11213.0000/0.00
    execution time (avg/stddev):   119.8907/0.00
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum