Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
How come EXT4 slows my ssd so much?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
RayDude
Veteran
Veteran


Joined: 29 May 2004
Posts: 1558
Location: San Jose, CA

PostPosted: Thu May 30, 2019 4:44 pm    Post subject: How come EXT4 slows my ssd so much? Reply with quote

Code:
server /mnt/backup/root # hdparm -tT /dev/nvme0n1

/dev/nvme0n1:
 Timing cached reads:   22260 MB in  2.00 seconds = 11144.34 MB/sec
 Timing buffered disk reads: 8146 MB in  3.00 seconds = 2715.05 MB/sec
server /mnt/backup/root # hdparm -tT /dev/nvme0n1p4

/dev/nvme0n1p4:
 Timing cached reads:   20114 MB in  2.00 seconds = 10068.90 MB/sec
 Timing buffered disk reads: 3356 MB in  3.00 seconds = 1118.66 MB/sec


This bugs me. I mean I really don't notice the performance difference, but it seem wrong for ext4 to create such an incredible overhead.

Is this normal? Is this expected?
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6351

PostPosted: Thu May 30, 2019 4:46 pm    Post subject: Reply with quote

Is the partition correctly aligned?
Back to top
View user's profile Send private message
tmcca
Tux's lil' helper
Tux's lil' helper


Joined: 24 May 2019
Posts: 78

PostPosted: Thu May 30, 2019 5:30 pm    Post subject: Reply with quote

I was going to say same thing make sure it is aligned. Also use fstrim instead of discard on root. You can use discard on boot I think that is correct approach.

How did you partition drive? Did you use parted?
Back to top
View user's profile Send private message
mike155
Veteran
Veteran


Joined: 17 Sep 2010
Posts: 1990
Location: Frankfurt, Germany

PostPosted: Thu May 30, 2019 5:33 pm    Post subject: Reply with quote

Is ext4's lazy inode table zeroing still running? See: 'man mkfs.ext4', option 'lazy_itable_init'.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 44945
Location: 56N 3W

PostPosted: Thu May 30, 2019 6:23 pm    Post subject: Reply with quote

RayDude,

Code:
# hdparm -tT /dev/nvme0n1
does raw sequential reads from the block device.
The contents of the blocks read are ignored. That is, the read speed returned by
Code:
hdparm -tT
does not depend on the filesystem, if any.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
RayDude
Veteran
Veteran


Joined: 29 May 2004
Posts: 1558
Location: San Jose, CA

PostPosted: Thu May 30, 2019 8:30 pm    Post subject: Reply with quote

Thanks for the quick replies.

I used gparted to partition the disk so the alignment should be correct.

I'll put fstrim on root and see if that makes a difference.

I'll check the lazy itable feature as well.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 44945
Location: 56N 3W

PostPosted: Thu May 30, 2019 8:51 pm    Post subject: Reply with quote

RayDude,

fstrim is about erasing used but free space in good time before you want to reuse it.
It will make no difference to the read speed.

Boot from a liveCD and rerun the tests when you are sure the partitions are not in use.
Don't even mount them.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
RayDude
Veteran
Veteran


Joined: 29 May 2004
Posts: 1558
Location: San Jose, CA

PostPosted: Thu May 30, 2019 9:42 pm    Post subject: Reply with quote

NeddySeagoon wrote:
RayDude,

fstrim is about erasing used but free space in good time before you want to reuse it.
It will make no difference to the read speed.

Boot from a liveCD and rerun the tests when you are sure the partitions are not in use.
Don't even mount them.


Thanks Neddy, I'll try that.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 5785
Location: Removed by Neddy

PostPosted: Fri May 31, 2019 1:13 pm    Post subject: Reply with quote

What is the IO scheduler being used?
_________________
The best argument against democracy is a five-minute conversation with the average voter
Great Britain is a republic, with a hereditary president, while the United States is a monarchy with an elective king
Back to top
View user's profile Send private message
Zucca
Veteran
Veteran


Joined: 14 Jun 2007
Posts: 1710
Location: KUUSANKOSKI, Finland

PostPosted: Fri May 31, 2019 4:16 pm    Post subject: Reply with quote

If you want to test filesystem performance, then use some other tool, like fio for example.

As Neddy said, hdparm "skips" filesystem. You can test disk performance with hdparm or (apparently) partition performance. As to why the partition performance is that much slower on an SSD, I have no clue. It would make sense if it was HDD you're testing...

Maybe it's about the IO scheduler as Naib was questioning.

I want to see how this ends up...
_________________
..: Zucca :..

Code:
ERROR: '--failure' is not an option. Aborting...
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 5785
Location: Removed by Neddy

PostPosted: Fri May 31, 2019 4:20 pm    Post subject: Reply with quote

also note that hdparm expects pata/sata type devices, nvme is not that so it might mis-report. nvme-tools provides means to do block reads
_________________
The best argument against democracy is a five-minute conversation with the average voter
Great Britain is a republic, with a hereditary president, while the United States is a monarchy with an elective king
Back to top
View user's profile Send private message
Pearlseattle
Apprentice
Apprentice


Joined: 04 Oct 2007
Posts: 162
Location: Switzerland

PostPosted: Fri May 31, 2019 10:44 pm    Post subject: Re: How come EXT4 slows my ssd so much? Reply with quote

RayDude wrote:
Code:
server /mnt/backup/root # hdparm -tT /dev/nvme0n1

/dev/nvme0n1:
 Timing cached reads:   22260 MB in  2.00 seconds = 11144.34 MB/sec
 Timing buffered disk reads: 8146 MB in  3.00 seconds = 2715.05 MB/sec
server /mnt/backup/root # hdparm -tT /dev/nvme0n1p4

/dev/nvme0n1p4:
 Timing cached reads:   20114 MB in  2.00 seconds = 10068.90 MB/sec
 Timing buffered disk reads: 3356 MB in  3.00 seconds = 1118.66 MB/sec


This bugs me. I mean I really don't notice the performance difference, but it seem wrong for ext4 to create such an incredible overhead.

Is this normal? Is this expected?

I thought that the tests done by hdparm did not involve at all the specific filesystem used for the partition?
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 14971

PostPosted: Sat Jun 01, 2019 12:39 am    Post subject: Reply with quote

That is what NeddySeagoon and Zucca both said, yes. The hdparm tests should be usable even on a device with no filesystem at all.

RayDude: please post the actual alignment so we can review whether the alignment is correct. The smartctl -a output could also be interesting. Hide any identifying data (such as serial numbers). We only need general model information.
Back to top
View user's profile Send private message
RayDude
Veteran
Veteran


Joined: 29 May 2004
Posts: 1558
Location: San Jose, CA

PostPosted: Sat Jun 01, 2019 12:42 am    Post subject: Reply with quote

Update: I ran hdparm from a system-restore boot flash on an unmounted /dev/nvmen0p4 and got the same results.

Thanks for telling me about fio, I'll try it.

I just checked and my kernel is configured for no IO Scheduler. How is that possible?

There are three choices: MQ deadline, Kyber, and BFQ. Which should I select?

What does it use if none is selected. I seriously wonder how I did this...

Update: none is apparently good for NVME: https://wiki.ubuntu.com/Kernel/Reference/IOSchedulers

Edit: since I'm using a raid6 array, it looks like I should use deadline...
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
mike155
Veteran
Veteran


Joined: 17 Sep 2010
Posts: 1990
Location: Frankfurt, Germany

PostPosted: Sat Jun 01, 2019 12:49 am    Post subject: Reply with quote

RayDude wrote:
I just checked and my kernel is configured for no IO Scheduler. How is that possible?

"none" (aka "noop") is the correct scheduler to use for NVMe disks.

See: https://stackoverflow.com/questions/27664334/selecting-the-right-linux-i-o-scheduler-for-a-host-equipped-with-nvme-ssd
Back to top
View user's profile Send private message
Pearlseattle
Apprentice
Apprentice


Joined: 04 Oct 2007
Posts: 162
Location: Switzerland

PostPosted: Sat Jun 01, 2019 9:36 pm    Post subject: Reply with quote

Quote:
Edit: since I'm using a raid6 array, it looks like I should use deadline...

What do you mean RayDude? I think that you previously posted tests done directy against an nvme device and not against a raid... .
Back to top
View user's profile Send private message
RayDude
Veteran
Veteran


Joined: 29 May 2004
Posts: 1558
Location: San Jose, CA

PostPosted: Sun Jun 02, 2019 5:03 pm    Post subject: Reply with quote

Pearlseattle wrote:
Quote:
Edit: since I'm using a raid6 array, it looks like I should use deadline...

What do you mean RayDude? I think that you previously posted tests done directy against an nvme device and not against a raid... .


The system boots off an NVME, but has a RAID6 arrary. To optimize the kernel for both the NVME and the RAID6 array it's best for me to use a deadline I/O scheduler. deadline doesn't slow the NVME much, but it improves the performance of the HD ARRAY.
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
RayDude
Veteran
Veteran


Joined: 29 May 2004
Posts: 1558
Location: San Jose, CA

PostPosted: Sun Jun 02, 2019 5:09 pm    Post subject: Reply with quote

Here's the partition table, according to parted:

Code:
server ~ # parted /dev/nvme0n1
GNU Parted 3.2
Using /dev/nvme0n1
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                               
Model: Unknown (unknown)
Disk /dev/nvme0n1: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system     Name      Flags
 1      1049kB  3146kB  2097kB                  BIOSBOOT  bios_grub
 2      3146kB  213MB   210MB   fat16           EFI       msftdata
 3      213MB   8803MB  8590MB  linux-swap(v1)  SWAP
 4      8803MB  1000GB  991GB   ext4            SERVER


Here's smarctl -a:

Code:
server ~ # smartctl -a /dev/nvme0n1
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.1.5-gentoo] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       CT1000P1SSD8
Serial Number:                      XXXXXXXXXXX
Firmware Version:                   P3CR010
PCI Vendor/Subsystem ID:            0xc0a9
IEEE OUI Identifier:                0x000000
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Sun Jun  2 10:06:43 2019 PDT
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0016):   Format Frmw_DL Self_Test
Optional NVM Commands (0x005e):     Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     70 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W       -        -    0  0  0  0        5       5
 1 +     4.60W       -        -    1  1  1  1       30      30
 2 +     3.80W       -        -    2  2  2  2       30      30
 3 -   0.0500W       -        -    3  3  3  3     1000    1000
 4 -   0.0040W       -        -    4  4  4  4     6000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        40 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    2,925,784 [1.49 TB]
Data Units Written:                 3,735,578 [1.91 TB]
Host Read Commands:                 16,841,519
Host Write Commands:                25,212,969
Controller Busy Time:               844
Power Cycles:                       12
Power On Hours:                     198
Unsafe Shutdowns:                   2
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               41 Celsius
Temperature Sensor 2:               39 Celsius
Temperature Sensor 5:               59 Celsius

Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged


Thanks for your help, everyone!
_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
molletts
n00b
n00b


Joined: 16 Feb 2013
Posts: 47

PostPosted: Mon Jun 03, 2019 12:58 pm    Post subject: Reply with quote

RayDude wrote:
The system boots off an NVME, but has a RAID6 arrary. To optimize the kernel for both the NVME and the RAID6 array it's best for me to use a deadline I/O scheduler. deadline doesn't slow the NVME much, but it improves the performance of the HD ARRAY.

You can use different schedulers on different devices if you like.

Put a line like this into /etc/udev/rules.d/10-ioscheduler.rules:
Code:
ACTION=="add|change", KERNEL=="nvme*", ATTR{queue/scheduler}="none"

and the system should automatically use the noop scheduler for all NVMe devices and whatever you select as the default scheduler (e.g. deadline) for all other devices.

You can check which is being used for each device with something like:
Code:
cat /sys/block/nvme0n1/queue/scheduler

substituting the device name as appropriate. It will show a list of available schedulers with the selected one bracketed.

(If you want to try out different schedulers, you can also echo the name of a scheduler that is available in your kernel to the file to change it on the fly.)

Hope this helps,
Stephen
Back to top
View user's profile Send private message
Anon-E-moose
Advocate
Advocate


Joined: 23 May 2008
Posts: 4402
Location: Dallas area

PostPosted: Mon Jun 03, 2019 3:59 pm    Post subject: Reply with quote

hdparm works on devices, not partitions, and (I don't think) arrays.

if you want file system performance, then something like iozone would be more what you need.

Edit to add: not sure why there's a performance difference in your first post, it should make no difference whether you point to whole device or a partition of it, it still uses the whole device, because it talks to the controller (if I'm not mistaken)

https://ssd.userbenchmark.com/SpeedTest/607339/CT1000P1SSD8

If running on a NVMe/PCIe Gen3 x4 slot then the device is supposed to hit ~2000 for reads and ~1700 for writes.
if it's not a gen3 slot then it will be slower, especially if that slot is shared with other cards, which is common on many motherboards.
_________________
Asus m5a99fx, FX 8320 - nouveau, oss4, rx550 for qemu passthrough
Acer laptop E5-575, i3-7100u - i965, alsa
---both---
5.0.13 zen kernel, profile 17.1 (no-pie & modified) amd64-no-multilib
gcc 8.2.0, eudev, openrc, openbox, palemoon
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 14971

PostPosted: Tue Jun 04, 2019 1:03 am    Post subject: Reply with quote

Please post the partition table without rounding. sgdisk --print can do this. The parted output is not clear whether the partitions are aligned to any of the commonly important boundaries.
Back to top
View user's profile Send private message
RayDude
Veteran
Veteran


Joined: 29 May 2004
Posts: 1558
Location: San Jose, CA

PostPosted: Sat Jun 08, 2019 3:51 pm    Post subject: Reply with quote

Hu wrote:
Please post the partition table without rounding. sgdisk --print can do this. The parted output is not clear whether the partitions are aligned to any of the commonly important boundaries.


update: found it:

Code:
server ~ # sgdisk --print /dev/nvme0n1
Disk /dev/nvme0n1: 1953525168 sectors, 931.5 GiB
Model: CT1000P1SSD8                           
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 1A547616-F8A0-485F-B15F-B6723E76FF7C
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 2048-sector boundaries
Total free space is 3437 sectors (1.7 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048            6143   2.0 MiB     EF02  BIOSBOOT
   2            6144          415743   200.0 MiB   0700  EFI
   3          415744        17192959   8.0 GiB     8200  SWAP
   4        17192960      1953523711   923.3 GiB   8300  SERVER




I can't find sgdisk...

How about this output from fdisk:

Code:
server ~ # fdisk /dev/nvme0n1

Welcome to fdisk (util-linux 2.33.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): p
Disk /dev/nvme0n1: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: CT1000P1SSD8                           
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 1A547616-F8A0-485F-B15F-B6723E76FF7C

Device            Start        End    Sectors   Size Type
/dev/nvme0n1p1     2048       6143       4096     2M BIOS boot
/dev/nvme0n1p2     6144     415743     409600   200M Microsoft basic data
/dev/nvme0n1p3   415744   17192959   16777216     8G Linux swap
/dev/nvme0n1p4 17192960 1953523711 1936330752 923.3G Linux filesystem

_________________
Some day there will only be free software.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum