Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
BTRFS: The SSD killer
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2  
Reply to topic    Gentoo Forums Forum Index Unsupported Software
View previous topic :: View next topic  
Author Message
Shining Arcanine
Veteran
Veteran


Joined: 24 Sep 2009
Posts: 1110

PostPosted: Sat Jul 10, 2010 11:41 pm    Post subject: Reply with quote

DestroyFX wrote:
For SSD, you must:

  • Use NOOP scheduler
  • align partition with HDD blocks and use the same size of sectors if possible
  • use noatime, compress, ssd_spread and nodiratime mount options


You do not need to use a noop i/o scheduler. Using it can actually be harmful to both performance and your ssd's longevity, because then the I/O scheduler is sequentially passing commands to the disk, when it could be doing more intelligent things, like consolidating writes to the same block into a single write and sequentializing reads. Things might feel faster with noop, but the moment you put your system under stress (e.g. you emerge updates), it will definitely perform slower.

None of the other things are strictly necessary either. While some like partition alignment is generally a good idea, some of the mount options you suggest are not. There is nothing to suggest ssd_spread is better than ssd in general. noatime implies nodiratime, so there is no point in specifying nodiratime. compress is probably a good idea, but it depends on your CPU's processing power. noatime is usually a good idea for SSDs, but you are not forced to use it.
Back to top
View user's profile Send private message
dmpogo
Advocate
Advocate


Joined: 02 Sep 2004
Posts: 2511
Location: Canada

PostPosted: Sun Jul 11, 2010 1:03 am    Post subject: Reply with quote

Shining Arcanine wrote:
DestroyFX wrote:
For SSD, you must:

  • Use NOOP scheduler
  • align partition with HDD blocks and use the same size of sectors if possible
  • use noatime, compress, ssd_spread and nodiratime mount options


You do not need to use a noop i/o scheduler. Using it can actually be harmful to both performance and your ssd's longevity, because then the I/O scheduler is sequentially passing commands to the disk, when it could be doing more intelligent things, like consolidating writes to the same block into a single write and sequentializing reads


As I understand, Noop actually mergers writes as well, if they come back to back

The logic of complicated schedulers of reodering multi-sector writes so that they come next to each other, is practically a pure overhead on SSD's
Back to top
View user's profile Send private message
Shining Arcanine
Veteran
Veteran


Joined: 24 Sep 2009
Posts: 1110

PostPosted: Sun Jul 11, 2010 6:38 am    Post subject: Reply with quote

dmpogo wrote:
Shining Arcanine wrote:
DestroyFX wrote:
For SSD, you must:

  • Use NOOP scheduler
  • align partition with HDD blocks and use the same size of sectors if possible
  • use noatime, compress, ssd_spread and nodiratime mount options


You do not need to use a noop i/o scheduler. Using it can actually be harmful to both performance and your ssd's longevity, because then the I/O scheduler is sequentially passing commands to the disk, when it could be doing more intelligent things, like consolidating writes to the same block into a single write and sequentializing reads


As I understand, Noop actually mergers writes as well, if they come back to back

The logic of complicated schedulers of reodering multi-sector writes so that they come next to each other, is practically a pure overhead on SSD's


Then why does CFQ outperform the noop scheduler at compiling the Linux kernel when -j3 is passed to make?

http://www.alphatek.info/2009/02/02/io-scheduler-and-ssd-part-2/

As far as I know, the no-op scheduler is called no-op because it does no operations on requests made to the disk. That is why the CFQ scheduler outperforms it when doing kernel compilations on a multicore system. With a single core, context switching will prevent multiple threads from making requests at once, and a SSD can service them all quickly, but with multiple cores, many requests to different areas are being made and the SSD simply cannot keep up with them. That is why the CFQ scheduler wins in multithreaded scenarios.

Keep in mind that when we talk about SSDs, we are typically talking about NAND based SSDs, which are little more than glorified mechanical hard drives in their performance characteristics. You need to switch to either phase change based SSDs or DRAM based SSDs before you see a real benefit from the no-op scheduler in multicore systems. Performance is inversely proportional to latencies, and in an idea scenario, a SLC NAND based SSD can respond in about 100 microseconds. That might seem impressive compared to the 10 milliseconds it takes for a mechanical hard drive, but DRAM can respond in less than 5 nanoseconds. CPU caches and registers take less than a nanosecond. 100 microseconds is still an eternity as far as your CPU is concerned and that is only in the ideal case. In real world conditions, you can observe latencies approaching that of hard drives. Why? Simple, it takes 2 milliseconds to erase an erase block and if you need to erase multiple blocks, you will be waiting an awfully long time.

Anyway, multicore systems with SSDs still need the CFQ scheduler. The only exception is on single core systems, which benefit from no-op.
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2864
Location: Bay Area, CA

PostPosted: Sun Jul 11, 2010 4:53 pm    Post subject: Reply with quote

DigitalCorpus wrote:
I have to ask since this was never mentioned, but did you use the ssd mount option for BTRFS?
Yes, I always used ssd,discard,noatime options on the BTRFS FSs.
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2864
Location: Bay Area, CA

PostPosted: Sun Jul 11, 2010 5:38 pm    Post subject: Reply with quote

I have been actually using deadline because in my comparisons, deadline worked best with SSDs. Things may have changed in last 1 year wrt CFQ but I haven't tested it on an SSD off late.

My own assessment of noop was that it will be worse than deadline for SSDs because IOs reach the disk as they arrive. Random small IOs (which is what a typical mostly-idle system does and what FS itself does for writing back journal) which are separated by few milliseconds will not be merged, and will lead to writing and erasing (in case an erase is needed for that write) more NAND blocks than necessary.

Other parameters which are interesting for SSD are to do with when the pdflush (dirty_writeback_centisecs, dirty_expire_centisecs, dirty_background_ratio/bytes) is triggered to do the IO or when the process itself takes on the flushing task (dirty_ratio/bytes), commit interval for the FS. Tuning these "higher" is good for SSDs. You run the risk of losing data but it doesn't affect FS consistency as long as the disk honors sync flush.
Back to top
View user's profile Send private message
dmpogo
Advocate
Advocate


Joined: 02 Sep 2004
Posts: 2511
Location: Canada

PostPosted: Sun Jul 11, 2010 9:58 pm    Post subject: Reply with quote

[quote="Shining Arcanine"][quote="dmpogo"][quote="Shining Arcanine"]
DestroyFX wrote:
For SSD, you must:

Then why does CFQ outperform the noop scheduler at compiling the Linux kernel when -j3 is passed to make?

http://www.alphatek.info/2009/02/02/io-scheduler-and-ssd-part-2/

As far as I know, the no-op scheduler is called no-op because it does no operations on requests made to the disk. That is why the CFQ scheduler outperforms it when doing kernel compilations on a multicore system. With a single core, context switching will prevent multiple threads from making requests at once, and a SSD can service them all quickly, but with multiple cores, many requests to different areas are being made and the SSD simply cannot keep up with them. That is why the CFQ scheduler wins in multithreaded scenarios.

Keep in mind that when we talk about SSDs, we are typically talking about NAND based SSDs, which are little more than glorified mechanical hard drives in their performance characteristics. You need to switch to either phase change based SSDs or DRAM based SSDs before you see a real benefit from the no-op scheduler in multicore systems. Performance is inversely proportional to latencies, and in an idea scenario, a SLC NAND based SSD can respond in about 100 microseconds. That might seem impressive compared to the 10 milliseconds it takes for a mechanical hard drive, but DRAM can respond in less than 5 nanoseconds. CPU caches and registers take less than a nanosecond. 100 microseconds is still an eternity as far as your CPU is concerned and that is only in the ideal case. In real world conditions, you can observe latencies approaching that of hard drives. Why? Simple, it takes 2 milliseconds to erase an erase block and if you need to erase multiple blocks, you will be waiting an awfully long time.

Anyway, multicore systems with SSDs still need the CFQ scheduler. The only exception is on single core systems, which benefit from no-op.


I did not do a personal assesment of NOOP versus CFQ, I actually use deadline with my SSDs.
I found the following description of the schedulers to be rather concise

http://linuxkernel2.atw.hu/ch13lev1sec5.html


However, there are two sides to the schedulers

1) How I/O is managed relative to the properties of the disk/disk controller
2) How I/O is managed between different simultaneous processes

Regarding hardware, SSD's are very different from mechanical drives. If we speak about write operations, the main differences (positive and negative) are
a) No seek cost (plus)
b) High cost of erase, minimal 512k erase block (minus)
c) parallel write to all NAND chips simultaneously (plus)
d*) sophisticated firmware operations - to use a,c, compensate for b, spread for uniform wear.

Because of that write performance very much depends on the quality of firmware. Good firmware does not erase on rewrite if there is empty space (which leads to performance degradation as disk fills up, but there are talks about some firmware erasing 'in spare time'), reoders writes if necessary, spreads writes over different NAND's to utilize parallelization etc.

Some of that may match what kernel scheduler is doing but it seems general opinion that it is better to get it away and let firmware do its job. However, if firmware is poor (like on JMicron chips), don't know

d*) Actually modern SATA drives with NCQ also have reodering of the write requests by the controller so people advocate for deadline scheduler for such drives as well.

2) Distributing I/O requests for multiple processes that are using I/O simultaniously is a different aspect. NOOP is most trivial one, and quite possibly (even probably) should be beaten by more sophisticated ones.

Especially "make -j3" is a peculiar example. This is not just three independent threads that read and write, but, as in parallelized code, several threads that interdepend on each other.
One process may need output of the other, and there are steps that can be performed only when all previous are accomplished. So there is a danger to have performance 'serialized' when
one has to wait untill I/O in several threads completes. Much depends on how things are cached as well, I guess.

So I'm not suprised that CFQ gives advantage in such scenario, since it was designed for 'fair queueing'. On the other hand, that 'make -j3' may not be a characteristic load with its high level interdepency between I/O of multiple processes. It would be interesting to see if CFQ shows any advantage with several completely independent thread, writing at the same time. Like compiling 3 kernels at the same time with plain serial make.
Back to top
View user's profile Send private message
Shining Arcanine
Veteran
Veteran


Joined: 24 Sep 2009
Posts: 1110

PostPosted: Mon Jul 12, 2010 6:24 am    Post subject: Reply with quote

There actually is a seek time for NAND SSDs, because the seek time is the latency before you start receiving data. Those latencies are typically on the order of 100 microseconds, but the erase block time penalty is on the order of milliseconds. When that is taken into consideration seek times for NAND SSDs can reach those of mechanical hard disks, despite them operating on two completely different mechanisms. People tend to think of hard drives as having a constant seek time and make a distinction between them and NAND SSDs, but that is not true, because hard drives have a variable seek time that depends on cache, and the relative positions of data and the hard drive head on the disk. If the data is in cache, the seek time will be incredibly low, but if the data is on the disk with the head approaching it as the request is made, the seek time will be higher than it would have been if the data had been in cache, but it will still be less than the figure presented for the hard drive.

I own multiple NAND-flash SSDs and the wow-effect has worn off. As I said in an earlier post, they are nothing but glorified hard drives in their performance characteristics. Depending on the model, they could be as much as 100 times faster than regular hard drives in random operations, but that is still slow when it is taken into perspective. DRAM is still 1,000 times faster. The 100 times figure is actually an upper bound on an outlier, as the majority of models are no more than 10 times faster. The big exception to this are Intel's SSDs (in random writes only, where they are 30 times faster) and SSDs based on the controllers recently related this year. Despite these improvements, SSDs maintain a large mechanical hard drive-like gap between sequential performance and random performance. For that reason, I consider them to be nothing more than glorified hard drives, because a proper "instant seek" storage technology should have no such gap.

By the way, anyone that thinks that NAND flash based SSDs are fast will be blown away by phase change based SSDs, which have similar performance to DRAM:

http://en.wikipedia.org/wiki/Phase-change_memory
Back to top
View user's profile Send private message
dmpogo
Advocate
Advocate


Joined: 02 Sep 2004
Posts: 2511
Location: Canada

PostPosted: Mon Jul 12, 2010 6:46 am    Post subject: Reply with quote

Shining Arcanine wrote:


I own multiple NAND-flash SSDs and the wow-effect has worn off. As I said in an earlier post, they are nothing but glorified hard drives in their performance characteristics. Depending on the model, they could be as much as 100 times faster than regular hard drives in random operations, but that is still slow when it is taken into perspective. DRAM is still 1,000 times faster. The 100 times figure is actually an upper bound on an outlier, as the majority of models are no more than 10 times faster. The big exception to this are Intel's SSDs (in random writes only, where they are 30 times faster) and SSDs based on the controllers recently related this year. Despite these improvements, SSDs maintain a large mechanical hard drive-like gap between sequential performance and random performance. For that reason, I consider them to be nothing more than glorified hard drives, because a proper "instant seek" storage technology should have no such gap.

By the way, anyone that thinks that NAND flash based SSDs are fast will be blown away by phase change based SSDs, which have similar performance to DRAM:

http://en.wikipedia.org/wiki/Phase-change_memory


Well, I have Intel ones :) You are right, SSD's are not that much faster than mechanical ones, although in reads they are pretty good :). But the logic of access is quite different,
and we are talking about schedulers after all.
Back to top
View user's profile Send private message
Shining Arcanine
Veteran
Veteran


Joined: 24 Sep 2009
Posts: 1110

PostPosted: Mon Jul 12, 2010 3:20 pm    Post subject: Reply with quote

dmpogo wrote:
Shining Arcanine wrote:


I own multiple NAND-flash SSDs and the wow-effect has worn off. As I said in an earlier post, they are nothing but glorified hard drives in their performance characteristics. Depending on the model, they could be as much as 100 times faster than regular hard drives in random operations, but that is still slow when it is taken into perspective. DRAM is still 1,000 times faster. The 100 times figure is actually an upper bound on an outlier, as the majority of models are no more than 10 times faster. The big exception to this are Intel's SSDs (in random writes only, where they are 30 times faster) and SSDs based on the controllers recently related this year. Despite these improvements, SSDs maintain a large mechanical hard drive-like gap between sequential performance and random performance. For that reason, I consider them to be nothing more than glorified hard drives, because a proper "instant seek" storage technology should have no such gap.

By the way, anyone that thinks that NAND flash based SSDs are fast will be blown away by phase change based SSDs, which have similar performance to DRAM:

http://en.wikipedia.org/wiki/Phase-change_memory


Well, I have Intel ones :) You are right, SSD's are not that much faster than mechanical ones, although in reads they are pretty good :). But the logic of access is quite different,
and we are talking about schedulers after all.


I have an 80GB Intel X25-M G2 SSD in my desktop. It is something like 260 MBps sequential read, 90 MBps sequential write, 17 MBps 4KB random read and 60 Mbps 4KB random write. The big gap between random and sequential access is very hard drive like, so the same algorithms should still apply.
Back to top
View user's profile Send private message
mbar
Veteran
Veteran


Joined: 19 Jan 2005
Posts: 1979
Location: Poland

PostPosted: Wed Jul 14, 2010 12:57 pm    Post subject: Reply with quote

Where is that patch? I cannot locate it.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Unsupported Software All times are GMT
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum