Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
any lessfs users out there?
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3  
Reply to topic    Gentoo Forums Forum Index Unsupported Software
View previous topic :: View next topic  
Author Message
Moriah
Advocate
Advocate


Joined: 27 Mar 2004
Posts: 2118
Location: Kentucky

PostPosted: Mon Apr 12, 2010 1:09 pm    Post subject: Reply with quote

If you post comparisons, please remember that dedup only buys you an advantage if there is actually duplicate stuff being written to the filesystem. A week's worth of backups of several different but similar machines would be very interesting to see. :)
_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2870
Location: Bay Area, CA

PostPosted: Mon Apr 12, 2010 9:03 pm    Post subject: Reply with quote

OK. I have a brief update. lessfs is way behind zfs-fuse in terms of performance. I finished the backup in 12 hours instead of 22 hours. Because I used the gzip compression in ZFS (which should have slowed it down compared to LZO compression in lessfs), the size of the backup was about about 510GB instead of 550GB.

I think I am liking the zfs-fuse a lot. One of the things which was really amazing was that the fact it could use my old (but quite performant) USB drive as cache for metadata (which generates a lot of random reads). This was the reason for speed up. The dedup requires a lot of random IO and random IO on platter drives is known to be order of magnitude slower than flash (yeah, even the USB variety if you have the extreme versions like Patriot Xporter).

Another thing I noticed was that ZFS runs multi-threaded (in fact it was running 185 threads at a time.. 8O ) and was compressing multiple blocks at a time. Hence it could use more cores on my box compared to lessfs. This was the reason for no-slowdowns seen because of gzip usage. The parallelism subsumed the speed loss because of heavier compression of gzip compared to lzo.

And this is with the added overhead of FUSE. The performance would be equal to BTRFS if this was a native kernel based FS.

Now, I am dreaming about putting ZFS-FUSE on my main data...:-)
Back to top
View user's profile Send private message
Moriah
Advocate
Advocate


Joined: 27 Mar 2004
Posts: 2118
Location: Kentucky

PostPosted: Mon Apr 12, 2010 10:00 pm    Post subject: Reply with quote

OK, so what is the ZFS dedup algorithm like? Is it basicly the same as lessfs -- hash the data block, look up the hash in an index to see if you already have it stored, wirte it if you don't, and finally keep a pointer to the block (hash slot, or actual data block?) for the sequence of blocks that make up the file?

128 threads seems like way too many unless you are running a many-socketted server motherboard? Seems like it ought to have some way to regulate thread spawning according to the number of *AVAILABLE* cores...

Remember, each thread has a context switching time if there are more ready to run threads than cores to run them with. :?

And besides backup, how does zfs perform for a vm virtual disk?

Also, has anybody tried dedup for log files? They tend to be highly redundant, but you might need a logger that periodically re-aligns the start of a line with the start of a block to maximize this, like whenever a single log line would span 2 blocks, make it start at the beginning of the new block instead. Compression would essentially make this a no-cost optimization, disk space wise.
_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2870
Location: Bay Area, CA

PostPosted: Mon Apr 12, 2010 10:14 pm    Post subject: Reply with quote

Moriah wrote:
OK, so what is the ZFS dedup algorithm like? Is it basicly the same as lessfs -- hash the data block, look up the hash in an index to see if you already have it stored, wirte it if you don't, and finally keep a pointer to the block (hash slot, or actual data block?) for the sequence of blocks that make up the file?

128 threads seems like way too many unless you are running a many-socketted server motherboard? Seems like it ought to have some way to regulate thread spawning according to the number of *AVAILABLE* cores...

Remember, each thread has a context switching time if there are more ready to run threads than cores to run them with. :?

And besides backup, how does zfs perform for a vm virtual disk?

Also, has anybody tried dedup for log files? They tend to be highly redundant, but you might need a logger that periodically re-aligns the start of a line with the start of a block to maximize this, like whenever a single log line would span 2 blocks, make it start at the beginning of the new block instead. Compression would essentially make this a no-cost optimization, disk space wise.
Its the same technique. Instead of 192-bit hash, it uses 256-bit hash, so its a little stronger.

Not all the threads are active at a time. But yeah, 185 (currently running 183!) is little too much! But Solaris/ZFS folks love threads.
Back to top
View user's profile Send private message
Moriah
Advocate
Advocate


Joined: 27 Mar 2004
Posts: 2118
Location: Kentucky

PostPosted: Tue Apr 13, 2010 1:21 am    Post subject: Reply with quote

Quote:
Its the same technique. Instead of 192-bit hash, it uses 256-bit hash, so its a little stronger.


SHA-256? (hopefully) would be a bit stronger, better studied, but slower.

Quote:
Solaris/ZFS folks love threads.


Threads are fine, but each one needs stack space, and the context on each stack is potentially large. This can hog a lot of ram, and if you swap it, then you thrash on a context switch. :(
_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.
Back to top
View user's profile Send private message
Moriah
Advocate
Advocate


Joined: 27 Mar 2004
Posts: 2118
Location: Kentucky

PostPosted: Thu Apr 15, 2010 8:30 pm    Post subject: Reply with quote

Of course, the pragmatic answer is just throw more cores and more ram at it. :?

My backup server is 2.67 GHz quad-core with 8 GB ram and 5 1.5 TB SATA drives in a software RAID-1 3-way mirror. I have a boot/work drive, 3 raid mirror drives, and a spare drive/slot. Using file level dedup, I backup 18 machines every night for a month to it, then I pull a drive for an off-line archive and shove in a new one to catch up and take its place. That's about $100.00 US per month to have every night for every machine backed up and safe offline for posterity.

The most painful part is the monthly purge, which takes *DAYS* to complete. :evil:

After the monthly archive drive is pulled, the previous month needs to be deleted. You don't want to delete everything and start over, since the baseline is already in the dedup area, but you do want to purge all the nightly filesystem backups for the previous month. This amounts to tens or even hundreds of millions of directory entries, all of which are actually hard links into the dedup area. Each time a hard link is removed, a check is made (by the filesystem) to see if it was the *LAST* reference to that file, and if so, all the data assiciated with that file also needs to be deleted. All this activity results in an enormous number of seeks and head motion on the drive, and hence a lot of time. I wish there was a better way... :(
_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7267
Location: almost Mile High in the USA

PostPosted: Fri Sep 03, 2010 5:32 am    Post subject: Reply with quote

latecomer to lessfs, deliberately necroing this thread...

Trying it on an Athlon XP 2200+ ... this is really slow with qls compression is pretty much unusable due to the severe speed penalty, basically it's so slow that any auxiliary interactive accesses to disk will be extremely slow. I think I'll have to stick with deflate compression or something...

Anyone try this on an Atom/via centaur/arm/other low power CPU (Sheevaplug, Geode, NSLU2? Sigh...it's like not much choice but to use some powerful CPU for it...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
vmk
n00b
n00b


Joined: 25 May 2004
Posts: 31

PostPosted: Fri Sep 03, 2010 10:39 am    Post subject: Reply with quote

I tried lessfs on a Dual Xeon 3GHz with 16GB RAM and 6 HD HW-RAID5. In the first try, the performance was ugly. Writing down the first files, it was very quick, but than the performance suffered extremely. In my next (with manually tuning the database with the extra script somewhere hidden in the comments on the hompage) , lessfs consumed 128GB virtual memory. After writing more than 100GB and more than 100.000 files, the performance was again ugly slow.
We switched to zfs-fuse. First i thought, zfs would be very complex to setup, but it is much easier than lessfs. Just start the daemon and do a 'zfs create myzfs sdb1' and then use it. Mainly the performance is good (with dedup and compression) and you can change the parameters for caching, compression on runtime.
_________________
"Security is like an onion - the more you dig in the more you want to cry"
Back to top
View user's profile Send private message
Moriah
Advocate
Advocate


Joined: 27 Mar 2004
Posts: 2118
Location: Kentucky

PostPosted: Fri Sep 03, 2010 3:06 pm    Post subject: Reply with quote

With the rising popularity and falling cost of flash SSD's, and the increasing complexity of the controllers for them, I am becomeing interested in the posibioities of implementing a dedup device, instead of a dedup filesystem. The dedup would be transparent to the host OS and filesystem, as it would all be taken care of in the drive controller. This has a lot of advantages for flash based drives. The newer controllers are already doing hardware compression. They could just as well do the cryptographic hashing in hardware as well, in parallel with the wear leveling and the actual access. Since dedup reduces the amount of writing to the drive, it would increase the lifetime of flash devices. With the typical compression ratios achieved by dedup, you could make a 5 TB drive with the same amount of flash as a 250 GB drive uses without dedup.

Now, if I can just get a consulting gig with an outfit that does flash drive controllers... :twisted:
_________________
The MyWord KJV Bible tool is at http://www.elilabs.com/~myword

Foghorn Leghorn is a Warner Bros. cartoon character.
Back to top
View user's profile Send private message
jbouzan
Tux's lil' helper
Tux's lil' helper


Joined: 23 Nov 2007
Posts: 138

PostPosted: Fri Sep 03, 2010 6:52 pm    Post subject: Reply with quote

Interesting thread. Sorry if I've just forgotten something said earlier in it, but has anyone tried using lessfs for personal backups? I'm worried about the aging of my computers and want to have some full system backups, instead of my crucial file backups on dropbox, but I don't have much space available on external hard drives. I have disks maybe 50% bigger than the filesystem on the computer, and want to know if that will let me store more than one or two previous versions of the files.
Back to top
View user's profile Send private message
Zucca
Veteran
Veteran


Joined: 14 Jun 2007
Posts: 1579
Location: KUUSANKOSKI, Finland

PostPosted: Sun Sep 05, 2010 9:46 am    Post subject: Reply with quote

Interesting thread. :)

I have all my backups on XFS. And backing up small files is slow.
I wonder if this lessfs would be better choice...
But a file system is just one of those things what I won't use is it's marked as testing/unstable.
The compression thing is also really interesting. I just hope there would be zlib/lzma2 support too.
_________________
..: Zucca :..

Code:
ERROR: '--failure' is not an option. Aborting...
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7267
Location: almost Mile High in the USA

PostPosted: Sun Sep 05, 2010 1:53 pm    Post subject: Reply with quote

Lessfs is _SLOW_
I really mean _SLOW_ -- Then again I'm using an Athlon XP2200+ ... Not fast at all by today's standards. It used to be fairly fast...

I was hoping the speed would make up for its dedup/compression but I'm not sure it's worth it due to its speed...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2870
Location: Bay Area, CA

PostPosted: Sun Sep 05, 2010 5:10 pm    Post subject: Reply with quote

eccerr0r wrote:
Lessfs is _SLOW_
I really mean _SLOW_ -- Then again I'm using an Athlon XP2200+ ... Not fast at all by today's standards. It used to be fairly fast...

I was hoping the speed would make up for its dedup/compression but I'm not sure it's worth it due to its speed...
yeah, I second this. I tried lessfs for a while and it became slow, slower and slowest with time as I kept adding files to it. It doesn't scale well.

Then, I moved on to zfs-fuse. That's the one you want! zfs-fuse has no scaling issues.
Back to top
View user's profile Send private message
Zucca
Veteran
Veteran


Joined: 14 Jun 2007
Posts: 1579
Location: KUUSANKOSKI, Finland

PostPosted: Sun Sep 05, 2010 6:09 pm    Post subject: Reply with quote

Sooo... I'll be waiting for btrfs to become stable. :)

Or is there any filesystem with high compression ratio?
_________________
..: Zucca :..

Code:
ERROR: '--failure' is not an option. Aborting...
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2870
Location: Bay Area, CA

PostPosted: Sun Sep 05, 2010 9:31 pm    Post subject: Reply with quote

Zucca wrote:
Sooo... I'll be waiting for btrfs to become stable. :)

Or is there any filesystem with high compression ratio?
You can't get better than zfs-fuse on Linux at this time. And then, when we have native ZFS in near future, it will be no-brainer.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Unsupported Software All times are GMT
Goto page Previous  1, 2, 3
Page 3 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum