Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[Proof of Concept] SLTag - a simplistic tagging utility
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Unsupported Software
View previous topic :: View next topic  
Author Message
Dr.Willy
Guru
Guru


Joined: 15 Jul 2007
Posts: 491
Location: NRW, Germany

PostPosted: Mon Nov 14, 2011 5:57 pm    Post subject: [Proof of Concept] SLTag - a simplistic tagging utility Reply with quote

SLTag: https://github.com/drwilly/sltag

README wrote:
SLTag is a utility to assign arbitrary tags to local files.
Unlike other tools I've found, SLTag is extremely simplistic. It has no dependencies other than python itself.
Instead of using a database, SLTag stores tags in .sltag repositories.
A repository contains a folder for each tag which in turn contains symlinks pointing to the tagged files.
Code:
dir/
|-- .sltag/
|   |-- tag1/
|   |   |-- 05168 -> ../../file1
|   |   `-- 24545 -> ../../file2
|   `-- tag2/
|       |-- 24545 -> ../../file2
|       `-- 58434 -> ../../file3
|-- file1
|-- file2
`-- file3

When a file is tagged, the file's inode number is used as the symlink name so that broken symlinks caused by renaming a tagged file can be fixed.
Here, file1 would have been tagged 'tag1', file2 'tag1' and 'tag2' and file3 would have been tagged 'tag2'.
So querying tags is effectively just listing directory contents. As such programs can, to a limited extend, use .sltag repositories without any kind of client.
Only querying for a combination of tags requires a bit of logic, which is done by SLTag.


I wrote SLTag with my music collection in mind, because I wanted a way to quickly access songs with certain moods but found something like 'Genre' unsatisfying and I wanted it to work for all players.
Right now a player has to either support reading symlinks or read filepaths from stdin, however generating playlists from SLTags output should be trivial.

Documentation is pretty sparse right now (actually you just read most of it), but SLTag's current commands are:
init
- creates a new repo
tag-files tag file [file…]
- adds 'tag' to one or more files
untag-files tag file [file…]
- removes 'tag' from one or more files
add-tag file tag [tag…]
- yadayada
remove-tag file tag [tag…]
- …
list [tag…]
- without args it lists all existing tags. with args it will print the files having all supplied tags. (e.g. 'sltag list foo bar' will list all files that have the tags foo AND bar)
orphans
- lists broken symlinks or mismatching inodes

I'd be happy for all kinds of feedback, regarding the current implementation or the idea in general. :)


Last edited by Dr.Willy on Tue Nov 15, 2011 2:46 pm; edited 1 time in total
Back to top
View user's profile Send private message
sirlark
Guru
Guru


Joined: 25 Oct 2004
Posts: 305
Location: Cape Town, South Africa

PostPosted: Mon Nov 14, 2011 10:06 pm    Post subject: Reply with quote

Not sure if your looking for feedback, but I've been considering the tagging problem for some time. I thought of using POSIX extended attributes. The thing is you really want to make tagging as transparent as possible to user applications, and the way I thought of doing this was extending glob syntax. Here's a brief summary of my thoughts I jotted down in a text file a while ago, and has since languished on my hard drive sadly.

Code:
Extend POSIX user space utilities to facilitate easy specification, querying,
and manipulation of single files or collections of files by arbitrary
categories, aka tags.

--- Actual Storage ---

A file may have zero or more tags assigned to it. The tags assigned to a file
are stored as a comma separated list (representing a set, i.e. no duplicates)
with no intervening whitespace in the POSIX extended attribute user.tags; The
list is ordered according to the applicable locale to facilitate faster
searching of the tag list.

A file without this attribute set is defined as having an empty tag list, and
may also be known as a file 'without tags', 'with no tags', or an 'untagged file'.


--- Modifications to globbing syntax ---

Standard globbing syntax should be extended as follows
 - a tag specification (tagspec) is introduced at the end of any valid standard
   glob with the special character '\:', recognised as different from the
   un-escaped ':'
 - Following the '\:' a tag spec is composed of a binary expression of tags
   containing no whitespace

<tag spec>  ::= <tag>
              | ~ <tag spec>
              | <tag spec> + <tag spec>
              | <tag spec> , <tag spec>
              | ( <tag spec> ) 
<tag>       ::= <character>
              | <character><tag>
<character> ::= [A-Z_a-z0-9]    (plus international glyphs such as A umluad)

'+' indicates logical OR
',' indicates logical AND
'~' indicates logical NOT

A file is defined as matching a glob with a <tag spec> if file matches the
binary expression that is the tag spec such that the most basic logical unit of
such an expression (i.e. a single tag) is interpreted as: the tag exists within
the file's tag list. Globs specified without a <tag spec> function as they
presently, and match files with non-empty tag lists as normal too.

--- A convenient userspace utility to inspect and manipulate tags ---

The command will be named 'tag' and have the following invocation syntax

tag [<-d tag[,tag,...]>|<-a tag[,tag,...]>|-u] <file list>

Invoked with '-a', tag causes the tag lists of each file in <file list> (which
may also be produced using a shell glob, and hence specify files matching
a certain tag spec) to have its tag list replaced by the union of its
current tag list with the tag list specified on the command line

Invoked with '-d', tag causes the tag lists of each file in <file list> to be
replaced by the set difference between the file's current tag list and the tag
list provided on the command line.

Invoked with '-u', tag outputs a single locale ordered tag list that is the
union of the tag lists of all the files in <file list>.

Invoked without any options (-a,-d, or -s) tag outputs a line for each file in
'file list' containing the file name, followed by a ':' followed by the comma
separated tag list of that file, with no whitespace.

_________________
Adopt an unanswered post today
Back to top
View user's profile Send private message
Dr.Willy
Guru
Guru


Joined: 15 Jul 2007
Posts: 491
Location: NRW, Germany

PostPosted: Tue Nov 15, 2011 10:28 am    Post subject: Reply with quote

sirlark wrote:
Not sure if your looking for feedback
Yes, I am :)

sirlark wrote:
I thought of using POSIX extended attributes. The thing is you really want to make tagging as transparent as possible to user applications, and the way I thought of doing this was extending glob syntax. Here's a brief summary of my thoughts I jotted down in a text file a while ago, and has since languished on my hard drive sadly.
Hm. The problem with extattrs is that the file->tags lookup is really fast, but the tags->files lookup requires scanning all files that potentially have the tag. Using extattrs would require some sort of lookup cache to get satisfying response times.
Back to top
View user's profile Send private message
sirlark
Guru
Guru


Joined: 25 Oct 2004
Posts: 305
Location: Cape Town, South Africa

PostPosted: Tue Nov 15, 2011 1:10 pm    Post subject: Reply with quote

Quote:
Hm. The problem with extattrs is that the file->tags lookup is really fast, but the tags->files lookup requires scanning all files that potentially have the tag. Using extattrs would require some sort of lookup cache to get satisfying response times.


Good point, maybe it would be worth storing the tags->files in the extattrs of the containing directories. Alternatively, an approach similar to yours (hidden tag name dirs with symlinks) might work. Problem is, neither of these approaches work for getting all files matching tags in subdirectories recursively. This would require some form of centralized database system. But this obviates the best feature of extattrs, which is that the tagging meta-data gets moved with the file, across renames and moves to different filesystems (assuming the destination fs has extattrs enabled)

I wonder though, how slow it would be to query all the tags in a directory. ls needs to stat each file doesn't it? And the extattr query is similar to a stat, in that the files themselves don't need to be opened. Might do some timing test this weekend...
_________________
Adopt an unanswered post today
Back to top
View user's profile Send private message
Dr.Willy
Guru
Guru


Joined: 15 Jul 2007
Posts: 491
Location: NRW, Germany

PostPosted: Tue Nov 15, 2011 3:07 pm    Post subject: Reply with quote

sirlark wrote:
Problem is, neither of these approaches work for getting all files matching tags in subdirectories recursively.

Uh - sure, it works just fine.
Back to top
View user's profile Send private message
sirlark
Guru
Guru


Joined: 25 Oct 2004
Posts: 305
Location: Cape Town, South Africa

PostPosted: Tue Nov 15, 2011 3:36 pm    Post subject: Reply with quote

Sorry, poorly phrased. What I meant is that finding all files in the current directory or lower requires recursion for both the mentioned methods, whereas storing a centralized database of tag:pathname links means the entire file system can be queried quickly and efficiently, without recursion.
_________________
Adopt an unanswered post today
Back to top
View user's profile Send private message
Dr.Willy
Guru
Guru


Joined: 15 Jul 2007
Posts: 491
Location: NRW, Germany

PostPosted: Tue Nov 15, 2011 4:29 pm    Post subject: Reply with quote

Why would a symlink database require recursion? If a file has been tagged, there's a symlink to it in the tag's directory.
Back to top
View user's profile Send private message
sirlark
Guru
Guru


Joined: 25 Oct 2004
Posts: 305
Location: Cape Town, South Africa

PostPosted: Wed Nov 16, 2011 8:25 am    Post subject: Reply with quote

I'm working on the idea that the symlink tag repository is only for files in the current directory. Even if it isn't, tagging a file in a sub dir requires updating the repos in all the above dirs, unless there's a single system wide repo. The problem is that with a system wide repo, tags don't move with files. At least with per dir repos, they'd move with directories.
_________________
Adopt an unanswered post today
Back to top
View user's profile Send private message
Dr.Willy
Guru
Guru


Joined: 15 Jul 2007
Posts: 491
Location: NRW, Germany

PostPosted: Wed Nov 16, 2011 4:25 pm    Post subject: Reply with quote

I still don't understand what kind of problem you see there and frankly I don't think there is one…
You init a repository in PWD and everything in PWD or one of it's subdirs will be stored in that repository. Kindof like git.
Moving a file requires an update of the symlinks.
Back to top
View user's profile Send private message
sirlark
Guru
Guru


Joined: 25 Oct 2004
Posts: 305
Location: Cape Town, South Africa

PostPosted: Thu Nov 17, 2011 9:31 am    Post subject: Reply with quote

Sorry, you're probably right. I was thinking that no matter what, you're always going to have to recurse somewhere, which is a performance hit. You were saying that tags->files using extattrs would be to slow, and my argument was that if you want to look up both ways (files->tags, tag->files) there will always be a performance hit going one way, the direction depending on the implementation. Given this, I figure the benefits of using extattrs (tags moving with files automatically) might outweigh the somewhat obviated performance penalty. However, the performance penalty can be on either read or write. With extattrs, it's on read, whereas with symlink repos it's on write. Penalty on write is much than than on read, considering the frequency of read vs write, in retrospect. I just didn't think it through. So yes, there is performance penalty for maintaining the symlink repo, especially if it involves all files in subdirectories as well, but that performance penalty comes into play far less often than having to read all tags of all files in a directory to list files by tag.

Also, you might want to check out https://code.google.com/p/htaggingolfs/
_________________
Adopt an unanswered post today
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Unsupported Software All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum