Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
source archives cleanup for distfiles - yes, another one
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
bunsen
Tux's lil' helper
Tux's lil' helper


Joined: 10 Aug 2003
Posts: 105

PostPosted: Thu Nov 24, 2005 8:30 am    Post subject: source archives cleanup for distfiles - yes, another one Reply with quote

This is my concoction for cleaning up source archives from /usr/portage/distfiles. Since writing it I've discovered there's a lot of dialogue in the forum and several scripts with similar purposes. What I've noticed is that support of LAN situations such as my own are not as well catered for, so I'm offering my method and notes. I don't expect it to suit everyone, nor go into a lengthy dialogue about it. Hopefully it will be good for some of you.

On my LAN there are several nodes running gentoo and they all use a common /usr/portage tree through NFS running on one of the nodes. That conserves my disc space and external network traffic. From my desktop computer I use ssh to get interactive shells on the others.

In cleaning out distfiles I have one objective: delete only the source archives that are not required by any currently installed package on any of the gentoo nodes. I leave CVS sources alone as I don't really understand what ought to be deleted, probably since I don't generally use them.

All the commands are to be found on a minimal system, so you should already have them: grep, sed, gawk, join, cat, find, sort and emerge.

I've made improvements since posting this and for clarity would prefer to update this post, but a part of it has been referenced elsewhere, at least by me, in relation to another problem, so I'm putting the new and improved version a couple of posts later. The original method seems to have missed a few sources and deleted them unecessarily.



  • Step 1. On each node, try to update all packages first with either

    Code:
    emerge -up world
    or
    Code:
    emerge -upD world


  • Step 2. On each node, check that all installed packages e-builds are still in portage. This is, for me, the curious part. In trying to identify the needed source archives, I found several installed packages that were not updated by emerge -upD world|system. The e-builds had been removed from portage. In all cases they were updated by just re-emerging the specific packages.
    Run this on each node:

    Code:
    ACCEPT_KEYWORDS="~x86" /usr/bin/emerge -p  2>&1 `find /var/db/pkg/* -mindepth 1 -maxdepth 1 |  sed 's/\/var\/db\/pkg\///' | gawk '{print "="$1}'`


    If any installed packages with deleted e-builds exist, this string of commands will report the offending package and version one at a time. Perhaps it's a flaw in portage. Maybe someone can enlighten me.

    You'll need to repeat this step until all the overlooked packages are up to date. When the familiar long list of e-builds is produced, you're ready for step 3.


  • Step 3. From each node, generate a list of required source archives and write it to /usr/portage/distfiles

    Code:
    ACCEPT_KEYWORDS="~x86" /usr/bin/emerge -pf 2>&1 `find /var/db/pkg/* -mindepth 1 -maxdepth 1 |  sed 's/\/var\/db\/pkg\///' | gawk '{print "="$1""}'` | gawk '{print $1}' | grep : | sed 's/^[a-zA-Z0-9:\/.\-]*\///' > /usr/portage/distfiles/sourcelist_for_host_$HOSTNAME


    This is the command string exposed the need for step 2.
    For every participating node, there'll now be a file called sourcelist_for_host_$HOSTNAME

  • Step 4. Consolidate the lists. This need only be done on one of the nodes, not necessarily the one running the NFS server. The two command strings concatenate the lists and then remove duplicate entries.

    Code:
    cat /usr/portage/distfiles/sourcelist_for_host_* > /usr/portage/distfiles/sourcelist_for_all_hosts
    sort -u -o /usr/portage/distfiles/sourcelist_required /usr/portage/distfiles/sourcelist_for_all_hosts



  • Step 5. Make a file listing all source archives currently in distfiles.

    Code:
    ls /usr/portage/distfiles | grep -v 'cvs-src' > /usr/portage/distfiles/sourcelist_available

    This step could probably be merged with the next by using a different command option, however this works well enough.


  • Step 6. List the files in distfiles that don't seem to be needed.


    Code:
    join -v 1 /usr/portage/distfiles/sourcelist_available /usr/portage/distfiles/sourcelist_required | less


    Notice that the list files generated during this procedure will be deleted too. That's good since I'm trying to get rid of unwanted stuff.

  • Step 7. If you're happy with the list, let rm at it.
    Code:
    rm `join -v 1 /usr/portage/distfiles/sourcelist_available /usr/portage/distfiles/sourcelist_required | gawk '{print "/usr/portage/distfiles/" $1}' `


    The command is basically the same as for Step 6. Apart from the obvious, the /usr/portage/distfiles path is prepended to every listed filename.

    By now, the clean up is theoretically done, but now is another opportunity to check for unexpected deletions.

  • Step 8. On all nodes, try fetching all source archives for all installed packages:

    Code:
    emerge -ef world


    I run this on one node at a time because of an unresolved file locking matter with NFS here. Hopefully nothing will be fetched. I've used this on two separate LANs such as described above. Some files were fetched afterward, but I can not be certain that I'd manually deleted them earlier.

  • Step 9. If you really want to be sure, run through it all again. This time it should all go through in a single pass and there should be no sources deleted or fetched.



The whole thing could be put into a script, but being a partially interactive process, I've chosen not to just yet and it's good to watch what's happening. So, instead of a script, I keep it all in a text file and copy command strings to my shell.

It might be nice to use an exclusion file for the occasional exceptions.

Lastly, I'd be interested in any comments from the portage developers regarding Step 2.


Last edited by bunsen on Fri Dec 16, 2005 6:20 am; edited 2 times in total
Back to top
View user's profile Send private message
hollowsoul
n00b
n00b


Joined: 29 Feb 2004
Posts: 31

PostPosted: Sat Dec 03, 2005 5:52 pm    Post subject: Reply with quote

Cant seem to get the above to work for me :/
No output to file for step3 only gettin (copy/pasted to gnome terminal) no luck :/
Only gettin the followin in file

emerge:

Last question, was wonderin if this keeps sourcefiles
that are not needed by any machine, but are currently installed???
Back to top
View user's profile Send private message
bunsen
Tux's lil' helper
Tux's lil' helper


Joined: 10 Aug 2003
Posts: 105

PostPosted: Sun Dec 04, 2005 2:36 am    Post subject: Reply with quote

If Step 2 produces an error, such as portage couldn't find a particular ebuild, then the error you report from step 3 is expected.

Step 2 must complete cleanly, i.e. there should be a long list of installed packages that would be re-built, e.g.

    [ebuild R ] x11-terms/aterm-0.4.2-r11
    [ebuild R ] x11-terms/xterm-204
    [ebuild R ] x11-themes/ethemes-0.16.7
    [ebuild R ] x11-themes/etheme-BrushedMetal-Tigert-0.16.7.1
    [ebuild R ] x11-themes/etheme-ShinyMetal-0.16.7.1
    [ebuild R ] x11-themes/etheme-Ganymede-0.16.7.1
    [ebuild R ] x11-themes/etheme-BlueSteel-0.16.7.1
    [ebuild R ] x11-wm/enlightenment-0.16.7.2


If instead Step 2 reports the non-existence of an ebuild for a specific version of an installed package , then step 3 WILL produce an almost empty file such as you said. Hence the last instruction of Step 2 ,which should direct you to Step 3 not 1 as I'm about to attempt to correct.
Back to top
View user's profile Send private message
bunsen
Tux's lil' helper
Tux's lil' helper


Joined: 10 Aug 2003
Posts: 105

PostPosted: Fri Dec 16, 2005 7:09 am    Post subject: Reply with quote

Improved method.
The original method worked reasonably well, but had a problem with some sources that had to be fetched manually. This improved method is similar but more thorough, still limited to stable and testing rated ebuilds, and still for the network configuration originally described.

The sole purpose of this procedure is to maintain a collection of source archives no more and no less than is required for any installed ebuild on any participating node.

I do this all as the root user. It now needs slocate


  • Step 0. Make sure the database for slocate is up to date.

  • Step 1. Run this on every participating node.
    Code:
    find /var/db/pkg/* -mindepth 2 -maxdepth 2 -name \*.\ebuild | sed 's/\// /g' | gawk '{print "/usr/bin/slocate " $6 " | grep /usr/portage | grep " $4}' > usr/portage/distfiles/ebuildlist_for_$HOSTNAME

    This finds ebuilds in /var/db/pkg and writes a batch of slocate commands to find the corresponding ebuilds in /usr/portage. It seems necessary to run the ebuilds from there rather than /var/db/pkg. Every participating node produces a named file in the common /usr/portage/distfiles directory.


  • Step 2. From one node, probably the NFS host that exports /usr/portage, but depends on your NFS configuration, run:
    Code:
    cat /usr/portage/distfiles/ebuildlist_for_* | sort -u -o /usr/portage/distfiles/ebuildlist_all_hosts

    to consolidate the lists.

  • Step 3. Use the consolidated ebuild search file:
    Code:
    /bin/bash /usr/portage/distfiles/ebuildlist_all_hosts | gawk '{print "/usr/bin/ebuild " $1 " fetch | grep src_uri"}' > /usr/portage/distfiles/ebuildfetchrun

    This runs the batch of commands in ebuildlist_all_hosts and in turn writes a new batch of commands that will obtain the names of the required source files.

  • Step 4. Now execute the result of the preceding step and filter it again to produce a single list of required sources.
    Code:
    /bin/bash /usr/portage/distfiles/ebuildfetchrun | gawk '{print $5}' > /usr/portage/distfiles/sourcelist_required

    This step takes a few minutes. It's running ebuild many times over.Note, it will try to fetch any files that ought to be here, if they aren't already.

  • Step 5. Make list of sources already on disc
    Code:
    ls /usr/portage/distfiles | grep -v 'cvs-src' > /usr/portage/distfiles/sourcelist_available

    This list is to be compared with the list of what's required.


  • Step 6. Sort the two lists to eliminate possible duplicates. The comparison seems to need this step.

    Code:
    sort -u -o /usr/portage/distfiles/sourcelist_required_sorted /usr/portage/distfiles/sourcelist_required
    sort -u -o /usr/portage/distfiles/sourcelist_available_sorted /usr/portage/distfiles/sourcelist_available



  • Step 7. List un-paired sources in the differences between /usr/portage/distfiles and consolidated list

    Code:
    join -v 1 /usr/portage/distfiles/sourcelist_available_sorted /usr/portage/distfiles/sourcelist_required_sorted | less

    You are advised to check the list of files this step produces. The next step will delete them, so this is a checkpoint. The files produced during this procedure will be deleted too.

  • Step 8. Delete the surplus:
    Code:
    rm `join -v 1 /usr/portage/distfiles/sourcelist_available_sorted /usr/portage/distfiles/sourcelist_required_sorted | gawk '{print "/usr/portage/distfiles/" $1}' `


I know there are a few things that could make this run faster and without so many intermediate files. Files can be good though - they provide a record and permit faster debugging (especially given the time taken for for step 4 to run.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum