Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
HOWTO: Central Gentoo Mirror for your Internal Network
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2, 3, 4, 5, 6  Next  
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
freshy98
Apprentice
Apprentice


Joined: 11 Jul 2002
Posts: 274
Location: The Netherlands

PostPosted: Thu Apr 29, 2004 11:29 am    Post subject: Reply with quote

I could not get an index of the symlink, but adding
Code:
<Directory /var/www/localhost/htdocs/distfiles>
    Options Indexes FollowSymLinks MultiViews
    AllowOverride All
    <IfModule mod_access.c>
      Order allow,deny
      Allow from all
    </IfModule>
</Directory>

helped reaching it via lynx from a client machine.

I changed the /// to //. I did notice it but thought it was ok. I also corrected my own mistake :-)

Gonna try to see if it works now.
_________________
Mac Pro single quad 2.8GHz, 6GB RAM, 8800GT. MacBook. Plus way too many SUN/Cobatl/SGI and a lonely Alpha.
Back to top
View user's profile Send private message
freshy98
Apprentice
Apprentice


Joined: 11 Jul 2002
Posts: 274
Location: The Netherlands

PostPosted: Thu Apr 29, 2004 12:00 pm    Post subject: Reply with quote

It doesn't work yet.
I even added /dist to the commonanpache2.conf file using
Code:
<Directory /dist>
    Options -Indexes FollowSymLinks MultiViews
    AllowOverride All
    <IfModule mod_access.c>
      Order allow,deny
      Allow from all
    </IfModule>
</Directory>

so that apache can reach it.

or should I only set rights to the dir and file?
I looked at chown and chmod manpages but I can make nothing out of it on how to use it.

for me it seems that the .htaccess file can be reached, but that the perl script can't be executed because of file rights.
_________________
Mac Pro single quad 2.8GHz, 6GB RAM, 8800GT. MacBook. Plus way too many SUN/Cobatl/SGI and a lonely Alpha.
Back to top
View user's profile Send private message
linkfromthepast
n00b
n00b


Joined: 18 Mar 2004
Posts: 23

PostPosted: Thu Apr 29, 2004 3:04 pm    Post subject: Reply with quote

Be sure you have added a Script Alias in the apache config.

ScriptAlias /dist /var/www/localhost/htdocs/

This is from my config. I have mapped /var/www/localhost/htdocs/ to http://host/dist and my dist.pl is in /var/www/localhost/htdocs/.

It can get a bit confusing with all the path manipulation and settings.

1.) Make sure your script is setup properly (use it from the command line to verify it works)
2.) Make sure are directory permissions are correct
3.) Added ScriptAlias to apache commonapace.conf
4.) Move .htaccess and dist.pl to desired directories
5.) Test entire setup with web browser
6.) Change make.conf on client machines

To ensure the script is setup correctly, definately try it on the command line first before trying to use it through a browser.

Perhaps I should write a setup guide :)

Although if you were to think of what else the script could be used for there is:
1.) File download/serving
2.) Dynamic file creation/compilation
3.) Another one but I can't think of it right now :)

All of this through a web browser and by only changing the wget command.
Back to top
View user's profile Send private message
freshy98
Apprentice
Apprentice


Joined: 11 Jul 2002
Posts: 274
Location: The Netherlands

PostPosted: Thu Apr 29, 2004 3:43 pm    Post subject: Reply with quote

I have the ScriptAlias :-)

Alls path and stuff are indeed confusing, but I will try again.
I now can access my distfiles symlink in ../htdocs, so that problem is solved. Just dunno I can write there, though.

This "Order allow,deny" is it to be read as read,write allow, deny? If so, then I know how to make it writable :-)

If I get the mapping part right (third sentence in your last reply) then I should do
Code:
ScriptAlias /usr/portage/distfiles /var/www/localhost/htdocs/

in order to make a sort of symlink from /usr/portage/distfiles in ../htdocs?
I now have a ln -sf type symlink in ../htdocs.

I think I am now starting to understand the /dist part a bit better.

And a guide would indeed be great! including all permissions that need to be set.
_________________
Mac Pro single quad 2.8GHz, 6GB RAM, 8800GT. MacBook. Plus way too many SUN/Cobatl/SGI and a lonely Alpha.
Back to top
View user's profile Send private message
linkfromthepast
n00b
n00b


Joined: 18 Mar 2004
Posts: 23

PostPosted: Thu Apr 29, 2004 3:55 pm    Post subject: Reply with quote

As I have my distfiles in my path for the webservers, I never had to change permissions or symlinks because you don't need to store the files elsewhere. If you would like a machine to be a server and client, then set the make.conf gentoo server to http://localhost/distfiles and you don't have to do any symlinks at all.

I'll see about writing a guide for setup, but like I said as far as permissions are concerned I didn't have to change any besides making the script +x.
Back to top
View user's profile Send private message
freshy98
Apprentice
Apprentice


Joined: 11 Jul 2002
Posts: 274
Location: The Netherlands

PostPosted: Thu Apr 29, 2004 7:23 pm    Post subject: Reply with quote

I understand about the ysmlink now. I just made a real symlink using ln -sf to create one. gonna remove that one and use the apache way of making a symlink.

was gonna do that tonight but things came up. gonna do that tomorrow.
_________________
Mac Pro single quad 2.8GHz, 6GB RAM, 8800GT. MacBook. Plus way too many SUN/Cobatl/SGI and a lonely Alpha.
Back to top
View user's profile Send private message
freshy98
Apprentice
Apprentice


Joined: 11 Jul 2002
Posts: 274
Location: The Netherlands

PostPosted: Sat May 01, 2004 12:18 pm    Post subject: Reply with quote

I am really getting pissed off by now.
Tried about everything and I still can not access http://ip-address/distfiles/ even with a index.html in it. placing a index.html there gives me a error 500, without just a 403.

I do not get it! Setting it up the same way as the apache manual does not work either. /var/www/localhost/htdocs/manual is a symlink to /usr/share/doc/apache-2.0.49/manual.
It has root on it's dir's and files which is the same to /usr/portage/distfiles/index.html but still I can not access it.

getting really sick of it!

[edit]removing the script alias allowed me to open the index.html properly. I noticed that manual has no script alias too ;-). after removing the index.html I could not longer access /distfiles 'cause of a 403. which is normal I guess since there is no index.html anymore[/edit]

[edit2]I can now also download a file after I entered the filename behind /behind/. so if everything is ok, I should be able to write there too[/edit2]
_________________
Mac Pro single quad 2.8GHz, 6GB RAM, 8800GT. MacBook. Plus way too many SUN/Cobatl/SGI and a lonely Alpha.
Back to top
View user's profile Send private message
linkfromthepast
n00b
n00b


Joined: 18 Mar 2004
Posts: 23

PostPosted: Wed May 05, 2004 4:01 pm    Post subject: Reply with quote

Any progress?
Back to top
View user's profile Send private message
freshy98
Apprentice
Apprentice


Joined: 11 Jul 2002
Posts: 274
Location: The Netherlands

PostPosted: Wed May 05, 2004 4:12 pm    Post subject: Reply with quote

not really. client's still lockup (console is) after I do a emerge -f for example.
I tried lot's of things, but no succes..dunno why.
kinda out of idea's right now
_________________
Mac Pro single quad 2.8GHz, 6GB RAM, 8800GT. MacBook. Plus way too many SUN/Cobatl/SGI and a lonely Alpha.
Back to top
View user's profile Send private message
linkfromthepast
n00b
n00b


Joined: 18 Mar 2004
Posts: 23

PostPosted: Wed May 05, 2004 4:31 pm    Post subject: Reply with quote

Have you tried downloading the file with a web-browser?
Back to top
View user's profile Send private message
freshy98
Apprentice
Apprentice


Joined: 11 Jul 2002
Posts: 274
Location: The Netherlands

PostPosted: Wed May 05, 2004 4:40 pm    Post subject: Reply with quote

yeah. that was no problem.
_________________
Mac Pro single quad 2.8GHz, 6GB RAM, 8800GT. MacBook. Plus way too many SUN/Cobatl/SGI and a lonely Alpha.
Back to top
View user's profile Send private message
linkfromthepast
n00b
n00b


Joined: 18 Mar 2004
Posts: 23

PostPosted: Thu May 06, 2004 2:25 pm    Post subject: Reply with quote

So when you try to download a file with a web browser, the mirror goes downloads the file and redirects the web browser so you can download it? It might be a problem with your version of wget, try updating.
Back to top
View user's profile Send private message
freshy98
Apprentice
Apprentice


Joined: 11 Jul 2002
Posts: 274
Location: The Netherlands

PostPosted: Fri May 07, 2004 2:34 pm    Post subject: Reply with quote

I haven't tried that one yet. I only tried to download a file that was there allready. gonna try it tonight.
_________________
Mac Pro single quad 2.8GHz, 6GB RAM, 8800GT. MacBook. Plus way too many SUN/Cobatl/SGI and a lonely Alpha.
Back to top
View user's profile Send private message
Moriah
Advocate
Advocate


Joined: 27 Mar 2004
Posts: 2118
Location: Kentucky

PostPosted: Sun May 16, 2004 2:16 am    Post subject: need configuration control over my local mirror/cache Reply with quote

All this caching, mirroring, and proxying is great, but it overlooks one critical problem: local configuration control.

Recently, in April 2004, the protage tree on the public mirrors suffered a bad case of bit-rot. It became impossible to do emerge operations, either sync or packages, using the public mirrors. If these nice automatically updating proxies/mirrors/caches blindly went out and grabbed the latest versions of anything referenced by the client machines that they serve, then the tree sickness would be propagated, and nothing would build on the local lan that the local sysadm is personally responsible for. This is unacceptable!

I would like to see a way to manually invoke an emerge sync on a portage tree that was in an archive, and be able to easily revert back to the pre-emerge-sync if need be. Ditto for emerging packages, etc.

With LVM comes the ability to make a snapshot of a volume. Could this be combined with these other methods, such as nfs, to protect the original tree from disasters?

Another idea also comes to mind. I have been using rsync since November 2003 to perform automatic nightly backups of all my machines over the network to a backup server with a big RAID storage system. The idea comes from the O'Rielly book "Linux Server Hacks" p 78. It uses rsync to do a nightly incremental backup to a special directory where backups are mirrored from the original client machine. Once the rsync operation has brought the mirror up to date, the entire mirror tree is snapshotted with cp -al $mirror $temp. This makes a hard-linked copy of the mirror tree. When this copy is finished, it is moved into position by a mv $temp $visible so that it becomes visible on a directory tree that is NFS exported read-only to the clients. Each client can then see backwards in time for a fair number of days to view the state of its entire filesystem as it existed on that day at the time of the rsync. The permissions of the rsynced and hard-link-copied files are preserved, so a user cannot see anything he would not have been able to see anyway.

Although I am not yet using LVM and snapshots to freeze the client's filesystem prior to the rsync operation, I plan to implement that just as soon as I get all the local boxes running genrtoo with LVM. It would be nice if the 2.6.5 kernel with LVM-2 supported the snapshot operation, but alas, last I heard, that was not yet working, so I have standardized on the 2.4.25 kernel and LVM-1 instead. I am being forced to use the 2.6.5 kernel for the IPSEC tunnelling server, so that I can support NAT traversal, and likewise for a client to test it with. Those 2 machines will just have to suffer without a snapshot to freeze them during rsync backups. The rsync occurs in the middle of the night, but that does not always mean the machines will be idle then.

It seems to me that a similar strategy could be used to manually emerge sync against a mirror directory, then do the hard-link copy snapshot thing to make the copy visible to the rest of the clients. This would have the advantage that the new view of the portage and distribution trees would not be released to the clients on the lan until after it had passed muster by whatever quality control and configuration management procedures your organization requires.

The problem of automatically fetching a bad copy of something and breaking the portage tree for everybody is now solved by not releasing the updated version until after it has passed local testing. The disadvantage of not being able to automatically fetch a missing file becomes an advantage, since you are trying to manage and control what is available to the clients, so you know that they all conform to local policy requirements. Remember that the previous, known-good even if slightly out-of-date version of the tree is still available to all your lan clients, so you are not crippling them during the updating and testing process.

The desire is to make an NFS retrievable version of everything be free from race conditions, and at the same time exercise some control over the mirroring process, so that you can go back to an earlier version of the mirrored trees if you have to. Of course, you could also serve it via http of ftp or whatever you like as well, or instead. You could even rsync a copy of it to a local lan client's own disk, and then go out to a public mirror to fetch something for testing prior to including it in the next configuration controlled mirror of everything.

Remember: the hard-link copy operation makes a snapshot using a file-sharing approach. You do not have more than a single copy of any given file, provided that the file did not actually change from one rsync of the public mirrors to the next.

Has anybody tried anything like this?

If so, what techniques did you use to make sure that all needed packages were indeed captured in the mirror before the snapshot hard link copy oiperation was performed?

Also, since I have not yet set up any kind of local mirror and have no idea of how much filesystem space I should allow for it, how much disk space is prudent for a mirror, and how much is the "churn"? What percentage of the tree changes from day to day, or week to week? I need to know this also, as I plan to keep historical archives of the portage and distribution trees just like I am now doing for my lan client machine backups.
Back to top
View user's profile Send private message
viperlin
Veteran
Veteran


Joined: 15 Apr 2003
Posts: 1317
Location: UK

PostPosted: Sun May 16, 2004 3:55 pm    Post subject: Reply with quote

Moriah: will you be publishing that post in book form?
Back to top
View user's profile Send private message
Moriah
Advocate
Advocate


Joined: 27 Mar 2004
Posts: 2118
Location: Kentucky

PostPosted: Sun May 16, 2004 5:53 pm    Post subject: book form... Reply with quote

No, but if I ever implement it and get it working, I might publish it as an emergable package. :)
Back to top
View user's profile Send private message
Satori80
Tux's lil' helper
Tux's lil' helper


Joined: 24 Feb 2004
Posts: 137

PostPosted: Sun Jun 06, 2004 6:39 pm    Post subject: Reply with quote

GurliGebis wrote:
Well, my installation works like this:

I have the server serving /usr/portage/distfiles over nfs, so the clients mount it in their /usr/portage/distfiles . The clients fetches the distfiles from the webservers like normally, but since they all have /usr/portage/distfiles mounted from the server, the file only needs to be downloaded once.


I do this as well, and I rather enjoy this setup as the distfile host has way more disk space than it needs -- whereas my other machines always seem to have too little.

Quote:
The server also runs the rsync daemon, so the clients can rsync against it, and thereby save bandwidth.
The server rsyncs once a day.


I want to do this, but I'm not sure what's to prevent rsync from mirroring the entire distfile tree (I'm assuming yours doesn't)?

Basically, I think you are running the setup I'd like to have, but I don't understand how to set it up so it only grabs the disfiles I need as I emerge them.

Another point I wonder about is if I were to set it up per this howto's instructions (on top of the local portage tree in /usr/portage, rather than the default /opt/gentoo-rsync/portage) how does the hosting machine update its portage cache after an emerge sync?

I'd like to better understand these issues before I go and try it out. I've put enough time into moving these systems from other distros to gentoo, and I'd hate to hose one of them now. ;)
Back to top
View user's profile Send private message
GurliGebis
Retired Dev
Retired Dev


Joined: 08 Aug 2002
Posts: 509

PostPosted: Sun Jun 06, 2004 6:48 pm    Post subject: Reply with quote

Here is my /etc/rsync/rsyncd.conf :

Code:
#uid = nobody
#gid = nobody
use chroot = no
max connections = 10
pid file = /var/run/rsyncd.pid
motd file = /etc/rsync/rsyncd.motd
transfer logging = yes
log format = %t %a %m %f %b
syslog facility = local3
timeout = 300

#[gentoo-x86-portage]
#this entry is for compatibility
#path = /opt/gentoo-rsync/portage
#comment = Gentoo Linux Portage tree

[gentoo-portage]
#modern versions of portage use this entry
path = /usr/portage
comment = Gentoo Linux Portage tree mirror
exclude = distfiles


As you can see, distfiles is exluded :)
All you have to do is emerge gentoo-rsync-mirror, edit your /etc/rsync/rsyncd.conf and /etc/rsync/rsyncd.motd.
Then add rsyncd to your default runlevel, and start it.

happy emerge sync'ing :)

EDIT: After doing this, change RSYNC in make.conf on the clients to: rsync://server_ip/gentoo-portage
_________________
Queen Rocks.
Back to top
View user's profile Send private message
flybynite
l33t
l33t


Joined: 06 Dec 2002
Posts: 620

PostPosted: Mon Jun 07, 2004 3:48 am    Post subject: Reply with quote

I must say this is a great thread and is what got me started thinking along these lines, but...


I've created a complete system that includes both a local rsync server and a distfile proxy cache that is really simple and secure. I've had nothing but good reports about the speed and ease of setup.

I have complete ebuilds for both setups so the install is almost effortless.

The local rsync server:
https://forums.gentoo.org/viewtopic.php?t=180336

The distfile cache for gentoo:
https://forums.gentoo.org/viewtopic.php?t=173226

I've tried all the rest, and these two packages are by far the best system for users with a LAN!!!!! I even have full confidence these will work fine for large setups such as a university!!!

They are designed with speed and security in mind - Try them!!!
Back to top
View user's profile Send private message
Parasietje
Apprentice
Apprentice


Joined: 25 Jan 2004
Posts: 194

PostPosted: Tue Nov 02, 2004 6:54 pm    Post subject: Reply with quote

I used the following solution:

For the portage tree, I have an rsyncd running on my server. I emerge sync every 2 days using a cronjob on the server. All clients pull their portage tree from the server's rsyncd.

For the distfiles, I use an NFS share. Only when two clients attempt to download the same file, a problem will occur. That chance is small indeed.

However, to avoid this, you could add a special rule to your caching proxy to treat requests to gentoo packages differently. It chould cache them all in a different directory and not delete them. Run a cleanup script every now and then on this directory (i.e. when portage-2.0.47.tar.bz2 and portage-2.0.48.tar.bz2 exist, delete the old one).

This is, IMHO the best option. Server rsyncd and caching proxy for the distfiles.
Back to top
View user's profile Send private message
jkroon
Tux's lil' helper
Tux's lil' helper


Joined: 15 Oct 2003
Posts: 110
Location: South Africa

PostPosted: Wed Nov 03, 2004 9:54 pm    Post subject: Reply with quote

Right, I've seen a lot of solutions now, none of them security aware.

My scenario: Varsity setup where I pay through my neck for every single byte download (for those familiar with the South African rand, I'm paying R2/MB - approx $0.33 US).

The problem: All solutions presented so far allows arbitrary users to download *any* file via http. As such I have two basic requirements:

1) Files need to be restricted to actual portage distfiles.
2) Users need to be authenticated

For number 2 the apache solution can be usefull. However I would like it if any user can download already downloaded files and let the administrators be able to download anything.

For this reason I have concocted torpage (http://www.kroon.co.za/torpage.php). Whilst lacking in the streaming while downloading it does provide me with all my other requirements. I use it in conjunction with vsftpd at work (serving over 400 workstations, probably closer to 500).

Current features are basically as described above:

1) Restricts downloadable files to those referenced in the portage tree.
2) Optionally require user authentication.
3) Validate and check file integrity before returning success to client.
4) Can be easily modified to support SSL using ucspi-ssl instead of ucspi-tcp

Features I would like to add:

1) progress information from the underlying wget process.
2) acl to certain sections (ie, to disallow for downloading games ...)
3) restrictions based on the upstream server (this is low priority as it'll most likely require help from portage which is not there yet).
4) A need to pass USE flags along with request - should not be too difficult (the only package I've seen yet for which this is an issue is xorg-x11).
5) Restrictions based on the size of missing distfiles (again the damn bandwidth issue).

At the university I'm running this on a server which has access to the outside, it runs beautifully. It serves to downstream clients using ftp. It also provides rsync to clients as described in the first post of this thread.

At home I have a single machine that exports /usr/portage *and* /usr/local/portage ro via nfs, then makes use of torpage to fetch files. Unfortunately portage-2.0.51 has broken this by first checking whether /usr/portage/distfiles is ro before attempting to download ... (any fixes?).


And to TheQuickBrownFox - ah labs. Yes, the configs is a *huge* problem. I've had some people develop me a distributed portage like application. They have completed it now and it functions on top of portage (or something to that effect - I'm not clued in on all the details yet). Hopefully it will solve the labs problems for once and for all.
_________________
There are 10 kinds of people in the world,
those who understand binary and who don't
Back to top
View user's profile Send private message
bschnzl
n00b
n00b


Joined: 13 Mar 2005
Posts: 67

PostPosted: Mon Mar 21, 2005 4:19 am    Post subject: R-rud - Rsync reactive unattended downloader Reply with quote

Hi Folks...

First I would like to pay tribute to Grimthorn. That is why I am posting this here, after 4 plus months of inactivity. SALUTE!

Introducing R-rud – Proof of Concept Gentoo Central update server reactive unattented package retriever.

It accepts a filename as an argument. It will then parse this file for rsync attempts, and return the package name, or an error. Then it will look for the completion of the rsync transaction. If it finds it, it will tell you. It will also tell you it doesn't. Hopefully this will lead to reactive unattended retrieval of missing files on the Central Gentoo Server. The downloads are not yet implemented. I request feedback on the success of identifying which requests require downloading.

It will build a list of ebuild files from the “digest” lists in each file under /usr/portage. It will put this in /usr/portage/pkglist. It will do this only if a regular text file is not at /usr/portage/pkglist. This file is currently 2792 k on my system with an md5sum of 42fe85e0427518a176dfcc91f69b47e9. I believe it should be the same on every gentoo system, but I don't know, so I don't check. It will change tomorrow, but change is a general feature of the portage system. Assuring the pkglist file shouldn't be hard.

I am posting this for testing. I also hope it answers jkroon's post. Here's how:

First, programs are not security aware, they are well written. As my code is “proof of concept” it is not generally well written. I am no PERL guru. Feedback is appreciated.

To restrict files to portage, run rsync as nobody:nobody, and only give nobody access to the portage files. You might also do something like this (assume your network uses a 10.0.4.0/23 address space):
Code:
 iptables -A INPUT -s 10.0.4.0/23 -p tcp --dport 873 -m state --state NEW -j ACCEPT

This also assumes the policy on your INPUT chain is “DROP”, and you handle related and established state elsewhere.

IMHO Authenticating users in this case is overkill. These files are available on the open internet, via http. If they want them, they can use their browser. This is certainly up for discussion.

By using only rsync for gentoo updates, you are avoiding public protocols like http. You shouldn't have to be a webmaster to run a gentoo update server. Rsync is simple. Simple protocols are easier to track, and thus, secure. While I am in this vein, using NFS for this is just plain wrong. The protocol is unwieldy, it begs for central user management, and the write issues are unmanageable. Web-caching techniques require a) an additional server of some sort, and b) http.

jkroon's feature requests would not be difficult to elegantly implement.

1)This is meant to be a background process to emerge(1). emerge already has generally apparent progress indications.
2)Games and the like can be protected by filesystem acls.
3)Given that portage mirrors sync every thirty minutes, upstream host restrictions should not be needed. Regardless, this will reduce the monitoring to one host. Are you watching your logs?
4)USE flags are managed on the local host. If you want to distribute them, make an ebuild with only make.conf, and post it on the appropriate server instance. This is outside the scope of the update server. It is a general admin task. Again, easier said than done, but the path is there.
5)I believe rsync can handle file size restrictions. If it can't, it shouldn't be too hard to add it to this.

But alas, pontificating over proof of concept code is spending before the goose lays the golden egg. Hopefully I will have time to see this to completion.

Good then... let's move on. Here's da code. It is meant to run as root.

Code:

#!/usr/bin/perl

#
#  This was written by Bill Scherr IV
#  It is released under the GNU Public Licence
#  as found at http://www.gnu.org/licenses/gpl.txt
#  With all its warnings and benefits...
#
#  PERL code to return package name from
#  log entry in rsync logs.
#
#  1) Takes an argument of a rsync log file
#  2) obtains the filename from that line
#  3) guesses and returns the package name or an error
#  4) Determines if file exists
#  5) Obtains missing files
#
#
#  This should be cron'd to restart after every sync. 
#  It should also tail the log file, but this is for
#  proof of concept.

# require File::Temp;
use IO::File;
use File::Temp (tmpnam);

#  Accept a filename from the command line (1)
$target = shift;

$pkgfile = metafind();

#  Initialize a counter (although I don't think I'll need it)
$counter = 0;

print "looking thru $target\n";
print "referring to $pkgfile\n\n";

#  Make sure we are only looking through text
if ( -T $target ) {
       
        #  Initialize Log Line and Success Checker
        my $inline = "";
        $success = "";

        #  Open our checked file!
        $LOGLINE = IO::File->new("< $target") || die "Unable to open $target: $!\n";
        while (defined($inline = $LOGLINE->getline())) {
               
                #  Cut the stuff we don't need to look at
                next if $inline !~ m/rsyncd/;
               
                #  A sign of a successful update; check further
                if ( $found ne "" ) {
                        $getit = success($inline,$success,$found) if ( $inline =~ m/distfiles / );
                        dostuff($found,$pkgpath) if ( $getit ne "gotit" );
               
                #  The Buck Stops Here!
                        $success = "";
                        $found = "";
                        $pkgpath = "";
                        $getit = "";
                }
                       
                # Grab package for success() checker
                $success = $inline;
               
                # print "$success $inline\n" if $inline =~ m/distfiles/;
                #  More cutting
                next if $inline !~ m/distfiles\//;

                #  Obtain a filename...
                chomp($inline);
               
                # remove the initial cruft...
                $inline =~ s/\S+\s+\d+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+distfiles\///;
                # remove the ending junk...             
                $inline =~ s/from\s+\S+\s+\S+\d+\)//;

                #  Initialize Result holder
                $pkgpath = "";

                # Obtain and output package name (2)
                $PKGNAMES = IO::File->new("< $pkgfile") || die "Could not open package list for reading: $!\n";
                while (defined($herestuff = $PKGNAMES->getline())) {
                       
                        if ( $herestuff =~ m/^\/usr/) {
                                $pkgpath = $herestuff;
                                next;
                        } elsif ( $herestuff =~ m/$inline/ ) {
                                $pkgpath =~ s/\/usr\/portage\///;
                                $pkgpath =~ s/\/files\/digest\S+//;
                                $herestuff =~ s/\S+\s+\S+\s//;
                                $herestuff =~ s/\s+\d+//;
                                $found = $herestuff;
                                print "-------\nfound $found \nfrom $pkgpath \n";
                                last;
                        } else {
                                # clean up for the next run!
                                $herestuff = "";
                                next;
                        }
                }
                if ( $found eq "" ) {
                        print "-------\nDid not find $inline\n";
                }
                # clean up for the next run!
                $inline = "";
        }
}

sub metafind
{
        # On my system, this generated a 3773K file...
        #
        # This will build a list of all the ebuild files
        # on the Gentoo System...
        #
        print "Building package list!\n\n";
        $test2 = "/usr/portage/pkglist";
       
        unless ( -T $test2 ) {
                $templist =  tmpnam();
               
                unlink $test2;
                my $CMD = "find /usr/portage/ -name digest\\* -exec ls {} >> $templist \\; -exec cat {} >> $templist \\;";
                system($CMD);
                open(OUTLIST, "> $test2") || die "Could not open package list for writing: $!\n";
                open(NEWLIST, "< $templist") || die "Could not open new package list for reading: $!\n";
                        while (<NEWLIST>) {
                                print OUTLIST $_;
                        }
                close(NEWLIST);
                close(OUTLIST);
                unlink $templist;
        }
        return $test2;
}

sub success
{
        # Determine the success of an rsync operation from a log entry
        my ($logdret,$logdreq,$pkgreq) = @_;
        my $BINGO = "";
       
        # get daemon[pid] from log entry
        $logdret =~ s/\S+\s+\d+\s+\S+\s+\S+\s//;
        $logdret =~ s/\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+//;
        $logdreq =~ s/\S+\s+\d+\s+\S+\s+\S+\s//;
        $logdreq =~ s/\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+//;
       
        #  print "$logdret\n$logdreq\n\n";
        if ( $logdret ne $logdreq ) {
                $BINGO = "donthaveit";
        } else {
               
                $logdret =~ s/\[\d+\]\://;
                # print "$logdret\n";
                if ( $logdret ne "rsyncd" ) {
                        $BINGO = "gotit";
                } else {
                        $BINGO = "donthaveit";
                }
       
        }
        print "You have transferred the $pkgreq package successfully\n\n" if ( $BINGO eq "gotit" );
       
        #       print "$BINGO\n";
        return $BINGO;
        #
}

sub dostuff
{
        # Do stuff
       
        my ($pkgname,$category) = @_;
        # $pkgpath =~ s/\S+\///;
        # my $emchek = "emerge -p $pkgpath";
        # my $emfind .= system($emchek);
        # $emfind =~ s/\S+\s//;
        print "We should download $category-$pkgname now:\n";
}


And here is a shell script that removes the package list, sync's, and rebuilds the package list.

Code:

#!/bin/sh
#
#  Shell script to do stuff associated with emerge on a central portage server
#  Bill Scherr IV
#
#  released under the GNU Public Licence
#  as found at http://www.gnu.org/licenses/gpl.txt
#  With all its warnings and benefits...
#
#  Let's go...

PATH=/usr/bin

EMERGE_BIN=`which emerge`
PKG_LIST="/usr/portage/pkglist"

if [ ! -x $EMERGE_BIN ]
then
        echo "can't find emerge, stopping"
        exit 1
fi

if [ -e $PKG_LIST ]
then
        rm -f $PKG_LIST
fi

$EMERGE_BIN sync

/usr/bin/find /usr/portage/ -name digest\* -exec ls {} >> $PKG_LIST \; -exec cat {} >> $PKG_LIST \;

exit 0


Obviously, this still requires some work. I believe the concept is ready for testing.

Enjoy!
_________________
Persistance Pays
Back to top
View user's profile Send private message
jkroon
Tux's lil' helper
Tux's lil' helper


Joined: 15 Oct 2003
Posts: 110
Location: South Africa

PostPosted: Mon Mar 21, 2005 9:54 am    Post subject: Re: R-rud - Rsync reactive unattended downloader Reply with quote

bschnzl wrote:
First, programs are not security aware, they are well written. As my code is “proof of concept” it is not generally well written. I am no PERL guru. Feedback is appreciated.


Aye, this is true, to a certain extent. Any security aware application should be well written, but not any well written program is security aware.

Quote:
To restrict files to portage, run rsync as nobody:nobody, and only give nobody access to the portage files. You might also do something like this (assume your network uses a 10.0.4.0/23 address space):
Code:
 iptables -A INPUT -s 10.0.4.0/23 -p tcp --dport 873 -m state --state NEW -j ACCEPT

This also assumes the policy on your INPUT chain is “DROP”, and you handle related and established state elsewhere.


Ok, this doesn't help where you don't have a clue who has what IP addresses and is sitting on a hostile network where is doesn't help to restrict to specific subnets (btw, I don't feel like selectively opening portage for 50 or so IPs over a class B network :)).

Quote:
IMHO Authenticating users in this case is overkill. These files are available on the open internet, via http. If they want them, they can use their browser. This is certainly up for discussion.


I truly wish that was the case. Unfortunately we pay for bandwidth - a lot - and we cannot afford to allow anyone to download on our account. By authenticating we can keep track of who downloads what, who initiates what downloads and in general do some basic accounting. Also, at the university where I've deployed this, a user cannot simply point his browser to the file and get it, since all students is restricted to using 100MB/year, or paying R2.00 per additional meg. That is a lot of money considering UK users pay about 30 pounds (approx R330) per month for unlimited ADSL access (iirc) compared to about R1000 (about 91 pounds) we pay per month for an ADSL line capped at 3GB.

Quote:
By using only rsync for gentoo updates, you are avoiding public protocols like http. You shouldn't have to be a webmaster to run a gentoo update server. Rsync is simple. Simple protocols are easier to track, and thus, secure. While I am in this vein, using NFS for this is just plain wrong. The protocol is unwieldy, it begs for central user management, and the write issues are unmanageable.


Ah, but not all distfiles are available via rsync. And in some cases only available on an international link from SA - which means that at home I'll get next to zero rate download rates (We have a 3GB per month cap on our ADSL - which only goes upto 512kbps anyway - after which all - especially international - traffic gets shaped like you wouldn't believe). There are however huge local ftp mirrors which we can use which still provides us with reasonable download rates (depending on the mirror anything from 3 or 4 KB/s right upto about 30KB/s). These mirrors don't provide rsync however - or at least, I haven't managed to locate one recently that does.

About the NFS, it's simply used to negate the need to emerge sync every machine seperately, and it's exported read-only. What is the biggest network you worked on yet? I'm not going to sync 400 machines seperately. No way.

Quote:
1)This is meant to be a background process to emerge(1). emerge already has generally apparent progress indications.


As is torpage - it hooks into the FETCH and RESUME commands. The problem however is streaming that progress indicator back. I have an idea to do this but just need to make sure that it'll always "do the right thing (tm)" without killing off the network.

Quote:
2)Games and the like can be protected by filesystem acls.


How? I want to protect the download, and since the download on the server will always happen as the same user which will require write access to /usr/portage/distfiles filesystems acls cannot do what I want - or at least, not in any way that I know of. Not unless you go and predictively create immutable empty files with all the filenames of all the games in /usr/portage/distfiles and that imho is a very ugly hack.

Quote:
3)Given that portage mirrors sync every thirty minutes, upstream host restrictions should not be needed. Regardless, this will reduce the monitoring to one host. Are you watching your logs?


I'm not in charge of our local rsync mirror - which btw, only syncs once a day since it isn't an official rsync mirror. And actually, restricting based on upstream ftp mirror does make sense, when your in an environment where bandwidth (esp international) is at a premium. Can't remember exactly what I wanted to do with this, but it can be usefull.

Quote:
4)USE flags are managed on the local host. If you want to distribute them, make an ebuild with only make.conf, and post it on the appropriate server instance. This is outside the scope of the update server. It is a general admin task. Again, easier said than done, but the path is there.


Not the way torpage is implemented where I tell the server that I'm looking for a specific package - it then initiates a emerge -f --nodeps =package-category/package-name-version, waits for it to terminate with success/failure and then reports back to the client. The new version I suspect will actually initiate the wget itself, merely using emerge -pf --nodeps to get a list of alternate mirrors (optional). It will use /usr/portage/mirrors to get a base list of mirrors. Additionally, each portage request will only fetch one file. I want a central torpage/distfiles server that can download for any client system - irrelevant of what the client's USE flags is. That was the orriginal problem and why I wanted to send the USE flags from the client to the server - so that the server could calculate which distfiles to download. I suspect you understood I wanted to force a set of use flags down on the clients, nope, I wanted to make the server flexible enough to handle all possible clients.

Quote:
5)I believe rsync can handle file size restrictions. If it can't, it shouldn't be too hard to add it to this.


Again the new version should be able to do this as I'm now looking at the digest files and only fetching one file at a time.

Jaco
_________________
There are 10 kinds of people in the world,
those who understand binary and who don't
Back to top
View user's profile Send private message
bschnzl
n00b
n00b


Joined: 13 Mar 2005
Posts: 67

PostPosted: Tue Mar 22, 2005 11:06 am    Post subject: Re: R-rud - Rsync reactive unattended downloader Reply with quote

Hi all...

My goal in posting R-rud was to get some help in testing, and maybe direction. The concept is to limit communications of the internal clients to the rsync server, as Grimthorn suggested. The client downloads are controlled by normal emerge configuration settings. The server would use emerge -f. Any protocol could be used to pull the files to the local mirror. Hopefully, this will be a drop-in to an already running automatically syncing server, as specified in Grimthorns Howto. The learning curve would be easy.

The command line specified file should be the rsync log. That file should match the rsyncd configuration. This location is configurable, or not, and effected by syslog.conf. For now, it is specified at run time. Non distfiles requests are filtered. As the distfiles are available from a specified rsync branch, as recommended in Grimthorn's Howto, the errors are limited to old packages. Those pointers should get updated by a later sync.

R-rud and the package list file could also be useful in equery, or emerge for identifying which file belongs to which package. I have looked through the reference list file, and noted that some files appear more than once. When they do, however, they usually have the same root package name. The code grabs this common package name, and sends it to the downloader. As you can see, the downloader, dostuff(), has some commented code that invokes emerge, with the local system parameters in place. Right now this code merely reports, it does not download. Other than generating a file list, no system changes will be made by running this code. If a similar file already exists, someone please clue me in.

Before I get in any more trouble, some other assumptions should be enumerated. First, judging from the mirror hosts I connect to, a single rsync server should be able to handle tens of thousands of internal updating clients. rsync is very lite, and appropriate for files that are distributed over the open internet. This server would not participate in any central user management. It would probably live on a semi-trusted screened subnet, as it connects to the internet automatically. A normal user would log in and become root to deal with maintenance issues. This would serve to reduce the target profile of this box. Keep it simple where you can!

Second, bandwidth between the updating clients and this server would be provided by the owning organization (i.e. an ethernet connection). I thought that was understood in deploying a local mirror. rsync logging contains size and ip address info, if accounting is an issue. Besides, rsync has some of the lowest overhead seen on any service.

Third, syncs are cron'd. I am using fcron, which uses roots crontab as the system crontab. Regular users are not given root! RPC is vulnerable regardless of how the filesystems are exported. It should not extend to systems that will connect to the internet without direct user initiation (yes, this includes SMB and NCP too). Configurations and network equipment can protect RPC from ip ranges, but what of boxes that connect using those RPC services? The real issue in dumping RPC is keeping all those distfiles on each local box. Are they really needed there? Do we fix RPC, or buy bigger drives? Which has a better chance of success, Right Now?

My understanding is that portage package sets are easily expanded. Thus, the possibility exists for insertion of individual files on the updating clients from the portage server. This would provide functionality on the order of ZenWorks or SMS. I have not tested this, but I don't see it being feasible on web-cache solutions.

Rsync was not my first choice. Of the choices offered by the community, rsync is the simplest. If someone really wants to do bad stuff, he a) probably won't be doing it from his own machine, nullifying the money barrier, and b) can find the stuff we are serving here on the network elsewhere. Besides, it's not like nobody (the user that rsyncd should run as) is a member of the portage group. Our systems are complex enough. Ya gotta keep it simple where you can.

Of course, currently, all R-rud does is issue a report on a logfile. I was hoping folks would run it and let me know if it missed any missed files, downloaded any files that were already there, or choked on problems. All I can do at this point is add you to the comments / README file!

Thanks for your help...
_________________
Persistance Pays
Back to top
View user's profile Send private message
yetano
n00b
n00b


Joined: 02 Mar 2005
Posts: 2

PostPosted: Tue Mar 22, 2005 8:25 pm    Post subject: Reply with quote

jkroon wrote:
Current features are basically as described above:

1) Restricts downloadable files to those referenced in the portage tree.
...

Features I would like to add:

...
2) acl to certain sections (ie, to disallow for downloading games ...)
...

What about RSYNC_EXCLUDEFROM (see make.conf(5))? This way the section isn't in the local portage tree at all, thus no file referenced there can be downloaded.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Goto page Previous  1, 2, 3, 4, 5, 6  Next
Page 4 of 6

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum