Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
run multiple diskless clients from one master installation
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Installing Gentoo
View previous topic :: View next topic  
Author Message
mortonP
n00b
n00b


Joined: 22 Dec 2015
Posts: 34

PostPosted: Mon Dec 30, 2019 7:57 pm    Post subject: run multiple diskless clients from one master installation Reply with quote

TL;DR: A master Gentoo installation, exported via NFS to multiple diskless clients, how to replicate efficiently?

Long version: So after uncountable trys/reboots I finally have a master Gentoo installation on a central server and I can boot one diskless client from it. The docs available do help (https://wiki.gentoo.org/wiki/Nfs-utils , https://wiki.gentoo.org/wiki/Diskless_nodes , ...) but it took me multiple sources and many many reboots and guessing to get all the pieces working together.

So now I have on the master node:

Code:
$ cat /etc/exports
/diskless             *(insecure,rw,async,no_subtree_check,no_root_squash,no_all_squash,crossmnt,fsid=0)
/diskless/pc1         *(insecure,rw,async,no_subtree_check,no_root_squash,no_all_squash)


...and on the client as kernel params and fstab:
Code:
BOOT_IMAGE=/boot/kernel-5.4.6 ip=dhcp root=/dev/nfs nfsroot=192.168.1.1:/diskless/pc1

$ cat /etc/fstab
192.168.1.1:/pc1              /         nfs4             rw                  0 0


The installation on the master that is exported via NFS is a full filesystem. I chroot into it for package updates, the clients need only be for users.
Now I want to run not only one, but multiple clients from this master install and I'm not sure how to scale this properly.

So let's go through it, what does one client need?:

/dev /proc and /sys appear to client local already, as master nfs logs say:
Code:
rpc.mountd[15714]: Cannot export /diskless/pc1/dev, possibly unsupported filesystem or fsid= required
rpc.mountd[15714]: Cannot export /diskless/pc1/proc, possibly unsupported filesystem or fsid= required
rpc.mountd[15714]: Cannot export /diskless/pc1/sys, possibly unsupported filesystem or fsid= required


/boot - not needed on clients?, bootloader+kernel+initranfs are diskless
/bin - mount ro from master, no need for rw
/etc - mount ro from master, but maybe patching/override of certain files that are client instance specific is needed?
/home - mount rw from master
/lib & /lib64 - mount ro from master, no need for rw
/opt - currently empty, but I guess also ro
/root - not needed on clients?
/run - appears to be also an in-memory tmpfs
/tmp - should be fixed to be client instance specific, currently a global mount
/sbin - mount ro from master, no need for rw
/usr - mount ro from master, no need for rw
/var - seems to be the most complicated. Some subdirectories things ro, some rw client specific (/var/log)?

So the directories on the master should be probably something like /diskless/$IP/.... and within those some directories replicated via bind mounts and some unique directories per clients.... did that make sense?

Does anyone run such a setup and have some experiences on how to scale this up without massive replication on the master or bind mounts?
Any pointers to good docs for such a setup appreciated, for example a demo configuration for above directories - the Diskless_nodes Gentoo docs don't answer everything.

OpenRC-based installation, no systemd.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 45383
Location: 56N 3W

PostPosted: Mon Dec 30, 2019 8:18 pm    Post subject: Reply with quote

mortonP,

Its been a while.

Root (/) needs to be separate for each diskless node.
The machines identity is there. With /etc/mtab a symlink into /proc, root can be read only but the root user will have no read/write home dir.
You will need /root too as its the root users home dir.

As you say, /usr can be shared ro.
/var needs to be writable and per client unless you log to /dev/null, thats not a good idea while you are debugging.
You could put /var/log into tmpfs on the clients but logs would vanish on power off.

/tmp can be and usually is, tmpfs.

/sbin, /root, /etc and /opt are all free with root.
/opt can be shared ro if you want to but if its empty ...
Take care with /lib. it can fill up with junk.
Prune linux-firmware. Its over 512Mb on its own if you have it all. Modules for old kernels accumulate in /lib/modules too.
Keep root small.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
szatox
Veteran
Veteran


Joined: 27 Aug 2013
Posts: 1872

PostPosted: Mon Dec 30, 2019 9:08 pm    Post subject: Reply with quote

If you have a fast network and drives, you can keep using a readonly NFS all day long. If you have some spare RAM, you can boot over NFS, copy your rootfs to diskless machine's RAM and then detach from NFS freeing those resources for other tasks (or booting more nodes)
Either way, using a compressed FS helps. Really, it helps a lot, be it by reducing "wasted" RAM or saving 80% of the IO required to maintain a single diskless client, this is the low-hanging fruit that gives you a lot of bang for your buck, so just mksquashfs it as much as you can. (Note: squah it _once_ as much as you can, not squash it as many times as you can :lol: )

Network boot typically depends on DHCP anyway, so you can use it to assign IPs and hostnames to the diskless machines.
Going RAM-only, keeping root small helps a lot. You don't need portage tree, distfiles, packages, gcc etc in a readonly image anyway, you just create a new image and reboot instead of upgrading the running image.
Adding an overlayfs to the mix allows you turning a compressed, readonly image into a compressed FS that can take a few non-compressed and non-persistent writes. I'm not sure whether or nor it's possible to host the upper layer of overlayfs on NFS, it doesn't work with cephfs due to insufficient locking. Still, if you do need persistent logs, you would probably be better of with a centralized log collecting service (fed with logstash, rsyslog or something like that).
Back to top
View user's profile Send private message
mortonP
n00b
n00b


Joined: 22 Dec 2015
Posts: 34

PostPosted: Tue Dec 31, 2019 11:30 pm    Post subject: Reply with quote

Thank you for the suggestions.
Been playing and investigating a little bit more.

The initial motivation to setup all diskless was to keep multiple clients up to date easily. I had no previous experience with NFS and I learned a lot about NFS and the boot process in this effort. Now I'm not so sure any more. One client is easy; keeping multiple client roots at the master synchronized is harder, especially also considering updates if all run from one master chroot, then all have to update at the same time. All files on NFS is also not that fast. It's kinda ok, but....

The squashfs solution looks interesting. Root is about ~2GB in size, so with compression maybe about 1GB. Downloading ~1GB root over a 1G network is ok, a few seconds. What I have not yet understood, the initramfs would access such a squashfs image from /dev/nfs/$IP/squashfs.img and in the initramfs this would then be mounted as root and everything else layered on top -- but how, the initramfs would need to keep an in-memory copy "somewhere below" the root then? I'm confused how this is possible (with a squashfs from e.g. a CD or USB boot stick the image is always accessible from /dev/sdX?)

Thinking further, I guess the tradeoffs still suggest to have a small local disk for root: Upon boot initramfs checks that a /dev/nfs/$IP/timestamp is equal /dev/disk/by-label/root/timestamp. If not, wipe local root filesystem and replace with a fresh copy from /dev/nfs/rootfs.tar.xz and reboot. If yes then the local root is up to date, switch to this root and only mount /home /root and /var from NFS. Bonus points: This also solves the update problem, as clients keep local root copies the master can update any time.

Surely someone has thought of something like this already? Where to insert such a custom update script in Gentoo's initramfs?

Happy new year! :-)
Back to top
View user's profile Send private message
szatox
Veteran
Veteran


Joined: 27 Aug 2013
Posts: 1872

PostPosted: Wed Jan 01, 2020 11:19 am    Post subject: Reply with quote

mortonP, I have created my own initramfs to suit my needs. It's actually a pretty good exercise of it's own.*
Now, with NFS-shared squashfs you can use genkernel. It does support liveCD mode and allows for pxe boot. A few years ago it just worked.
You can do a rolling update by creating a new sqfs and updating bootloader's configuration (usualy pxelinux),

The other way, that detaches from NFS copies all data into initramfs, creates the storage stack with overlayfs (lowerdir with mounted sqfs, upperdir for changes, workdir for overlayfs inner working and the mountpoint), mounts /dev, /proc, and /sys on top of overlayfs mountpoint and executes switch_root into the new mountpoint effectively losing any mapping to the original root ( you can think of it as an intentional memory leak)
After playing with this setup for a while I think pivot_root would be a better option than switch_root, but since it was just a toy for me, and I was busy with other matters I've never patched that init script inside my custom initramfs.


* it's EASY. Start with just busybox in a cpio archive and /init file containing just one command: /bin/sh. Set things up manually, and write down your commands, so you can put them in a script you will use with a second version.
Back to top
View user's profile Send private message
mortonP
n00b
n00b


Joined: 22 Dec 2015
Posts: 34

PostPosted: Wed Jan 01, 2020 11:25 pm    Post subject: Reply with quote

I think an initramfs is actually not needed. My argument is like this:
In a boring, normal installation with a local harddisk with ext4 partition you pass root=/dev/foo to the kernel and the kernel starts init from this partition and everything is fine (last time I tried, some time ago). This requires of course that hardware and fs drivers are compiled into the kernel. You do need an initramfs for the fancy stuff, encrypted partitions, LVM, raid, boot splash image, etc.
A single kernel image would also make EFI stub or PXE booting simpler.

I have already compiled all required drivers into the kernel image (I forgot the kernel modules to integrate and the system still came up to login prompt...). There is only one reason why I need an initramfs, the Realtek network driver needs a firmware blob. So I would need only an initramfs with /lib/firmware, attached to the kernel image, and this image would boot directly from the partition root=/dev/nfs, just like a normal system. And none of all the other scripts in initramfs are actually needed?

Ok, the system boots, looks into /etc/fstab and mounts / rw from the path specified there. I verified this, patching "ro" for / in fstab makes the startup scripts complain because, well, everything is read-only, but it does come up.
So /etc/init.d/root needs to be improved/replaced: "Am I booted diskless?" If yes, assemble / differently.

Either I reverse engineer genkernel, or do you have example code for how to assemble a layered filesystem stack?
Back to top
View user's profile Send private message
Ionen
l33t
l33t


Joined: 06 Dec 2018
Posts: 691

PostPosted: Wed Jan 01, 2020 11:46 pm    Post subject: Reply with quote

mortonP wrote:
There is only one reason why I need an initramfs, the Realtek network driver needs a firmware blob.
You can buildin firmware files the kernel at build time. I have this myself for example:
Code:
# Firmware loader
#
CONFIG_FW_LOADER=y
CONFIG_EXTRA_FIRMWARE="rtl_nic/rtl8168h-2.fw intel/ibt-17-16-1.sfi intel/ibt-17-16-1.ddc"
CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"
If it's something that's gonna be loaded every boot anyway, I feel there's no major reason not to on a non-generic kernel. Microcode can be loaded in a similar fashion if wanted.
Back to top
View user's profile Send private message
mortonP
n00b
n00b


Joined: 22 Dec 2015
Posts: 34

PostPosted: Thu Jan 02, 2020 11:20 am    Post subject: Reply with quote

I embedded the firmware in the kernel - thank you for the howto - and as expected, the system boots via nfs without an initramfs! and much faster! :-)
https://i.imgur.com/lJotKFb.png

So the only open problem to solve for multiple clients is to mount a rw tmpfs on top of the ro / provided via NFS, and /home rw separately (just a fstab entry?)

Setting / to ro in fstab results in: https://i.imgur.com/4gUHlbN.png
The USB stick used for boot is no longer detected :-( ... too fast boot to let him settle, or caused by ro fs?
But anything until "remounting root fs read/write" "remounting filesystems" seems to be working.

So my idea of solely patching /etc/init.d/root still looks to be the correct approach...?
Back to top
View user's profile Send private message
mortonP
n00b
n00b


Joined: 22 Dec 2015
Posts: 34

PostPosted: Thu Jan 02, 2020 1:29 pm    Post subject: Reply with quote

putting the rw mounts in fstab seems to be working at first try:
Code:
192.168.1.1:/master             /         nfs4             ro                  0 0
192.168.1.1:/master/home        /home     nfs4             rw                  0 0
192.168.1.1:/master/root        /root     nfs4             rw                  0 0
192.168.1.1:/master/var         /var      nfs4             rw                  0 0


This needs "netmount" to move from runlevel "default" to "boot".
But "netmount" depends on "nfsclient".
"nfsclient" depends on [...] network available, and that triggers dhcpcd.
But dhcpcd complains because /var/lib/dhcpcd/* at that point in time is ro because netmount is not done, cannot write the lease info.
And a proper dhcp client running is needed, the IP leased by the kernel expires and needs to be renewed after some time.

So we have a chicken and egg problem here,
just write it in fstab would have been too easy...
Back to top
View user's profile Send private message
szatox
Veteran
Veteran


Joined: 27 Aug 2013
Posts: 1872

PostPosted: Thu Jan 02, 2020 7:10 pm    Post subject: Reply with quote

Well.... Actually I do have some code. It's old, somewhat ugly, could probably use some cleanup, but it should still work.

Things needed to get a bootable initramfs with emerge:
https://forums.gentoo.org/viewtopic-p-7931972.html#7931972
/init inside initramfs
Code:

# cat initramfs/init
#! /bin/sh

parse_params () {
        for OPT in  "${@}"
        do
                case $OPT in
                *.*=* ) continue ;;
                * )
                        VNAME="${OPT%%=*}"
                        VVAL="${OPT#*=}"
                        echo result: "${VNAME}" "${VVAL}"
                        export "${VNAME}"="${VVAL}"
                ;;
                esac
        done
}


get_nics (){
        ip -o a | awk '{print $2}' | egrep -v ':|^lo[[:digit:]]*$' |   while read line
        do
                echo "${line}"
        done
}


main () {
NEW_ROOT="/mnt/newroot"
NEW_OVERLAY="/mnt/overlays/root"
OLD_OVERLAY="/mnt/overlay"


        mount -t proc proc /proc
        mount -t devtmpfs devtmpfs /dev/
        mount -t sysfs sysfs /sys
        echo "mounted kernel interfaces"
        mount
        echo "parse cmd line"
        parse_params $(cat /proc/cmdline)
        if [ -n "$ip" ]
        then    echo configuring network ; sleep 5
                case $ip in
                        dhcp ) for NIC in $(get_nics)
                                do echo requesting IP for $NIC
                                        echo "### trying to get IP address for $NIC"
                                        udhcpc -q -i $NIC #; sleep 2
                                        ip a show dev $NIC
                                done ;;
                        * ) echo static IP unimplemented ;;

                esac
        fi

        echo creating mountpoints in /mnt
        for DIR in cdrom livecd overlay newroot
        do mkdir -p "/mnt/${DIR}"
        done
        mount -t tmpfs -o size=20% tmpfs /mnt/overlay
        for dir in upper work cdrom livecd newroot
        do      mkdir -p "${OLD_OVERLAY}/${dir}" && echo "created: ${dir}"
        done
        if [ -n "${nfsroot}" ] && [ -n "${loop}" ]
        then    echo mount nfsroot
                mkdir -p /mnt/nfs/
                mount -t nfs -o nolock "${nfsroot}" "/mnt/nfs" && echo nfs ok
                cp "/mnt/nfs/${loop}" "${OLD_OVERLAY}/cdrom" && echo copy OK
                umount /mnt/nfs && echo umount nfs ok
                mount -o loop "${OLD_OVERLAY}/cdrom/${loop}" "${OLD_OVERLAY}/livecd" && echo loop ok
                mount -t overlay overlay -o "lowerdir=${OLD_OVERLAY}/livecd,upperdir=${OLD_OVERLAY}/upper,workdir=${OLD_OVERLAY}/work" "${NEW_ROOT}" && echo overlay ok
        fi
        mkdir -p "${NEW_ROOT}/${NEW_OVERLAY}"
        mount -o move "${OLD_OVERLAY}" "${NEW_ROOT}/${NEW_OVERLAY}"

        if [[ "$debug" == true ]]
        then /bin/sh
        fi
        exec switch_root "${NEW_ROOT}" /sbin/init

}

main "$@"

A trick that pulls your hostname out of thin air (well, one way to do that, could also be done with dhcp options instead)
Code:
# cat livepxe.sqfs.mpt/etc/conf.d/hostname
hostname="livepxe-$( ip -o a | sed -n -e '/inet[[:blank:]]127/ ! { s/^.*inet[[:blank:]]*\([[:digit:]]\{1,3\}\.\)\{3\}\([[:digit:]]*\).*$/\2/p }' | head -n1)"


And boot options
Code:
ip=dhcp root=/dev/ram0 cdroot=1 real_root=/dev/nfs nfsroot=10.0.0.1:/exports/pxelive/ initrd=gentoo-amd64/initramfs loop=livepxe.sqfs looptype=squashfs net.ifnames=0


As mentioned earlier, pivot_root would probably be superior to switch_root, but I was too lazy to upgrade that part.
Back to top
View user's profile Send private message
mortonP
n00b
n00b


Joined: 22 Dec 2015
Posts: 34

PostPosted: Fri Jan 03, 2020 11:16 pm    Post subject: Reply with quote

I think I did it :-)

Self-contained kernel image, with embedded boot options:

Code:
init=/init.diskless ip=dhcp root=/dev/nfs nfsroot=192.168.1.1:/master,tcp,vers=4.2


On server a chroot install in /diskless/master exported via NFS:

Code:
# /etc/exports: NFS file systems being exported.  See exports(5).
/diskless               *(insecure,rw,async,no_subtree_check,no_root_squash,no_all_squash,crossmnt,fsid=0)
/diskless/master        *(insecure,rw,async,no_subtree_check,no_root_squash,no_all_squash)
/diskless/master/home   *(insecure,rw,async,no_subtree_check,no_root_squash,no_all_squash)
/diskless/master/root   *(insecure,rw,async,no_subtree_check,no_root_squash,no_all_squash)


/etc/fstab for clients:

Code:

192.168.1.1:/master/home        /home     nfs4             rw                  0 0
192.168.1.1:/master/root        /root     nfs4             rw                  0 0


resulting after successful client boot in:
Code:
Filesystem                 Size  Used Avail Use% Mounted on
overlay                    1,6G  1,1M  1,6G   1% /
devtmpfs                    10M     0   10M   0% /dev
shm                        3,9G     0  3,9G   0% /dev/shm
192.168.1.1:/master/home   834G   39G  796G   5% /home
192.168.1.1:/master/root   834G   39G  796G   5% /root
tmpfs                      784M  460K  784M   1% /run


these should be dropped at end of boot with umount -l /mnt/old_root/
Code:
192.168.1.1:/master        834G   39G  796G   5% /mnt/old_root
tmpfs                      1,6G  1,1M  1,6G   1% /mnt/old_root/mnt


and the custom /init.diskless that does the magic is:

Code:
#!/bin/sh

echo "=== diskless init"
mount -t proc proc /proc
# mount -t devtmpfs devtmpfs /dev/  # already mounted?
mount -t sysfs sysfs /sys

HOSTNAME=$(/bin/hostname)
HOSTIP=$(/bin/hostname -i)
echo "= ${HOSTNAME} at ${HOSTIP}"

echo "= prepare mountpoints..."
mount -t tmpfs -o size=20% tmpfs /mnt
mkdir -p /mnt/upper
mkdir -p /mnt/work
mkdir -p /mnt/newroot

echo "= merge layers..."  # Note: NFS lowerdir fs must be "noacl" mounted at host for NFSv4!
mount -t overlay overlay -o lowerdir=/,upperdir=/mnt/upper,workdir=/mnt/work /mnt/newroot

echo "= switch to new root..."
umount /proc /sys
mount --move /dev /mnt/newroot/dev   # active, so move it

mkdir -p /mnt/newroot/mnt/old_root
cd /mnt/newroot
/sbin/pivot_root . /mnt/newroot/mnt/old_root

echo "=== done!"
exec /sbin/init
echo "BUG! init failed!"

# TODO: umount -l /mnt/old_root at the end of boot (requires NFS up)


So only one kernel image file that can be booted in any way (Grub, EFI Stub or PXE).
No initramfs.
Connects to one chroot install on NFS server.
Custom init.diskless overlays whole / with tmpfs for rw first, then normal system boots with OpenRC.
/home and /root mounted rw via fstab.

For updates, either do them in chroot on server,
or by starting normally with /sbin/init and skipping the new /init.diskless, then one client comes up with rw /.

Hmmm... now that my chroot is rather messy from all the testing,
I should start from a fresh stage3 chroot and record every modification step by step....

THANK YOU all for the help and suggestions!
Gentoo and its community is awesome! :-)
Back to top
View user's profile Send private message
mortonP
n00b
n00b


Joined: 22 Dec 2015
Posts: 34

PostPosted: Sat Jan 04, 2020 1:56 pm    Post subject: Reply with quote

"umask ignored on NFSv4.2 mounts" - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1779736

because of acls in v4.2 by default,
but overlay needs "noacl",
so yes vers=4.1 on all mounts is the workaround
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Installing Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum