Saturday, 26 May 2012

LXC - Linux Container

LXC

Last time I checked and LXC (Linux Container) does not work well, probably I have not looked deeply enough. The current documentation of LXC is scarce and I think it is one of the reason its adoption is slow.

Now I am replacing vservers with lxc and it looks good. It provides most thing vserver provided without the need to patch the kernel - and more on that, layer 2 networking separation is nice to have.

I noted some important things.

To avoid some surprises when starting the guest make sure the config file you need to drop enough privileges.

Also the fstab sys mount need to be read-only;

Step by step to build a working CT

First get the lxc tools source and build it. I got many problem with distro shipped one - and to be sure things work use the version 0.7.5 that I have tested.

I build binary for x86_64 and i686 ready - if someone needs just let me know. This build is using prefix /lxc for all files. The lxc binary is in /lxc/bin/.

We will store all rootdir of lxc CT in /lxc/guest/CT_NAME ; fstab file in /lxc/etc/CT_NAME.fstab

We will use bridge vmbr0 for the CT network.If you want the CT to use a subnet then bridge the corresponding interface to vmbr0. To make it consistent after reboot you need to consult the networking documentation for the distro you are running. Here is example of internal subnet 5.5.5.0/24

export PATH=/lxc/bin:$PATH
brctl addbr vmbr0
ifconfig vmbr0 5.5.5.1/24

# Mount the cgroup filesystem if it is not mounted yet. Problem is it is mounted in two different mount point so make sure only one.

mount | grep cgroup
if [ ! "$?" == '0' ]; then
grep 'none /cgroup cgroup defaults 0 0' /etc/fstab >/dev/null 2>&1
if [ ! "$?" == '0' ]; then echo "none /cgroup cgroup defaults 0 0" >> /etc/fstab; fi
mkdir -p /cgroup
mount /cgroup
fi


function genmac() {
local hexchars="0123456789ABCDEF"
local end=$( for i in {1..6} ; do echo -n ${hexchars:$(( $RANDOM % 16 )):1} ; done | sed -e 's/\(..\)/:\1/g' )
echo 00:60:2F$end
}


CONTAINER=test-lxc
IPADDR=5.5.5.2
PREFIX=24
BRIDGE=vmbr0
MACADDR=`genmac`
VETHPAIR=some_short_name # in the host run ifconfig vethsome_name to see. < 11 chars is fine (total interface name not > 15? )
MEM=512M
MEMSWAP=600M
ROOT=/lxc/guest/${CONTAINER}
FSTAB=/lxc/etc/${CONTAINER}.fstab
CONF=/lxc/etc/${CONTAINER}.conf
ARCH=i686  # needed only if you want to run i686 CT inside x86_64 host



mkdir -p $ROOT /lxc/etc


cat <<EOF > $FSTAB
none $ROOT/dev/pts    devpts defaults 0 0
none $ROOT/proc    proc    nodev,noexec,nosuid 0 0
none $ROOT/sys    sysfs    defaults,ro 0 0
none $ROOT/dev/shm tmpfs defaults 0 0
EOF


cat <<EOF > $CONF
#
# LXC container configuration file
#
# container name
lxc.utsname = $CONTAINER
#
# how many tty consoles to create
lxc.tty = 4
#
# full path to the container's root filesystem
lxc.rootfs = $ROOT
#
# full path to the container.fstab config file
lxc.mount = $FSTAB
#
# create one network interface
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = $BRIDGE
lxc.network.hwaddr = $MACADDR
lxc.network.ipv4 = $IPADDR/$PREFIX
lxc.network.veth.pair   = veth$VETHPAIR
#
# which cpus can this container use?
# run: /lxc/bin/lxc-cgroup -n container cpuset.cpus
# to display the current value
# lxc.cgroup.cpuset.cpus = 0
#
# which devices can this container access?
# deny to all by default
lxc.cgroup.devices.deny = a
# Allow any mknod (but not using the node)
lxc.cgroup.devices.allow = c *:* m
lxc.cgroup.devices.allow = b *:* m
# allow /dev/null and zero
lxc.cgroup.devices.allow = c 1:3 rwm
lxc.cgroup.devices.allow = c 1:5 rwm
# allow consoles
lxc.cgroup.devices.allow = c 5:1 rwm
lxc.cgroup.devices.allow = c 5:0 rwm
lxc.cgroup.devices.allow = c 4:0 rwm
lxc.cgroup.devices.allow = c 4:1 rwm
# allow /dev/{,u}random
lxc.cgroup.devices.allow = c 1:9 rwm
lxc.cgroup.devices.allow = c 1:8 rwm
# allow /dev/pts/* - pts namespaces are "coming soon"
lxc.cgroup.devices.allow = c 136:* rwm
lxc.cgroup.devices.allow = c 5:2 rwm
# allow rtc
lxc.cgroup.devices.allow = c 254:0 rwm
#full
lxc.cgroup.devices.allow = c 1:7 rwm
#hpet
lxc.cgroup.devices.allow = c 10:228 rwm
#kvm
#lxc.cgroup.devices.allow = c 10:232 rwm
#ppp
lxc.cgroup.devices.allow = c 108:0 rwm
#fuse
#lxc.cgroup.devices.allow = c 10:229 rwm
# loop7
#lxc.cgroup.devices.allow = b 7:7 rwm
#tun
lxc.cgroup.devices.allow = c 10:200 rwm
# Drop cap - If mount is needed then comment out the line with sys_admin
lxc.cap.drop = audit_control
lxc.cap.drop = audit_write
lxc.cap.drop = mac_admin
lxc.cap.drop = mac_override
lxc.cap.drop = mknod
#lxc.cap.drop = net_raw
lxc.cap.drop = setfcap
lxc.cap.drop = setpcap
lxc.cap.drop = sys_admin
lxc.cap.drop = sys_boot
lxc.cap.drop = sys_module
lxc.cap.drop = sys_nice
lxc.cap.drop = sys_pacct
lxc.cap.drop = sys_rawio
lxc.cap.drop = sys_resource
lxc.cap.drop = sys_time
# need for getty login
#lxc.cap.drop = sys_tty_config
lxc.cap.drop = fsetid ipc_lock ipc_owner lease linux_immutable sys_ptrace
# end drop cap


EOF


if [ ! -z "$MEM" ]; then
    echo "lxc.cgroup.memory.limit_in_bytes = $MEM" >> $CONF
fi
if [ ! -z "$SWAP" ]; then
        echo "lxc.cgroup.memory.memsw.limit_in_bytes = $MEMSWAP" >> $CONF
fi
if [ ! -z "$ARCH" ]; then
    echo "lxc.arch = $ARCH" >> $CONF
fi

Now we have the CT conf. We create the CT

lxc-create -n $CONTAINER -f $CONF
# should not any errors, if error then fix it! If /lxc/var/lib/lxc does not exists run mkdir -p /lxc/var/lib/lxc and re-run the lxc-create command again

It is time to populate the CT root dir with a root file system. I usually use rsync -avh --numeric-ids from the existing vserver instance root dir - you can use other, debootstrap for debian/ubuntu based or any OpenVZ template tar ball. After populating root file system to $ROOT is done, we need to do some adjustment for the CT to boot.

vim $ROOT/root/lxc-setup.sh
paste the below content

#!/bin/sh


# lxc adjustment commands. Run it once when the container is created


rm -f /etc/mtab
ln -sf /proc/mounts /etc/mtab


cat <<EOF > /etc/fstab
# Unconfigured fstab
EOF


chmod -x /sbin/*udev*


/bin/sed --in-place -e "s/^session.*required.*pam_loginuid.so/# session\trequired\tpam_loginuid.so/g" /etc/pam.d/* >/dev/null 2>&1


/bin/sed --in-place -e "s/^session.*[success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid/# session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid/g" /etc/pam.d/password-auth >/dev/null 2>&1


/sbin/MAKEDEV -d /dev -x {p,t}ty{a,p}{0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f} console core full kmem kmsg mem null port ptmx random urandom zero ram0
/sbin/MAKEDEV -d /etc/udev/devices -x {p,t}ty{a,p}{0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f} console core full kmem kmsg mem null port ptmx random urandom zero ram0


chmod 1777 tmp var/tmp



if [ -f "/etc/redhat-release" ]; then


chmod -x /sbin/start_udev


for srv in  NetworkManager nfs ipmi mdmpd nfslock multipathd wpa_supplicant dund netplugd pand ip6tables  iptables kudzu readahead_later lvm2-monitor readahead_early  cpuspeed  network  irqbalance  mdmonitor  bluetooth  acpid  hidd  lm_sensors rawdevices  ntpd gpm haldaemon  firstboot  smartd  pcscd  auditd autofs portmap rpcgssd  rpcidmapd  rpcsvcgssd iscsi iscsid  irqbalance ipmievd ipmi  irda  heartbeat microcode_ctl netfs; do chkconfig \$srv off >/dev/null 2>&1 ; done


for srv in sshd network; do chkconfig \$srv on; done
echo "NETWORKING=yes" > /etc/sysconfig/network


cat <<ENDF > /etc/sysconfig/network-scripts/ifcfg-eth0
HWADDR=$MACADDR
IPADDR=$IPADDR
PREFIX=$PREFIX
GATEWAY=$GATEWAY
DEVICE=eth0
ONBOOT=yes


ENDF


VER=\`cat /etc/redhat-release | perl -ne '/release ([\d]+)/ && print \$1'\`
case "\$VER" in
'6')
cp /etc/init/tty1.conf /etc/init/tty1.conf.old



cat <<END > /etc/init/tty1.conf
start on stopped rc RUNLEVEL=[2345]
stop on runlevel [!2345]
respawn
exec /sbin/mingetty tty1


END


;;
'5'|'4')
grep 'tty1' /etc/inittab
if [ ! "\$?" == '0' ]; then
echo "1:2345:respawn:/sbin/mingetty tty1" >> /etc/inittab
fi


;;
*)
echo "Unknown redhat version '\$VER'"
;;


esac


elif [ -f "/etc/debian_version" ]; then
# Debian based
# ifupdown needed to configure network interface
apt-get -q  install ifupdown


cat <<ENDF > /etc/network/interfaces
auto lo
iface lo inet loopback


auto eth0
iface eth0 inet static
    address $IPADDR
    netmask $NETMASK
    gateway $GATEWAY


ENDF
for srv in networking ssh; do
    update-rc.d \$srv defaults
done


for srv in acpid acpi-support alsa-mixer-save alsasound apparmor bluetooth brltty console-setup cryptdisks cryptdisks-early cryptdisks-enable cryptdisks-udev lm-sensors pcmciautils smartmontools udev udev-finish udevmonitor udevtrigger umountfs umountroot; do
    update-rc.d -f \$srv remove
done
# just to be sure
echo "service networking restart" >> /etc/rc.local


if [ -d "/etc/init" ]; then
# upstart. (ubuntu guest)
# For now it is best to start console and ssh by default, add default gw.
# When log in adjust whihc service to be started manually

cp -a /etc/init /etc/init.orig


cat <<ENDF >< /etc/init/tty1.conf
start on startup
stop on runlevel [!2345]
respawn
exec /sbin/getty -8 38400 tty1


ENDF


perl  -i.bak -pe 's/^start on.*$/start on startup/ ; /^[\s]*and stopped.*$/stop on runlevel [!2345]/ ; s/^exec ifup -a/exec route add default gw $GATEWAY/' /etc/init/networking.conf
mkdir /var/run/network


perl -i.bak -pe 's/^start on filesystem.*$/start on startup/' /etc/init/ssh.conf
perl -i.bak -pe 's/^start on filesystem.*$/start on startup/' /etc/init/rc-sysinit.conf


cat <<EOF > /etc/environment
LANG="en_US.UTF-8"
LANGUAGE="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
LC_CTYPE="C"
EOF


echo $CONTAINER > /etc/hostname


else
grep 'tty1' /etc/inittab
if [ ! "\$?" == '0' ]; then echo "1:2345:respawn:/sbin/getty 38400 tty1" >> /etc/inittab ; fi
fi


else
:
fi



# END of paste text

chmod +x $ROOT/root/lxc-setup.sh

Now chroot the $ROOT and run the tuneup script

chroot ${ROOT}/ /root/lxc-setup.sh

All done!. Now to start it

lxc-start -n $CONTAINER

If no error you can access the console. Open other ssh to the host and run

lxc-console -n $CONTAINER

To start CT in daemon mode first stop it and then start using option -d

lxc-stop -n $CONTAINER
lxc-start -n $CONTAINER -d

To remove the CT just

lxc-destroy -n $CONTAINER
rm -rf  $ROOT /lxc/etc/${CONTAINER}.* 

As lxc has not supported disk quota yet , it is best to use a lv for the root file system mounted at $ROOT.

Saturday, 15 December 2007

Just try to create this blog on a not so good day!
:-(