LXC
Last time I checked and LXC (Linux Container) does not work well, probably I have not looked deeply enough. The current documentation of LXC is scarce and I think it is one of the reason its adoption is slow.
Now I am replacing vservers with lxc and it looks good. It provides most thing vserver provided without the need to patch the kernel - and more on that, layer 2 networking separation is nice to have.
I noted some important things.
To avoid some surprises when starting the guest make sure the config file you need to drop enough privileges.
Also the fstab sys mount need to be read-only;
Step by step to build a working CT
First get the lxc tools source and build it. I got many problem with distro shipped one - and to be sure things work use the version 0.7.5 that I have tested.
I build binary for x86_64 and i686 ready - if someone needs just let me know. This build is using prefix /lxc for all files. The lxc binary is in /lxc/bin/.
We will store all rootdir of lxc CT in /lxc/guest/CT_NAME ; fstab file in /lxc/etc/CT_NAME.fstab
We will use bridge vmbr0 for the CT network.If you want the CT to use a subnet then bridge the corresponding interface to vmbr0. To make it consistent after reboot you need to consult the networking documentation for the distro you are running. Here is example of internal subnet 5.5.5.0/24
export PATH=/lxc/bin:$PATH
brctl addbr vmbr0
ifconfig vmbr0 5.5.5.1/24
# Mount the cgroup filesystem if it is not mounted yet. Problem is it is mounted in two different mount point so make sure only one.
mount | grep cgroup
if [ ! "$?" == '0' ]; then
grep 'none /cgroup cgroup defaults 0 0' /etc/fstab >/dev/null 2>&1
if [ ! "$?" == '0' ]; then echo "none /cgroup cgroup defaults 0 0" >> /etc/fstab; fi
mkdir -p /cgroup
mount /cgroup
fi
function genmac() {
local hexchars="0123456789ABCDEF"
local end=$( for i in {1..6} ; do echo -n ${hexchars:$(( $RANDOM % 16 )):1} ; done | sed -e 's/\(..\)/:\1/g' )
echo 00:60:2F$end
}
CONTAINER=test-lxc
IPADDR=5.5.5.2
PREFIX=24
BRIDGE=vmbr0
MACADDR=`genmac`
VETHPAIR=some_short_name # in the host run ifconfig vethsome_name to see. < 11 chars is fine (total interface name not > 15? )
MEM=512M
MEMSWAP=600M
ROOT=/lxc/guest/${CONTAINER}
FSTAB=/lxc/etc/${CONTAINER}.fstab
CONF=/lxc/etc/${CONTAINER}.conf
ARCH=i686 # needed only if you want to run i686 CT inside x86_64 host
mkdir -p $ROOT /lxc/etc
cat <<EOF > $FSTAB
none $ROOT/dev/pts devpts defaults 0 0
none $ROOT/proc proc nodev,noexec,nosuid 0 0
none $ROOT/sys sysfs defaults,ro 0 0
none $ROOT/dev/shm tmpfs defaults 0 0
EOF
cat <<EOF > $CONF
#
# LXC container configuration file
#
# container name
lxc.utsname = $CONTAINER
#
# how many tty consoles to create
lxc.tty = 4
#
# full path to the container's root filesystem
lxc.rootfs = $ROOT
#
# full path to the container.fstab config file
lxc.mount = $FSTAB
#
# create one network interface
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = $BRIDGE
lxc.network.hwaddr = $MACADDR
lxc.network.ipv4 = $IPADDR/$PREFIX
lxc.network.veth.pair = veth$VETHPAIR
#
# which cpus can this container use?
# run: /lxc/bin/lxc-cgroup -n container cpuset.cpus
# to display the current value
# lxc.cgroup.cpuset.cpus = 0
#
# which devices can this container access?
# deny to all by default
lxc.cgroup.devices.deny = a
# Allow any mknod (but not using the node)
lxc.cgroup.devices.allow = c *:* m
lxc.cgroup.devices.allow = b *:* m
# allow /dev/null and zero
lxc.cgroup.devices.allow = c 1:3 rwm
lxc.cgroup.devices.allow = c 1:5 rwm
# allow consoles
lxc.cgroup.devices.allow = c 5:1 rwm
lxc.cgroup.devices.allow = c 5:0 rwm
lxc.cgroup.devices.allow = c 4:0 rwm
lxc.cgroup.devices.allow = c 4:1 rwm
# allow /dev/{,u}random
lxc.cgroup.devices.allow = c 1:9 rwm
lxc.cgroup.devices.allow = c 1:8 rwm
# allow /dev/pts/* - pts namespaces are "coming soon"
lxc.cgroup.devices.allow = c 136:* rwm
lxc.cgroup.devices.allow = c 5:2 rwm
# allow rtc
lxc.cgroup.devices.allow = c 254:0 rwm
#full
lxc.cgroup.devices.allow = c 1:7 rwm
#hpet
lxc.cgroup.devices.allow = c 10:228 rwm
#kvm
#lxc.cgroup.devices.allow = c 10:232 rwm
#ppp
lxc.cgroup.devices.allow = c 108:0 rwm
#fuse
#lxc.cgroup.devices.allow = c 10:229 rwm
# loop7
#lxc.cgroup.devices.allow = b 7:7 rwm
#tun
lxc.cgroup.devices.allow = c 10:200 rwm
# Drop cap - If mount is needed then comment out the line with sys_admin
lxc.cap.drop = audit_control
lxc.cap.drop = audit_write
lxc.cap.drop = mac_admin
lxc.cap.drop = mac_override
lxc.cap.drop = mknod
#lxc.cap.drop = net_raw
lxc.cap.drop = setfcap
lxc.cap.drop = setpcap
lxc.cap.drop = sys_admin
lxc.cap.drop = sys_boot
lxc.cap.drop = sys_module
lxc.cap.drop = sys_nice
lxc.cap.drop = sys_pacct
lxc.cap.drop = sys_rawio
lxc.cap.drop = sys_resource
lxc.cap.drop = sys_time
# need for getty login
#lxc.cap.drop = sys_tty_config
lxc.cap.drop = fsetid ipc_lock ipc_owner lease linux_immutable sys_ptrace
# end drop cap
EOF
if [ ! -z "$MEM" ]; then
echo "lxc.cgroup.memory.limit_in_bytes = $MEM" >> $CONF
fi
if [ ! -z "$SWAP" ]; then
echo "lxc.cgroup.memory.memsw.limit_in_bytes = $MEMSWAP" >> $CONF
fi
if [ ! -z "$ARCH" ]; then
echo "lxc.arch = $ARCH" >> $CONF
fi
Now we have the CT conf. We create the CT
lxc-create -n $CONTAINER -f $CONF
# should not any errors, if error then fix it! If /lxc/var/lib/lxc does not exists run mkdir -p /lxc/var/lib/lxc and re-run the lxc-create command again
It is time to populate the CT root dir with a root file system. I usually use rsync -avh --numeric-ids from the existing vserver instance root dir - you can use other, debootstrap for debian/ubuntu based or any OpenVZ template tar ball. After populating root file system to $ROOT is done, we need to do some adjustment for the CT to boot.
vim $ROOT/root/lxc-setup.sh
paste the below content
#!/bin/sh
# lxc adjustment commands. Run it once when the container is created
rm -f /etc/mtab
ln -sf /proc/mounts /etc/mtab
cat <<EOF > /etc/fstab
# Unconfigured fstab
EOF
chmod -x /sbin/*udev*
/bin/sed --in-place -e "s/^session.*required.*pam_loginuid.so/# session\trequired\tpam_loginuid.so/g" /etc/pam.d/* >/dev/null 2>&1
/bin/sed --in-place -e "s/^session.*[success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid/# session [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid/g" /etc/pam.d/password-auth >/dev/null 2>&1
/sbin/MAKEDEV -d /dev -x {p,t}ty{a,p}{0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f} console core full kmem kmsg mem null port ptmx random urandom zero ram0
/sbin/MAKEDEV -d /etc/udev/devices -x {p,t}ty{a,p}{0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f} console core full kmem kmsg mem null port ptmx random urandom zero ram0
chmod 1777 tmp var/tmp
if [ -f "/etc/redhat-release" ]; then
chmod -x /sbin/start_udev
for srv in NetworkManager nfs ipmi mdmpd nfslock multipathd wpa_supplicant dund netplugd pand ip6tables iptables kudzu readahead_later lvm2-monitor readahead_early cpuspeed network irqbalance mdmonitor bluetooth acpid hidd lm_sensors rawdevices ntpd gpm haldaemon firstboot smartd pcscd auditd autofs portmap rpcgssd rpcidmapd rpcsvcgssd iscsi iscsid irqbalance ipmievd ipmi irda heartbeat microcode_ctl netfs; do chkconfig \$srv off >/dev/null 2>&1 ; done
for srv in sshd network; do chkconfig \$srv on; done
echo "NETWORKING=yes" > /etc/sysconfig/network
cat <<ENDF > /etc/sysconfig/network-scripts/ifcfg-eth0
HWADDR=$MACADDR
IPADDR=$IPADDR
PREFIX=$PREFIX
GATEWAY=$GATEWAY
DEVICE=eth0
ONBOOT=yes
ENDF
VER=\`cat /etc/redhat-release | perl -ne '/release ([\d]+)/ && print \$1'\`
case "\$VER" in
'6')
cp /etc/init/tty1.conf /etc/init/tty1.conf.old
cat <<END > /etc/init/tty1.conf
start on stopped rc RUNLEVEL=[2345]
stop on runlevel [!2345]
respawn
exec /sbin/mingetty tty1
END
;;
'5'|'4')
grep 'tty1' /etc/inittab
if [ ! "\$?" == '0' ]; then
echo "1:2345:respawn:/sbin/mingetty tty1" >> /etc/inittab
fi
;;
*)
echo "Unknown redhat version '\$VER'"
;;
esac
elif [ -f "/etc/debian_version" ]; then
# Debian based
# ifupdown needed to configure network interface
apt-get -q install ifupdown
cat <<ENDF > /etc/network/interfaces
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet static
address $IPADDR
netmask $NETMASK
gateway $GATEWAY
ENDF
for srv in networking ssh; do
update-rc.d \$srv defaults
done
for srv in acpid acpi-support alsa-mixer-save alsasound apparmor bluetooth brltty console-setup cryptdisks cryptdisks-early cryptdisks-enable cryptdisks-udev lm-sensors pcmciautils smartmontools udev udev-finish udevmonitor udevtrigger umountfs umountroot; do
update-rc.d -f \$srv remove
done
# just to be sure
echo "service networking restart" >> /etc/rc.local
if [ -d "/etc/init" ]; then
# upstart. (ubuntu guest)
# For now it is best to start console and ssh by default, add default gw.
# When log in adjust whihc service to be started manually
cp -a /etc/init /etc/init.orig
cat <<ENDF >< /etc/init/tty1.conf
start on startup
stop on runlevel [!2345]
respawn
exec /sbin/getty -8 38400 tty1
ENDF
perl -i.bak -pe 's/^start on.*$/start on startup/ ; /^[\s]*and stopped.*$/stop on runlevel [!2345]/ ; s/^exec ifup -a/exec route add default gw $GATEWAY/' /etc/init/networking.conf
mkdir /var/run/network
perl -i.bak -pe 's/^start on filesystem.*$/start on startup/' /etc/init/ssh.conf
perl -i.bak -pe 's/^start on filesystem.*$/start on startup/' /etc/init/rc-sysinit.conf
cat <<EOF > /etc/environment
LANG="en_US.UTF-8"
LANGUAGE="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
LC_CTYPE="C"
EOF
echo $CONTAINER > /etc/hostname
else
grep 'tty1' /etc/inittab
if [ ! "\$?" == '0' ]; then echo "1:2345:respawn:/sbin/getty 38400 tty1" >> /etc/inittab ; fi
fi
else
:
fi
# END of paste text
chmod +x $ROOT/root/lxc-setup.sh
Now chroot the $ROOT and run the tuneup script
chroot ${ROOT}/ /root/lxc-setup.sh
All done!. Now to start it
lxc-start -n $CONTAINER
If no error you can access the console. Open other ssh to the host and run
lxc-console -n $CONTAINER
To start CT in daemon mode first stop it and then start using option -d
lxc-stop -n $CONTAINER
lxc-start -n $CONTAINER -d
To remove the CT just
lxc-destroy -n $CONTAINER
rm -rf $ROOT /lxc/etc/${CONTAINER}.*
As lxc has not supported disk quota yet , it is best to use a lv for the root file system mounted at $ROOT.
No comments:
Post a Comment