Linux Containers
From ArchWiki
Current state of this HowTo
Delerious010 21:35, 1 December 2009 (EST)
- Currently just a rough draft... I think I'll need to restructure this a bit and I've also noticed I've become a bit too verbose -_-;; I'll be along shortly to complete this as well as clean it up.
Introduction
Synopsis
Linux Containers (LXC) are an operating system-level virtualization method for running multiple isolated server installs (containers) on a single control host. LXC does not provide a virtual machine, but rather provides a virtual environment that has its own process and network space. It is similar to a chroot, but offers much more isolation.
About this HowTo
This document is intended as an overview on setting up and deploying containers, and is not an in depth detailed instruction by instruction guide. A certain amount of prerequisite knowledge and skills are assumed (running commands as root, kernel configuration, mounting filesystems, shell scripting, chroot type environments, networking setup, etc).
Much of this was taken verbatim from Dwight Schauer, Tuxe and Ulhume. It has been copied here both to enable to community to share their collective wisdom and to expand on a few points.
Less verbose tutorial
Delerious010 21:43, 1 December 2009 (EST) I've come to realize I've added a lot of text to this HowTo. If you'd like something more streamlined, please head on over to http://lxc.teegra.net/ for Dwight's excellent guide.
Kernel configuration
Through the GUI
General Setup
- [*] Group CPU scheduler
- [*] Group scheduling for SCHED_OTHER
- Basis for grouping tasks (Control groups)
- [*] Control groups
- [*] Control Group support
- [*] Namespace cgroup subsystem
- [*] Freezer cgroup subsystem
- [*] Device controller for cgroups
- [*] Cpuset support
- [*] Include legacy /proc/ /cpuset file
- [*] Simple CPU accounting cgroup subsystem
- [*] Resource counters
- [*] Memory Resource Controller for Control Groups
- [*] Memory Resource Controller Swap Extension(EXPERIMENTAL)
- [*] Memory Resource Controller for Control Groups
- [*] Namespace support
- [*] UTS namespace
- [*] IPC namespace
- [*] User namespace (EXPERIMENTAL)
- [*] PID Namespaces (EXPERIMENTAL)
- [*] Network namespace
Networking support
- Networking options
- [*] QoS and/or fair queueing
- [*] Control Group Classifier
- [*] QoS and/or fair queueing
Device drivers
- Character devices
- [*] Unix98 pty support
- [*] Support multiple instances of devpts
- [*] Unix98 pty support
Security options
- [*] File POSIX Capabilities
Through the .config
CONFIG_GROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_RT_GROUP_SCHED=y CONFIG_CGROUP_SCHED=y CONFIG_CGROUPS=y CONFIG_CGROUP_NS=y CONFIG_CGROUP_FREEZER=y CONFIG_CGROUP_DEVICE=y CONFIG_CPUSETS=y CONFIG_PROC_PID_CPUSET=y CONFIG_CGROUP_CPUACCT=y CONFIG_RESOURCE_COUNTERS=y CONFIG_CGROUP_MEM_RES_CTLR=y CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y CONFIG_MM_OWNER=y CONFIG_NAMESPACES=y CONFIG_UTS_NS=y CONFIG_IPC_NS=y CONFIG_USER_NS=y CONFIG_PID_NS=y CONFIG_NET_NS=y CONFIG_NET_CLS_CGROUP=y CONFIG_SECURITY_FILE_CAPABILITIES=y CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
Testing capabilities
Once the lxc package is installed, running lxc-checkconfig will print out a list of your system's capabilities
Host configuration
Control group filesystem
LXC depends on the control group filesystem being mounted. At present, there exists no standard location for it. As such, you're free to create it where ever you see fit.
Mounting manually
mkdir /cgroup mount -t cgroup none /cgroup
In /etc/fstab
none /cgroup cgroup defaults 0 0
Userspace tools
Both lxc and lxc-git can be found in AUR. In the example below, yaourt will be used to download and compile the package for us.
yaourt -S lxc-git
Bridge device setup
/etc/conf.d/bridges
bridge_br0="eth0" config_br0="brctl setfd br0 0" BRIDGE_INTERFACES=(br0)
/etc/rc.conf
MODULES=(... bridge ...) # YMMV, but this was not required eth0="eth0 0.0.0.0 up" # I had to do the 0.0.0.0, "eth0 up" was not sufficient. br0="dhcp" # or however you set your address INTERFACES=(eth0 br0)
Bridge forward delay
In order for br0 to dhcp quickly (and for container network devices to be available quickly) one must set the forward delay of the bridge device to zero.
brctl setfd br0 0
Patch for /etc/rc.d/network
This is required to use the above mentioned config_br0 statement as of initscripts 2009.08-1.
--- network.0 2009-10-13 13:05:40.924603683 -0500 +++ network 2009-10-13 13:18:59.534523717 -0500 @@ -172,6 +172,15 @@ /usr/sbin/brctl addif $br $brif || error=1 fi done + eval brconfig="\$config_${br}" + if [ -n "${brconfig}" ]; then + if ${brconfig}; then + true + else + echo config_${br}=\"${brconfig}\" \<-- invalid configuration statement + error=1 + fi + fi fi done }
See also: FS#16625
Container setup
There are various different means to do this
Creating the filesystem
Bootstrap
Bootstrap an install ( mkarchroot, debootstrap, rinse, Install From Existing Linux ). You can also just copy/use an existing installation’s complete root filesystem.
Download existing
You can download a base install tar ball. OpenVZ templates work just fine.
Using the lxc tools
/usr/bin/lxc-debian {create|destroy|purge|help} /usr/bin/lxc-fedora {create|destroy|purge|help}
Creating the device nodes
Since udev does not work within the container, you'll want to make sure that a certain minimum amount of devices is created for it. This may be done with the following script :
#!/bin/bash ROOT=$(pwd) DEV=${ROOT}/dev mv ${DEV} ${DEV}.old mkdir -p ${DEV} mknod -m 666 ${DEV}/null c 1 3 mknod -m 666 ${DEV}/zero c 1 5 mknod -m 666 ${DEV}/random c 1 8 mknod -m 666 ${DEV}/urandom c 1 9 mkdir -m 755 ${DEV}/pts mkdir -m 1777 ${DEV}/shm mknod -m 666 ${DEV}/tty c 5 0 mknod -m 600 ${DEV}/console c 5 1 mknod -m 666 ${DEV}/tty0 c 4 0 mknod -m 666 ${DEV}/full c 1 7 mknod -m 600 ${DEV}/initctl p mknod -m 666 ${DEV}/ptmx c 5 2
Container configuration
Configuration file
The main configuration files are used to describe how to originally create a container. Though these files may be located anywhere, /etc/lxc is probably a good place.
Basic settings
lxc.utsname = $CONTAINER_NAME
lxc.mount = $CONTAINER_FSTAB lxc.rootfs = $CONTAINER_ROOTFS
lxc.network.type = veth lxc.network.flags = up lxc.network.link = br0 lxc.network.hwaddr = $CONTAINER_MACADDR lxc.network.ipv4 = $CONTAINER_IPADDR lxc.network.name = $CONTAINER_DEVICENAME
Basic settings explained
lxc.utsname : This will be the name of the cgroup for the container. Once the container is started, you should be able to see a new folder named /cgroup/$CONTAINER_NAME.
Furthermore, this will also be the value returned by hostname from within the container. Assuming you've not removed access, the container may overwrite this with it's init script.
lxc.mount : This points to an fstab formatted file that is a listing of the mount points used when lxc-start is called. This file is further explained further
Terminal settings
The following configuration paramters are optional. You may add them to your main configuration file if you wish to login via lxc-console, or through a terminal ( ex.: Ctrl+Alt+F1 ).
lxc.tty = 1 lxc.pseudo = 1024
Terminal settings explained
lxc.tty This defines the total amount of /dev/tty defines that are accessible from within the container.
lxc.pseudo Maximum amount of pseudo terminals that are may be created in /dev/pts. Delerious010 18:57, 3 December 2009 (EST) Currently, assuming the kernel was compiled with CONFIG_DEVPTS_MULTIPLE_INSTANCES, this tells lxc-start to mount the devpts filesystem with the newinstance flag.
Host device access settings
lxc.cgroup.devices.deny = a # Deny all access to devices
lxc.cgroup.devices.allow = c 1:3 rwm # dev/null lxc.cgroup.devices.allow = c 1:5 rwm # dev/zero
lxc.cgroup.devices.allow = c 5:1 rwm # dev/console lxc.cgroup.devices.allow = c 5:0 rwm # dev/tty lxc.cgroup.devices.allow = c 4:0 rwm # dev/tty0
lxc.cgroup.devices.allow = c 1:9 rwm # dev/urandon lxc.cgroup.devices.allow = c 1:8 rwm # dev/random lxc.cgroup.devices.allow = c 136:* rwm # dev/pts/* lxc.cgroup.devices.allow = c 5:2 rwm # dev/pts/ptmx
# No idea what this is .. dev/bsg/0:0:0:0 ??? lxc.cgroup.devices.allow = c 254:0 rwm
Host device access settings explained
lxc.cgroup.devices.deny : By settings this to a, we're stating that the container has access to no devices unless explicitely defined within the configuration file.
Configuration file notes
At runtime /dev/ttyX devices are recreated
If you've enabled multiple DevPTS instances in your kernel, lxc-start will recreate lxc.tty amount of /dev/ttyX devices when it is executed.
This means that you will have lxc.tty amount of pseudo ttys. If you're planning on accessing the container via a "real" terminal ( Ctrl+Alt+FX ), make sure that it's a number that is inferior to lxc.tty.
To tell whether it's been re-created, just log into the container via either lxc-console or ssh and perform a ls -Al command on the tty. Devices with a major number of 4 are "real" tty devices where as a major number of 136 indicates a pts.
Be aware that this is only visible from within the container itself and not from the host.
Containers have access to host's TTY nodes
If you do not properly restrict the container's access to the /dev/tty nodes, the container may have access to the host's.
Taking into consideration that, as previously mentioned, lxc-start recreates lxc.tty amount of /dev/tty devices, any tty nodes present in the container that are of a greater minor number than lxc.tty will be linked to the host's.
To access the container from a host TTY
- On the host, verify no getty is started for that tty by checking /etc/inittab.
- In the container, start a getty for that tty.
To prevent access to the host TTY
Please have a look at the configuration statements found in host device access settings.
Via the lxc.cgroup.devices.deny = a we're preventing access to all host level devices. And then, throuh lxc.cgroup.devices.allow = c 4:1 rwm we're allowing access to the host's /dev/tty1. In the above example, simply removing all allow statements for major number 4 and minor > 1 should be sufficient.
To test this access
I may be off here, but looking at the output of the ls command below should show you both the major and minor device numbers. These are located after the user and group and represented as : 4, 2
- Set lxc.tty to 1
- Make there that the container has dev/tty1 and /dev/tty2
- lxc-start the container
- lxc-console into the container
- ls -Al /dev/tty
crw------- 1 root root 4, 2 Dec 2 00:20 /dev/tty2 - echo "test output" > /dev/tty2
- Ctrl+Alt+F2 to view the host's second terminal
- You should see "test output" printed on the screen
Configuration troubleshooting
console access denied: Permission denied
If, when executing lxc-console, you receive the error lxc-console: console access denied: Permission denied you've most likely either omitted lxc.tty or set it to 0.
lxc-console does not provide a login prompt
Though you're reaching a tty on the container, it most likely is not running a getty. You'll want to double check that you have a getty defined in the container's /etc/inittab for the specific tty.
Configuring fstab
none $CONTAINER_ROOTFS/dev/pts devpts defaults 0 0 none $CONTAINER_ROOTFS/proc proc defaults 0 0 none $CONTAINER_ROOTFS/sys sysfs defaults 0 0 none $CONTAINER_ROOTFS/dev/shm tmpfs defaults 0 0
This fstab is used by lxc-start when mounting the container. As such, you can define any mount that would be possible on the host such as bind mounting to the host's own filesystem. However, please be aware of any and all security implications that this may have.
Warning : You certainly do not want to bind mount the host's /dev to the container as this would allow it to, amongst other things, reboot the host.
Container Creation and Destruction
Creation
lxc-create -f $CONTAINER_CONFIGPATH -n $CONTAINER_NAME
lxc-create will create /var/lib/lxc/$CONTAINER_NAME with a new copy of the container configuration file found in $CONTAINER_CONFIGPATH.
As such, if you need to make modifications to the container's configuration file, it's advisable to modify only the original file and then perform lxc-destroy and lxc-create operations afterwards. No data will be lost by doing this.
Note : When copying the file over, lxc-create will strip all comments from the file.
Note : As of lxc-git from atleast 2009-12-01, performing lxc-create no longer splits the config file into multiple files and folders. Therefore, we only have the configuration file to worry about.
Destruction
lxc-destroy -n $CONTAINER_NAME
This will delete /var/lib/lxc/$CONTAINER_NAME which only contains configuration files. No data will be lost.
Readying the host for virtualization
/etc/inittab
- Comment out any getty that are not required
/etc/rc.sysinit replacement
Since we're running in a virtual environment, a number of steps undertaken by rc.sysinit are superfluous and may even flat out fail or stall. As such, until the initscripts are made virtualization aware, this'll take some hack and slash.
For now, simply replace the file :
#!/bin/bash # Whatever is needed to clean out old daemon/service pids from your container rm -f $(find /var/run -name '*pid') rm -f /var/lock/subsys/*
# Configure network settings ## You can either use dhcp here, manually configure your ## interfaces or try to get the rc.d/network script working. ## There have been reports that network failed in this ## environment. route add default gw 192.168.10.1 echo > /etc/resolv.conf search your-domain echo >> /etc/resolv.conf nameserver 192.168.10.1
# Initally we don't have any container originated mounts rm -f /etc/mtab touch /etc/mtab
/etc/rc.conf cleanup
You may want to remove any and all hardware related daemons from the DAEMONS line. Furthermore, depending on your situation, you may also want to remove the network daemon.