Difference between revisions of "Virtualization (LXC Containers)"

From CSCWiki
Jump to navigation Jump to search
(Update DNS servers in networking file)
m
 
(13 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
UPDATE: as of Debian buster and later, many systemd services will break horribly in LXC due to namespacing issues. I suggest using either [[Podman]] or systemd-nspawn instead.
 +
 
As of Fall 2009, we use [http://lxc.sourceforge.net/ Linux containers] to maintain virtual machines, most notably [[Machine_List#caffeine|caffeine]], which is hosted on [[Machine_List#glomag|glomag]]. The various commands to manipulate Linux containers are prefixed with "lxc-"; see their individual manpages for usage.
 
As of Fall 2009, we use [http://lxc.sourceforge.net/ Linux containers] to maintain virtual machines, most notably [[Machine_List#caffeine|caffeine]], which is hosted on [[Machine_List#glomag|glomag]]. The various commands to manipulate Linux containers are prefixed with "lxc-"; see their individual manpages for usage.
  
Line 59: Line 61:
 
Cross your fingers and restart networking for your configuration to take effect!
 
Cross your fingers and restart networking for your configuration to take effect!
  
<pre># service networking restart
+
<pre># ifdown br0 && ifup br0
 
// bash enter to see if you lost connectivity and have to make a machine room trip</pre>
 
// bash enter to see if you lost connectivity and have to make a machine room trip</pre>
 +
 +
'''Note:''' !!! Do '''not''' use !!! <pre># service networking restart</pre> The init scripts are broken and this likely will result in a machine room trip (or IPMI power cycle).
  
 
==== Setting up storage ====
 
==== Setting up storage ====
Line 76: Line 80:
  
 
// Create a filesystem on it
 
// Create a filesystem on it
# mkfs.ext4 /dev/mapper/vg0-container</pre>
+
# mkfs.ext4 /dev/mapper/vg0-container
 +
 
 +
// Add a mount point
 +
# mkdir /vm/container </pre>
 
Last, add it to <code>/etc/fstab</code>:
 
Last, add it to <code>/etc/fstab</code>:
  
Line 97: Line 104:
 
<pre># Auto-start the container on boot
 
<pre># Auto-start the container on boot
 
lxc.start.auto = 1</pre>
 
lxc.start.auto = 1</pre>
 +
 +
You'll also want to set up networking (if applicable):
 +
 +
<pre># Networking
 +
lxc.network.type = veth
 +
lxc.network.flags = up
 +
lxc.network.link = br0
 +
lxc.network.name = eth0
 +
lxc.network.hwaddr = de:ad:be:ef:ba:be  # or something sensible</pre>
 +
 
Now,
 
Now,
  
Line 106: Line 123:
 
# lxc-start -d -n container
 
# lxc-start -d -n container
 
# lxc-attach -n container</pre>
 
# lxc-attach -n container</pre>
 +
 
=== Migrating a container between hosts ===
 
=== Migrating a container between hosts ===
  
Line 123: Line 141:
  
 
<pre># tar --numeric-owner -xzvf container.tar.gz -C /</pre>
 
<pre># tar --numeric-owner -xzvf container.tar.gz -C /</pre>
 +
 +
If you are migrating from an old version of LXC onto a newer one (e.g. migrating onto xylitol), update the config:
 +
 +
<pre># lxc-update-config -c /vm/container/config</pre>
 +
 +
This will also create a config.backup; you should inspect the new config file to make sure the migration was successful.
 +
 
Verify the container's existence:
 
Verify the container's existence:
  
Line 133: Line 158:
 
<pre># lxc-start -d -n container</pre>
 
<pre># lxc-start -d -n container</pre>
 
And test by trying an ssh in!
 
And test by trying an ssh in!
 +
 +
== merenber's guide to unprivileged LXC containers ==
 +
Prerequisite reading: https://wiki.debian.org/LXC#Privileged_Vs._Unprivileged_Containers
 +
 +
With unprivileged containers, UIDs and GIDs in the container map to a different set of UIDs/GIDs on the host.
 +
This is very important if you wish to use nested virtualization (i.e. container inside a container), because
 +
it is [https://github.com/lxc/lxd/issues/7085 dangerous to use nested virtualization in a privileged container].
 +
The following is a guide to setting up unprivileged containers with cgroup delegation, i.e. processes inside
 +
the container can create new cgroups. This is useful if, for example, you wish to run Docker in an LXC container.
 +
If you do not need cgroup delegation, just ignore the cgroup-specific steps.
 +
 +
First, we need to enable unprivileged user namespaces, which are disabled by default on Debian. Add the following
 +
line to /etc/sysctl.conf:
 +
<pre>
 +
kernel.unprivileged_userns_clone = 1
 +
</pre>
 +
Then run:
 +
<pre>
 +
sysctl -p
 +
</pre>
 +
 +
Now we are going to create a dummy user under which the unprivileged containers will run. Note that it is
 +
[https://linuxcontainers.org/lxc/getting-started/#creating-unprivileged-containers-as-root possible to create unprivileged containers as root];
 +
however, I wasn't able to get it to work. Some kind of permissions error
 +
when lxc-start tried to mount the rootfs. If you do find a way, please add the instructions here.
 +
<pre>
 +
useradd -s /bin/bash -m lxcuser0
 +
</pre>
 +
 +
Make sure that the newly created user has subuid and subgid entries:
 +
<pre>
 +
cat /etc/subuid
 +
cat /etc/subgid
 +
</pre>
 +
 +
For example, /etc/subuid could look like the following:
 +
<pre>
 +
lxcuser0:100000:65536
 +
</pre>
 +
This means that UIDs 0-65535 in the container will be mapped to UIDs 100000-165535 on the host.
 +
 +
Next, make sure that the user is allowed to create new veth interfaces:
 +
<pre>
 +
echo 'lxcuser0 veth br0 10' >> /etc/lxc/lxc-usernet
 +
</pre>
 +
 +
Create a new logical LVM volume, as shown in the guide above (replace 'gitlabrunner' by the container name):
 +
<pre>
 +
lvcreate -L 10G -n gitlabrunner xylitol-raidten 
 +
mkfs.ext4 /dev/mapper/xylitol--raidten-gitlabrunner
 +
mkdir /vm/gitlabrunner
 +
</pre>
 +
 +
Add the following to /etc/fstab:
 +
<pre>
 +
/dev/mapper/xylitol--raidten-gitlabrunner /vm/gitlabrunner ext4 defaults 0 2
 +
</pre>
 +
 +
Next, we need the volume to be mounted as the UID and GID which will be root inside the container (here, 100000).
 +
We will use debugfs to do this:
 +
<pre>
 +
debugfs -w -R 'set_inode_field . uid 100000' /dev/mapper/xylitol--raidten-gitlabrunner
 +
debugfs -w -R 'set_inode_field . gid 100000' /dev/mapper/xylitol--raidten-gitlabrunner
 +
</pre>
 +
See [https://unix.stackexchange.com/questions/586874/mounting-as-user-a-loop-still-assigns-root-ownership this]
 +
post if you're interested in knowing what these commands are doing.
 +
 +
Now we're ready to mount the volume:
 +
 +
<pre>
 +
mount /vm/gitlabrunner
 +
</pre>
 +
 +
Use <code>ls</code> to make sure that the volume was indeed mounted as the subuid root, not the real root:
 +
<pre>
 +
ls -ld /vm/gitlabrunner
 +
</pre>
 +
 +
Now switch to the dummy user and copy the default LXC conf file:
 +
<pre>
 +
su - lxcuser0
 +
cp /etc/lxc/default.conf .
 +
</pre>
 +
 +
Add the following lines to your copy of default.conf (replace the values with whatever you found in
 +
/etc/subuid and /etc/subgid, respectively):
 +
<pre>
 +
lxc.idmap = u 0 100000 65536
 +
lxc.idmap = g 0 100000 65536
 +
</pre>
 +
 +
This is necessary to create the rootfs with the correct file ownerships.
 +
 +
Now, as the dummy user, open a tmux or screen session. This is necessary to avoid some
 +
[https://github.com/lxc/lxc/issues/3163 weird TTY permission error].
 +
 +
Inside the tmux session, run the following:
 +
<pre>
 +
lxc-create -f default.conf -t download -n gitlabrunner --dir=/vm/gitlabrunner -- -d debian -r buster -a amd64
 +
</pre>
 +
 +
Now exit from the tmux session, and open <code>~/.local/share/lxc/gitlabrunner/config</code>. Add the following
 +
lines to it:
 +
<pre>
 +
lxc.include = /usr/share/lxc/config/nesting.conf
 +
lxc.mount.auto = proc:mixed sys:ro cgroup:mixed
 +
lxc.apparmor.profile = unconfined
 +
lxc.start.auto = 1
 +
</pre>
 +
 +
Now switch back to the root user and install the cgroup tools:
 +
<pre>
 +
apt install cgroup-tools
 +
</pre>
 +
 +
The idea is to create a new cgroup under which the unprivileged container will run, so that Docker (inside the
 +
container) can create new cgroups as necessary. <b>Note</b>: this method uses cgroups v1, which are going away
 +
soon. I tried to use cgroups v2 but I kept on running into some cgroup permissions error. If you figure out
 +
a way to use cgroups v2, please update the instructions here.
 +
 +
Paste the following into /root/bin/lxc-unprivileged-autostart.sh:
 +
<pre>
 +
#!/bin/bash
 +
 +
CGROUP_OWNER=lxcuser0
 +
CGROUP_NAME=lxcgroup0
 +
CGROUP_CONTROLLERS=rdma,cpuset,memory,perf_event,devices,pids,blkio,freezer,net_cls,net_prio,cpu,cpuacct
 +
NUM_CPUS=2
 +
 +
cgcreate \
 +
  -t $CGROUP_OWNER:$CGROUP_OWNER \
 +
  -a $CGROUP_OWNER:$CGROUP_OWNER \
 +
  -g "$CGROUP_CONTROLLERS:$CGROUP_NAME"
 +
# cpuset controller needs to be initialized
 +
echo 0 > /sys/fs/cgroup/cpuset/$CGROUP_NAME/cpuset.mems
 +
echo 0-$(( $NUM_CPUS - 1 )) > /sys/fs/cgroup/cpuset/$CGROUP_NAME/cpuset.cpus
 +
su -c "cgexec -g '$CGROUP_CONTROLLERS:$CGROUP_NAME' lxc-autostart" $CGROUP_OWNER
 +
</pre>
 +
 +
Change NUM_CPUS to however many CPU cores you wish to be available inside the container. You may have noticed
 +
that hugetlb is missing from the list of controllers; this is because the hugetlb controller is not mounted
 +
on xylitol as of this writing. Feel free to add it to the list if this no longer the case.
 +
 +
Make the script executable:
 +
<pre>
 +
chmod +x bin/lxc-unprivileged-autostart.sh
 +
</pre>
 +
 +
Now paste the following into <code>/etc/systemd/system/lxc-unprivileged-autostart.service</code>:
 +
<pre>
 +
[Unit]
 +
Description=Autostart unprivileged LXC Containers
 +
Requires=lxc.service
 +
After=lxc.service
 +
 +
[Service]
 +
Type=oneshot
 +
ExecStart=/root/bin/lxc-unprivileged-autostart.sh
 +
RemainAfterExit=true
 +
 +
[Install]
 +
WantedBy=multi-user.target
 +
</pre>
 +
 +
Then run:
 +
<pre>
 +
systemctl daemon-reload
 +
systemctl enable lxc-unprivileged-autostart.service
 +
systemctl start lxc-unprivileged-autostart.service
 +
</pre>
 +
 +
Now switch to the dummy user and make sure that the container is running:
 +
<pre>
 +
su - lxcuser0
 +
lxc-ls -f
 +
</pre>
 +
 +
=== Run Docker in the unprivileged container ===
 +
We are going to use the fuse-overlayfs storage driver because unprivileged users cannot mount overlay directories on Debian. It is not as fast as the overlay driver, but it much better than vfs. Make sure to read [https://docs.docker.com/engine/security/rootless/ this]. Next, insert the fuse module and make sure it loads automatically at boot time:
 +
<pre>
 +
modprobe fuse
 +
echo fuse >> /etc/modules
 +
</pre>
 +
Stop the container, and add the following line to <code>/home/lxcuser0/.local/share/lxc/gitlabrunner/config</code>:
 +
<pre>
 +
lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file 0 0
 +
</pre>
 +
(source: [https://discuss.linuxcontainers.org/t/security-of-fuse-overlayfs-with-lxc-unprivileged/10145 here])
 +
Then, start the container (using systemctl, as shown above), attach to it, and install fuse-overlayfs.
 +
Next, [https://docs.docker.com/engine/install/debian/ install Docker].
 +
Insert the following into <code>/etc/docker/daemon.json</code>:
 +
<pre>
 +
{
 +
    "storage-driver": "fuse-overlayfs"
 +
}
 +
</pre>
 +
Restart docker (systemctl restart docker).
 +
<br>
 +
Finally, run <code>docker run --rm hello-world</code> to make sure that everything is working correctly.
  
 
[[Category:Software]]
 
[[Category:Software]]

Latest revision as of 23:23, 28 October 2021

UPDATE: as of Debian buster and later, many systemd services will break horribly in LXC due to namespacing issues. I suggest using either Podman or systemd-nspawn instead.

As of Fall 2009, we use Linux containers to maintain virtual machines, most notably caffeine, which is hosted on glomag. The various commands to manipulate Linux containers are prefixed with "lxc-"; see their individual manpages for usage.

Management Quick Guide

To manage containers, use the lxc-* tools, which require root privilege. Some examples (replace caffeine with the appropriate container name):

# check if caffeine is running
lxc-info -n caffeine

# start caffeine in the background
lxc-start -d -n caffeine

# stop caffeine gracefully
lxc-halt -n caffeine

# stop caffeine forcefully
lxc-stop -n caffeine

# launch a TTY console for the container
lxc-console -n caffeine

To install Linux container support on a recent Debian (squeeze or newer) system:

  • Install the lxc and bridge-utils packages.
  • Create a bridged network interface (this can be configured in /etc/network/interfaces as though it were a normal Ethernet device, with the additional bridge_ports parameter. This is usually called br0 (can be created manually with brctl). LXC will create a virtual Ethernet device and add it to the bridge when each container starts.

To start caffeine, run the following command as root on glomag:

lxc-start -d -n caffeine

Containers are stored on the host filesystem in /var/lib/lxc (root filesystems are symlinked to the appropriate directory on /vm).

ehashman's Guide to LXC on Debian

Configuring the host machine

First, install all required packages:

# apt-get install lxc bridge-utils

Setting up ethernet bridging

Next, create an ethernet bridge for the container. Edit /etc/network/interfaces:

# The primary network interface
#auto eth0
#iface eth0 inet static
#       address 129.97.134.200
#       netmask 255.255.255.0
#       gateway 129.97.134.1

# Bridge ethernet for containers
auto br0
iface br0 inet static
    bridge_ports eth0
    address 129.97.134.200
    netmask 255.255.255.0
    gateway 129.97.134.1
    dns-nameservers 129.97.2.1 129.97.2.2
    dns-search wics.uwaterloo.ca uwaterloo.ca

Cross your fingers and restart networking for your configuration to take effect!

# ifdown br0 && ifup br0
// bash enter to see if you lost connectivity and have to make a machine room trip

Note: !!! Do not use !!!

# service networking restart

The init scripts are broken and this likely will result in a machine room trip (or IPMI power cycle).

Setting up storage

Last, allocate some space in your volume group to put the container root on:

// Find the correct volume group to put the container on
# vgdisplay

// Create the volume in the appropriate volume group
# lvcreate -L 20G -n container vg0

// Find it in the dev mapper
# ls /dev/mapper/

// Create a filesystem on it
# mkfs.ext4 /dev/mapper/vg0-container

// Add a mount point
# mkdir /vm/container 

Last, add it to /etc/fstab:

/dev/mapper/vg0-container /vm/container        ext4    defaults        0       2

Test the entry with mount:

# mount /vm/container

Now you're done!

Creating a new container

Create a new container using lxc-create:

// Create new container "container" with root fs located at /vm/container
# lxc-create --dir=/vm/container -n container --template download

This will prompt you for distribution, release, and architecture. (Architecture must match host machine.)

Take this time to review its config in /var/lib/lxc/container/config, and tell it to auto-start if you like:

# Auto-start the container on boot
lxc.start.auto = 1

You'll also want to set up networking (if applicable):

# Networking
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = br0
lxc.network.name = eth0
lxc.network.hwaddr = de:ad:be:ef:ba:be  # or something sensible

Now,

// List containers, -f for fancy
# lxc-ls -f

to ensure that your container has been successfully created; it should be listed. You can also list its root directory if you like. To start it in the background and obtain a root shell, do

// Start and attach a root shell
# lxc-start -d -n container
# lxc-attach -n container

Migrating a container between hosts

Start by shutting the container down:

root@container:~# halt

Then make a tarball of the container's filesystem:

# tar --numeric-owner -czvf container.tar.gz /vm/container

Copy it to its target destination, along with the configs:

$ scp container.tar.gz new-host:
$ scp -r /var/lib/lxc/container/ new-host:/var/lib/lxc/

Now carefully extract it. If you haven't already, provision storage and ethernet per the container creation section.

Yes, we really do want to stick it directly into /:

# tar --numeric-owner -xzvf container.tar.gz -C /

If you are migrating from an old version of LXC onto a newer one (e.g. migrating onto xylitol), update the config:

# lxc-update-config -c /vm/container/config

This will also create a config.backup; you should inspect the new config file to make sure the migration was successful.

Verify the container's existence:

# lxc-ls -f
NAME       STATE    IPV4  IPV6  AUTOSTART  
-----------------------------------------
container  STOPPED  -     -     YES   

Now just start it on up:

# lxc-start -d -n container

And test by trying an ssh in!

merenber's guide to unprivileged LXC containers

Prerequisite reading: https://wiki.debian.org/LXC#Privileged_Vs._Unprivileged_Containers

With unprivileged containers, UIDs and GIDs in the container map to a different set of UIDs/GIDs on the host. This is very important if you wish to use nested virtualization (i.e. container inside a container), because it is dangerous to use nested virtualization in a privileged container. The following is a guide to setting up unprivileged containers with cgroup delegation, i.e. processes inside the container can create new cgroups. This is useful if, for example, you wish to run Docker in an LXC container. If you do not need cgroup delegation, just ignore the cgroup-specific steps.

First, we need to enable unprivileged user namespaces, which are disabled by default on Debian. Add the following line to /etc/sysctl.conf:

kernel.unprivileged_userns_clone = 1

Then run:

sysctl -p

Now we are going to create a dummy user under which the unprivileged containers will run. Note that it is possible to create unprivileged containers as root; however, I wasn't able to get it to work. Some kind of permissions error when lxc-start tried to mount the rootfs. If you do find a way, please add the instructions here.

useradd -s /bin/bash -m lxcuser0

Make sure that the newly created user has subuid and subgid entries:

cat /etc/subuid
cat /etc/subgid

For example, /etc/subuid could look like the following:

lxcuser0:100000:65536

This means that UIDs 0-65535 in the container will be mapped to UIDs 100000-165535 on the host.

Next, make sure that the user is allowed to create new veth interfaces:

echo 'lxcuser0 veth br0 10' >> /etc/lxc/lxc-usernet

Create a new logical LVM volume, as shown in the guide above (replace 'gitlabrunner' by the container name):

lvcreate -L 10G -n gitlabrunner xylitol-raidten  
mkfs.ext4 /dev/mapper/xylitol--raidten-gitlabrunner
mkdir /vm/gitlabrunner

Add the following to /etc/fstab:

/dev/mapper/xylitol--raidten-gitlabrunner /vm/gitlabrunner ext4 defaults 0 2

Next, we need the volume to be mounted as the UID and GID which will be root inside the container (here, 100000). We will use debugfs to do this:

debugfs -w -R 'set_inode_field . uid 100000' /dev/mapper/xylitol--raidten-gitlabrunner
debugfs -w -R 'set_inode_field . gid 100000' /dev/mapper/xylitol--raidten-gitlabrunner

See this post if you're interested in knowing what these commands are doing.

Now we're ready to mount the volume:

mount /vm/gitlabrunner

Use ls to make sure that the volume was indeed mounted as the subuid root, not the real root:

ls -ld /vm/gitlabrunner

Now switch to the dummy user and copy the default LXC conf file:

su - lxcuser0
cp /etc/lxc/default.conf .

Add the following lines to your copy of default.conf (replace the values with whatever you found in /etc/subuid and /etc/subgid, respectively):

lxc.idmap = u 0 100000 65536
lxc.idmap = g 0 100000 65536

This is necessary to create the rootfs with the correct file ownerships.

Now, as the dummy user, open a tmux or screen session. This is necessary to avoid some weird TTY permission error.

Inside the tmux session, run the following:

lxc-create -f default.conf -t download -n gitlabrunner --dir=/vm/gitlabrunner -- -d debian -r buster -a amd64

Now exit from the tmux session, and open ~/.local/share/lxc/gitlabrunner/config. Add the following lines to it:

lxc.include = /usr/share/lxc/config/nesting.conf
lxc.mount.auto = proc:mixed sys:ro cgroup:mixed
lxc.apparmor.profile = unconfined
lxc.start.auto = 1

Now switch back to the root user and install the cgroup tools:

apt install cgroup-tools

The idea is to create a new cgroup under which the unprivileged container will run, so that Docker (inside the container) can create new cgroups as necessary. Note: this method uses cgroups v1, which are going away soon. I tried to use cgroups v2 but I kept on running into some cgroup permissions error. If you figure out a way to use cgroups v2, please update the instructions here.

Paste the following into /root/bin/lxc-unprivileged-autostart.sh:

#!/bin/bash

CGROUP_OWNER=lxcuser0
CGROUP_NAME=lxcgroup0
CGROUP_CONTROLLERS=rdma,cpuset,memory,perf_event,devices,pids,blkio,freezer,net_cls,net_prio,cpu,cpuacct
NUM_CPUS=2

cgcreate \
  -t $CGROUP_OWNER:$CGROUP_OWNER \
  -a $CGROUP_OWNER:$CGROUP_OWNER \
  -g "$CGROUP_CONTROLLERS:$CGROUP_NAME"
# cpuset controller needs to be initialized
echo 0 > /sys/fs/cgroup/cpuset/$CGROUP_NAME/cpuset.mems
echo 0-$(( $NUM_CPUS - 1 )) > /sys/fs/cgroup/cpuset/$CGROUP_NAME/cpuset.cpus
su -c "cgexec -g '$CGROUP_CONTROLLERS:$CGROUP_NAME' lxc-autostart" $CGROUP_OWNER

Change NUM_CPUS to however many CPU cores you wish to be available inside the container. You may have noticed that hugetlb is missing from the list of controllers; this is because the hugetlb controller is not mounted on xylitol as of this writing. Feel free to add it to the list if this no longer the case.

Make the script executable:

chmod +x bin/lxc-unprivileged-autostart.sh

Now paste the following into /etc/systemd/system/lxc-unprivileged-autostart.service:

[Unit]
Description=Autostart unprivileged LXC Containers
Requires=lxc.service
After=lxc.service

[Service]
Type=oneshot
ExecStart=/root/bin/lxc-unprivileged-autostart.sh
RemainAfterExit=true

[Install]
WantedBy=multi-user.target

Then run:

systemctl daemon-reload
systemctl enable lxc-unprivileged-autostart.service
systemctl start lxc-unprivileged-autostart.service

Now switch to the dummy user and make sure that the container is running:

su - lxcuser0
lxc-ls -f

Run Docker in the unprivileged container

We are going to use the fuse-overlayfs storage driver because unprivileged users cannot mount overlay directories on Debian. It is not as fast as the overlay driver, but it much better than vfs. Make sure to read this. Next, insert the fuse module and make sure it loads automatically at boot time:

modprobe fuse
echo fuse >> /etc/modules

Stop the container, and add the following line to /home/lxcuser0/.local/share/lxc/gitlabrunner/config:

lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file 0 0

(source: here) Then, start the container (using systemctl, as shown above), attach to it, and install fuse-overlayfs. Next, install Docker. Insert the following into /etc/docker/daemon.json:

{
    "storage-driver": "fuse-overlayfs"
}

Restart docker (systemctl restart docker).
Finally, run docker run --rm hello-world to make sure that everything is working correctly.