Virtualization (LXC Containers): Difference between revisions
mNo edit summary |
|||
(10 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
UPDATE: as of Debian buster and later, many systemd services will break horribly in LXC due to namespacing issues. I suggest using either [[Podman]] or systemd-nspawn instead. |
|||
As of Fall 2009, we use [http://lxc.sourceforge.net/ Linux containers] to maintain virtual machines, most notably [[Machine_List#caffeine|caffeine]], which is hosted on [[Machine_List#glomag|glomag]]. The various commands to manipulate Linux containers are prefixed with "lxc-"; see their individual manpages for usage. |
As of Fall 2009, we use [http://lxc.sourceforge.net/ Linux containers] to maintain virtual machines, most notably [[Machine_List#caffeine|caffeine]], which is hosted on [[Machine_List#glomag|glomag]]. The various commands to manipulate Linux containers are prefixed with "lxc-"; see their individual manpages for usage. |
||
Line 139: | Line 141: | ||
<pre># tar --numeric-owner -xzvf container.tar.gz -C /</pre> |
<pre># tar --numeric-owner -xzvf container.tar.gz -C /</pre> |
||
If you are migrating from an old version of LXC onto a newer one (e.g. migrating onto xylitol), update the config: |
|||
<pre># lxc-update-config -c /vm/container/config</pre> |
|||
This will also create a config.backup; you should inspect the new config file to make sure the migration was successful. |
|||
Verify the container's existence: |
Verify the container's existence: |
||
Line 149: | Line 158: | ||
<pre># lxc-start -d -n container</pre> |
<pre># lxc-start -d -n container</pre> |
||
And test by trying an ssh in! |
And test by trying an ssh in! |
||
== merenber's guide to unprivileged LXC containers == |
|||
Prerequisite reading: https://wiki.debian.org/LXC#Privileged_Vs._Unprivileged_Containers |
|||
With unprivileged containers, UIDs and GIDs in the container map to a different set of UIDs/GIDs on the host. |
|||
This is very important if you wish to use nested virtualization (i.e. container inside a container), because |
|||
it is [https://github.com/lxc/lxd/issues/7085 dangerous to use nested virtualization in a privileged container]. |
|||
The following is a guide to setting up unprivileged containers with cgroup delegation, i.e. processes inside |
|||
the container can create new cgroups. This is useful if, for example, you wish to run Docker in an LXC container. |
|||
If you do not need cgroup delegation, just ignore the cgroup-specific steps. |
|||
First, we need to enable unprivileged user namespaces, which are disabled by default on Debian. Add the following |
|||
line to /etc/sysctl.conf: |
|||
<pre> |
|||
kernel.unprivileged_userns_clone = 1 |
|||
</pre> |
|||
Then run: |
|||
<pre> |
|||
sysctl -p |
|||
</pre> |
|||
Now we are going to create a dummy user under which the unprivileged containers will run. Note that it is |
|||
[https://linuxcontainers.org/lxc/getting-started/#creating-unprivileged-containers-as-root possible to create unprivileged containers as root]; |
|||
however, I wasn't able to get it to work. Some kind of permissions error |
|||
when lxc-start tried to mount the rootfs. If you do find a way, please add the instructions here. |
|||
<pre> |
|||
useradd -s /bin/bash -m lxcuser0 |
|||
</pre> |
|||
Make sure that the newly created user has subuid and subgid entries: |
|||
<pre> |
|||
cat /etc/subuid |
|||
cat /etc/subgid |
|||
</pre> |
|||
For example, /etc/subuid could look like the following: |
|||
<pre> |
|||
lxcuser0:100000:65536 |
|||
</pre> |
|||
This means that UIDs 0-65535 in the container will be mapped to UIDs 100000-165535 on the host. |
|||
Next, make sure that the user is allowed to create new veth interfaces: |
|||
<pre> |
|||
echo 'lxcuser0 veth br0 10' >> /etc/lxc/lxc-usernet |
|||
</pre> |
|||
Create a new logical LVM volume, as shown in the guide above (replace 'gitlabrunner' by the container name): |
|||
<pre> |
|||
lvcreate -L 10G -n gitlabrunner xylitol-raidten |
|||
mkfs.ext4 /dev/mapper/xylitol--raidten-gitlabrunner |
|||
mkdir /vm/gitlabrunner |
|||
</pre> |
|||
Add the following to /etc/fstab: |
|||
<pre> |
|||
/dev/mapper/xylitol--raidten-gitlabrunner /vm/gitlabrunner ext4 defaults 0 2 |
|||
</pre> |
|||
Next, we need the volume to be mounted as the UID and GID which will be root inside the container (here, 100000). |
|||
We will use debugfs to do this: |
|||
<pre> |
|||
debugfs -w -R 'set_inode_field . uid 100000' /dev/mapper/xylitol--raidten-gitlabrunner |
|||
debugfs -w -R 'set_inode_field . gid 100000' /dev/mapper/xylitol--raidten-gitlabrunner |
|||
</pre> |
|||
See [https://unix.stackexchange.com/questions/586874/mounting-as-user-a-loop-still-assigns-root-ownership this] |
|||
post if you're interested in knowing what these commands are doing. |
|||
Now we're ready to mount the volume: |
|||
<pre> |
|||
mount /vm/gitlabrunner |
|||
</pre> |
|||
Use <code>ls</code> to make sure that the volume was indeed mounted as the subuid root, not the real root: |
|||
<pre> |
|||
ls -ld /vm/gitlabrunner |
|||
</pre> |
|||
Now switch to the dummy user and copy the default LXC conf file: |
|||
<pre> |
|||
su - lxcuser0 |
|||
cp /etc/lxc/default.conf . |
|||
</pre> |
|||
Add the following lines to your copy of default.conf (replace the values with whatever you found in |
|||
/etc/subuid and /etc/subgid, respectively): |
|||
<pre> |
|||
lxc.idmap = u 0 100000 65536 |
|||
lxc.idmap = g 0 100000 65536 |
|||
</pre> |
|||
This is necessary to create the rootfs with the correct file ownerships. |
|||
Now, as the dummy user, open a tmux or screen session. This is necessary to avoid some |
|||
[https://github.com/lxc/lxc/issues/3163 weird TTY permission error]. |
|||
Inside the tmux session, run the following: |
|||
<pre> |
|||
lxc-create -f default.conf -t download -n gitlabrunner --dir=/vm/gitlabrunner -- -d debian -r buster -a amd64 |
|||
</pre> |
|||
Now exit from the tmux session, and open <code>~/.local/share/lxc/gitlabrunner/config</code>. Add the following |
|||
lines to it: |
|||
<pre> |
|||
lxc.include = /usr/share/lxc/config/nesting.conf |
|||
lxc.mount.auto = proc:mixed sys:ro cgroup:mixed |
|||
lxc.apparmor.profile = unconfined |
|||
lxc.start.auto = 1 |
|||
</pre> |
|||
Now switch back to the root user and install the cgroup tools: |
|||
<pre> |
|||
apt install cgroup-tools |
|||
</pre> |
|||
The idea is to create a new cgroup under which the unprivileged container will run, so that Docker (inside the |
|||
container) can create new cgroups as necessary. <b>Note</b>: this method uses cgroups v1, which are going away |
|||
soon. I tried to use cgroups v2 but I kept on running into some cgroup permissions error. If you figure out |
|||
a way to use cgroups v2, please update the instructions here. |
|||
Paste the following into /root/bin/lxc-unprivileged-autostart.sh: |
|||
<pre> |
|||
#!/bin/bash |
|||
CGROUP_OWNER=lxcuser0 |
|||
CGROUP_NAME=lxcgroup0 |
|||
CGROUP_CONTROLLERS=rdma,cpuset,memory,perf_event,devices,pids,blkio,freezer,net_cls,net_prio,cpu,cpuacct |
|||
NUM_CPUS=2 |
|||
cgcreate \ |
|||
-t $CGROUP_OWNER:$CGROUP_OWNER \ |
|||
-a $CGROUP_OWNER:$CGROUP_OWNER \ |
|||
-g "$CGROUP_CONTROLLERS:$CGROUP_NAME" |
|||
# cpuset controller needs to be initialized |
|||
echo 0 > /sys/fs/cgroup/cpuset/$CGROUP_NAME/cpuset.mems |
|||
echo 0-$(( $NUM_CPUS - 1 )) > /sys/fs/cgroup/cpuset/$CGROUP_NAME/cpuset.cpus |
|||
su -c "cgexec -g '$CGROUP_CONTROLLERS:$CGROUP_NAME' lxc-autostart" $CGROUP_OWNER |
|||
</pre> |
|||
Change NUM_CPUS to however many CPU cores you wish to be available inside the container. You may have noticed |
|||
that hugetlb is missing from the list of controllers; this is because the hugetlb controller is not mounted |
|||
on xylitol as of this writing. Feel free to add it to the list if this no longer the case. |
|||
Make the script executable: |
|||
<pre> |
|||
chmod +x bin/lxc-unprivileged-autostart.sh |
|||
</pre> |
|||
Now paste the following into <code>/etc/systemd/system/lxc-unprivileged-autostart.service</code>: |
|||
<pre> |
|||
[Unit] |
|||
Description=Autostart unprivileged LXC Containers |
|||
Requires=lxc.service |
|||
After=lxc.service |
|||
[Service] |
|||
Type=oneshot |
|||
ExecStart=/root/bin/lxc-unprivileged-autostart.sh |
|||
RemainAfterExit=true |
|||
[Install] |
|||
WantedBy=multi-user.target |
|||
</pre> |
|||
Then run: |
|||
<pre> |
|||
systemctl daemon-reload |
|||
systemctl enable lxc-unprivileged-autostart.service |
|||
systemctl start lxc-unprivileged-autostart.service |
|||
</pre> |
|||
Now switch to the dummy user and make sure that the container is running: |
|||
<pre> |
|||
su - lxcuser0 |
|||
lxc-ls -f |
|||
</pre> |
|||
=== Run Docker in the unprivileged container === |
|||
We are going to use the fuse-overlayfs storage driver because unprivileged users cannot mount overlay directories on Debian. It is not as fast as the overlay driver, but it much better than vfs. Make sure to read [https://docs.docker.com/engine/security/rootless/ this]. Next, insert the fuse module and make sure it loads automatically at boot time: |
|||
<pre> |
|||
modprobe fuse |
|||
echo fuse >> /etc/modules |
|||
</pre> |
|||
Stop the container, and add the following line to <code>/home/lxcuser0/.local/share/lxc/gitlabrunner/config</code>: |
|||
<pre> |
|||
lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file 0 0 |
|||
</pre> |
|||
(source: [https://discuss.linuxcontainers.org/t/security-of-fuse-overlayfs-with-lxc-unprivileged/10145 here]) |
|||
Then, start the container (using systemctl, as shown above), attach to it, and install fuse-overlayfs. |
|||
Next, [https://docs.docker.com/engine/install/debian/ install Docker]. |
|||
Insert the following into <code>/etc/docker/daemon.json</code>: |
|||
<pre> |
|||
{ |
|||
"storage-driver": "fuse-overlayfs" |
|||
} |
|||
</pre> |
|||
Restart docker (systemctl restart docker). |
|||
<br> |
|||
Finally, run <code>docker run --rm hello-world</code> to make sure that everything is working correctly. |
|||
[[Category:Software]] |
[[Category:Software]] |
Latest revision as of 23:23, 28 October 2021
UPDATE: as of Debian buster and later, many systemd services will break horribly in LXC due to namespacing issues. I suggest using either Podman or systemd-nspawn instead.
As of Fall 2009, we use Linux containers to maintain virtual machines, most notably caffeine, which is hosted on glomag. The various commands to manipulate Linux containers are prefixed with "lxc-"; see their individual manpages for usage.
Management Quick Guide
To manage containers, use the lxc-* tools, which require root privilege. Some examples (replace caffeine with the appropriate container name):
# check if caffeine is running lxc-info -n caffeine # start caffeine in the background lxc-start -d -n caffeine # stop caffeine gracefully lxc-halt -n caffeine # stop caffeine forcefully lxc-stop -n caffeine # launch a TTY console for the container lxc-console -n caffeine
To install Linux container support on a recent Debian (squeeze or newer) system:
- Install the lxc and bridge-utils packages.
- Create a bridged network interface (this can be configured in /etc/network/interfaces as though it were a normal Ethernet device, with the additional bridge_ports parameter. This is usually called br0 (can be created manually with brctl). LXC will create a virtual Ethernet device and add it to the bridge when each container starts.
To start caffeine, run the following command as root on glomag:
lxc-start -d -n caffeine
Containers are stored on the host filesystem in /var/lib/lxc (root filesystems are symlinked to the appropriate directory on /vm).
ehashman's Guide to LXC on Debian
Configuring the host machine
First, install all required packages:
# apt-get install lxc bridge-utils
Setting up ethernet bridging
Next, create an ethernet bridge for the container. Edit /etc/network/interfaces
:
# The primary network interface #auto eth0 #iface eth0 inet static # address 129.97.134.200 # netmask 255.255.255.0 # gateway 129.97.134.1 # Bridge ethernet for containers auto br0 iface br0 inet static bridge_ports eth0 address 129.97.134.200 netmask 255.255.255.0 gateway 129.97.134.1 dns-nameservers 129.97.2.1 129.97.2.2 dns-search wics.uwaterloo.ca uwaterloo.ca
Cross your fingers and restart networking for your configuration to take effect!
# ifdown br0 && ifup br0 // bash enter to see if you lost connectivity and have to make a machine room trip
Note: !!! Do not use !!!
# service networking restart
The init scripts are broken and this likely will result in a machine room trip (or IPMI power cycle).
Setting up storage
Last, allocate some space in your volume group to put the container root on:
// Find the correct volume group to put the container on # vgdisplay // Create the volume in the appropriate volume group # lvcreate -L 20G -n container vg0 // Find it in the dev mapper # ls /dev/mapper/ // Create a filesystem on it # mkfs.ext4 /dev/mapper/vg0-container // Add a mount point # mkdir /vm/container
Last, add it to /etc/fstab
:
/dev/mapper/vg0-container /vm/container ext4 defaults 0 2
Test the entry with mount
:
# mount /vm/container
Now you're done!
Creating a new container
Create a new container using lxc-create
:
// Create new container "container" with root fs located at /vm/container # lxc-create --dir=/vm/container -n container --template download
This will prompt you for distribution, release, and architecture. (Architecture must match host machine.)
Take this time to review its config in /var/lib/lxc/container/config
, and tell it to auto-start if you like:
# Auto-start the container on boot lxc.start.auto = 1
You'll also want to set up networking (if applicable):
# Networking lxc.network.type = veth lxc.network.flags = up lxc.network.link = br0 lxc.network.name = eth0 lxc.network.hwaddr = de:ad:be:ef:ba:be # or something sensible
Now,
// List containers, -f for fancy # lxc-ls -f
to ensure that your container has been successfully created; it should be listed. You can also list its root directory if you like. To start it in the background and obtain a root shell, do
// Start and attach a root shell # lxc-start -d -n container # lxc-attach -n container
Migrating a container between hosts
Start by shutting the container down:
root@container:~# halt
Then make a tarball of the container's filesystem:
# tar --numeric-owner -czvf container.tar.gz /vm/container
Copy it to its target destination, along with the configs:
$ scp container.tar.gz new-host: $ scp -r /var/lib/lxc/container/ new-host:/var/lib/lxc/
Now carefully extract it. If you haven't already, provision storage and ethernet per the container creation section.
Yes, we really do want to stick it directly into /
:
# tar --numeric-owner -xzvf container.tar.gz -C /
If you are migrating from an old version of LXC onto a newer one (e.g. migrating onto xylitol), update the config:
# lxc-update-config -c /vm/container/config
This will also create a config.backup; you should inspect the new config file to make sure the migration was successful.
Verify the container's existence:
# lxc-ls -f NAME STATE IPV4 IPV6 AUTOSTART ----------------------------------------- container STOPPED - - YES
Now just start it on up:
# lxc-start -d -n container
And test by trying an ssh in!
merenber's guide to unprivileged LXC containers
Prerequisite reading: https://wiki.debian.org/LXC#Privileged_Vs._Unprivileged_Containers
With unprivileged containers, UIDs and GIDs in the container map to a different set of UIDs/GIDs on the host. This is very important if you wish to use nested virtualization (i.e. container inside a container), because it is dangerous to use nested virtualization in a privileged container. The following is a guide to setting up unprivileged containers with cgroup delegation, i.e. processes inside the container can create new cgroups. This is useful if, for example, you wish to run Docker in an LXC container. If you do not need cgroup delegation, just ignore the cgroup-specific steps.
First, we need to enable unprivileged user namespaces, which are disabled by default on Debian. Add the following line to /etc/sysctl.conf:
kernel.unprivileged_userns_clone = 1
Then run:
sysctl -p
Now we are going to create a dummy user under which the unprivileged containers will run. Note that it is possible to create unprivileged containers as root; however, I wasn't able to get it to work. Some kind of permissions error when lxc-start tried to mount the rootfs. If you do find a way, please add the instructions here.
useradd -s /bin/bash -m lxcuser0
Make sure that the newly created user has subuid and subgid entries:
cat /etc/subuid cat /etc/subgid
For example, /etc/subuid could look like the following:
lxcuser0:100000:65536
This means that UIDs 0-65535 in the container will be mapped to UIDs 100000-165535 on the host.
Next, make sure that the user is allowed to create new veth interfaces:
echo 'lxcuser0 veth br0 10' >> /etc/lxc/lxc-usernet
Create a new logical LVM volume, as shown in the guide above (replace 'gitlabrunner' by the container name):
lvcreate -L 10G -n gitlabrunner xylitol-raidten mkfs.ext4 /dev/mapper/xylitol--raidten-gitlabrunner mkdir /vm/gitlabrunner
Add the following to /etc/fstab:
/dev/mapper/xylitol--raidten-gitlabrunner /vm/gitlabrunner ext4 defaults 0 2
Next, we need the volume to be mounted as the UID and GID which will be root inside the container (here, 100000). We will use debugfs to do this:
debugfs -w -R 'set_inode_field . uid 100000' /dev/mapper/xylitol--raidten-gitlabrunner debugfs -w -R 'set_inode_field . gid 100000' /dev/mapper/xylitol--raidten-gitlabrunner
See this post if you're interested in knowing what these commands are doing.
Now we're ready to mount the volume:
mount /vm/gitlabrunner
Use ls
to make sure that the volume was indeed mounted as the subuid root, not the real root:
ls -ld /vm/gitlabrunner
Now switch to the dummy user and copy the default LXC conf file:
su - lxcuser0 cp /etc/lxc/default.conf .
Add the following lines to your copy of default.conf (replace the values with whatever you found in /etc/subuid and /etc/subgid, respectively):
lxc.idmap = u 0 100000 65536 lxc.idmap = g 0 100000 65536
This is necessary to create the rootfs with the correct file ownerships.
Now, as the dummy user, open a tmux or screen session. This is necessary to avoid some weird TTY permission error.
Inside the tmux session, run the following:
lxc-create -f default.conf -t download -n gitlabrunner --dir=/vm/gitlabrunner -- -d debian -r buster -a amd64
Now exit from the tmux session, and open ~/.local/share/lxc/gitlabrunner/config
. Add the following
lines to it:
lxc.include = /usr/share/lxc/config/nesting.conf lxc.mount.auto = proc:mixed sys:ro cgroup:mixed lxc.apparmor.profile = unconfined lxc.start.auto = 1
Now switch back to the root user and install the cgroup tools:
apt install cgroup-tools
The idea is to create a new cgroup under which the unprivileged container will run, so that Docker (inside the container) can create new cgroups as necessary. Note: this method uses cgroups v1, which are going away soon. I tried to use cgroups v2 but I kept on running into some cgroup permissions error. If you figure out a way to use cgroups v2, please update the instructions here.
Paste the following into /root/bin/lxc-unprivileged-autostart.sh:
#!/bin/bash CGROUP_OWNER=lxcuser0 CGROUP_NAME=lxcgroup0 CGROUP_CONTROLLERS=rdma,cpuset,memory,perf_event,devices,pids,blkio,freezer,net_cls,net_prio,cpu,cpuacct NUM_CPUS=2 cgcreate \ -t $CGROUP_OWNER:$CGROUP_OWNER \ -a $CGROUP_OWNER:$CGROUP_OWNER \ -g "$CGROUP_CONTROLLERS:$CGROUP_NAME" # cpuset controller needs to be initialized echo 0 > /sys/fs/cgroup/cpuset/$CGROUP_NAME/cpuset.mems echo 0-$(( $NUM_CPUS - 1 )) > /sys/fs/cgroup/cpuset/$CGROUP_NAME/cpuset.cpus su -c "cgexec -g '$CGROUP_CONTROLLERS:$CGROUP_NAME' lxc-autostart" $CGROUP_OWNER
Change NUM_CPUS to however many CPU cores you wish to be available inside the container. You may have noticed that hugetlb is missing from the list of controllers; this is because the hugetlb controller is not mounted on xylitol as of this writing. Feel free to add it to the list if this no longer the case.
Make the script executable:
chmod +x bin/lxc-unprivileged-autostart.sh
Now paste the following into /etc/systemd/system/lxc-unprivileged-autostart.service
:
[Unit] Description=Autostart unprivileged LXC Containers Requires=lxc.service After=lxc.service [Service] Type=oneshot ExecStart=/root/bin/lxc-unprivileged-autostart.sh RemainAfterExit=true [Install] WantedBy=multi-user.target
Then run:
systemctl daemon-reload systemctl enable lxc-unprivileged-autostart.service systemctl start lxc-unprivileged-autostart.service
Now switch to the dummy user and make sure that the container is running:
su - lxcuser0 lxc-ls -f
Run Docker in the unprivileged container
We are going to use the fuse-overlayfs storage driver because unprivileged users cannot mount overlay directories on Debian. It is not as fast as the overlay driver, but it much better than vfs. Make sure to read this. Next, insert the fuse module and make sure it loads automatically at boot time:
modprobe fuse echo fuse >> /etc/modules
Stop the container, and add the following line to /home/lxcuser0/.local/share/lxc/gitlabrunner/config
:
lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file 0 0
(source: here)
Then, start the container (using systemctl, as shown above), attach to it, and install fuse-overlayfs.
Next, install Docker.
Insert the following into /etc/docker/daemon.json
:
{ "storage-driver": "fuse-overlayfs" }
Restart docker (systemctl restart docker).
Finally, run docker run --rm hello-world
to make sure that everything is working correctly.