Mirror
The Computer Science Club runs a public mirror (mirror.csclub.uwaterloo.ca) on potassium-benzoate.
We are listed on the ResNet "don't count" list, so downloading from our mirror will not count against one's ResNet quota.
Software Mirrored
A list of current archives (and their respective disk usage) is listed on our mirror's homepage at mirror.csclub.uwaterloo.ca.
Mirroring Requests
Requests to mirror a particular distribution or archive should be made to syscom@csclub.uwaterloo.ca.
Implementation Details
Syncing
Storage
All of our projects are stored on one of two zfs zpools. There are 8 drives per array, configured as raidz2, and there is an additional drive that can be swapped in (in the event of a disk failure).
/mirror/root/.cscmirror1
/mirror/root/.cscmirror2
Each project is given a filesystem under one of the two pools. Symlinks are created /mirror/root
to point to the correct pool and file system.
Merlin
The synchronization process is run by a Python script called "merlin", written by a2brenna. The script is stored in ~mirror/merlin
.
The list of repositories and their configuration (synch frequency, location, etc.) is configured in merlin.py
.
To view the sync status, execute ~mirror/merlin/arthur.py status
. To force the sync of a project, execute ~mirror/merlin/arthur.py sync:PROJECT_NAME
.
Push Sync
Some projects support push syncing via SSH.
We are running a special SSHD instance on mirror.csclub.uwaterloo.ca:22. This instance has been locked down, with the following settings:
- Only SSH key authentication
- Only users of the
push
group (exceptmirror
) are allowed to connect - X11 Forwarding, TCP Forwarding, Agent Forwarding, User RC and TTY are disabled
- Users are chrooted to
/mirror/merlin
Most projects will connect using the push
user. The SSH authorized keys file is located at /home/push/.ssh/authorized_keys
. An example entry is:
restrict,no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="arthur sync:ubuntu >/dev/null 2>/dev/null </dev/null &",from="XXX.XXX.XXX.XXX" ssh-rsa ...
Sync Scripts
Our collection of synchronization scripts are located in ~mirror/bin
. They currently include:
csc-sync-apache
csc-sync-debian
csc-sync-debian-cd
csc-sync-gentoo
csc-sync-ssh
csc-sync-standard
Most of these scripts take the following parameters:
local_dir rsync_host rsync_dir
HTTP(s)
We use nginx as our webserver.
Index
An index of the archives we mirror is available at mirror.csclub.uwaterloo.ca.
As of Winter 2010, it is now generated by a Python script in ~mirror/mirror-index
.
~mirror/mirror-index/make-index
is scheduled in /etc/cron.d/csc-mirror
to be run at 5:40am on the 14th and 28th of each month. The script can be run manually when needed (for example, when the archive list is updated) by running:
sudo -u mirror /home/mirror/mirror-index/make-index.py
This causes an instance of du
which computes the size of each directory. This list is then sorted alphabetically by directory name and returned to the Python script. If any errors occur during this process, the script conservatively chooses to exit rather than risk generating an index file that is incorrect.
make-index.py
is configured by means of a YAML file, config.yaml
, in the same directory. Its format is as follows:
docroot: /mirror/root duflags: --human-readable --max-depth=1 output: /mirror/root/index.html exclude: - include - lost+found - pub # (...) directories: apache: site: apache.org url: http://www.apache.org/ archlinux: site: archlinux.org url: http://www.archlinux.org/ # (...)
The docroot is the directory which is to be scanned; this will probably always be the mirror root from which Apache serves. duflags
specifies the flags to be passed to du
. This is here so that it's easy to find and alter. For instance, we could change --human-readable
to --si
if we ever decided that, like hard disk manufacturers, we want sizes to appear larger than they are. output
defines the file to which the generated index will be written.
exclude
specifies the list of directories which will not be included in the generated index page (since, by default, all folders are included in the generated index page).
Finally, directories
specifies the list of directories to be listed. The format is fairly straightforward: simply name the directory and provide a site (the display name in the "Project Site" column) and URL. One caveat here is that YAML does not allow tabs for whitespace. Indent with two spaces to remain consistent with the existing file format, please. Also note that the directory name is case-sensitive, as is always the case on Unix.
Finally, the HTML index file is generated from index.mako
, a Mako template (which is mostly HTML anyhow). If you really can't figure out how it works, look up the Mako documentation.
FTP
We use proftpd (standalone daemon) as our FTP server.
To increase performance, we disable DNS lookups in proftpd.conf
:
UseReverseDNS off IdentLookups off
We also limit the amount of CPU/memory resources used (e.g. to minimize Globbing resources):
RLimitCPU session 10 RLimitMemory session 4096K
We allow a maximum of 500 concurrent FTP sessions:
MaxInstances 500 MaxClients 500
The contents of /mirror/root/include/motd.msg
are displayed when a user connects.
rsync
We use rsyncd
(standalone daemon).
We disable compression and checksumming in rsyncd.conf
:
dont compress = * refuse options = c delete
The contents of /mirror/root/include/motd.msg
are displayed when a user connects.