Difference between revisions of "Mirror"

From CSCWiki
Jump to navigation Jump to search
(updated crontab dump)
(add temporary update message for merlin-go)
 
(61 intermediate revisions by 6 users not shown)
Line 1: Line 1:
We currently run a public mirror ([http://mirror.csclub.uwaterloo.ca/ mirror.csclub.uwaterloo.ca]) on [[Machine_List#sodium-benzoate|sodium-benzoate]]. We are listed on the ResNet [http://noc.uwaterloo.ca/cn/Stats/resReport "don't count" list] so downloading from our mirror will not count against one's ResNet quota. Requests to mirror a particular distribution or archive should be made to syscom@csclub.uwaterloo.ca. We also have a [http://cacti.csclub.uwaterloo.ca/graph_image.php?action=view&local_graph_id=1560&rra_id=1&graph_height=120&graph_width=440 bandwidth graph] you can look at.
+
The [https://csclub.uwaterloo.ca Computer Science Club] runs a public mirror ([http://mirror.csclub.uwaterloo.ca mirror.csclub.uwaterloo.ca]) on [[Machine_List#potassium-benzoate|potassium-benzoate]].
  
== Archives Mirrored ==
+
''We are listed on the ResNet "don't count" list, so downloading from our mirror will not count against one's ResNet quota.''
  
'''Total Size:''' 4.4 TiB
+
== Software Mirrored ==
  
{| class="wikitable"
+
A list of current archives (and their respective disk usage) is listed on our mirror's homepage at [http://mirror.csclub.uwaterloo.ca mirror.csclub.uwaterloo.ca].
!style="width: 12em"| Archive ||style="width: 31em"| Upstream/Listing ||style="width: 5em"| Size || Notes
 
|-
 
| Apache || rsync://rsync.us.apache.org/apache-dist/<br/>http://www.apache.org/mirrors/ || 23 GiB || -
 
|-
 
| Arch Linux || rsync://mirror.rit.edu/archlinux<br/>http://www.archlinux.org/download/<br/>http://wiki.archlinux.org/index.php/Mirrors#Mirror_List || 75 GiB || -
 
|-
 
| Blastwave || rsync://www.ibiblio.org/sun-packages/csw/<br/>http://www.blastwave.org/mirrors.php || 11 GiB || -
 
|-
 
| CentOS || rsync://us-msync.centos.org/CentOS/<br/>http://mirror-status.centos.org/ || 131 GiB || -
 
|-
 
| CPAN || rsync://rsync.nic.funet.fi/CPAN/<br/>http://mirror.cpan.org/ || 6.8 GiB || -
 
|-
 
| CRAN || rsync://cran.r-project.org/CRAN/ || 41 GiB || '''We should get added as an official mirror.'''
 
|-
 
| CTAN || rsync://carroll.aset.psu.edu/ctan/<br/>http://www.dante.de/cgi-bin/ctan/list.cgi<br/>http://www.dante.de/mirmon/ || 16 GiB || -
 
|-
 
| Cygwin || rsync://cygwin.com/cygwin-ftp/<br/>http://www.cygwin.com/mirrors.html || 11 GiB || -
 
|-
 
| Damn Small Linux || rsync://ftp.belnet.be/packages/damnsmalllinux/ || 18 GiB || '''Sent a request to be added.''' Maybe contact the dude listed here: http://www.damnsmalllinux.org/donate.html.
 
|-
 
| Debian || rsync://debian.mirror.rafal.ca/debian/<br/>http://www.debian.org/mirror/list<br/>http://www.de.debian.org/dmc/today/ || 446 GiB || Requested to be added to the ftp.ca.debian.org rotation; will be added if the need arises..
 
|-
 
| Debian-backports || rsync://www.backports.org/backports.org/<br/>http://www.backports.org/debian/README.mirrors.html || 30 GiB || Submitted a request to be added to the mirror list long ago with no response.
 
|-
 
| Debian-cd || rsync://cdimage.debian.org/debian-cd/ || 70 GiB || Mirror only the first CD and DVD, and all small CD's (netinst, business-card, etc...).
 
|-
 
| Debian-multimedia || rsync://www.debian-multimedia.org/debian/<br/>http://www.debian-multimedia.org/debian-m.php || 5.3 GiB || -
 
|-
 
| Debian-ports || rsync://ftp.debian-ports.org/debian/ || 124G || -
 
|-
 
| Debian-security || rsync://security.debian.org/debian-security/ || 36 GiB || Debian does not currently list debian-security mirrors, and encourages users to use security.debian.org exclusively.
 
|-
 
| Debian-unofficial || rsync://ftp.debian-unofficial.org/debian/<br/>http://www.debian-unofficial.org/mirrors.html || 970 MiB || -
 
|-
 
| Debian-volatile || rsync://volatile.debian.org/debian-volatile/<br/>http://www.debian.org/volatile/volatile-mirrors || 3.3 GiB || -
 
|-
 
| Eclipse || rsync://download.eclipse.org/eclipseMirror<br/>http://www.eclipse.org/downloads/download.php?file=/ || 226 GiB || -
 
|-
 
| Emdebian || rsync://www.emdebian.org/emdebian/ || 3.7 GiB || Emdebian does not currently list mirrors.
 
|-
 
| Fedora || http://mirrors.fedoraproject.org/publiclist<br/>https://admin.fedoraproject.org/mirrormanager/site/647<br/>rsync://ftp.muug.mb.ca/pub/fedora/linux/core/ || 548 GiB || -
 
|-
 
| FreeBSD || rsync://ftp1.ca.freebsd.org/ || 673 GiB || '''We should get added as an official mirror.'''
 
|-
 
| Gentoo (portage) || rsync://rsync1.us.gentoo.org/gentoo-portage/ || 581 MiB || We are rsync4.ca.gentoo.org.
 
|-
 
| Gentoo (sources) || rsync://masterdistfiles.gentoo.org/gentoo/<br/>http://www.gentoo.org/main/en/mirrors.xml<br/>http://mirrorstats.gentoo.org/ || 159 GiB || See ~sysadmin/passwords/gentoo for rsync password.
 
|-
 
| GNOME || rsync://ftp.gnome.org/gnome/ || 86 GiB || -
 
|-
 
| GNU || rsync://ftp.ibiblio.org/pub/gnu/ftp/gnu/<br/>http://www.gnu.org/order/ftp.html || 20 GiB || -
 
|-
 
| KDE || rsync://master.kde.org/kdeftp/<br/>http://www.kde.org/mirrors/ || 69 GiB || -
 
|-
 
| kernel.org || rsync://kernel.org/pub/linux/<br/>rsync://kernel.org/pub/software/<br/>http://kernel.org/mirrors/countries/html/CA.html || 130 GiB || -
 
|-
 
| Linux Mint (releases) || rsync://ftp.heanet.ie/pub/linuxmint.com/<br/>http://www.linuxmint.com/mirrors.php || 23 GiB || -
 
|-
 
| Linux Mint (packages) || rsync://packages.linuxmint.com/packages/<br/>http://www.linuxmint.com/mirrors.php || 4.1 GiB || -
 
|-
 
| mozdev || rsync://rsync.mozdev.org/mozdev/<br/>http://mirrors.mozdev.org/index.html || 6.0 GiB || Currently in the North American rotation, but could request to be added to the global rotation.
 
|-
 
| mozilla.org || rsync://releases-rsync.mozilla.org/mozilla-releases/<br/>http://www.mozilla.org/mirrors.html || 105 GiB || -
 
|-
 
| MySQL || rsync://mysql.mirrors.pair.com/mysql/<br/>http://dev.mysql.com/downloads/mirrors.html || 148 GiB || -
 
|-
 
| non-GNU || rsync://dl.sv.gnu.org/releases/<br/>http://dl.sv.gnu.org/releases/00_MIRRORS.html<br/>http://download.savannah.gnu.org/mirmon/ || 15 GiB || -
 
|-
 
| Openoffice (extended set) || rsync://rsync.services.openoffice.org/openoffice-extended/<br/>http://distribution.openoffice.org/mirrors/#mirrors<br/>http://www.ooodev.org/mirmon/ || 102 GiB || -
 
|-
 
| OpenSUSE (opensuse-full) || rsync://stage.opensuse.org/opensuse-full/opensuse/<br/>http://mirrors.opensuse.org/list/all.html || 161 GiB || -
 
|-
 
| Slackware || - || 139 GiB || '''We should get added as an official mirror.'''
 
|-
 
| Ubuntu || rsync://archives.ubuntu.com/ubuntu/<br/>https://launchpad.net/ubuntu/+archivemirrors || 286 GiB || '''We used to be the official Canadian mirror (''i.e.'', ca.archive.ubuntu.com); when we get more bandwidth get us added back.'''
 
|-
 
| Ubuntu-ports || rsync://ports.ubuntu.com/ubuntu-ports/ || 394 GiB || -
 
|-
 
| Ubuntu-ports-releases || rsync://cdimage.ubuntu.com/cdimage/ubuntu-ports/releases/ || 36 GiB || -
 
|-
 
| Ubuntu-releases || rsync://releases.ubuntu.com/releases/<br/>https://launchpad.net/ubuntu/+cdmirrors<br/>http://www.ubuntu.com/getubuntu/download<br/>http://www.kubuntu.org/download.php<br/>http://www.edubuntu.org/Download || 39 GiB || Are the official Canadian mirror (''i.e.'', ca.archive.ubuntu.com); ubuntu-releases includes Ubuntu, Kubuntu, and Edubuntu.
 
|-
 
| xorg.freedesktop.org || rsync://xorg.freedesktop.org/xorg-archive/<br/>http://www.x.org/wiki/Releases/Download || 5.3 GiB || -
 
|-
 
| Xubuntu-releases || rsync://cdimage.ubuntu.com/cdimage/xubuntu/releases/<br/>http://www.xubuntu.org/get || 18 GiB || -
 
|-
 
| wine-budgetdedicated || http://wine.budgetdedicated.com/apt/ || 151 MiB || -
 
|}
 
  
== Proposed Archives to Mirror ==
+
=== Mirroring Requests ===
  
* Mandriva (774 GiB)
+
Requests to mirror a particular distribution or archive should be made to [mailto:syscom@csclub.uwaterloo.ca syscom@csclub.uwaterloo.ca].
* OpenBSD (209 GiB)
 
* NetBSD (340 GiB)
 
* PCLinuxOS
 
  
 
== Implementation Details ==
 
== Implementation Details ==
  
The mirroring is done by one of two scripts. Both are based on [http://www.debian.org/mirror/anonftpsync anonftpsync]. Various cronjobs are scheduled to call one of these scripts.
+
=== Syncing ===
  
=== csc-sync-debian ===
+
==== Storage ====
  
This is used to sync debian-style repositories. It's usage is:
+
All of our projects are stored on one of three zfs zpools. There are 8 drives per array (7 run cscmirror3), configured as raidz2, and there is an additional drive that can be swapped in (in the event of a disk failure).
csc-sync-debian local_dir rsync_host rsync_dir [trace_host [trace_dir]]
 
  
If trace_host is specified, then $rsync_dir/project/trace/$trace_host is checked to see if it has changed. If it has, a normal debian-style (two-pass) rsync is done.
+
* <code>/mirror/root/.cscmirror1</code>
 +
* <code>/mirror/root/.cscmirror2</code>
 +
* <code>/mirror/root/.cscmirror3</code>
  
=== csc-sync-standard ===
+
Each project is given a filesystem under one of the two pools. Symlinks are created <code>/mirror/root</code> to point to the correct pool and file system.
  
This is used to sync a tree in a general way. Like anonftpsync, it supports locking and logging. It's usage is:
+
==== Merlin ====
  
csc-sync-standard local_dir rsync_host rsync_dir
+
<nowiki>**</nowiki>'''UPDATE'''**: merlin.py and the sync scripts are currently being merged together into merlin-go which can be found at <code>/home/mirror-go/merlin</code>. The current status can be found using <code>systemctl status merlin-go.service</code> or by going to <code>/home/mirror-go/merlin/cmd/arthur</code> and running <code>./arthur status</code>. To force sync a project execute <code>./arthur sync:PROJECT_NAME</code>.
  
=== Crontab ===
 
  
All cronjobs are listed in mirror's crontab. If csc-sync-debian is used, the cronjob is typically run bi-hourly. When using csc-sync-standard, the frequency of the cronjob is typically 12 hours. The crontab currently looks like this:
+
The synchronization process is run by a Python script called &quot;merlin&quot;, written by a2brenna. The script is stored in <code>~mirror/merlin</code>.
  
# m  h    dom mon dow command
+
The list of repositories and their configuration (synch frequency, location, etc.) is configured in <code>merlin.py</code>.
 
# make torrents
 
  */10 *  *  *  *    /home/mirror/bin/make-torrents > /dev/null 2> /dev/null
 
 
#
 
# bi-hourly
 
#
 
  5  */2  *  *  *  ~/bin/csc-sync-debian debian debian.mirror.rafal.ca ftp-master.debian.org
 
  5  */2  *  *  *  ~/bin/csc-sync-debian debian-multimedia www.debian-multimedia.org debian marillat.net
 
  5  */2  *  *  *  ~/bin/csc-sync-debian debian-backports www.backports.org backports.org www.backports.org
 
  5  */2  *  *  *  ~/bin/csc-sync-debian debian-volatile volatile-master.debian.org debian-volatile volatile-master.debian.org
 
  5  */2  *  *  *  ~/bin/csc-sync-debian debian-security security.debian.org debian-security security-master.debian.org
 
  5  */2  *  *  *  ~/bin/csc-sync-debian debian-unofficial ftp.debian-unofficial.org debian ftp-master.debian-unofficial.org
 
  5  */2  *  *  *  ~/bin/csc-sync-debian ubuntu archive.ubuntu.com ubuntu drescher.canonical.com
 
  5  */2  *  *  *  ~/bin/csc-sync-standard ubuntu-releases rsync.releases.ubuntu.com releases
 
  5  */2  *  *  *  ~/bin/csc-sync-standard xubuntu-releases cdimage.ubuntu.com cdimage/xubuntu/releases/
 
  5  */2  *  *  *  ~/bin/csc-sync-debian ubuntu-ports ports.ubuntu.com ubuntu-ports drescher.canonical.com
 
  5  */2  *  *  *  ~/bin/csc-sync-debian linuxmint-packages packages.linuxmint.com packages
 
 
#
 
# daily
 
#
 
  5  3,15  *  *  *  ~/bin/csc-sync-debian emdebian www.emdebian.org debian
 
  5  3,15  *  *  *  ~/bin/csc-sync-standard CPAN rsync.nic.funet.fi CPAN
 
  5  3,15  *  *  *  ~/bin/csc-sync-standard CRAN cran.r-project.org CRAN
 
  5  3,15  *  *  *  ~/bin/csc-sync-standard CTAN carroll.aset.psu.edu ctan
 
  5  3,15  *  *  *  ~/bin/csc-sync-standard openoffice rsync.services.openoffice.org openoffice-extended
 
#  5  3,15  *  *  *  ~/bin/csc-sync-standard fedora/epel fedora-archives.ibiblio.org fedora-epel && ~/bin/report_mirror >/dev/null
 
  5  4,16  *  *  *  ~/bin/csc-sync-standard cygwin cygwin.com cygwin-ftp
 
  5  4,16  *  *  *  ~/bin/csc-sync-standard gnu ftp.ibiblio.org pub/gnu/ftp/gnu/
 
  5  4,16  *  *  *  ~/bin/csc-sync-standard nongnu dl.sv.gnu.org releases --ignore-errors
 
  5  4,16  *  *  *  ~/bin/csc-sync-standard kernel.org/linux kernel.org all/linux/
 
  5  4,16  *  *  *  ~/bin/csc-sync-standard kernel.org/software kernel.org all/software/
 
  5  4,16  *  *  *  ~/bin/csc-sync-standard apache rsync.us.apache.org apache-dist
 
  5  4.16  *  *  *  ~/bin/csc-sync-standard eclipse download.eclipse.org eclipseMirror
 
  5  5,17  *  *  *  ~/bin/csc-sync-standard mysql mysql.he.net mysql
 
  5  5,17  *  *  *  ~/bin/csc-sync-standard kde master.kde.org kdeftp
 
  5  5,17  *  *  *  ~/bin/csc-sync-standard mozdev rsync.mozdev.org mozdev
 
  5  5,17  *  *  *  ~/bin/csc-sync-standard blastwave master.rsync.blastwave.org blastwave
 
  5  5,17  *  *  *  ~/bin/csc-sync-standard archlinux mirror.rit.edu archlinux
 
  5  5,17  *  *  *  ~/bin/csc-sync-standard debian-ports ftp.debian-ports.org debian --ignore-errors
 
  5  5,17  *  *  *  ~/bin/csc-sync-standard slackware slackware.cs.utah.edu slackware
 
  5  5,17  *  *  *  ~/bin/csc-sync-debian-cd
 
  5  6,18  *  *  *  ~/bin/csc-sync-standard x.org xorg.freedesktop.org xorg-archive
 
  5  6,18  *  *  *  ~/bin/csc-sync-standard gnome ftp.gnome.org gnome
 
  5  6,18  *  *  *  ~/bin/csc-sync-standard centos us-msync.centos.org CentOS
 
  5  6,18  *  *  *  ~/bin/csc-sync-standard opensuse stage.opensuse.org opensuse-full/opensuse/ #"--exclude distribution/.timestamp_invisible"
 
  5  6,18  *  *  *  ~/bin/csc-sync-standard damnsmalllinux ftp.heanet.ie mirrors/damnsmalllinux.org/
 
  5  7,19  *  *  *  ~/bin/csc-sync-standard FreeBSD ftp1.ca.freebsd.org freebsd
 
#  5  7,19  *  *  *  ~/bin/csc-sync-standard fedora/linux fedora-archives.ibiblio.org fedora-enchilada/linux/ --ignore-errors && ~/bin/report_mirror >/dev/null
 
  5  7,19  *  *  *  ~/bin/csc-sync-standard linuxmint ftp.heanet.ie pub/linuxmint.com/
 
  5  7,19  *  *  *  ~/bin/csc-sync-standard ubuntu-ports-releases cdimage.ubuntu.com cdimage/ports/releases/
 
 
#
 
# other
 
#
 
  29 */4  *  *  *  RSYNC_USER=gentoo RSYNC_PASSWORD=******** ~/bin/csc-sync-standard gentoo-distfiles masterdistfiles.gentoo.org gentoo
 
  15,45 *  *  *  *  ~/bin/csc-sync-standard gentoo-portage rsync1.us.gentoo.org gentoo-portage
 
  5,35 *  *  *  *  ~/bin/csc-sync-standard mozilla.org releases-rsync.mozilla.org mozilla-releases
 
  
=== HTTP ===
+
To view the sync status, execute <code>~mirror/merlin/arthur.py status</code>. To force the sync of a project, execute <code>~mirror/merlin/arthur.py sync:PROJECT_NAME</code>.
  
We use Apache as our web server. Here's a snippet of the worker configuration:
+
===== Push Sync =====
  
<IfModule mpm_worker_module>
+
Some projects support push syncing via SSH.
    ServerLimit          32
 
    ThreadLimit          64
 
    StartServers          2
 
    MaxClients        2048
 
    MinSpareThreads      16
 
    MaxSpareThreads      48
 
    ThreadsPerChild      64
 
    MaxRequestsPerChild  0
 
</IfModule>
 
  
We use the bwbar application to display current bandwidth in the footer of mirror pages.
+
We are running a special SSHD instance on mirror.csclub.uwaterloo.ca:22. This instance has been locked down, with the following settings:
  
We use mod_bw to ensure every connection is at least 100 KiB/s.
+
* Only SSH key authentication
 +
* Only users of the <code>push</code> group (except <code>mirror</code>) are allowed to connect
 +
* X11 Forwarding, TCP Forwarding, Agent Forwarding, User RC and TTY are disabled
 +
* Users are chrooted to <code>/mirror/merlin</code>
 +
 
 +
Most projects will connect using the <code>push</code> user. The SSH authorized keys file is located at <code>/home/push/.ssh/authorized_keys</code>. An example entry is:
 +
 
 +
<pre>
 +
restrict,no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="arthur sync:ubuntu >/dev/null 2>/dev/null </dev/null &",from="XXX.XXX.XXX.XXX" ssh-rsa ...
 +
</pre>
 +
 
 +
==== Sync Scripts ====
 +
 
 +
Our collection of synchronization scripts are located in <code>~mirror/bin</code>. They currently include:
 +
 
 +
* <code>csc-sync-apache</code>
 +
* <code>csc-sync-debian</code>
 +
* <code>csc-sync-debian-cd</code>
 +
* <code>csc-sync-gentoo</code>
 +
* <code>csc-sync-ssh</code>
 +
* <code>csc-sync-standard</code>
 +
 
 +
Most of these scripts take the following parameters:
 +
 
 +
<code>local_dir rsync_host rsync_dir</code>
 +
 
 +
=== HTTP(s) ===
 +
 
 +
We use [https://nginx.org nginx] as our webserver.
 +
 
 +
==== Index ====
 +
 
 +
An index of the archives we mirror is available at [http://mirror.csclub.uwaterloo.ca mirror.csclub.uwaterloo.ca].
 +
 
 +
As of Winter 2010, it is now generated by a Python script in <code>~mirror/mirror-index</code>.
 +
 
 +
<code>~mirror/mirror-index/make-index</code> is scheduled in <code>/etc/cron.d/csc-mirror</code> to be run hourly. The script can be run manually when needed (for example, when the archive list is updated) by running:
 +
 
 +
<code>sudo -u mirror /home/mirror/mirror-index/make-index.py</code>
 +
 
 +
The script will iterate all folders in <code>/mirror/root</code>, identify the size of the project using `zfs get -H -o value used $dataset`, where $dataset is calculated from the symlink in <code>/mirror/root</code>. The size of all folders is added together to calculate the total folder size (the total size includes hidden projects).
 +
 
 +
<code>make-index.py</code> is configured by means of a [https://yaml.org YAML] file, <code>config.yaml</code>, in the same directory. Its format is as follows:
 +
 
 +
<pre class="yaml">docroot: /mirror/root
 +
output: /mirror/root/index.html
 +
 
 +
exclude:
 +
  - include
 +
  - lost+found
 +
  - pub
 +
# (...)
 +
 
 +
directories:
 +
  apache:
 +
    site: apache.org
 +
    url: http://www.apache.org/
 +
 
 +
  archlinux:
 +
    site: archlinux.org
 +
    url: http://www.archlinux.org/
 +
 
 +
# (...)</pre>
 +
The docroot is the directory which is to be scanned; this will probably always be the mirror root from which Apache serves. This is here so that it's easy to find and alter. For instance, we could change <code>--human-readable</code> to <code>--si</code> if we ever decided that, like hard disk manufacturers, we want sizes to appear larger than they are. <code>output</code> defines the file to which the generated index will be written.
 +
 
 +
<code>exclude</code> specifies the list of directories which will not be included in the generated index page (since, by default, all folders are included in the generated index page).
 +
 
 +
Finally, <code>directories</code> specifies the information of directories. All directories are listed by default, whether or not they appear in this list - only those under <code>exclude</code> are ignored. The format is fairly straightforward: simply name the directory and provide a site (the display name in the &quot;Project Site&quot; column) and URL. One caveat here is that YAML does not allow tabs for whitespace. Indent with two spaces to remain consistent with the existing file format, please. Also note that the directory name is case-sensitive, as is always the case on Unix.
 +
 
 +
Finally, the HTML index file is generated from <code>index.mako</code>, a Mako template (which is mostly HTML anyhow). If you really can't figure out how it works, look up the Mako documentation.
  
 
=== FTP ===
 
=== FTP ===
  
We use proftpd (standalone daemon) as our ftp server. To increase performance we disable DNS lookups in proftpd.conf:
+
We use [http://www.proftpd.org/ proftpd] (standalone daemon) as our FTP server.
 +
 
 +
To increase performance, we disable DNS lookups in <code>proftpd.conf</code>:
 +
 
 +
<pre>UseReverseDNS          off
 +
IdentLookups            off</pre>
 +
We also limit the amount of CPU/memory resources used (e.g. to minimize [https://en.wikipedia.org/wiki/Globbing Globbing] resources):
 +
 
 +
<pre>RLimitCPU              session 10
 +
RLimitMemory            session 4096K</pre>
 +
We allow a maximum of 500 concurrent FTP sessions:
 +
 
 +
<pre>MaxInstances            500
 +
MaxClients              500</pre>
 +
The contents of <code>/mirror/root/include/motd.msg</code> are displayed when a user connects.
 +
 
 +
=== rsync ===
 +
 
 +
We use <code>rsyncd</code> (standalone daemon).
 +
 
 +
We disable compression and checksumming in <code>rsyncd.conf</code>:
 +
 
 +
<pre>dont compress = *
 +
refuse options = c delete</pre>
 +
The contents of <code>/mirror/root/include/motd.msg</code> are displayed when a user connects.
 +
 
 +
== Mirror Administration ==
 +
 
 +
=== Adding a new project ===
 +
 
 +
# Find the instructions for mirroring the project. Ideally, try to sync directly from the project’s source repository.
 +
#* Note that some projects provide sync scripts, however we generally won’t use them. We will instead use our custom ones.
 +
# Create a zfs filesystem to store the project in:
 +
#* Find the pool with least current disk usage
 +
#* <code>zfs create cscmirror{1,2,3}/$PROJECT_NAME</code>
 +
# Change the folder ownership
 +
#* <code>chown mirror:mirror /mirror/root/.cscmirror{1,2,3}/$PROJECT_NAME</code>
 +
# Create the symlink in <code>/mirror/root</code>
 +
#* <code>ln -s .cscmirror{1,2,3}/$PROJECT_NAME $PROJECT_NAME</code> ('''NOTE''': The symlink must be relative to the <code>/mirror/root</code> directory. If it isn’t, the symlinks will not work when chrooted)
 +
# Repeat the above steps on mirror-dc. <code>sudo ssh mirror-dc</code> on potassium-benzoate
 +
# Configure the project in merlin (<code>~mirror/merlin/merlin.py</code>)
 +
#* Select the appropriate sync script (typically <code>csc-sync-standard</code>) and supply the appropriate parameters
 +
# Restart merlin: <code>systemctl restart merlin</code>
 +
# Configure the project in zfssync.yml (<code>~mirror/merlin/zfssync.yml</code>)
 +
#* This will kick off the initial sync
 +
#* Check <code>~mirror/merlin/logs/$PROJECT_NAME</code> for errors, <code>~mirror/merlin/logs/transfer.log</code> for transfer progress
 +
# Update the mirror index configuration (<code>~mirror/mirror-index/config.yaml</code>)
 +
# Add the project to rsync (<code>/etc/rsyncd.conf</code>)
 +
#* Restart rsync with <code>systemctl restart rsync</code>
 +
 
 +
If push mirroring is available/required, see [[#Push_Sync|Push Sync]].
 +
 
 +
=== Secondary Mirror ===
  
UseReverseDNS          off
+
The School of Computer Science's CSCF has provided us with a secondary mirror machine located in DC. This will limit the downtime of mirror.csclub in the event of an outage affecting the MC machine room.
IdentLookups            off
 
  
We also limit the amount of CPU/memory resources used (e.g. to minimize [http://en.wikipedia.org/wiki/Globbing Globbing] resources):
+
==== Keepalived ====
  
RLimitCPU              session 10
+
Mirror's IP addresses (129.97.134.71 and 2620:101:f000:4901:c5c::f:1055) have been configured has VRRP address on both machines. Keepalived does the monitoring and selecting of the active node.
RLimitMemory            session 4096K
 
  
We allow a maximum of 200 concurrent ftp sessions:
+
Potassium-benzoate has higher priority and will typically be the active node. A node's priority is reduced when nginx, proftpd or rsync are not running. Potassium-benzoate starts with a score of 100 and mirror-dc starts with a priority of 90 (higher score wins).
  
MaxInstances            500
+
When nginx is unavailable (checked w/ curl), the priority is reduced by 20. When proftpd is unavailable (checked with curl), the priority is reduced by 5. When rsync is unavailable (checking with rsync), the priority is reduced by 15.
MaxClients              500
 
  
=== Rsync ===
+
The Systems Committee should received an email when the nodes swap position.
  
We use rsyncd (standalone daemon). We disable compression and checksumming in rsyncd.conf:
+
==== Project synchronization ====
  
dont compress = *
+
Only potassium-benzoate is configure with merlin. mirror-dc has the software components, but they are probably not update to date nor configured to run correctly.
refuse options = c delete
 
  
For ftp and rsync, the contents of /mirror/root/include/motd.msg are displayed when users connect.
+
When a project sync is complete, merlin will kick off a custom script to sync the zfs dataset to the other node. These scripts live in /usr/local/bin and in ~mirror/merlin.

Latest revision as of 21:37, 4 August 2022

The Computer Science Club runs a public mirror (mirror.csclub.uwaterloo.ca) on potassium-benzoate.

We are listed on the ResNet "don't count" list, so downloading from our mirror will not count against one's ResNet quota.

Software Mirrored

A list of current archives (and their respective disk usage) is listed on our mirror's homepage at mirror.csclub.uwaterloo.ca.

Mirroring Requests

Requests to mirror a particular distribution or archive should be made to syscom@csclub.uwaterloo.ca.

Implementation Details

Syncing

Storage

All of our projects are stored on one of three zfs zpools. There are 8 drives per array (7 run cscmirror3), configured as raidz2, and there is an additional drive that can be swapped in (in the event of a disk failure).

  • /mirror/root/.cscmirror1
  • /mirror/root/.cscmirror2
  • /mirror/root/.cscmirror3

Each project is given a filesystem under one of the two pools. Symlinks are created /mirror/root to point to the correct pool and file system.

Merlin

**UPDATE**: merlin.py and the sync scripts are currently being merged together into merlin-go which can be found at /home/mirror-go/merlin. The current status can be found using systemctl status merlin-go.service or by going to /home/mirror-go/merlin/cmd/arthur and running ./arthur status. To force sync a project execute ./arthur sync:PROJECT_NAME.


The synchronization process is run by a Python script called "merlin", written by a2brenna. The script is stored in ~mirror/merlin.

The list of repositories and their configuration (synch frequency, location, etc.) is configured in merlin.py.

To view the sync status, execute ~mirror/merlin/arthur.py status. To force the sync of a project, execute ~mirror/merlin/arthur.py sync:PROJECT_NAME.

Push Sync

Some projects support push syncing via SSH.

We are running a special SSHD instance on mirror.csclub.uwaterloo.ca:22. This instance has been locked down, with the following settings:

  • Only SSH key authentication
  • Only users of the push group (except mirror) are allowed to connect
  • X11 Forwarding, TCP Forwarding, Agent Forwarding, User RC and TTY are disabled
  • Users are chrooted to /mirror/merlin

Most projects will connect using the push user. The SSH authorized keys file is located at /home/push/.ssh/authorized_keys. An example entry is:

restrict,no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="arthur sync:ubuntu >/dev/null 2>/dev/null </dev/null &",from="XXX.XXX.XXX.XXX" ssh-rsa ...

Sync Scripts

Our collection of synchronization scripts are located in ~mirror/bin. They currently include:

  • csc-sync-apache
  • csc-sync-debian
  • csc-sync-debian-cd
  • csc-sync-gentoo
  • csc-sync-ssh
  • csc-sync-standard

Most of these scripts take the following parameters:

local_dir rsync_host rsync_dir

HTTP(s)

We use nginx as our webserver.

Index

An index of the archives we mirror is available at mirror.csclub.uwaterloo.ca.

As of Winter 2010, it is now generated by a Python script in ~mirror/mirror-index.

~mirror/mirror-index/make-index is scheduled in /etc/cron.d/csc-mirror to be run hourly. The script can be run manually when needed (for example, when the archive list is updated) by running:

sudo -u mirror /home/mirror/mirror-index/make-index.py

The script will iterate all folders in /mirror/root, identify the size of the project using `zfs get -H -o value used $dataset`, where $dataset is calculated from the symlink in /mirror/root. The size of all folders is added together to calculate the total folder size (the total size includes hidden projects).

make-index.py is configured by means of a YAML file, config.yaml, in the same directory. Its format is as follows:

docroot: /mirror/root
output: /mirror/root/index.html

exclude:
   - include
   - lost+found
   - pub
# (...)

directories:
  apache:
    site: apache.org
    url: http://www.apache.org/

  archlinux:
    site: archlinux.org
    url: http://www.archlinux.org/

# (...)

The docroot is the directory which is to be scanned; this will probably always be the mirror root from which Apache serves. This is here so that it's easy to find and alter. For instance, we could change --human-readable to --si if we ever decided that, like hard disk manufacturers, we want sizes to appear larger than they are. output defines the file to which the generated index will be written.

exclude specifies the list of directories which will not be included in the generated index page (since, by default, all folders are included in the generated index page).

Finally, directories specifies the information of directories. All directories are listed by default, whether or not they appear in this list - only those under exclude are ignored. The format is fairly straightforward: simply name the directory and provide a site (the display name in the "Project Site" column) and URL. One caveat here is that YAML does not allow tabs for whitespace. Indent with two spaces to remain consistent with the existing file format, please. Also note that the directory name is case-sensitive, as is always the case on Unix.

Finally, the HTML index file is generated from index.mako, a Mako template (which is mostly HTML anyhow). If you really can't figure out how it works, look up the Mako documentation.

FTP

We use proftpd (standalone daemon) as our FTP server.

To increase performance, we disable DNS lookups in proftpd.conf:

UseReverseDNS           off
IdentLookups            off

We also limit the amount of CPU/memory resources used (e.g. to minimize Globbing resources):

RLimitCPU               session 10
RLimitMemory            session 4096K

We allow a maximum of 500 concurrent FTP sessions:

MaxInstances            500
MaxClients              500

The contents of /mirror/root/include/motd.msg are displayed when a user connects.

rsync

We use rsyncd (standalone daemon).

We disable compression and checksumming in rsyncd.conf:

dont compress = *
refuse options = c delete

The contents of /mirror/root/include/motd.msg are displayed when a user connects.

Mirror Administration

Adding a new project

  1. Find the instructions for mirroring the project. Ideally, try to sync directly from the project’s source repository.
    • Note that some projects provide sync scripts, however we generally won’t use them. We will instead use our custom ones.
  2. Create a zfs filesystem to store the project in:
    • Find the pool with least current disk usage
    • zfs create cscmirror{1,2,3}/$PROJECT_NAME
  3. Change the folder ownership
    • chown mirror:mirror /mirror/root/.cscmirror{1,2,3}/$PROJECT_NAME
  4. Create the symlink in /mirror/root
    • ln -s .cscmirror{1,2,3}/$PROJECT_NAME $PROJECT_NAME (NOTE: The symlink must be relative to the /mirror/root directory. If it isn’t, the symlinks will not work when chrooted)
  5. Repeat the above steps on mirror-dc. sudo ssh mirror-dc on potassium-benzoate
  6. Configure the project in merlin (~mirror/merlin/merlin.py)
    • Select the appropriate sync script (typically csc-sync-standard) and supply the appropriate parameters
  7. Restart merlin: systemctl restart merlin
  8. Configure the project in zfssync.yml (~mirror/merlin/zfssync.yml)
    • This will kick off the initial sync
    • Check ~mirror/merlin/logs/$PROJECT_NAME for errors, ~mirror/merlin/logs/transfer.log for transfer progress
  9. Update the mirror index configuration (~mirror/mirror-index/config.yaml)
  10. Add the project to rsync (/etc/rsyncd.conf)
    • Restart rsync with systemctl restart rsync

If push mirroring is available/required, see Push Sync.

Secondary Mirror

The School of Computer Science's CSCF has provided us with a secondary mirror machine located in DC. This will limit the downtime of mirror.csclub in the event of an outage affecting the MC machine room.

Keepalived

Mirror's IP addresses (129.97.134.71 and 2620:101:f000:4901:c5c::f:1055) have been configured has VRRP address on both machines. Keepalived does the monitoring and selecting of the active node.

Potassium-benzoate has higher priority and will typically be the active node. A node's priority is reduced when nginx, proftpd or rsync are not running. Potassium-benzoate starts with a score of 100 and mirror-dc starts with a priority of 90 (higher score wins).

When nginx is unavailable (checked w/ curl), the priority is reduced by 20. When proftpd is unavailable (checked with curl), the priority is reduced by 5. When rsync is unavailable (checking with rsync), the priority is reduced by 15.

The Systems Committee should received an email when the nodes swap position.

Project synchronization

Only potassium-benzoate is configure with merlin. mirror-dc has the software components, but they are probably not update to date nor configured to run correctly.

When a project sync is complete, merlin will kick off a custom script to sync the zfs dataset to the other node. These scripts live in /usr/local/bin and in ~mirror/merlin.