Mirror: Difference between revisions
(112 intermediate revisions by 9 users not shown) | |||
Line 1: | Line 1: | ||
The [https://csclub.uwaterloo.ca Computer Science Club] runs a public mirror ([http://mirror.csclub.uwaterloo.ca mirror.csclub.uwaterloo.ca]) on [[Machine_List#potassium-benzoate|potassium-benzoate]]. |
|||
''We are listed on the ResNet "don't count" list, so downloading from our mirror will not count against one's ResNet quota.'' |
|||
== Archives Mirrored == |
|||
== Software Mirrored == |
|||
'''Total Size:''' 4.5 TiB |
|||
A list of current archives (and their respective disk usage) is listed on our mirror's homepage at [http://mirror.csclub.uwaterloo.ca mirror.csclub.uwaterloo.ca]. |
|||
{| class="wikitable" |
|||
!style="width: 12em"| Archive ||style="width: 31em"| Upstream/Listing ||style="width: 5em"| Size || Notes |
|||
|- |
|||
| Apache || rsync://rsync.us.apache.org/apache-dist/<br/>http://www.apache.org/mirrors/ || 20 GiB || - |
|||
|- |
|||
| Arch Linux || rsync://mirror.rit.edu/archlinux<br/>http://www.archlinux.org/download/<br/>http://wiki.archlinux.org/index.php/Mirrors#Mirror_List || 41 GiB || - |
|||
|- |
|||
| Blastwave || rsync://www.ibiblio.org/sun-packages/csw/<br/>http://www.blastwave.org/mirrors.php || 10 GiB || - |
|||
|- |
|||
| CentOS || rsync://us-msync.centos.org/CentOS/<br/>http://mirror-status.centos.org/ || 125 GiB || - |
|||
|- |
|||
| CPAN || rsync://rsync.nic.funet.fi/CPAN/<br/>http://mirror.cpan.org/ || 6 GiB || - |
|||
|- |
|||
| CRAN || rsync://cran.r-project.org/CRAN/ || 36 GiB || '''We should get added as an official mirror.''' |
|||
|- |
|||
| CTAN || rsync://carroll.aset.psu.edu/ctan/<br/>http://www.dante.de/cgi-bin/ctan/list.cgi<br/>http://www.dante.de/mirmon/ || 16 GiB || - |
|||
|- |
|||
| Cygwin || rsync://cygwin.com/cygwin-ftp/<br/>http://www.cygwin.com/mirrors.html || 10 GiB || - |
|||
|- |
|||
| Damn Small Linux || rsync://ftp.belnet.be/packages/damnsmalllinux/ || 18 GiB || '''We should get added as an official mirror.''' Maybe contact the dude listed here: http://www.damnsmalllinux.org/donate.html. |
|||
|- |
|||
| Debian || rsync://gulus.usherbrooke.ca/debian/<br/>http://www.debian.org/mirror/list<br/>http://www.de.debian.org/dmc/today/ || 410 GiB || Requested to be added to the ftp.ca.debian.org rotation; will be added if the need arises.. |
|||
|- |
|||
| Debian-backports || rsync://www.backports.org/backports.org/<br/>http://www.backports.org/debian/README.mirrors.html || 28 GiB || Submitted a request to be added to the mirror list. |
|||
|- |
|||
| Debian-cd || rsync://cdimage.debian.org/debian-cd/ || 68 GiB || Mirror only the first CD and DVD, and all small CD's (netinst, business-card, etc...). |
|||
|- |
|||
| Debian-multimedia || rsync://www.debian-multimedia.org/debian/<br/>http://www.debian-multimedia.org/debian-m.php || 5 GiB || - |
|||
|- |
|||
| Debian-ports || rsync://ftp.debian-ports.org/debian/ || 124G || - |
|||
|- |
|||
| Debian-security || rsync://security.debian.org/debian-security/ || 29 GiB || Debian does not currently list debian-security mirrors, and encourages users to use security.debian.org exclusively. |
|||
|- |
|||
| Debian-unofficial || rsync://ftp.debian-unofficial.org/debian/<br/>http://www.debian-unofficial.org/mirrors.html || 1 GiB || - |
|||
|- |
|||
| Debian-volatile || rsync://volatile.debian.org/debian-volatile/<br/>http://www.debian.org/volatile/volatile-mirrors || 4 GiB || - |
|||
|- |
|||
| Eclipse || rsync://download.eclipse.org/eclipseMirror<br/>http://www.eclipse.org/downloads/download.php?file=/ || 138 GiB || - |
|||
|- |
|||
| Emdebian || rsync://www.emdebian.org/emdebian/ || 3 GiB || Emdebian does not currently list mirrors. |
|||
|- |
|||
| Fedora || rsync://archive.linux.duke.edu/fedora-enchilada/ || 873 GiB || '''We should get added as an official mirror.''' |
|||
|- |
|||
| FreeBSD || rsync://ftp1.ca.freebsd.org/ || 718 GiB || '''We should get added as an official mirror.''' |
|||
|- |
|||
| Gentoo (portage) || rsync://rsync1.us.gentoo.org/gentoo-portage/ || 1 GiB || We are rsync4.ca.gentoo.org. |
|||
|- |
|||
| Gentoo (sources) || rsync://masterdistfiles.gentoo.org/gentoo/<br/>http://www.gentoo.org/main/en/mirrors.xml<br/>http://mirrorstats.gentoo.org/ || 92 GiB || See ~sysadmin/passwords/gentoo for rsync password. |
|||
|- |
|||
| GNOME || rsync://ftp.gnome.org/gnome/ || 65 GiB || - |
|||
|- |
|||
| GNU || rsync://ftp.ibiblio.org/pub/gnu/ftp/gnu/<br/>http://www.gnu.org/order/ftp.html || 14 GiB || - |
|||
|- |
|||
| Gobuntu-releases || rsync://cdimage.ubuntu.com/cdimage/gobuntu/releases/ || 7 GiB || Canonical does not currently list mirrors. |
|||
|- |
|||
| KDE || rsync://master.kde.org/kdeftp/<br/>http://www.kde.org/mirrors/ || 65 GiB || - |
|||
|- |
|||
| kernel.org || rsync://kernel.org/pub/linux/<br/>rsync://kernel.org/pub/software/<br/>http://kernel.org/mirrors/countries/html/CA.html || 82 GiB || - |
|||
|- |
|||
| Linux Mint || rsync://ftp.heanet.ie/pub/linuxmint.com/ || 21 GiB || '''We should get added as an official mirror.''' |
|||
|- |
|||
| mozdev || rsync://rsync.mozdev.org/mozdev/<br/>http://mirrors.mozdev.org/index.html || 5 GiB || Currently in the North American rotation, but could request to be added to the global rotation. |
|||
|- |
|||
| mozilla.org || rsync://releases-rsync.mozilla.org/mozilla-releases/<br/>http://www.mozilla.org/mirrors.html || 95 GiB || - |
|||
|- |
|||
| MySQL || rsync://mysql.mirrors.pair.com/mysql/<br/>http://dev.mysql.com/downloads/mirrors.html || 76 GiB || - |
|||
|- |
|||
| non-GNU || rsync://dl.sv.gnu.org/releases/<br/>http://dl.sv.gnu.org/releases/00_MIRRORS.html<br/>http://download.savannah.gnu.org/mirmon/ || 8 GiB || - |
|||
|- |
|||
| Openoffice (extended set) || rsync://rsync.services.openoffice.org/openoffice-extended/<br/>http://distribution.openoffice.org/mirrors/#mirrors<br/>http://www.ooodev.org/mirmon/ || 136 GiB || - |
|||
|- |
|||
| OpenSUSE (opensuse-full) || rsync://suse.mirrors.tds.net/opensuse-full/opensuse/<br/>http://download.opensuse.org/distribution/11.0/iso/dvd/MD5SUMS?mirrorlist || 230 GiB || - |
|||
|- |
|||
| Slackware || - || 27 GiB || - |
|||
|- |
|||
| Ubuntu || rsync://archives.ubuntu.com/ubuntu/<br/>https://launchpad.net/ubuntu/+archivemirrors || 261 GiB || Are the official Canadian mirror (''i.e.'', ca.archive.ubuntu.com) |
|||
|- |
|||
| Ubuntu-releases || rsync://releases.ubuntu.com/releases/<br/>https://launchpad.net/ubuntu/+cdmirrors<br/>http://www.ubuntu.com/getubuntu/download<br/>http://www.kubuntu.org/download.php<br/>http://www.edubuntu.org/Download || 37 GiB || Are the official Canadian mirror (''i.e.'', ca.archive.ubuntu.com); ubuntu-releases includes Ubuntu, Kubuntu, and Edubuntu. |
|||
|- |
|||
| xorg.freedesktop.org || rsync://xorg.freedesktop.org/xorg-archive/<br/>http://www.x.org/wiki/Releases/Download || 5 GiB || - |
|||
|- |
|||
| Xubuntu-releases || rsync://cdimage.ubuntu.com/cdimage/xubuntu/releases/<br/>http://www.xubuntu.org/get || 20 GiB || - |
|||
|} |
|||
=== Mirroring Requests === |
|||
== Propsed Archives to Mirror == |
|||
Requests to mirror a particular distribution or archive should be made to [mailto:syscom@csclub.uwaterloo.ca syscom@csclub.uwaterloo.ca]. |
|||
* openSUSE (306 GiB) |
|||
* Mandriva (774 GiB) |
|||
* OpenBSD (54 GiB) |
|||
* NetBSD (340 GiB) |
|||
== Implementation Details == |
== Implementation Details == |
||
=== Syncing === |
|||
The mirroring is done by one of two scripts. Both are based on [http://www.debian.org/mirror/anonftpsync anonftpsync]. Various cronjobs are scheduled to call one of these scripts. |
|||
=== |
==== Storage ==== |
||
All of our projects are stored on an 8x18TB disk raidz2 array (cscmirror0). There is an additional drive acting as a hot-spare. |
|||
This is used to sync debian-style repositories. It's usage is: |
|||
csc-sync-debian local_dir rsync_host rsync_dir [trace_host [trace_dir]] |
|||
* <code>/mirror/root/.cscmirror0</code> |
|||
If trace_host is specified, then $rsync_dir/project/trace/$trace_host is checked to see if it has changed. If it has, a normal debian-style (two-pass) rsync is done. |
|||
Each project is given a filesystem the pool. Symlinks are created <code>/mirror/root</code> to point to the correct pool and file system. |
|||
=== csc-sync-standard === |
|||
==== Merlin ==== |
|||
This is used to sync a tree in a general way. Like anonftpsync, it supports locking and logging. It's usage is: |
|||
Project synchronization is done by "merlin" which is a Go rewrite of the Python script "merlin" originally written by a2brenna. |
|||
The program is stored in <code>~mirror/merlin</code> and is managed by the systemd unit <code>merlin-go.service</code>. |
|||
csc-sync-standard local_dir rsync_host rsync_dir |
|||
The config file <code>merlin-config.ini</code> contains the list of repositories along with their configurations. |
|||
=== Crontab === |
|||
To view the sync status, execute <code>~mirror/merlin/cmd/arthur/arthur status</code>. To force the sync of a project, execute <code>~mirror/merlin/cmd/arthur/arthur sync:PROJECT_NAME</code>. |
|||
All cronjobs are listed in mirror's crontab. If csc-sync-debian is used, the cronjob is typically run bi-hourly. When using csc-sync-standard, the frequency of the cronjob is typically 12 hours. The crontab currently looks like this: |
|||
'''Remark''': For syncing Debian repositories we were [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020998 requested] to use ftpsync which has configs in <code>~mirror/ftpsync</code>. |
|||
# m h dom mon dow command |
|||
# |
|||
# bi-hourly |
|||
# |
|||
5 */2 * * * ~/bin/csc-sync-debian debian gulus.usherbrooke.ca debian ftp-master.debian.org |
|||
5 */2 * * * ~/bin/csc-sync-debian debian-multimedia www.debian-multimedia.org debian marillat.net |
|||
5 */2 * * * ~/bin/csc-sync-debian debian-backports www.backports.org backports.org www.backports.org |
|||
5 */2 * * * ~/bin/csc-sync-debian debian-volatile volatile.debian.org debian-volatile volatile-master.debian.net |
|||
5 */2 * * * ~/bin/csc-sync-debian debian-security security.debian.org debian-security security-master.debian.org |
|||
5 */2 * * * ~/bin/csc-sync-debian debian-unofficial ftp.debian-unofficial.org debian ftp-master.debian-unofficial.org |
|||
5 */2 * * * ~/bin/csc-sync-debian ubuntu archive.ubuntu.com ubuntu drescher.canonical.com |
|||
5 */2 * * * ~/bin/csc-sync-standard ubuntu-releases releases.ubuntu.com releases |
|||
5 */2 * * * ~/bin/csc-sync-standard xubuntu-releases cdimage.ubuntu.com cdimage/xubuntu/releases/ |
|||
5 */2 * * * ~/bin/csc-sync-standard kubuntu-kde4 cdimage.ubuntu.com cdimage/kubuntu-kde4/releases/ |
|||
# |
|||
# daily |
|||
# |
|||
5 3,15 * * * ~/bin/csc-sync-debian emdebian www.emdebian.org emdebian |
|||
5 3,15 * * * ~/bin/csc-sync-standard nexenta nexenta.org repository |
|||
5 3,15 * * * ~/bin/csc-sync-standard nexenta-releases nexenta.org releases |
|||
5 3,15 * * * ~/bin/csc-sync-standard CPAN rsync.nic.funet.fi CPAN |
|||
5 3,15 * * * ~/bin/csc-sync-standard CRAN cran.r-project.org CRAN |
|||
5 3,15 * * * ~/bin/csc-sync-standard CTAN carroll.aset.psu.edu ctan |
|||
5 3,15 * * * ~/bin/csc-sync-standard openoffice ftp.ussg.iu.edu openoffice |
|||
5 4,16 * * * ~/bin/csc-sync-standard cygwin cygwin.com cygwin-ftp |
|||
5 4,16 * * * ~/bin/csc-sync-standard gnu ftp.ibiblio.org pub/gnu/ftp/gnu/ |
|||
5 4,16 * * * ~/bin/csc-sync-standard nongnu dl.sv.gnu.org releases |
|||
5 4,16 * * * ~/bin/csc-sync-standard kernel.org/linux kernel.org all/linux/ |
|||
5 4,16 * * * ~/bin/csc-sync-standard kernel.org/software kernel.org all/software/ |
|||
5 4,16 * * * ~/bin/csc-sync-standard apache rsync.us.apache.org apache-dist |
|||
5 4.16 * * * ~/bin/csc-sync-standard eclipse download.eclipse.org eclipseMirror |
|||
5 5,17 * * * ~/bin/csc-sync-standard mysql mysql.he.net mysql |
|||
5 5,17 * * * ~/bin/csc-sync-standard kde master.kde.org kdeftp |
|||
5 5,17 * * * ~/bin/csc-sync-standard mozdev rsync.mozdev.org mozdev |
|||
5 5,17 * * * ~/bin/csc-sync-standard blastwave www.ibiblio.org sun-packages/csw/ |
|||
5 5,17 * * * ~/bin/csc-sync-standard archlinux mirror.rit.edu archlinux |
|||
5 5,17 * * * ~/bin/csc-sync-standard debian-ports ftp.debian-ports.org debian |
|||
5 5,17 * * * ~/bin/csc-sync-debian-cd |
|||
5 6,18 * * * ~/bin/csc-sync-standard x.org xorg.freedesktop.org xorg-archive |
|||
5 6,18 * * * ~/bin/csc-sync-standard gnome ftp.gnome.org gnome |
|||
5 6,18 * * * ~/bin/csc-sync-standard centos us-msync.centos.org CentOS |
|||
5 6,18 * * * ~/bin/csc-sync-standard opensuse suse.mirrors.tds.net opensuse-full/opensuse/ |
|||
5 6,18 * * * ~/bin/csc-sync-standard damnsmalllinux ftp.belnet.be packages/damnsmalllinux/ |
|||
# |
|||
# other |
|||
# |
|||
29 */4 * * * RSYNC_USER=gentoo RSYNC_PASSWORD=******** ~/bin/csc-sync-standard gentoo-distfiles masterdistfiles.gentoo.org gentoo |
|||
15,45 * * * * ~/bin/csc-sync-standard gentoo-portage rsync1.us.gentoo.org gentoo-portage |
|||
5,35 * * * * ~/bin/csc-sync-standard mozilla.org releases-rsync.mozilla.org mozilla-releases |
|||
=== |
===== Push Sync ===== |
||
Some projects support push syncing via SSH. |
|||
We use Apache as our web server. Here's a snippet of the worker configuration: |
|||
We are running a special SSHD instance on mirror.csclub.uwaterloo.ca:22. This instance has been locked down, with the following settings: |
|||
<IfModule mpm_worker_module> |
|||
ServerLimit 32 |
|||
ThreadLimit 64 |
|||
StartServers 2 |
|||
MaxClients 2048 |
|||
MinSpareThreads 16 |
|||
MaxSpareThreads 48 |
|||
ThreadsPerChild 64 |
|||
MaxRequestsPerChild 0 |
|||
</IfModule> |
|||
* Only SSH key authentication |
|||
We use the bwbar application to display current bandwidth in the footer of mirror pages. |
|||
* Only users of the <code>push</code> group (except <code>mirror</code>) are allowed to connect |
|||
* X11 Forwarding, TCP Forwarding, Agent Forwarding, User RC and TTY are disabled |
|||
* Users are chrooted to <code>/mirror/merlin</code> |
|||
Most projects will connect using the <code>push</code> user. The SSH authorized keys file is located at <code>/home/push/.ssh/authorized_keys</code>. An example entry is: |
|||
We use mod_bw to ensure every connection is at least 100 KiB/s. |
|||
<pre> |
|||
restrict,no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="arthur sync:ubuntu >/dev/null 2>/dev/null </dev/null &",from="XXX.XXX.XXX.XXX" ssh-rsa ... |
|||
</pre> |
|||
==== Sync Scripts ==== |
|||
Our collection of synchronization scripts are located in <code>~mirror/bin</code>. They currently include: |
|||
* <code>csc-sync-apache</code> |
|||
* <code>csc-sync-debian</code> |
|||
* <code>csc-sync-debian-cd</code> |
|||
* <code>csc-sync-gentoo</code> |
|||
* <code>csc-sync-ssh</code> |
|||
* <code>csc-sync-standard</code> |
|||
Most of these scripts take the following parameters: |
|||
<code>local_dir rsync_host rsync_dir</code> |
|||
=== HTTP(s) === |
|||
We use [https://nginx.org nginx] as our webserver. |
|||
==== Index ==== |
|||
An index of the archives we mirror is available at [http://mirror.csclub.uwaterloo.ca mirror.csclub.uwaterloo.ca]. |
|||
As of Spring 2023, it is now generated by Hugo. |
|||
<code>~mirror/mirror-index/deploy.sh</code> is scheduled in <code>/etc/cron.d/csc-mirror</code> to be run every minute. |
|||
The script will first run <code>synctask2project</code>, which pull project synchronization status from Merlin (using merlin's socket), combine sub-projects (for example <code>racket</code> is a combination for two merlin tasks, <code>plt-bundles</code> and <code>racket-installers</code>) and read the size of the project using <code>zfs list -Hp</code>. This Python script then spits out a json file to <code>data/sync.json</code>. Hugo then read the json file and generate the HTML table based on it. The table part is also generated separately into <code>public/project_table/index.html</code>, which can be read by htmx (JS library used on index page) to achieve live reload on sync status. Finally, the generated product of Hugo is copied to mirror root for display by nginx. |
|||
Project information is located at <code>synctask2project/config.toml</code> ('''NOT''' the config.toml in the root folder! That's the config for Hugo). Its format is as follows: |
|||
<pre class="toml"> |
|||
merlin_sock = "/path/to/merlin/socket" |
|||
zfs_pools = ["mirror_zfs_pool1", "mirror_zfs_pool2"] |
|||
[project_name] |
|||
# This is supposed to be the short version shown on the website |
|||
# Mandatory field |
|||
site = "project.site" |
|||
# The full URL |
|||
# Mandatory field |
|||
url = "https://full.project.site" |
|||
# We are the upstream or archived project. Don't show sync error or last sync time |
|||
# Optional. Default: no |
|||
upstream = yes |
|||
# If this project contains multiple merlin sync tasks, list them here |
|||
# Optional. Default: project_name |
|||
merlin-tasks = ["task1", "task2"] |
|||
# define more projects below... |
|||
</pre> |
|||
The mirror-index also supports news. When adding new projects or making modifications, create a markdown file in <code>mirror-index/content/news/</code> to tell the user what was changed. It should be picked up by Hugo automatically on next generation. |
|||
On first setup, run <code>setup.sh</code>. When doing development (like change the sass or static files), run <code>build.sh</code> to build assets. |
|||
=== FTP === |
=== FTP === |
||
<b>UPDATE</b>: We now use vsftpd instead. See /etc/vsftpd.conf for details. Official documentation can be found [https://manpages.debian.org/stable/vsftpd/vsftpd.conf.5.en.html here]. |
|||
We use proftpd (standalone daemon) as our ftp server. To increase performance we disable DNS lookups in proftpd.conf: |
|||
We use [http://www.proftpd.org/ proftpd] (standalone daemon) as our FTP server. |
|||
To increase performance, we disable DNS lookups in <code>proftpd.conf</code>: |
|||
<pre>UseReverseDNS off |
|||
IdentLookups off</pre> |
|||
We also limit the amount of CPU/memory resources used (e.g. to minimize [https://en.wikipedia.org/wiki/Globbing Globbing] resources): |
|||
<pre>RLimitCPU session 10 |
|||
RLimitMemory session 4096K</pre> |
|||
We allow a maximum of 500 concurrent FTP sessions: |
|||
<pre>MaxInstances 500 |
|||
MaxClients 500</pre> |
|||
The contents of <code>/mirror/root/include/motd.msg</code> are displayed when a user connects. |
|||
=== rsync === |
|||
We use <code>rsyncd</code> (standalone daemon). |
|||
We disable compression and checksumming in <code>rsyncd.conf</code>: |
|||
<pre>dont compress = * |
|||
refuse options = c delete</pre> |
|||
The contents of <code>/mirror/root/include/motd.msg</code> are displayed when a user connects. |
|||
== Mirror Administration == |
|||
=== Making changes === |
|||
Everything in the <code>~mirror</code> is managed by git (so a monorepo containing all sub-projects like Merlin and mirror-index). To make changes, switch to the mirror user and commit with <code>--author "FirstName LastName <email@csc></code> to show who made the change. Then run <code>git push</code> to push the changes. The remote is using the HTTPS URL, so just enter your CSC credentials. |
|||
=== Adding a new project === |
|||
# Find the instructions for mirroring the project. Ideally, try to sync directly from the project’s source repository. |
|||
#* Note that some projects provide sync scripts, however we generally won’t use them. We will instead use our custom ones. |
|||
# Create a zfs filesystem to store the project in: |
|||
#*<code>zfs create cscmirror0/$PROJECT_NAME</code> |
|||
# Change the folder ownership |
|||
#*<code>chown mirror:mirror /mirror/root/.cscmirror0/$PROJECT_NAME</code> |
|||
# Create the symlink in <code>/mirror/root</code> |
|||
#*<code>ln -s .cscmirror0/$PROJECT_NAME $PROJECT_NAME</code> ('''NOTE''': The symlink must be relative to the <code>/mirror/root</code> directory. If it isn’t, the symlinks will not work when chrooted) |
|||
# Repeat the above steps on mirror-phys. <code>sudo ssh mirror-dc</code> on potassium-benzoate ['''NOTE: This machine is currently unavailable]''' |
|||
# Configure the project in merlin (<code>~mirror/merlin/merlin-config.ini</code>) |
|||
#* Select the appropriate sync script (typically <code>csc-sync-standard</code>) and supply the appropriate parameters |
|||
# Restart merlin: <code>systemctl restart merlin-go</code> |
|||
#* This will kick off the initial sync |
|||
#* Check <code>~mirror/merlin/log/$PROJECT_NAME</code> for errors, <code>~mirror/merlin/log-$PROTOCOL/$PROJECT_NAME-*.log</code> for transfer progress |
|||
# Configure the project in zfssync.yml (<code>~mirror/merlin/zfssync.yml</code>) ['''NOTE: The backup machine is currently unavailable, so this step is not currently needed]''' |
|||
# Update the mirror index configuration (<code>~mirror/mirror-index-ng/synctask2project/config.toml</code>) |
|||
# Add the project to rsync (<code>/etc/rsyncd.conf</code>) |
|||
#* Restart rsync with <code>systemctl restart rsync</code> |
|||
If push mirroring is available/required, see [[#Push_Sync|Push Sync]]. |
|||
=== Rename project === |
|||
# Change project name (title) and local_dir in <code>merlin-config.ini</code> |
|||
# Change zfs dataset name |
|||
#* <code>zfs rename cscmirror0/OLD_NAME cscmirror0/NEW_NAME</code> |
|||
# Reload merlin config |
|||
#* <code>systemctl reload merlin-go.service</code> |
|||
# Remove old symlink and create new symlink in mirror root |
|||
#* <code>rm OLD_DIR</code> |
|||
#* <code>ln -s .cscmirror0/NEW_DIR NEW_DIR</code> |
|||
# Add a symlink for the old name (in <code>/mirror/root</code>) so that existing users won't be broken by the change |
|||
#* <code>ln -s NEW_DIR OLD_DIR</code> |
|||
# Update the rsync daemon |
|||
#* Edit <code>/etc/rsyncd.conf</code>, adding a new entry for the new name (keep the old name too). Restart with <code>systemctl restart rsync</code> |
|||
# Modify index page generator config |
|||
#* At <code>~mirror/mirror-index-ng/synctask2project/config.toml</code> |
|||
# Update an mirror registrations with the project to ensure the new URLs are used |
|||
=== Secondary Mirror === |
|||
The School of Computer Science's CSCF has provided us with a secondary mirror machine located in DC. This will limit the downtime of mirror.csclub in the event of an outage affecting the MC machine room. |
|||
As of June 2023, CSCF mirror is down. CSCF is planing to bring it back with new hardware but no ETA. |
|||
UseReverseDNS off |
|||
IdentLookups off |
|||
==== Keepalived ==== |
|||
We also limit the amount of CPU/memory resources used (e.g. to minimize [http://en.wikipedia.org/wiki/Globbing Globbing] resources): |
|||
Mirror's IP addresses (129.97.134.71 and 2620:101:f000:4901:c5c::f:1055) have been configured has VRRP address on both machines. Keepalived does the monitoring and selecting of the active node. |
|||
RLimitCPU session 10 |
|||
RLimitMemory session 4096K |
|||
Potassium-benzoate has higher priority and will typically be the active node. A node's priority is reduced when nginx, proftpd or rsync are not running. Potassium-benzoate starts with a score of 100 and mirror-dc starts with a priority of 90 (higher score wins). |
|||
We allow a maximum of 200 concurrent ftp sessions: |
|||
When nginx is unavailable (checked w/ curl), the priority is reduced by 20. When proftpd is unavailable (checked with curl), the priority is reduced by 5. When rsync is unavailable (checking with rsync), the priority is reduced by 15. |
|||
MaxInstances 500 |
|||
MaxClients 500 |
|||
The Systems Committee should received an email when the nodes swap position. |
|||
=== Rsync === |
|||
==== Project synchronization ==== |
|||
We use rsyncd (standalone daemon). We disable compression and checksumming in rsyncd.conf: |
|||
Only potassium-benzoate is configure with merlin. mirror-dc has the software components, but they are probably not update to date nor configured to run correctly. |
|||
dont compress = * |
|||
refuse options = c delete |
|||
When a project sync is complete, merlin will kick off a custom script to sync the zfs dataset to the other node. These scripts live in /usr/local/bin and in ~mirror/merlin. |
|||
For ftp and rsync, the contents of /mirror/root/include/motd.msg are displayed when users connect. |
Latest revision as of 18:39, 1 July 2023
The Computer Science Club runs a public mirror (mirror.csclub.uwaterloo.ca) on potassium-benzoate.
We are listed on the ResNet "don't count" list, so downloading from our mirror will not count against one's ResNet quota.
Software Mirrored
A list of current archives (and their respective disk usage) is listed on our mirror's homepage at mirror.csclub.uwaterloo.ca.
Mirroring Requests
Requests to mirror a particular distribution or archive should be made to syscom@csclub.uwaterloo.ca.
Implementation Details
Syncing
Storage
All of our projects are stored on an 8x18TB disk raidz2 array (cscmirror0). There is an additional drive acting as a hot-spare.
/mirror/root/.cscmirror0
Each project is given a filesystem the pool. Symlinks are created /mirror/root
to point to the correct pool and file system.
Merlin
Project synchronization is done by "merlin" which is a Go rewrite of the Python script "merlin" originally written by a2brenna.
The program is stored in ~mirror/merlin
and is managed by the systemd unit merlin-go.service
.
The config file merlin-config.ini
contains the list of repositories along with their configurations.
To view the sync status, execute ~mirror/merlin/cmd/arthur/arthur status
. To force the sync of a project, execute ~mirror/merlin/cmd/arthur/arthur sync:PROJECT_NAME
.
Remark: For syncing Debian repositories we were requested to use ftpsync which has configs in ~mirror/ftpsync
.
Push Sync
Some projects support push syncing via SSH.
We are running a special SSHD instance on mirror.csclub.uwaterloo.ca:22. This instance has been locked down, with the following settings:
- Only SSH key authentication
- Only users of the
push
group (exceptmirror
) are allowed to connect - X11 Forwarding, TCP Forwarding, Agent Forwarding, User RC and TTY are disabled
- Users are chrooted to
/mirror/merlin
Most projects will connect using the push
user. The SSH authorized keys file is located at /home/push/.ssh/authorized_keys
. An example entry is:
restrict,no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="arthur sync:ubuntu >/dev/null 2>/dev/null </dev/null &",from="XXX.XXX.XXX.XXX" ssh-rsa ...
Sync Scripts
Our collection of synchronization scripts are located in ~mirror/bin
. They currently include:
csc-sync-apache
csc-sync-debian
csc-sync-debian-cd
csc-sync-gentoo
csc-sync-ssh
csc-sync-standard
Most of these scripts take the following parameters:
local_dir rsync_host rsync_dir
HTTP(s)
We use nginx as our webserver.
Index
An index of the archives we mirror is available at mirror.csclub.uwaterloo.ca.
As of Spring 2023, it is now generated by Hugo.
~mirror/mirror-index/deploy.sh
is scheduled in /etc/cron.d/csc-mirror
to be run every minute.
The script will first run synctask2project
, which pull project synchronization status from Merlin (using merlin's socket), combine sub-projects (for example racket
is a combination for two merlin tasks, plt-bundles
and racket-installers
) and read the size of the project using zfs list -Hp
. This Python script then spits out a json file to data/sync.json
. Hugo then read the json file and generate the HTML table based on it. The table part is also generated separately into public/project_table/index.html
, which can be read by htmx (JS library used on index page) to achieve live reload on sync status. Finally, the generated product of Hugo is copied to mirror root for display by nginx.
Project information is located at synctask2project/config.toml
(NOT the config.toml in the root folder! That's the config for Hugo). Its format is as follows:
merlin_sock = "/path/to/merlin/socket" zfs_pools = ["mirror_zfs_pool1", "mirror_zfs_pool2"] [project_name] # This is supposed to be the short version shown on the website # Mandatory field site = "project.site" # The full URL # Mandatory field url = "https://full.project.site" # We are the upstream or archived project. Don't show sync error or last sync time # Optional. Default: no upstream = yes # If this project contains multiple merlin sync tasks, list them here # Optional. Default: project_name merlin-tasks = ["task1", "task2"] # define more projects below...
The mirror-index also supports news. When adding new projects or making modifications, create a markdown file in mirror-index/content/news/
to tell the user what was changed. It should be picked up by Hugo automatically on next generation.
On first setup, run setup.sh
. When doing development (like change the sass or static files), run build.sh
to build assets.
FTP
UPDATE: We now use vsftpd instead. See /etc/vsftpd.conf for details. Official documentation can be found here.
We use proftpd (standalone daemon) as our FTP server.
To increase performance, we disable DNS lookups in proftpd.conf
:
UseReverseDNS off IdentLookups off
We also limit the amount of CPU/memory resources used (e.g. to minimize Globbing resources):
RLimitCPU session 10 RLimitMemory session 4096K
We allow a maximum of 500 concurrent FTP sessions:
MaxInstances 500 MaxClients 500
The contents of /mirror/root/include/motd.msg
are displayed when a user connects.
rsync
We use rsyncd
(standalone daemon).
We disable compression and checksumming in rsyncd.conf
:
dont compress = * refuse options = c delete
The contents of /mirror/root/include/motd.msg
are displayed when a user connects.
Mirror Administration
Making changes
Everything in the ~mirror
is managed by git (so a monorepo containing all sub-projects like Merlin and mirror-index). To make changes, switch to the mirror user and commit with --author "FirstName LastName <email@csc>
to show who made the change. Then run git push
to push the changes. The remote is using the HTTPS URL, so just enter your CSC credentials.
Adding a new project
- Find the instructions for mirroring the project. Ideally, try to sync directly from the project’s source repository.
- Note that some projects provide sync scripts, however we generally won’t use them. We will instead use our custom ones.
- Create a zfs filesystem to store the project in:
zfs create cscmirror0/$PROJECT_NAME
- Change the folder ownership
chown mirror:mirror /mirror/root/.cscmirror0/$PROJECT_NAME
- Create the symlink in
/mirror/root
ln -s .cscmirror0/$PROJECT_NAME $PROJECT_NAME
(NOTE: The symlink must be relative to the/mirror/root
directory. If it isn’t, the symlinks will not work when chrooted)
- Repeat the above steps on mirror-phys.
sudo ssh mirror-dc
on potassium-benzoate [NOTE: This machine is currently unavailable] - Configure the project in merlin (
~mirror/merlin/merlin-config.ini
)- Select the appropriate sync script (typically
csc-sync-standard
) and supply the appropriate parameters
- Select the appropriate sync script (typically
- Restart merlin:
systemctl restart merlin-go
- This will kick off the initial sync
- Check
~mirror/merlin/log/$PROJECT_NAME
for errors,~mirror/merlin/log-$PROTOCOL/$PROJECT_NAME-*.log
for transfer progress
- Configure the project in zfssync.yml (
~mirror/merlin/zfssync.yml
) [NOTE: The backup machine is currently unavailable, so this step is not currently needed] - Update the mirror index configuration (
~mirror/mirror-index-ng/synctask2project/config.toml
) - Add the project to rsync (
/etc/rsyncd.conf
)- Restart rsync with
systemctl restart rsync
- Restart rsync with
If push mirroring is available/required, see Push Sync.
Rename project
- Change project name (title) and local_dir in
merlin-config.ini
- Change zfs dataset name
zfs rename cscmirror0/OLD_NAME cscmirror0/NEW_NAME
- Reload merlin config
systemctl reload merlin-go.service
- Remove old symlink and create new symlink in mirror root
rm OLD_DIR
ln -s .cscmirror0/NEW_DIR NEW_DIR
- Add a symlink for the old name (in
/mirror/root
) so that existing users won't be broken by the changeln -s NEW_DIR OLD_DIR
- Update the rsync daemon
- Edit
/etc/rsyncd.conf
, adding a new entry for the new name (keep the old name too). Restart withsystemctl restart rsync
- Edit
- Modify index page generator config
- At
~mirror/mirror-index-ng/synctask2project/config.toml
- At
- Update an mirror registrations with the project to ensure the new URLs are used
Secondary Mirror
The School of Computer Science's CSCF has provided us with a secondary mirror machine located in DC. This will limit the downtime of mirror.csclub in the event of an outage affecting the MC machine room.
As of June 2023, CSCF mirror is down. CSCF is planing to bring it back with new hardware but no ETA.
Keepalived
Mirror's IP addresses (129.97.134.71 and 2620:101:f000:4901:c5c::f:1055) have been configured has VRRP address on both machines. Keepalived does the monitoring and selecting of the active node.
Potassium-benzoate has higher priority and will typically be the active node. A node's priority is reduced when nginx, proftpd or rsync are not running. Potassium-benzoate starts with a score of 100 and mirror-dc starts with a priority of 90 (higher score wins).
When nginx is unavailable (checked w/ curl), the priority is reduced by 20. When proftpd is unavailable (checked with curl), the priority is reduced by 5. When rsync is unavailable (checking with rsync), the priority is reduced by 15.
The Systems Committee should received an email when the nodes swap position.
Project synchronization
Only potassium-benzoate is configure with merlin. mirror-dc has the software components, but they are probably not update to date nor configured to run correctly.
When a project sync is complete, merlin will kick off a custom script to sync the zfs dataset to the other node. These scripts live in /usr/local/bin and in ~mirror/merlin.