Mirror: Difference between revisions
(→Index) |
|||
Line 136: | Line 136: | ||
=== HTTP === |
=== HTTP === |
||
We use Apache as our web server. Here's a snippet of the worker configuration: |
We use [[Apache]] as our web server. Here's a snippet of the worker configuration: |
||
<IfModule mpm_worker_module> |
<IfModule mpm_worker_module> |
||
Line 150: | Line 150: | ||
We use the bwbar application to display current bandwidth in the footer of mirror pages. |
We use the bwbar application to display current bandwidth in the footer of mirror pages. |
||
==== Index ==== |
|||
An index of the archives we mirror is available at http://mirror.csclub.uwaterloo.ca/. |
|||
As of Winter 2010, it is now generated by a Python script in <tt>~mirror/mirror-index</tt>, |
|||
the details of which are below. |
|||
<tt>~mirror/mirror-index/make-index.py</tt> is scheduled in <tt>mirror</tt>'s crontab to be |
|||
run at 5:40 AM on the 14th and 28th of each month. The script can be run manually when needed |
|||
(for example, when an archive is removed) as follows: |
|||
sudo -u mirror /home/mirror/mirror-index/make-index.py |
|||
This causes an instance of <tt>du</tt> to be run which computes the size of each directory. This |
|||
list is then sorted alphabetically by directory name and returned to the Python script. |
|||
If any errors occur during this process, the script conservatively chooses to exit rather |
|||
than risk generating an index file that is incorrect. |
|||
<tt>make-index.py</tt> is configured by means of a [http://www.yaml.org/ YAML] file, |
|||
<tt>config.yaml</tt>, in the same directory. Its format is as follows: |
|||
docroot: /mirror/root |
|||
duflags: --human-readable --max-depth=1 |
|||
output: /mirror/root/index.html |
|||
directories: |
|||
apache: |
|||
site: apache.org |
|||
url: <nowiki>http://www.apache.org/</nowiki> |
|||
archlinux: |
|||
site: archlinux.org |
|||
url: <nowiki>http://www.archlinux.org/</nowiki> |
|||
# (...) |
|||
The <tt>docroot</tt> is the directory which is to be scanned; this will probably |
|||
always be the mirror root from which Apache serves. <tt>duflags</tt> specifies |
|||
the flags to be passed to <tt>du</tt>. This is here so that it's easy to find |
|||
and alter. For instance, we could change <tt>--human-readable</tt> to <tt>--si</tt> |
|||
if we ever decided that, like hard disk manufacturers, we want sizes to appear larger |
|||
than they are. <tt>output</tt> defines the file to which the generated index will be |
|||
written. |
|||
Finally, <tt>directories</tt> specifies the list of directories to be listed. |
|||
No directories not listed here will be shown. If you add a new archive and it doesn't |
|||
appear, that's why. The format is fairly straightforward: simply name the directory |
|||
and provide a site (the display name in the "Project Site" column) and URL. |
|||
One caveat here is that YAML does not allow tabs for whitespace. Indent with |
|||
two spaces to remain consistent with the existing file format, please. Also note |
|||
that the directory name is case-sensitive, as is always the case on Unix. |
|||
Finally, the HTML index file is generated from <tt>index.mako</tt>, a |
|||
[http://www.makotemplates.org/ Mako] template (which is mostly HTML anyhow). |
|||
If you really can't figure out how it works, look up the Mako documentation. |
|||
=== FTP === |
=== FTP === |
Revision as of 22:34, 28 February 2010
We currently run a public mirror (mirror.csclub.uwaterloo.ca) on sodium-benzoate. We are listed on the ResNet "don't count" list so downloading from our mirror will not count against one's ResNet quota. Requests to mirror a particular distribution or archive should be made to syscom@csclub.uwaterloo.ca. We also have a bandwidth graph you can look at.
Archives Mirrored
Total Size: 4.5 TiB
Archive | Upstream/Listing | Size | Notes |
---|---|---|---|
Apache | rsync://rsync.us.apache.org/apache-dist/ http://www.apache.org/mirrors/ |
24 GiB | - |
Arch Linux | rsync://mirror.rit.edu/archlinux http://www.archlinux.org/download/ http://wiki.archlinux.org/index.php/Mirrors#Mirror_List |
40 GiB | - |
Blastwave | rsync://master.rsync.blastwave.org/blastwave/ http://www.blastwave.org/mirrors.php |
12 GiB | - |
CentOS | rsync://us-msync.centos.org/CentOS/ http://mirror-status.centos.org/ |
118 GiB | - |
CPAN | rsync://rsync.nic.funet.fi/CPAN/ http://mirror.cpan.org/ |
7.3 GiB | - |
CRAN | rsync://cran.r-project.org/CRAN/ | 44 GiB | We should get added as an official mirror. |
CTAN | rsync://carroll.aset.psu.edu/ctan/ http://www.dante.de/cgi-bin/ctan/list.cgi http://www.dante.de/mirmon/ |
17 GiB | - |
Cygwin | rsync://cygwin.com/cygwin-ftp/ http://www.cygwin.com/mirrors.html |
8.3 GiB | - |
Damn Small Linux | rsync://ftp.heanet.ie/mirrors/damnsmalllinux.org/ | 18 GiB | Sent a request to be added. Maybe contact the dude listed here: http://www.damnsmalllinux.org/donate.html. |
Debian | rsync://ftp3.nrc.ca/debian/ http://www.debian.org/mirror/list http://www.de.debian.org/dmc/today/ |
489 GiB | Requested to be added to the ftp.ca.debian.org rotation; will be added if the need arises.. |
Debian-backports | rsync://www.backports.org/backports.org/ http://www.backports.org/debian/README.mirrors.html |
35 GiB | Submitted a request to be added to the mirror list long ago with no response. |
Debian-cd | rsync://cdimage.debian.org/debian-cd/ | 70 GiB | Mirror only the first CD and DVD, and all small CD's (netinst, business-card, etc...). |
Debian-multimedia | rsync://www.debian-multimedia.org/debian/ http://www.debian-multimedia.org/debian-m.php |
6.2 GiB | - |
Debian-ports | rsync://ftp.debian-ports.org/debian/ | 44 GiB | - |
Debian-security | rsync://security.debian.org/debian-security/ | 44 GiB | Debian does not currently list debian-security mirrors, and encourages users to use security.debian.org exclusively. |
Debian-unofficial | rsync://debian-maintainers.org/unofficial/ http://www.debian-unofficial.org/mirrors.html |
465 MiB | - |
Debian-volatile | rsync://volatile-master.debian.org/debian-volatile/ http://www.debian.org/volatile/volatile-mirrors |
3.4 GiB | - |
Eclipse | rsync://download.eclipse.org/eclipseMirror http://www.eclipse.org/downloads/download.php?file=/ |
295 GiB | - |
Emdebian | rsync://www.emdebian.org/debian/ | 3.7 GiB | Emdebian does not currently list mirrors. |
FreeBSD | rsync://ftp1.ca.freebsd.org/ | 1.2 TiB | We should get added as an official mirror. |
Gentoo (portage) | rsync://rsync1.us.gentoo.org/gentoo-portage/ | 594 MiB | We are rsync4.ca.gentoo.org. |
Gentoo (sources) | rsync://masterdistfiles.gentoo.org/gentoo/ http://www.gentoo.org/main/en/mirrors.xml http://mirrorstats.gentoo.org/ |
169 GiB | See ~sysadmin/passwords/gentoo for rsync password. |
GNOME | rsync://ftp.gnome.org/gnome/ | 91 GiB | - |
GNU | rsync://ftp.ibiblio.org/pub/gnu/ftp/gnu/ http://www.gnu.org/order/ftp.html |
21 GiB | - |
KDE | rsync://master.kde.org/kdeftp/ http://www.kde.org/mirrors/ |
77 GiB | - |
kernel.org | rsync://kernel.org/pub/linux/ rsync://kernel.org/pub/software/ http://kernel.org/mirrors/countries/html/CA.html |
136 GiB | - |
Linux Mint (releases) | rsync://ftp.heanet.ie/pub/linuxmint.com/ http://www.linuxmint.com/mirrors.php |
31 GiB | - |
Linux Mint (packages) | rsync://packages.linuxmint.com/packages/ http://www.linuxmint.com/mirrors.php |
4.4 GiB | - |
mozdev | rsync://rsync.mozdev.org/mozdev/ http://mirrors.mozdev.org/index.html |
6.2 GiB | Currently in the North American rotation, but could request to be added to the global rotation. |
mozilla.org | rsync://releases-rsync.mozilla.org/mozilla-releases/ http://www.mozilla.org/mirrors.html |
122 GiB | - |
MySQL | rsync://mysql.he.net/mysql/ http://dev.mysql.com/downloads/mirrors.html |
167 GiB | - |
non-GNU | rsync://dl.sv.gnu.org/releases/ http://dl.sv.gnu.org/releases/00_MIRRORS.html http://download.savannah.gnu.org/mirmon/ |
16 GiB | - |
Openoffice (extended set) | rsync://rsync.services.openoffice.org/openoffice-extended/ http://distribution.openoffice.org/mirrors/#mirrors http://www.ooodev.org/mirmon/ |
103 GiB | - |
OpenSUSE (opensuse-full) | rsync://stage.opensuse.org/opensuse-full/opensuse/ http://mirrors.opensuse.org/list/all.html |
179 GiB | - |
Slackware | rsync://slackware.cs.utah.edu/slackware/ | 139 GiB | We should get added as an official mirror. |
Ubuntu | rsync://archives.ubuntu.com/ubuntu/ https://launchpad.net/ubuntu/+archivemirrors |
319 GiB | We used to be the official Canadian mirror (i.e., ca.archive.ubuntu.com); when we get more bandwidth get us added back. |
Ubuntu-ports | rsync://ports.ubuntu.com/ubuntu-ports/ | 442 GiB | - |
Ubuntu-ports-releases | rsync://cdimage.ubuntu.com/cdimage/ubuntu-ports/releases/ | 36 GiB | - |
Ubuntu-releases | rsync://rsync.releases.ubuntu.com/releases/ https://launchpad.net/ubuntu/+cdmirrors http://www.ubuntu.com/getubuntu/download http://www.kubuntu.org/download.php http://www.edubuntu.org/Download |
39 GiB | Are the official Canadian mirror (i.e., ca.releases.ubuntu.com); ubuntu-releases includes Ubuntu, Kubuntu, and Edubuntu. |
xorg.freedesktop.org | rsync://xorg.freedesktop.org/xorg-archive/ http://www.x.org/wiki/Releases/Download |
5.5 GiB | - |
Xubuntu-releases | rsync://cdimage.ubuntu.com/cdimage/xubuntu/releases/ http://www.xubuntu.org/get |
21 GiB | - |
Proposed Archives to Mirror
- Fedora
- Mandriva
- OpenBSD
- NetBSD
- PCLinuxOS
- RubyForge
- SourceForge
- MacPorts
- PLT Scheme (they don't ask for mirrors, but they currently offer download from half a dozen or so sites)
- VLC
Implementation Details
The mirroring is done by one of three scripts. The latter two are based on anonftpsync. merlin is used to call one of these scripts.
ftpsync
ftpsync is the official Debian mirror synchronization tool, and is used to rsync the Debian repository. It's located in ~mirror/debian. Its invocation takes a few steps (this is more or less how merlin invokes it:
export BASEDIR=/home/mirror/debian cd $BASEDIR ./bin/ftpsync sync:stage1 ./bin/ftpsync sync:stage2
csc-sync-debian
This is used to sync debian-style repositories. It's usage is:
csc-sync-debian local_dir rsync_host rsync_dir [trace_host [trace_dir]]
If trace_host is specified, then $rsync_dir/project/trace/$trace_host is checked to see if it has changed. If it has, a normal debian-style (two-pass) rsync is done.
csc-sync-standard
This is used to sync a tree in a general way. Like anonftpsync, it supports locking and logging. It's usage is:
csc-sync-standard local_dir rsync_host rsync_dir
merlin
The synchronization process is run by a Python script called "merlin", ostensibly written by mspang, stored in ~mirror/merlin. The repository list, sync time, etc. is maintained in merlin.py.
HTTP
We use Apache as our web server. Here's a snippet of the worker configuration:
<IfModule mpm_worker_module> ServerLimit 64 ThreadLimit 64 StartServers 2 MaxClients 4096 MinSpareThreads 16 MaxSpareThreads 48 ThreadsPerChild 64 MaxRequestsPerChild 0 </IfModule>
We use the bwbar application to display current bandwidth in the footer of mirror pages.
Index
An index of the archives we mirror is available at http://mirror.csclub.uwaterloo.ca/. As of Winter 2010, it is now generated by a Python script in ~mirror/mirror-index, the details of which are below.
~mirror/mirror-index/make-index.py is scheduled in mirror's crontab to be run at 5:40 AM on the 14th and 28th of each month. The script can be run manually when needed (for example, when an archive is removed) as follows:
sudo -u mirror /home/mirror/mirror-index/make-index.py
This causes an instance of du to be run which computes the size of each directory. This list is then sorted alphabetically by directory name and returned to the Python script. If any errors occur during this process, the script conservatively chooses to exit rather than risk generating an index file that is incorrect.
make-index.py is configured by means of a YAML file, config.yaml, in the same directory. Its format is as follows:
docroot: /mirror/root duflags: --human-readable --max-depth=1 output: /mirror/root/index.html directories: apache: site: apache.org url: http://www.apache.org/ archlinux: site: archlinux.org url: http://www.archlinux.org/ # (...)
The docroot is the directory which is to be scanned; this will probably always be the mirror root from which Apache serves. duflags specifies the flags to be passed to du. This is here so that it's easy to find and alter. For instance, we could change --human-readable to --si if we ever decided that, like hard disk manufacturers, we want sizes to appear larger than they are. output defines the file to which the generated index will be written.
Finally, directories specifies the list of directories to be listed. No directories not listed here will be shown. If you add a new archive and it doesn't appear, that's why. The format is fairly straightforward: simply name the directory and provide a site (the display name in the "Project Site" column) and URL.
One caveat here is that YAML does not allow tabs for whitespace. Indent with two spaces to remain consistent with the existing file format, please. Also note that the directory name is case-sensitive, as is always the case on Unix.
Finally, the HTML index file is generated from index.mako, a Mako template (which is mostly HTML anyhow). If you really can't figure out how it works, look up the Mako documentation.
FTP
We use proftpd (standalone daemon) as our ftp server. To increase performance we disable DNS lookups in proftpd.conf:
UseReverseDNS off IdentLookups off
We also limit the amount of CPU/memory resources used (e.g. to minimize Globbing resources):
RLimitCPU session 10 RLimitMemory session 4096K
We allow a maximum of 200 concurrent ftp sessions:
MaxInstances 500 MaxClients 500
rsync
We use rsyncd (standalone daemon). We disable compression and checksumming in rsyncd.conf:
dont compress = * refuse options = c delete
For ftp and rsync, the contents of /mirror/root/include/motd.msg are displayed when users connect.