Mirror: Difference between revisions
No edit summary |
|||
Line 176: | Line 176: | ||
#* <code>rm OLD_DIR</code> |
#* <code>rm OLD_DIR</code> |
||
#* <code>ln -s .cscmirror0/NEW_DIR NEW_DIR</code> |
#* <code>ln -s .cscmirror0/NEW_DIR NEW_DIR</code> |
||
# Add a symlink for the old name (in <code>/mirror/root</code>) so that existing users won't be broken by the change |
|||
#* <code>ln -s NEW_DIR OLD_DIR</code> |
|||
# Modify index page generator config |
# Modify index page generator config |
||
#* At <code>~mirror/mirror-index-ng/synctask2project/config.toml</code> |
#* At <code>~mirror/mirror-index-ng/synctask2project/config.toml</code> |
||
# Update an mirror registrations with the project to ensure the new URLs are used |
|||
=== Secondary Mirror === |
=== Secondary Mirror === |
Revision as of 15:56, 19 June 2023
The Computer Science Club runs a public mirror (mirror.csclub.uwaterloo.ca) on potassium-benzoate.
We are listed on the ResNet "don't count" list, so downloading from our mirror will not count against one's ResNet quota.
Software Mirrored
A list of current archives (and their respective disk usage) is listed on our mirror's homepage at mirror.csclub.uwaterloo.ca.
Mirroring Requests
Requests to mirror a particular distribution or archive should be made to syscom@csclub.uwaterloo.ca.
Implementation Details
Syncing
Storage
All of our projects are stored on an 8x18TB disk raidz2 array (cscmirror0). There is an additional drive acting as a hot-spare.
/mirror/root/.cscmirror0
Each project is given a filesystem the pool. Symlinks are created /mirror/root
to point to the correct pool and file system.
Merlin
Project synchronization is done by "merlin" which is a Go rewrite of the Python script "merlin" originally written by a2brenna.
The program is stored in ~mirror/merlin
and is managed by the systemd unit merlin-go.service
.
The config file merlin-config.ini
contains the list of repositories along with their configurations.
To view the sync status, execute ~mirror/merlin/cmd/arthur/arthur status
. To force the sync of a project, execute ~mirror/merlin/cmd/arthur/arthur sync:PROJECT_NAME
.
Remark: For syncing Debian repositories we were requested to use ftpsync which has configs in ~mirror/ftpsync
.
Push Sync
Some projects support push syncing via SSH.
We are running a special SSHD instance on mirror.csclub.uwaterloo.ca:22. This instance has been locked down, with the following settings:
- Only SSH key authentication
- Only users of the
push
group (exceptmirror
) are allowed to connect - X11 Forwarding, TCP Forwarding, Agent Forwarding, User RC and TTY are disabled
- Users are chrooted to
/mirror/merlin
Most projects will connect using the push
user. The SSH authorized keys file is located at /home/push/.ssh/authorized_keys
. An example entry is:
restrict,no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="arthur sync:ubuntu >/dev/null 2>/dev/null </dev/null &",from="XXX.XXX.XXX.XXX" ssh-rsa ...
Sync Scripts
Our collection of synchronization scripts are located in ~mirror/bin
. They currently include:
csc-sync-apache
csc-sync-debian
csc-sync-debian-cd
csc-sync-gentoo
csc-sync-ssh
csc-sync-standard
Most of these scripts take the following parameters:
local_dir rsync_host rsync_dir
HTTP(s)
We use nginx as our webserver.
Index
An index of the archives we mirror is available at mirror.csclub.uwaterloo.ca.
As of Winter 2010, it is now generated by a Python script in ~mirror/mirror-index
.
~mirror/mirror-index/make-index
is scheduled in /etc/cron.d/csc-mirror
to be run hourly. The script can be run manually when needed (for example, when the archive list is updated) by running:
sudo -u mirror /home/mirror/mirror-index/make-index.py
The script will iterate all folders in /mirror/root
, identify the size of the project using `zfs get -H -o value used $dataset`, where $dataset is calculated from the symlink in /mirror/root
. The size of all folders is added together to calculate the total folder size (the total size includes hidden projects).
make-index.py
is configured by means of a YAML file, config.yaml
, in the same directory. Its format is as follows:
docroot: /mirror/root output: /mirror/root/index.html exclude: - include - lost+found - pub # (...) directories: apache: site: apache.org url: http://www.apache.org/ archlinux: site: archlinux.org url: http://www.archlinux.org/ # (...)
The docroot is the directory which is to be scanned; this will probably always be the mirror root from which Apache serves. This is here so that it's easy to find and alter. For instance, we could change --human-readable
to --si
if we ever decided that, like hard disk manufacturers, we want sizes to appear larger than they are. output
defines the file to which the generated index will be written.
exclude
specifies the list of directories which will not be included in the generated index page (since, by default, all folders are included in the generated index page).
Finally, directories
specifies the information of directories. All directories are listed by default, whether or not they appear in this list - only those under exclude
are ignored. The format is fairly straightforward: simply name the directory and provide a site (the display name in the "Project Site" column) and URL. One caveat here is that YAML does not allow tabs for whitespace. Indent with two spaces to remain consistent with the existing file format, please. Also note that the directory name is case-sensitive, as is always the case on Unix.
Finally, the HTML index file is generated from index.mako
, a Mako template (which is mostly HTML anyhow). If you really can't figure out how it works, look up the Mako documentation.
FTP
UPDATE: We now use vsftpd instead. See /etc/vsftpd.conf for details. Official documentation can be found here.
We use proftpd (standalone daemon) as our FTP server.
To increase performance, we disable DNS lookups in proftpd.conf
:
UseReverseDNS off IdentLookups off
We also limit the amount of CPU/memory resources used (e.g. to minimize Globbing resources):
RLimitCPU session 10 RLimitMemory session 4096K
We allow a maximum of 500 concurrent FTP sessions:
MaxInstances 500 MaxClients 500
The contents of /mirror/root/include/motd.msg
are displayed when a user connects.
rsync
We use rsyncd
(standalone daemon).
We disable compression and checksumming in rsyncd.conf
:
dont compress = * refuse options = c delete
The contents of /mirror/root/include/motd.msg
are displayed when a user connects.
Mirror Administration
Adding a new project
- Find the instructions for mirroring the project. Ideally, try to sync directly from the project’s source repository.
- Note that some projects provide sync scripts, however we generally won’t use them. We will instead use our custom ones.
- Create a zfs filesystem to store the project in:
zfs create cscmirror0/$PROJECT_NAME
- Change the folder ownership
chown mirror:mirror /mirror/root/.cscmirror0/$PROJECT_NAME
- Create the symlink in
/mirror/root
ln -s .cscmirror0/$PROJECT_NAME $PROJECT_NAME
(NOTE: The symlink must be relative to the/mirror/root
directory. If it isn’t, the symlinks will not work when chrooted)
- Repeat the above steps on mirror-phys.
sudo ssh mirror-dc
on potassium-benzoate [NOTE: This machine is currently unavailable] - Configure the project in merlin (
~mirror/merlin/merlin.py
)- Select the appropriate sync script (typically
csc-sync-standard
) and supply the appropriate parameters
- Select the appropriate sync script (typically
- Restart merlin:
systemctl restart merlin
- This will kick off the initial sync
- Check
~mirror/merlin/logs/$PROJECT_NAME
for errors,~mirror/merlin/logs/transfer.log
for transfer progress
- Configure the project in zfssync.yml (
~mirror/merlin/zfssync.yml
) - Update the mirror index configuration (
~mirror/mirror-index-ng/synctask2project/config.toml
) - Add the project to rsync (
/etc/rsyncd.conf
)- Restart rsync with
systemctl restart rsync
- Restart rsync with
If push mirroring is available/required, see Push Sync.
Rename project
- Change project name (title) and local_dir in
merlin-config.ini
- Change zfs dataset name
zfs rename cscmirror0/OLD_NAME cscmirror0/NEW_NAME
- Reload merlin config
systemctl reload merlin-go.service
- Remove old symlink and create new symlink in mirror root
rm OLD_DIR
ln -s .cscmirror0/NEW_DIR NEW_DIR
- Add a symlink for the old name (in
/mirror/root
) so that existing users won't be broken by the changeln -s NEW_DIR OLD_DIR
- Modify index page generator config
- At
~mirror/mirror-index-ng/synctask2project/config.toml
- At
- Update an mirror registrations with the project to ensure the new URLs are used
Secondary Mirror
The School of Computer Science's CSCF has provided us with a secondary mirror machine located in DC. This will limit the downtime of mirror.csclub in the event of an outage affecting the MC machine room.
As of June 2023, CSCF mirror is down. CSCF is planing to bring it back with new hardware but no ETA.
Keepalived
Mirror's IP addresses (129.97.134.71 and 2620:101:f000:4901:c5c::f:1055) have been configured has VRRP address on both machines. Keepalived does the monitoring and selecting of the active node.
Potassium-benzoate has higher priority and will typically be the active node. A node's priority is reduced when nginx, proftpd or rsync are not running. Potassium-benzoate starts with a score of 100 and mirror-dc starts with a priority of 90 (higher score wins).
When nginx is unavailable (checked w/ curl), the priority is reduced by 20. When proftpd is unavailable (checked with curl), the priority is reduced by 5. When rsync is unavailable (checking with rsync), the priority is reduced by 15.
The Systems Committee should received an email when the nodes swap position.
Project synchronization
Only potassium-benzoate is configure with merlin. mirror-dc has the software components, but they are probably not update to date nor configured to run correctly.
When a project sync is complete, merlin will kick off a custom script to sync the zfs dataset to the other node. These scripts live in /usr/local/bin and in ~mirror/merlin.