Sysadmin Guide: Difference between revisions

From CSCWiki
Jump to navigation Jump to search
m (Update mailman link)
Line 37: Line 37:
<ul>
<ul>
<li>
<li>
Log back into each MC machine and make sure that <code>/users</code> was mounted correctly. If not, check /etc/network/interfaces to get the name of the VLAN device, and use <code>ip addr</code> to see if the interface is up. If it is not up, try to use ifup; if that doesn't work, manually bring up the device and assign it the appropriate IP addresses using iproute2.
Log back into each MC machine and make sure that <code>/users</code> was mounted correctly. If not, check /etc/network/interfaces to get the name of the VLAN device, and use <code>ip addr</code> to see if the interface is up. If it is not up, try to use ifup; if that doesn't work, manually bring up the device and assign it the appropriate IP addresses using iproute2:
<pre>
# check /etc/network/interfaces for the interface name and IP
ip link add name ens3.530 link ens3 type vlan id 530
ip addr add dev ens3.530 172.19.168.49/27
ip addr add dev ens3.530 fd74:6b6a:8eca:4903:c5c::49/64
ip link set dev ens3.530 up
</pre>
</li>
</li>
</ul>
</ul>

Revision as of 00:22, 18 July 2021

The system administrator chairs the Systems Committee, and is responsible for keeping all of our computers in working order. The CSC computing environment is good, but not nearly perfect, and the sysadmin should look for ways to improve it. We don't have a strict "if it works, don't touch it" policy, and encourage people to try new things to see if they work better. Because of this, we don't have "5 nines" uptime or anything close, but do have a modern computing environment that is constantly improving. Our systems should be, and often are, better at the end of term than the beginning.

Early in the term, the sysadmin should consider what hardware upgrades we would like to have, and send proposals to the treasurer to add to the budget. A bit later, this happens again with MEF proposals.

The sysadmin should also make sure requests by our users (to systems-committee@csclub) are answered, and make recommendations to the Executive Council to add new systems committee members or reevaluate old ones.

Power Outages

Occasionally MC will undergo planned power outages. These usually last from the morning until the evening. a2brenna or someone from IST will hopefully give us a notice in advance. When this happens, you should:

Pre-Outage

  • Send an email to csc-general announcing the outage (example here)
  • Create an announcement on our main website announcing the outage
  • Announce the outage in the #csc IRC channel and update the channel topic to show outage information
  • Schedule the shutdown the night before the outage using the shutdown command on all of our MC machines, e.g.
    sudo shutdown 06:00 "CSC systems will be unavailable for a power outage 7am -> 5pm. This machine will shutdown at 6:00AM EDT."
    
  • If the real machines hosting the web server (phosphoric-acid) and mirror (potassium-benzoate) cannot be kept up during the outage, set up a backup web server in an LXC container on a machine which is not located in MC (currently there is a container named dr-website on biloba). After the MC machines have shutdown, assign the IP addresses of csclub.uwaterloo.ca and mirror.csclub.uwaterloo.ca to the backup container.
    TODO: Consider using keepalived to automate this process.

Post-Outage

  • Log back into each MC machine and make sure that /users was mounted correctly. If not, check /etc/network/interfaces to get the name of the VLAN device, and use ip addr to see if the interface is up. If it is not up, try to use ifup; if that doesn't work, manually bring up the device and assign it the appropriate IP addresses using iproute2:
    # check /etc/network/interfaces for the interface name and IP
    ip link add name ens3.530 link ens3 type vlan id 530
    ip addr add dev ens3.530 172.19.168.49/27
    ip addr add dev ens3.530 fd74:6b6a:8eca:4903:c5c::49/64
    ip link set dev ens3.530 up
    

Let's Encrypt certificates

Make sure to read SSL first.

We handle LE certs for members and clubs who host their websites on our servers. The certs should be renewed automatically; if they do not, then something is very wrong. There are plans underway to migrate from certbot to dehydrate since the apt version of certbot appears to be broken.

If you get an email LE warning you that a cert is about to expire, login to caffeine and check /var/log/letsencrypt/letsencrypt.log. There should usually be some clue as to what went wrong. Often, a club or member decides that they no longer want to host their website on our servers, in which case the cert can safely be removed via certbot delete --cert-name CERT_NAME. Make sure to also delete the corresponding Apache config files. Sometimes a subset of the domains for a member have become invalid, in which case they must be removed from the cert. One way to do this is via certbot certonly --webroot -d domain1.com -d domain2.com .... Only list the domains which are still valid with the -d flags; omitted domains will be removed. Make sure to update the corresponding Apache config files.

uwaterloo.ca subdomains

Make sure to read Web Hosting first.

If a member or club requests a uwaterloo.ca subdomain, first make sure that their website is being hosted on our servers. Then, forward the email to hostmaster (at) uwaterloo.ca, and ask them to make the domain a CNAME for caffeine.csclub.uwaterloo.ca. You will also need to add a VirtualHost entry in /etc/apache2 on caffeine, redirecting requests to /users/club_name/www.

Make sure to create a new Let's Encrypt certificate for the domain.

Mailing list subscriptions

At the very least, you need to be subscribed to the syscom and exec mailing lists. You may also wish to subscribe to the following:

  • git: Get alerts on some git commits (mainly Apache configs)
  • packages: Get alerts of when packages are added to our Debian repository
  • ceo: Get alerts on new user accounts