Sysadmin Guide: Difference between revisions
Line 7: | Line 7: | ||
=== Power Outages === |
=== Power Outages === |
||
Occasionally MC will undergo planned power outages. These usually last from the morning until the evening. a2brenna or someone from IST will hopefully give us a notice in advance. When this happens, you should: |
Occasionally MC will undergo planned power outages. These usually last from the morning until the evening. a2brenna or someone from IST will hopefully give us a notice in advance. When this happens, you should: |
||
==== Pre-Outage ==== |
|||
<ul> |
<ul> |
||
<li> |
<li> |
||
Line 30: | Line 31: | ||
<br> |
<br> |
||
TODO: Consider using keepalived to automate this process. |
TODO: Consider using keepalived to automate this process. |
||
</li> |
|||
</ul> |
|||
==== Post-Outage ==== |
|||
<ul> |
|||
<li> |
|||
Log back into each MC machine and make sure that <code>/users</code> was mounted correctly. If not, check /etc/network/interfaces to get the name of the VLAN device, and use <code>ip addr</code> to see if the interface is up. If it is not up, try to use ifup; if that doesn't work, manually bring up the device and assign it the appropriate IP addresses using iproute2. |
|||
</li> |
</li> |
||
</ul> |
</ul> |
Revision as of 17:44, 20 March 2021
The system administrator chairs the Systems Committee, and is responsible for keeping all of our computers in working order. The CSC computing environment is good, but not nearly perfect, and the sysadmin should look for ways to improve it. We don't have a strict "if it works, don't touch it" policy, and encourage people to try new things to see if they work better. Because of this, we don't have "5 nines" uptime or anything close, but do have a modern computing environment that is constantly improving. Our systems should be, and often are, better at the end of term than the beginning.
Early in the term, the sysadmin should consider what hardware upgrades we would like to have, and send proposals to the treasurer to add to the budget. A bit later, this happens again with MEF proposals.
The sysadmin should also make sure requests by our users (to systems-committee@csclub) are answered, and make recommendations to the Executive Council to add new systems committee members or reevaluate old ones.
Power Outages
Occasionally MC will undergo planned power outages. These usually last from the morning until the evening. a2brenna or someone from IST will hopefully give us a notice in advance. When this happens, you should:
Pre-Outage
- Send an email to csc-general announcing the outage (example here)
- Create an announcement on our main website announcing the outage
- Announce the outage in the #csc IRC channel and update the channel topic to show outage information
-
Schedule the shutdown the night before the outage using the
shutdown
command on all of our MC machines, e.g.sudo shutdown 06:00 "CSC systems will be unavailable for a power outage 7am -> 5pm. This machine will shutdown at 6:00AM EDT."
-
If the real machines hosting the web server (phosphoric-acid) and mirror (potassium-benzoate) cannot be kept up during the outage,
set up a backup web server in an LXC container on a machine which is not located in MC (currently there is a container named dr-website
on biloba). After the MC machines have shutdown, assign the IP addresses of csclub.uwaterloo.ca and mirror.csclub.uwaterloo.ca to
the backup container.
TODO: Consider using keepalived to automate this process.
Post-Outage
-
Log back into each MC machine and make sure that
/users
was mounted correctly. If not, check /etc/network/interfaces to get the name of the VLAN device, and useip addr
to see if the interface is up. If it is not up, try to use ifup; if that doesn't work, manually bring up the device and assign it the appropriate IP addresses using iproute2.
Let's Encrypt certificates
Make sure to read SSL first.
We handle LE certs for members and clubs who host their websites on our servers. The certs should be renewed automatically; if they do not, then something is very wrong. There are plans underway to migrate from certbot
to dehydrate
since the apt version of certbot appears to be broken.
uwaterloo.ca subdomains
Make sure to read Web Hosting first.
If a member or club requests a uwaterloo.ca subdomain, first make sure that their website is being hosted on our servers.. Then, forward the email to hostmaster (at) uwaterloo.ca, and make their domain a CNAME for caffeine.csclub.uwaterloo.ca.