CSCWiki - User contributions [en]

LDAP

2025-01-17T01:36:01Z

Merenber: /* Changing a user's username */

We use [http://www.openldap.org/ OpenLDAP] for directory services. Our primary LDAP server is [[Machine_List#auth1|auth1]] and our secondary LDAP server is [[Machine_List#auth2|auth2]].

=== ehashman's Guide to Setting up OpenLDAP on Debian ===

Welcome to my nightmare.

==== What is LDAP? ====

<blockquote>'''LDAP:''' Lightweight Directory Access Protocol

An open, vendor-neutral, industry standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network. — [https://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol Wikipedia: LDAP]
</blockquote>
In this case, "directory" refers to the user directory, like on an old-school Rolodex. Many groups use LDAP to maintain their user directory, including the University (the "WatIAM" identity management system), the Computer Science Club, and even the UW Amateur Radio Club.

This is a guide documenting how to set up LDAP on a Debian Linux system.

==== First steps ====

<ul>
<li>Ensure that openldap is installed on the machine:
<pre># apt-get install slapd ldap-utils</pre></li>
<li>Debian will do a lot of magic and set up a skeleton LDAP server and get it running. We need to configure that further.</li>
<li>Let's set up logging before we forget. Create the following files in <code>/var/log</code>:
<pre># mkdir /var/log/ldap
# touch /var/log/ldap.log</pre></li>
<li>Set ownership correctly:
<pre># chown openldap:openldap /var/log/ldap</pre></li>
<li>Set up rsyslog to dump the LDAP logs into <code>/var/log/ldap.log</code> by adding the following lines:
<pre># vim /etc/rsyslog.conf
...
# Grab ldap logs, don't duplicate in syslog
local4.* /var/log/ldap.log</pre></li>
<li>Set up log rotation for these by creating the file [https://git.uwaterloo.ca/wics/documentation/blob/master/ldap/logrotate.d.ldap <code>/etc/logrotate.d/ldap</code>] with the following contents:
<pre>/var/log/ldap/*log {
weekly
missingok
rotate 1000
compress
delaycompress
notifempty
create 0640 openldap adm
postrotate
if [ -f /var/run/slapd/slapd.pid ]; then
/etc/init.d/slapd restart >/dev/null 2>&1
fi
endscript
}

/var/log/ldap.log {
weekly
missingok
rotate 24
compress
delaycompress
notifempty
}</pre></li>
<li>As of OpenLDAP 2.4, it doesn't actually create a config file for us. Apparently, this is a "feature": LDAP maintainers think we should want to set this up via dynamic queries. We don't, so the first thing we need is our [https://git.uwaterloo.ca/wics/documentation/blob/master/ldap/slapd.conf <code>slapd.conf</code>] file.</li></ul>

===== Building <code>slapd.conf</code> from scratch =====

<ul>
<li>Get a copy to work with:
<pre># scp uid@auth1.csclub.uwaterloo.ca:/etc/ldap/slapd.conf /etc/ldap/ ## you need CSC root for this</pre></li>
<li>You'll want to comment out the TLS lines, and anything referring to Kerberos and access for now. You'll also want to comment out lines specifically referring to syscom and office staff.</li>
<li>Make sure you remove the reference to <code>nonMemberTerm</code> as an index, as we're going to remove this field.</li>
<li>You'll also need to generate a root password for the LDAP to bootstrap auth, like so:
<pre># slappasswd
New password:
Re-enter new password:
{SSHA}longhash</pre></li>
<li>Add this line below <code>rootdn</code> in the <code>slapd.conf</code>:
<pre>rootpw {SSHA}longhash</pre></li>
<li>Now we want to edit all instances of "csclub" to be "wics" instead, e.g.:
<pre>suffix "dc=wics,dc=uwaterloo,dc=ca"
rootdn "cn=root,dc=wics,dc=uwaterloo,dc=ca"</pre></li>
<li>Next, we need to grab all the relevant schemas:
<pre>scp -r uid@auth1.csclub.uwaterloo.ca:/etc/ldap/schema/ /tmp/schemas</pre></li>
<li>Use the include directives to help you find the ones you need. I noticed we were missing <code>sudo.schema</code>, <code>csc.schema</code>, and <code>rfc2307bis.schema</code>.</li>
<li>Open up the [https://git.uwaterloo.ca/wics/documentation/blob/master/ldap/csc.schema <code>csc.schema</code>] for editing; we're not using it verbatim. Remove the attributes <code>studentid</code> and <code>nonMemberTerm</code> and the objectclass <code>club</code>. Also make sure you change the OID so we don't clash with the CSC. Because we didn't want to go through the process of requesting a [http://pen.iana.org/pen/PenApplication.page PEN number], we chose arbitrarily to use 26338, which belongs to IWICS Inc.</li>
<li>We also need to can the auto-generated config files, so do that:
<pre># rm -rf /etc/openldap/slapd.d/*</pre></li>
<li>Also nuke the auto-generated database:
<pre># rm /var/lib/ldap/__db.*</pre></li>
<li>Configure the database:
<pre># cp /usr/share/slapd/DB_CONFIG /var/lib/ldap/
# chown openldap:openldap /var/lib/ldap/DB_CONFIG </pre></li>
<li>Now we can generate the new configuration files:
<pre># slaptest -f /etc/ldap/slapd.conf -F /etc/ldap/slapd.d/</pre></li>
<li>And ensure that the permissions are all set correctly, lest this break something:
<pre># chown -R openldap:openldap /etc/ldap/slapd.d</pre></li>
<li>If at this point you get a nasty error, such as
<pre>5657d4db hdb_db_open: database "dc=wics,dc=uwaterloo,dc=ca": db_open(/var/lib/ldap/id2entry.bdb) failed: No such file or directory (2).
5657d4db backend_startup_one (type=hdb, suffix="dc=wics,dc=uwaterloo,dc=ca"): bi_db_open failed! (2)
slap_startup failed (test would succeed using the -u switch)</pre>
Just try restarting slapd, and see if that fixes the problem:
<pre># service slapd stop
# service slapd start</pre></li>
<li>Congratulations! Your LDAP service is now configured and running.</li></ul>

==== Getting TLS Up and Running ====

<ul>
<li>Now that we have our LDAP service, we'll want to be able to serve encrypted traffic. This is especially important for any remote access, since binding to LDAP (i.e. sending it a password for auth) occurs over plaintext, and we don't want to leak our admin password.</li>
<li>Our first step is to copy our SSL certificates into the correct places. Public ones go into <code>/etc/ssl/certs/</code> and private ones go into <code>/etc/ssl/private/</code>.</li>
<li>Since the LDAP daemon needs to be able to read our private cert, we need to grant LDAP access to the private folder:
<pre># chgrp openldap /etc/ssl/private
# chmod g+x /etc/ssl/private</pre></li>
<li>Next, uncomment the TLS-related settings in <code>slapd.conf</code>. These are <code>TLSCertificateFile</code> (the public cert), <code>TLSCertificateKeyFile</code> (the private key), <code>TLSCACertificateFile</code> (the intermediate CA cert), and <code>TLSVerifyClient</code> (set to "allow").
<pre># enable TLS connections
TLSCertificateFile /etc/ssl/certs/wics-wildcard.crt
TLSCertificateKeyFile /etc/ssl/private/wics-wildcard.key

# enable TLS client authentication
TLSCACertificateFile /etc/ssl/certs/GlobalSign_Intermediate_Root_SHA256_G2.pem
TLSVerifyClient allow</pre></li>
<li>Update all your LDAP settings:
<pre># rm -rf /etc/openldap/slapd.d/*
# slaptest -f /etc/ldap/slapd.conf -F /etc/ldap/slapd.d/
# chown -R openldap:openldap /etc/ldap/slapd.d</pre></li>
<li>And last, ensure that LDAP will actually serve <code>ldaps://</code> by modifying the init script variables in <code>/etc/default/</code>:
<pre># vim /etc/default/slapd
...
SLAPD_SERVICES="ldap:/// ldapi:/// ldaps:///"
...</pre></li>
<li>Now you can restart the LDAP server:
<pre># service slapd restart</pre></li>
<li>And assuming this is successful, test to ensure LDAP is serving on port 636 for <code>ldaps://</code>:
<pre># netstat -ntaup
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:389 0.0.0.0:* LISTEN 22847/slapd
tcp 0 0 0.0.0.0:636 0.0.0.0:* LISTEN 22847/slapd </pre></li></ul>

==== Populating the Database ====

Now you'll need to start adding objects to the database. While we'll want to mostly do this programmatically, there are a few entries we'll need to bootstrap.

===== Root Entries =====

<ul>
<li>Start by creating a file [https://git.uwaterloo.ca/wics/documentation/blob/master/ldap/tree.ldif <code>tree.ldif</code>] to create a few necessary "roots" in our LDAP tree, with the contents:
<pre>dn: dc=wics,dc=uwaterloo,dc=ca
objectClass: dcObject
objectClass: organization
o: Women in Computer Science
dc: wics

dn: ou=People,dc=wics,dc=uwaterloo,dc=ca
objectClass: organizationalUnit
ou: People

dn: ou=Group,dc=wics,dc=uwaterloo,dc=ca
objectClass: organizationalUnit
ou: Group</pre></li>
<li>Now attempt an LDAP add, using the password you set earlier:
<pre># ldapadd -cxWD cn=root,dc=wics,dc=uwaterloo,dc=ca -f tree.ldif
Enter LDAP Password:
adding new entry "dc=wics,dc=uwaterloo,dc=ca"

adding new entry "ou=People,dc=wics,dc=uwaterloo,dc=ca"

adding new entry "ou=Group,dc=wics,dc=uwaterloo,dc=ca"</pre></li>
<li>Test that everything turned out okay, by performing a query of the entire database:
<pre># ldapsearch -x -h localhost
# extended LDIF
#
# LDAPv3
# base <dc=wics,dc=uwaterloo,dc=ca> (default) with scope subtree
# filter: (objectclass=*)
# requesting: ALL
#

# wics.uwaterloo.ca
dn: dc=wics,dc=uwaterloo,dc=ca
objectClass: dcObject
objectClass: organization
o: Women in Computer Science
dc: wics

# People, wics.uwaterloo.ca
dn: ou=People,dc=wics,dc=uwaterloo,dc=ca
objectClass: organizationalUnit
ou: People

# Group, wics.uwaterloo.ca
dn: ou=Group,dc=wics,dc=uwaterloo,dc=ca
objectClass: organizationalUnit
ou: Group

# search result
search: 2
result: 0 Success

# numResponses: 4
# numEntries: 3</pre></li></ul>

===== Users and Groups =====

<ul>
<li>Next, add users to track the current GID and UID. This will save us from querying the entire database every time we make a new user or group. Create this file, [https://git.uwaterloo.ca/wics/documentation/blob/master/ldap/nextxid.ldif <code>nextxid.ldif</code>]:
<pre>dn: uid=nextuid,ou=People,dc=wics,dc=uwaterloo,dc=ca
cn: nextuid
objectClass: account
objectClass: posixAccount
objectClass: top
uidNumber: 20000
gidNumber: 20000
homeDirectory: /dev/null

dn: cn=nextgid,ou=Group,dc=wics,dc=uwaterloo,dc=ca
objectClass: group
objectClass: posixGroup
objectClass: top
gidNumber: 10000</pre>
You'll see here that our first GID is 10000 and our first UID is 20000.</li>
<li>Now add them, like you did with the roots of the tree:
<pre># ldapadd -cxWD cn=root,dc=wics,dc=uwaterloo,dc=ca -f nextxid.ldif
Enter LDAP Password:
adding new entry "uid=nextuid,ou=People,dc=wics,dc=uwaterloo,dc=ca"

adding new entry "cn=nextgid,ou=Group,dc=wics,dc=uwaterloo,dc=ca"</pre></li></ul>

===== Special <code>sudo</code> Entries =====

<ul>
<li>We also need to add a sudoers OU with a defaults object for default sudo settings. We also need entries for syscom, such that members of the syscom group can use sudo on all hosts, and for termcom, whose members can use sudo on only the office terminals. Call this one [https://git.uwaterloo.ca/wics/documentation/blob/master/ldap/sudoers.ldif <code>sudoers.ldif</code>]:
<pre>dn: ou=SUDOers,dc=wics,dc=uwaterloo,dc=ca
objectClass: organizationalUnit
ou: SUDOers

dn: cn=defaults,ou=SUDOers,dc=wics,dc=uwaterloo,dc=ca
objectClass: top
objectClass: sudoRole
cn: defaults
sudoOption: !lecture
sudoOption: env_reset
sudoOption: listpw=never
sudoOption: mailto="wics-sys@lists.uwaterloo.ca"
sudoOption: shell_noargs

dn: cn=%syscom,ou=SUDOers,dc=wics,dc=uwaterloo,dc=ca
objectClass: top
objectClass: sudoRole
cn: %syscom
sudoUser: %syscom
sudoHost: ALL
sudoCommand: ALL
sudoRunAsUser: ALL

dn: cn=%termcom,ou=SUDOers,dc=wics,dc=uwaterloo,dc=ca
objectClass: top
objectClass: sudoRole
cn: %termcom
sudoUser: %termcom
sudoHost: honk
sudoHost: hiss
sudoHost: gosling
sudoCommand: ALL
sudoRunAsUser: ALL</pre></li>
<li>Now add them:
<pre># ldapadd -cxWD cn=root,dc=wics,dc=uwaterloo,dc=ca -f sudoers.ldif
Enter LDAP Password:
adding new entry "ou=SUDOers,dc=wics,dc=uwaterloo,dc=ca"

adding new entry "cn=defaults,ou=SUDOers,dc=wics,dc=uwaterloo,dc=ca"

adding new entry "cn=%syscom,ou=SUDOers,dc=wics,dc=uwaterloo,dc=ca"

adding new entry "cn=%termcom,ou=SUDOers,dc=wics,dc=uwaterloo,dc=ca"</pre></li>
<li>Last, add some special local groups via [https://git.uwaterloo.ca/wics/documentation/blob/master/ldap/local-groups.ldif <code>local-groups.ldif</code>]:
<pre># ldapadd -cxWD cn=root,dc=wics,dc=uwaterloo,dc=ca -f local-groups.ldif</pre>
The local groups are special because they usually are present on all systems, but we want to be able to add users to them at the LDAP level. For instance, the audio group controls access to sound equipment, and the adm group controls log read access.</li>
<li>That's all the entries we have to add manually! Now we can use software for the rest. See [[weo|<code>ceo</code>]] for more details.</li></ul>

=== Querying LDAP ===

There are many tools available for issuing LDAP queries. Queries should be issued to <tt>ldap1.csclub.uwaterloo.ca</tt>. The search base you almost certainly want is <tt>dc=csclub,dc=uwaterloo,dc=ca</tt>. Read access is available without authentication; [[Kerberos]] is used to authenticate commands which require it.

Example:

ldapsearch -x -h ldap1.csclub.uwaterloo.ca -b dc=csclub,dc=uwaterloo,dc=ca uid=ctdalek

The <tt>-x</tt> option causes <tt>ldapsearch</tt> to switch to simple authentication rather than trying to authenticate via SASL (which will fail if you do not have a Kerberos ticket).

The University LDAP server (uwldap.uwaterloo.ca) can also be queried like this. Again, use "simple authentication" as read access is available (from on campus) without authentication. SASL authentication will fail without additional parameters.

Example:

ldapsearch -x -h uwldap.uwaterloo.ca -b dc=uwaterloo,dc=ca "cn=Prabhakar Ragde"

=== Replication ===

While <tt>ldap1.csclub.uwaterloo.ca</tt> ([[Machine_List#auth1|auth1]]) is the LDAP master, an up-to-date replica is available on <tt>ldap2.csclub.uwaterloo.ca</tt> ([[Machine_List#auth2|auth2]]).

In order to replicate changes from the master, the slave maintains an authenticated connection to the master which provides it with full read access to all changes.

Specifically, <tt>/etc/systemd/system/k5start-slapd.service</tt> maintains an active Kerberos ticket for <tt>ldap/auth2.csclub.uwaterloo.ca@CSCLUB.UWATERLOO.CA</tt> in <tt>/var/run/slapd/krb5cc</tt>. This is then used to authenticate the slave to the server, who maps this principal to <tt>cn=ldap-slave,dc=csclub,dc=uwaterloo,dc=ca</tt>, which in turn has full read privileges.

In the event of master failure, all hosts should fail LDAP reads seamlessly over to the slave.

[[Category:Software]]

=== Modifying LDAP entry ===

Editing entries can be easily done with <code>ldapvi</code>. First search for the entry using <code>ldapsearch</code> like above, and change <code>ldapsearch -x</code> to <code>ldapvi -Y GSSAPI</code> to make your edits.

Note that if your <tt>EDITOR</tt> enviroment is set to something not avaliable it will give out errors like

error (misc.c line 180): No such file or directory
editor died
error (ldapvi.c line 83): No such file or directory

This can be fixed by something like

EDITOR=vi ldapvi ******

==== Changing a user's username ====

Only a member of the Systems Committee can change a user's username. '''At all times, a user's username must match the user's username in WatIAM.'''

All changes to an account MUST be done in person so that identity can be confirmed. If a member cannot attend in person, then an alternate method of identity verification may be chosen by the Systems Administrator.

# Edit entries in LDAP (<code>ldapvi -Y GSSAPI</code>)
#* Find and replace the user's old username with the new one (<code>%s/$OLD/$NEW/g</code>)
# Change the user's Kerberos principal (on auth1, <code>renprinc $OLD $NEW</code>)
# Move the user's home directory (on phosphoric-acid, <code>mv /users/$OLD /users/$NEW</code>)
# Modify the user's ~/.forward file if their old username is in it.
# Change the user's csc-general (and csc-industry, if subscribed) email address for <code>$OLD@csclub.uwaterloo.ca</code> to <code>$NEW@csclub.uwaterloo.ca</code>
#* https://mailman.csclub.uwaterloo.ca/admin/csc-general
# If the user has vhosts on caffeine, update them to point to their new username

If the user's account has been around for a while, and they request it, forward email from their old username to their new one.

# Edit <code>/etc/aliases</code> on mail. <code>$OLD: $NEW</code>
# Run <code>newaliases</code>

SSL

2024-11-07T13:59:36Z

Merenber: /* GlobalSign */

== GlobalSign ==

The CSC currently has an SSL Certificate from GlobalSign for *.csclub.uwaterloo.ca provided at no cost to us through IST. GlobalSign likes to take a long time to respond to certificate signing requests (CSR) for wildcard certs, so our CSR really needs to be handed off to IST at least 2 weeks in advance. You can do it sooner – the certificate expiry date will be the old expiry date + 1 year (+ a bonus ) Having an invalid cert for any length of time leads to terrible breakage, followed by terrible workarounds and prolonged problems.

When the certificate is due to expire in a month or two, syscom should (but apparently doesn't always) get an email notification. This will include a renewal link. Otherwise, use the [https://uwaterloo.ca/information-systems-technology/about/organizational-structure/information-security-services/certificate-authority/globalsign-signed-x5093-certificates/self-service-globalsign-ssl-certificates IST-CA self service system]. Please keep a copy of the key, CSR and (once issued) certificate in <tt>/users/sysadmin/certs</tt>. The OpenSSL examples linked there are good to generate a 2048-bit RSA key and a corresponding CSR. It's probably a good idea to change the private key (as it's not that much effort anyways). Just sure your CSR is for <tt>*.csclub.uwaterloo.ca</tt>.

At the self-service portal, these options worked in 2013. If you need IST assistance, [mailto:ist-ca@uwaterloo.ca ist-ca@uwaterloo.ca] is the email address you should contact.
Products: OrganizationSSL
SSL Certificate Type: Wildcard SSL Certificate
Validity Period: 1 year
Are you switching from a Competitor? No, I am not switching
Are you renewing this Certificate? Yes (paste current certificate)
30-day bonus: Yes (why not?)
Add specific Subject Alternative Names (SANs): No (*.csclub.uwaterloo.ca automatically adds csclub.uwaterloo.ca as a SAN)
Enter Certificate Signing Request (CSR): Yes (paste CSR)
Contact Information:
First Name: Computer Science Club
Last Name: Systems Committee
Telephone: +1 519 888 4567 x33870
Email Address: syscom@csclub.uwaterloo.ca

=== Helpful links ===
* [https://support.globalsign.com/ssl/ssl-certificates-installation/generate-csr-openssl How to generate a new CSR and private key]
* [https://uwaterloo.atlassian.net/wiki/spaces/ISTKB/pages/262013183/How+to+obtain+a+new+GlobalSign+certificate+or+renew+an+existing+one How to obtain a new GlobalSign certificate or renew an existing one]
* [https://system.globalsign.com/bm/public/certificate/poporder.do?domain=PAR12271n5w6s27pvg8d92v4150t GlobalSign UWaterloo self-service page]
* [https://support.globalsign.com/ca-certificates/intermediate-certificates/organizationssl-intermediate-certificates GlobalSign intermediate certificate] (needed to create a certificate chain; see below)

=== OpenSSL cheat sheet ===
<ul>
<li>
Generate a new CSR and private key (do this in a new directory):
<pre>
openssl req -out csclub.uwaterloo.ca.csr -new -newkey rsa:2048 -keyout csclub.uwaterloo.ca.key -nodes
</pre>
Enter the following information at the prompts:
<pre>
Country Name (2 letter code) [AU]:CA
State or Province Name (full name) [Some-State]:Ontario
Locality Name (eg, city) []:Waterloo
Organization Name (eg, company) [Internet Widgits Pty Ltd]:University of Waterloo
Organizational Unit Name (eg, section) []:Computer Science Club
Common Name (e.g. server FQDN or YOUR name) []:*.csclub.uwaterloo.ca
Email Address []:systems-committee@csclub.uwaterloo.ca

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:
</pre>
</li>
<li>
View the information inside a CSR:
<pre>
openssl req -noout -text -in csclub.uwaterloo.ca.csr
</pre>
</li>
<li>
View the information inside a private key:
<pre>
openssl pkey -noout -text -in csclub.uwaterloo.ca.key
</pre>
</li>
<li>
View information inside a certificate:
<pre>
openssl x509 -noout -text -in csclub.uwaterloo.ca.crt
</pre>
</li>
</ul>

=== csclub.cloud ===
Once a year, someone from IST will ask us to create a temporary TXT record for csclub.cloud to prove to GlobalSign that we own it. This must be created at the root of the domain. Since this zone is managed dynamically (via the acme.sh script on biloba, see below), we need to freeze the domain and update /var/lib/bind/db.csclub.cloud directly.

Once you're in the correct server (not Biloba). Here are the steps:
<ol>
<li>Run <code>rndc freeze csclub.cloud</code>.</li>
<li>
Open /var/lib/bind/db.csclub.cloud and add a new TXT record. It'll look something like
<pre>
TXT "_globalsign-domain-verification=blablabla"
</pre>
</li>
<li>
In the same file, make sure to also update the SOA serial number. It should generally be YYYYMMDDNN where NN is a monotonically increasing counter (YYYYMMDD is the current date).
</li>
<li>Run <code>rndc reload</code>.</li>
<li>
Run a DNS query to make sure you can see the TXT record:
<pre>
dig -t txt @dns1 csclub.cloud
dig -t txt @dns2 csclub.cloud
</pre>
</li>
<li>Email back the person from IST and let them know that we created the TXT record.</li>
<li>
Once the certificate has been renewed, delete the TXT record, update the SOA serial number, and run <code>rndc reload</code>.
</li>
<li>Run <code>rndc thaw csclub.cloud</code>.</li>
</ol>

== Certificate Files ==
Let's say you obtain a new certificate for *.csclub.uwaterloo.ca. Here are the files which should be stored in the certs folder:
<ul>
<li>csclub.uwaterloo.ca.key: private key created by openssl</li>
<li>csclub.uwaterloo.ca.csr: certificate signing request created by openssl</li>
<li>order: order number from GlobalSign</li>
<li>csclub.uwaterloo.ca.crt: certificate created by GlobalSign</li>
<li>globalsign-intermediate.crt: intermediate certificate from GlobalSign, obtainable from [https://support.globalsign.com/ca-certificates/intermediate-certificates/organizationssl-intermediate-certificates here]. As of this writing, we use the "OrganizationSSL SHA-256 R3 Intermediate Certificate". Just click the "View in Base64" button and copy the contents.
<ul>
<li>There is an alternative way to get the intermediate certificate: if you run <code>openssl x509 -noout -text -in csclub.uwaterloo.ca.crt</code>, under X509v3 extensions > Authority Information Access, there should be a field called "CA Issuers" which has a URL which looks like http://secure.globalsign.com/cacert/gsrsaovsslca2018.crt. You can download that file and convert it to PEM:
<pre>
wget https://secure.globalsign.com/cacert/gsrsaovsslca2018.crt
openssl x509 -inform der -in gsrsaovsslca2018.crt -out globalsign-intermediate.crt
rm gsrsaovsslca2018.crt
</pre>
</li>
</ul>
</li>
</ul><ul>
<li>csclub.uwaterloo.ca.chain: create this with the following command:
<pre>
cat csclub.uwaterloo.ca.crt globalsign-intermediate.crt > csclub.uwaterloo.ca.chain
</pre>
</li>
<li>csclub.uwaterloo.ca.pem: create this with the following command:
<pre>
cat csclub.uwaterloo.ca.key csclub.uwaterloo.ca.chain > csclub.uwaterloo.ca.pem
chmod 600 csclub.uwaterloo.ca.pem
</pre>
</li>
</ul>

== Certificate Locations ==

Keep a copy of newly generated certificates in /users/sysadmin/certs.

A list of places you'll need to put the new certificate to keep our services running. Private key (if applicable) should be kept next to the certificate with the extension .key.

* caffeine:/etc/ssl/private/csclub-wildcard.crt (for Apache)
* coffee:/etc/ssl/private/csclub.uwaterloo.ca (for PostgreSQL and MariaDB)
* <s>mail:/etc/ssl/private/csclub-wildcard.crt (for Apache, Postfix and Dovecot)</s> (UPDATE: we use certbot now for these)
* mailman:/etc/ssl/private/csclub-wildcard-chain.crt (for Apache)
* rt:/etc/ssl/private/csclub-wildcard.crt (for Apache)
* potassium-benzoate:/etc/ssl/private/csclub-wildcard.crt (for nginx)
* phosphoric-acid:/etc/ssl/private/csclub-wildcard-chain.crt (for ceod)
* auth1:/etc/ssl/private/csclub-wildcard.crt (for slapd, make sure to <code>sudo service slapd restart</code>)
* auth2:/etc/ssl/private/csclub-wildcard.crt (for slapd, make sure to <code>sudo service slapd restart</code>)
* mattermost:/etc/ssl/private/csclub-wildcard.crt (for nginx)
* load-balancer-0(1|2):/etc/ssl/private/csclub.uwaterloo.ca (for haproxy) [temporarily down 2020]
* chat:/etc/ssl/private/csclub-wildcard-chain.crt (for nginx)
* prometheus:/etc/ssl/private/csclub-wildcard-chain.crt (for Apache)
* bigbluebutton:/etc/nginx/ssl/csclub-wildcard-chain.crt (podman container on xylitol)
* icy:/etc/ssl/private/csclub-wildcard.pem (for Icecast)
* chamomile:/etc/ssl/private/cloud.csclub.uwaterloo.ca.chain.crt, /etc/ssl/private/csclub.cloud.chain, /etc/ssl/private/csclub.uwaterloo.ca.chain (for nginx)
* biloba:/etc/ssl/private/cloud.csclub.uwaterloo.ca.chain.crt, /etc/ssl/private/csclub.cloud.chain, /etc/ssl/private/csclub.uwaterloo.ca.chain (for nginx)
* nextcloud (nspawn container inside guayusa): /etc/ssl/private/csclub.uwaterloo.ca.chain (for nginx)
* citric-acid (runs vaultwarden): /etc/ssl/private/csclub.uwaterloo.ca.{chain,key} (for nginx)

Some services (e.g. Dovecot, Postfix) prefer to have the certificate chain in one file. Concatenate the appropriate intermediate root to the end of the certificate and store this as csclub-wildcard-chain.crt.

=== More certificate locations ===
We have some SSL certificates which are not used by web servers, but still need to be renewed eventually.

==== Prometheus node exporter ====
All of our Prometheus node exporters are using mTLS via stunnel (every bare-metal host, as well as caffeine, coffee and mail, is running this exporter). The certificates (both client and server) are set to expire in September 2031; before then, create new keypairs in /opt/prometheus/tls, and deploy the new server.crt, node.crt and node.key to /etc/stunnel/tls on all machines. Restart prometheus and all of the node exporters.

==== ADFS ====
See [[ADFS]]. When the university's IdP certificate expires (October 2025), we can just download a new one and restart Apache; when our own certificate expires (July 2031), we need to submit a new form to IST (please do this before the cert expires).

==== Keycloak ====
See [[Keycloak]]. When the saml-passthrough certificate expires (January 2032), you need to create a new keypair in /srv/saml-passthrough on caffeine, and upload the new certificate into the Keycloak UI (IdP settings). When the Keycloak SP certificate expires (December 2031), make sure to create a new keypair and upload it to the Keycloak UI (Realm Settings).

== letsencrypt ==

We support letsencrypt for our virtual hosts with custom domains. We use the <tt>cerbot</tt> from debian repositories with a configuration file at <tt>/etc/letsencrypt/cli.ini</tt>, and a systemd timer to handle renewals.

The setup for a new domain is:

# Become <tt>certbot</tt> on caffine with <tt>sudo -u certbot bash</tt> or similar.
# Run <tt>certbot certonly -c /etc/letsencrypt/cli.ini -d DOMAIN --logs-dir /tmp</tt>. The logs-dir isn't important and is only needed for troubleshooting.
# Set up the Apache site configuration using the example below. (apache config is in /etc/apache2) Note the permanent redirect to https.
# Make sure to commit your changes when you're done.
# Reloading apache config is <tt>sudo systemctl reload apache2</tt>.

<VirtualHost *:80>
ServerName example.com
ServerAlias *.example.com
ServerAdmin example@csclub.uwaterloo.ca

#DocumentRoot /users/example/www/
Redirect permanent / https://example.com/

ErrorLog /var/log/apache2/example-error.log
CustomLog /var/log/apache2/example-access.log combined
</VirtualHost>

<VirtualHost csclub:443>
SSLEngine on
SSLCertificateFile /etc/letsencrypt/live/example.com/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/example.com/privkey.pem
SSLStrictSNIVHostCheck on

ServerName example.com
ServerAlias *.example.com
ServerAdmin example@csclub.uwaterloo.ca

DocumentRoot /users/example/www

ErrorLog /var/log/apache2/example-error.log
CustomLog /var/log/apache2/example-access.log combined
</VirtualHost>

== acme.sh ==
We are using [https://github.com/acmesh-official/acme.sh acme.sh] for provisioning SSL certificates for some of our *.csclub.cloud domains. It is currently set up under /root/.acme.sh on biloba.

NOTE: acme.sh has a cron job which automatically renews certificates before they expire and reloads NGINX, so you do not have to do anything after issuing and installing a certificate (i.e. "set-and-forget").

=== How to add a new SSL cert for a custom domain on CSC cloud ===
Note: you do not need to acquire a new cert if the requested domain is directly on csclub.cloud, e.g. app1.csclub.cloud. We can re-use our wildcard cert on csclub.cloud for that. However, if a user requests a multi-level domain on csclub.cloud, or a domain hosted on an external registrar, then you will need to create a new cert.

Let's say user <code>ctdalek</code> wants <code>mydomain.com</code> to point to a VM on CSC cloud.
 
TLDR:
<pre>
# Obtain the cert.
# If a subdomain was also requested, pass the -d option multiple times, e.g.
# `-d mydomain.com -d sub.mydomain.com`. Make sure the "main" domain is specified first.
acme.sh --issue -d mydomain.com -w /var/www

# Install the cert.
# If a subdomain was also requested, only specify the "main" domain.
acme.sh --install-cert -d mydomain.com \
--key-file /etc/nginx/ceod/member-ssl/mydomain.com.key \
--fullchain-file /etc/nginx/ceod/member-ssl/mydomain.com.chain \
--reloadcmd "/root/bin/reload-nginx.sh"

# Create a vhost file.
# Look at the other files in the same directory for inspiration.
# Make sure the file starts with the username and an underscore, e.g. "ctdalek_",
# because this is how ceod keeps track of the vhosts.
# Make sure to set the custom domain name(s) and paths to the SSL key/cert.
vim /etc/nginx/ceod/member-vhosts/ctdalek_mydomain.com

# Finally, reload NGINX on both biloba and chamomile. The /etc/nginx/ceod directory
# is shared between them.
/root/bin/reload-nginx.sh
</pre>

=== Installation ===
<pre>
cd /opt
git clone --depth 1 https://github.com/acmesh-official/acme.sh
cd acme.sh
./acme.sh --install -m syscom@csclub.uwaterloo.ca
. "/root/.acme.sh/acme.sh.env"
</pre>
Important: If invoking acme.sh from another program, it needs the environment variables set in acme.sh.env. Currently, that is just
<pre>
LE_WORKING_DIR="/root/.acme.sh"
</pre>

For testing purposes, make sure to use the Let's Encrypt test server:
<pre>
acme.sh --set-default-ca --server letsencrypt_test
</pre>

=== NGINX setup ===
<pre>
mkdir -p /var/www/.well-known/acme-challenge
</pre>
Add the following snippet to your default NGINX file (e.g. /etc/nginx/sites-enabled/default):
<pre>
# For Let's Encrypt
location /.well-known/acme-challenge/ {
alias /var/www/.well-known/acme-challenge/;
}
</pre>

Now assuming that biloba has the IP address for *.csclub.cloud, you can test that everything is working:
<pre>
acme.sh --issue -d app.merenber.csclub.cloud -w /var/www
</pre>
To install a certificate after it's been issued:
<pre>
acme.sh --install-cert -d app.merenber.csclub.cloud \
--key-file /etc/nginx/ceod/member-ssl/app.merenber.csclub.cloud.key \
--fullchain-file /etc/nginx/ceod/member-ssl/app.merenber.csclub.cloud.chain \
--reloadcmd "/root/bin/reload-nginx.sh"
</pre>
At this point, you should add your NGINX vhost file which uses that SSL certificate.
 
To remove a certificate:
<pre>
acme.sh --remove -d app.merenber.csclub.cloud
rm -r /root/.acme.sh/app.merenber.csclub.cloud
rm /etc/nginx/ceod/member-ssl/app.merenber.csclub.cloud.chain
rm /etc/nginx/ceod/member-ssl/app.merenber.csclub.cloud.key
</pre>
Don't forget to remove the NGINX vhost file too.

Once you think you're ready, use a real ACME provider, e.g.
<pre>
acme.sh --set-default-ca --server letsencrypt
</pre>

Since we have a [https://zerossl.com ZeroSSL] account, and ZeroSSL has no rate limit, we are going to use that instead:
<pre>
acme.sh --register-account --server zerossl \
--eab-kid xxxxxxxxxxxx \
--eab-hmac-key xxxxxxxxx
acme.sh --set-default-ca --server zerossl
</pre>

=== DNS challenge ===
To obtain a wildcard certificate (e.g. *.k8s.csclub.cloud), you will need to perform the DNS-01 challenge. We are going to use nsupdate to interact with our BIND9 server on dns1.

On dns1, run:
<pre>
tsig-keygen csc-cloud
</pre>
Paste the output into the appropriate section in /etc/bind/named.conf.local. Also paste it into a file somewhere on biloba, e.g. /etc/csc/csc-cloud-tsig.key.

Add the following to the csclub.cloud zone block:
<pre>
allow-update {
!{
!127.0.0.1;
!::1;
!129.97.134.0/24;
!2620:101:f000:4901::/64;
any;
};
key csc-cloud;
};
</pre>
(We're basically trying to restrict updates to the given IP ranges. See https://serverfault.com/a/417229.)

The 'bind' user can't write to files under /etc/bind, so we're going to move our zone file to /var/lib/bind instead.
Comment out 'file "/etc/bind/db.csclub.cloud";' from named.conf.local and add this line below it:
<pre>
file "/var/lib/bind/db.csclub.cloud";
</pre>
Then run:
<pre>
cp /etc/bind/db.csclub.cloud /var/lib/bind/db.csclub.cloud
chown bind:bind /var/lib/bind/db.csclub.cloud
rndc reload
</pre>

On biloba, check that everything's working:
<pre>
nsupdate -k /etc/csc/csc-cloud-tsig.key -v <<EOF
update add test.csclub.cloud 300 A 0.0.0.0
send
EOF
</pre>
Use a tool such as <code>dig</code> to make sure that the update was successful.
If it worked, you can delete the record:
<pre>
nsupdate -k /etc/csc/csc-cloud-tsig.key -v <<EOF
delete test.csclub.cloud
send
EOF
</pre>
Now we are ready to actually perform the challenge with acme.sh:
<pre>
export NSUPDATE_SERVER="dns1.csclub.uwaterloo.ca"
export NSUPDATE_KEY="/etc/csc/csc-cloud-tsig.key"
acme.sh --issue --dns dns_nsupdate -d 'k8s.csclub.cloud' -d '*.k8s.csclub.cloud'
</pre>
(If something goes wrong, use the <code>--debug</code> flag.)

If all went well, just install the certificate as usual:
<pre>
acme.sh --install-cert -d k8s.csclub.cloud \
--key-file /etc/nginx/ceod/syscom-ssl/k8s.csclub.cloud.key \
--fullchain-file /etc/nginx/ceod/syscom-ssl/k8s.csclub.cloud.chain \
--reloadcmd 'systemctl reload nginx'
</pre>

CloudStack

2024-09-19T01:34:40Z

Merenber:

We are using [https://cloudstack.apache.org/ Apache CloudStack] to provide VMs-as-a-service to members. Our user documentation is here: https://docs.cloud.csclub.uwaterloo.ca

Prerequisite reading:

* [[Ceph]]
* [[Cloud Networking]]

Official CloudStack documentation: http://docs.cloudstack.apache.org/en/4.16.0.0/

== Rebooting machines ==
I'm going to start with this first because this is what future sysadmins are most interested in. If you reboot one of the CloudStack guest machines (as of this writing: biloba, ginkgo and chamomile), then I suggest you perform a live migration of all of the VMs on that host to the other other machines (see [[#Sequential reboot]]).

If this is not possible (e.g. there is not enough capacity on the other machines), then CloudStack will most likely shut down the VMs automatically. You are responsible for restarting them manually after the reboot. You will also need to manually restart any Kubernetes clusters.

Note: if the cloudstack-agent.service is having trouble reconnecting to the management servers after a reboot, just do a systemctl restart and cross your fingers.

=== Sequential reboot ===
If it is possible to reboot the machines one at a time (e.g. for a software upgrade), then it is possible to avoid having any downtime. Login to the web UI as admin, go to Infrastructure > Hosts, hover above the three-dots button for a particular host, then press the "Enable Maintenance Mode" button.
[[File:Cloudstack-enable-maintenance-mode-button.png|1000px]]
 
Wait for the VMs to be migrated to the other machines (press the Refresh button to update the table). If you see an error which says "ErrorInPrepareForMaintenance", just wait it out. If more than 20 minutes have passed and there is still no progress, take the host out of maintenance mode, and put it back into maintenance mode. If this still does not work, restart the management server.

When a host is in maintenance mode, it should look like this:
[[File:Cloudstack-host-in-maintenance-mode.png|1000px]]
 
Once all VMs have been migrated, do whatever you need to do on the physical host; once it is back up, take it back out of maintenance mode from the web UI. Repeat for any other hosts which need to be taken offline.

== Unexpected reboot ==
Sometimes a network interface fails on a machine after the switches in MC are rebooted (looking at you, riboflavin). Or a machine randomly goes offline in the middle of the night (looking at you, ginkgo). Point is, sometimes a machine needs to rebooted, or is forcefully rebooted, without preparation. Unfortunately, CloudStack is unable to recover gracefully from an unexpected reboot. This means that manual intervention is required to get the VMs back into a working state.

Once the machine has come back online, perform the following:
<ol>
<li>All of the VMs which were on that machine will eventually transition to the Stopped state. Wait for this to happen first (from the web UI).</li>
<li>Go to Infrastructure -> Management servers and make sure that both biloba and chamomile are present and running. If not, you may need to restart the management server on the machine (<code>systemctl restart cloudstack-management</code>). Watch the journald logs for any error messages.</li>
<li>Go to Infrastructure -> Hosts and make sure that all three hosts (biloba, chamomile and ginkgo) are present and running. If not, you may need to restart the agent on the machine (<code>systemctl restart cloudstack-agent</code>). Watch the journald logs for any error messages.</li>
<li>If you restart cloudstack-agent, restart virtlogd as well, just for good measure. Watch the journald logs for any error messages.</li>
<li>Restart ONE of the stopped VMs and make sure that it transitions to the Started state. If more than 20 minutes pass and it still hasn't started, restart the management servers and try again.</li>
<li>Restart the rest of the stopped VMs.</li>
</ol>

== Administration ==
To login with the admin account, use the following credentials in the web UI
<ul>
<li>Username: admin</li>
<li>Password: stored in the usual place</li>
<li>Domain: leave this empty</li>
</ul>

There is another admin account for the Members domain. This is necessary to create projects in the Members domain which regular members can access. Note that his account has fewer privileges than the root admin account above (it has the DomainAdmin role instead of the RootAdmin role).
<ul>
<li>Username: membersadmin</li>
<li>Password: stored in the usual place</li>
<li>Domain: Members
</ul>

Note that there are two management servers, one on each of biloba and chamomile (chamomile is a hot standby for biloba). If you restart one of them, you should restart the other as well.

=== CLI ===
CloudStack has a CLI called [https://github.com/apache/cloudstack-cloudmonkey cloudmonkey] which is already set up on biloba. Just run <code>cmk</code> as root to start it up.

Cloudmonkey is basically a shell for the API (https://cloudstack.apache.org/api/apidocs-4.16/). For example, to list all domains:
<pre>
listDomains details=min
</pre>
Run <code>somecommand -h</code> to see all parameters for a particular command (or browse the API documentation).
See https://github.com/apache/cloudstack-cloudmonkey for more details.

== Building packages ==
While CloudStack does provide .deb packages for Ubuntu, unfortunately these don't work on Debian (the 'qemu-kvm' dependency is a virtual package on Debian, but not on Ubuntu). So we're going to build our own packages instead.

We're going to perform the build in a Podman container to avoid polluting the host machine with unnecessary packages. There's a container called cloudstack-build on biloba which you can re-use. If you create a new container, make sure to use the same Podman image as the release for which you're building (e.g. 'debian:bullseye').

The instructions below are adapted from http://docs.cloudstack.apache.org/en/latest/installguide/building_from_source.html

Inside the container, install the dependencies:
<pre>
apt install maven openjdk-11-jdk libws-commons-util-java libcommons-codec-java libcommons-httpclient-java liblog4j1.2-java genisoimage devscripts debhelper python3-setuptools
</pre>
Install Node.js 12 as well (Debian bullseye's version happens to be 12):
<pre>
apt install nodejs npm
</pre>
Build the node-sass module (see [https://github.com/sass/node-sass/issues/1579 this issue] to see why this is necessary):
<pre>
cd ui && npm install && npm rebuild node-sass && cd ..
</pre>
The python3-mysql.connector package is not available in bullseye, so we're going to download and install it from the sid release:
<pre>
curl -LOJ http://ftp.ca.debian.org/debian/pool/main/m/mysql-connector-python/python3-mysql.connector_8.0.15-2_all.deb
apt install ./python3-mysql.connector_8.0.15-2_all.deb
</pre>
Download the CloudStack source code:
<pre>
curl -LOJ http://mirror.csclub.uwaterloo.ca/apache/cloudstack/releases/4.16.0.0/apache-cloudstack-4.16.0.0-src.tar.bz2
tar -jxvf apache-cloudstack-4.16.0.0-src.tar.bz2
cd apache-cloudstack-4.16.0.0-src
</pre>
Download the Maven dependencies:
<pre>
mvn -P deps
</pre>
Now open debian/control and perform the following changes:

* Replace 'qemu-kvm (>=2.5)' with 'qemu-system-x86 (>= 1:5.2)' in the dependencies of cloudstack-agent
* Remove dh-systemd as a build dependency of cloudstack (it's included in debhelper)

Now open debian/rules and add the following flags to the <code>mvn</code> command:
<pre>
-Dmaven.test.skip=true -Dclean.skip=true -Dcheckstyle.skip
</pre>

Now open debian/changelog and change 'unstable' to 'bullseye'.

As of this writing, there is a [https://gitlab.com/libvirt/libvirt/-/issues/161 bug in libvirt] which prevents VMs with more than 4GB of RAM from being created on hosts with cgroups2. Until that issue is fixed, we're going to need to modify the source code. Since we're already building a custom CloudStack package, it's easier to patch CloudStack than to patch libvirt, so paste something like the following into debian/patches/fix-cgroups2-cpu-weight.patch:
<pre>
Description: Workaround for libvirt trying to write a value to the cgroups v2
cpu.weight controller which is greater than the maximum (10000). The
libvirt developers are currently discussing a solution.
Forwarded: not-needed
Origin: upstream, https://gitlab.com/libvirt/libvirt/-/issues/161
Author: Max Erenberg <merenber@csclub.uwaterloo.ca>
Last-Update: 2021-12-03
Index: apache-cloudstack-4.16.0.0-src/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/LibvirtVMDef.java
===================================================================
--- apache-cloudstack-4.16.0.0-src.orig/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/LibvirtVMDef.java
+++ apache-cloudstack-4.16.0.0-src/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/LibvirtVMDef.java
@@ -1483,6 +1483,10 @@ public class LibvirtVMDef {
static final int MAX_PERIOD = 1000000;

public void setShares(int shares) {
+ // Clamp the value to the cgroups v2 cpu.weight maximum until
+ // upstream libvirt gets fixed:
+ // https://gitlab.com/libvirt/libvirt/-/issues/161
+ shares = Math.min(shares, 10000);
_shares = shares;
}
</pre>
I think you have to manually modify that LibvirtVMDef.java file to incorporate those changes (I could be wrong on this, but that's how I did it).

Then paste the following into debian/patches/00list:
<pre>
fix-cgroup2-cpu-weight
</pre>

Finally, import your GPG key into the container (make sure to delete it afterwards!), and build the packages:
<pre>
debuild -k<YOUR_GPG_KEY_ID>
</pre>

There should already be a .dupload.conf in the /root directory in the cloudstack-build container; if you need need another copy, ask a syscom member. Open /root/.ssh/config and change the User parameter to your username. Finally, go to /root and upload the packages to potassium-benzoate (replace the version number):
<pre>
dupload cloudstack_4.16.0.0+1_amd64.changes
</pre>

== Incompatibility with Debian 12 packages ==
After upgrading ginkgo to bookworm, we discovered that libvirt 8+ was incompatible with CloudStack 4.16.0.0. See https://www.shapeblue.com/advisory-on-libvirt-8-compatibility-issues-with-cloudstack/ for details. So we built new packages from the 4.16.1.0 branch of ShapeBlue's GitHub repository. For some reason the cloudstack-management process failed with some errors from SLF4J, so we needed to download some JARs:

<pre>
wget -O /usr/share/cloudstack-management/lib/log4j-1.2.17.jar https://repo1.maven.org/maven2/log4j/log4j/1.2.17/log4j-1.2.17.jar
wget -O /usr/share/cloudstack-management/lib/slf4j-log4j12-1.6.6.jar https://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.6.6/slf4j-log4j12-1.6.6.jar
</pre>

See https://stackoverflow.com/a/70528383 for details.

We also encountered some kind of Java 11 -> 17 incompatibility issue, so following parameters were added to the JAVA_OPTS variable in /etc/default/cloudstack-management:

<pre>
--add-opens java.base/java.lang=ALL-UNNAMED
</pre>

See https://stackoverflow.com/a/41265267 for details. Note that this file is NOT a shell script so you cannot use variable interpolation. You must modify the value of JAVA_OPTS directly.

== Database setup ==
We are using master-master replication between two MariaDB instances on biloba and chamomile. See [https://mariadb.com/kb/en/setting-up-replication/ here] and [https://tunnelix.com/simple-master-master-replication-on-mariadb/ here] for instructions on how to set this up.

To avoid split-brain syndrome, mariadb.cloud.csclub.uwaterloo.ca points to a virtual IP shared by biloba and chamomile via keepalived. This means that only one host is actually handling requests at any moment; the other is a hot standby.

Also add the following parameters to /etc/mysql/my.cnf on the hosts running MariaDB:
<pre>
[mysqld]
innodb_rollback_on_timeout=1
innodb_lock_wait_timeout=600
max_connections=350
log-bin=mysql-bin
binlog-format = 'ROW'
</pre>
Also comment out (or remove) the following line in /etc/mysql/mariadb.conf.d/50-server.cnf:
<pre>
bind-address = 127.0.0.1
</pre>
Now restart MariaDB.

== Management server setup ==
Install the management server from our Debian repository:
<pre>
apt install cloudstack-management
</pre>

Run the database scripts:
<pre>
cloudstack-setup-databases cloud:password@localhost --deploy-as=root
</pre>
(Replace 'password' by a strong password.)

Open /etc/cloudstack/management/db.properties and replace all instances of 'localhost' by 'mariadb.cloud.csclub.uwaterloo.ca'.

Open /etc/cloudstack/management/server.properties and set 'bind-interface' to 127.0.0.1 (CloudStack is being reverse proxied behind NGINX).

Run some more scripts:
<pre>
cloudstack-setup-management
</pre>

Mount the cloudstack-secondary CephFS volume at /mnt/cloudstack-secondary:
<pre>
mkdir /mnt/cloudstack-secondary
mount -t nfs4 -o port=2049 ceph-nfs.cloud.csclub.uwaterloo.ca:/cloudstack-secondary /mnt/cloudstack-secondary
</pre>
Now download the management VM template:
<pre>
/usr/share/cloudstack-common/scripts/storage/secondary/cloud-install-sys-tmplt -m /mnt/cloudstack-secondary/ -u https://download.cloudstack.org/systemvm/4.16/systemvmtemplate-4.16.0-kvm.qcow2.bz2 -h kvm -F
</pre>

The management server will run on port 8080 by default, so reverse proxy it from NGINX:
<pre>
location / {
proxy_pass http://localhost:8080;
}
</pre>

== Compute node setup ==
Install packages:
<pre>
apt install cloudstack-agent libvirt-daemon-driver-storage-rbd qemu-block-extra
</pre>
Create a new user for CloudStack:
<pre>
useradd -s /bin/bash -d /nonexistent -M cloudstack
# set the password
passwd cloudstack
</pre>
Add the following to /etc/sudoers:
<pre>
cloudstack ALL=(ALL) NOPASSWD:ALL
Defaults:cloudstack !requiretty
</pre>
(There is a way to restrict this, but I was never able to get it to work.)

=== Network setup ===
The /etc/network/interfaces file should look something like this (taking ginkgo as an example):
<pre>
auto enp3s0f0
iface enp3s0f0 inet manual

auto ens1f0np0
iface ens1f0np0 inet manual

# csc-cloud management
auto enp3s0f0.529
iface enp3s0f0.529 inet manual

auto br529
iface br529 inet static
bridge_ports enp3s0f0.529
address 172.19.168.22/27
iface br529 inet6 static
bridge_ports enp3s0f0.529
address fd74:6b6a:8eca:4902::22/64

# csc-cloud provider
auto ens1f0np0.425
iface ens1f0np0.425 inet manual

auto br425
iface br425 inet manual
bridge_ports ens1f0np0.425

# csc server network
auto ens1f0np0.134
iface ens1f0np0.134 inet manual

auto br134
iface br134 inet static
bridge_ports ens1f0np0.134
address 129.97.134.148/24
gateway 129.97.134.1
iface br134 inet6 static
bridge_ports ens1f0np0.134
address 2620:101:f000:4901:c5c::148/64
gateway 2620:101:f000:4901::1
</pre>
Add/modify the following lines to /etc/cloudstack/agent.properties:
<pre>
private.network.device=br529
guest.network.device=br425
public.network.device=br425
host=172.19.168.23,172.19.168.24@static
</pre>

=== libvirtd setup ===
Add/modify the following lines in /etc/libvirt/libvirtd.conf:
<pre>
listen_tls = 0
listen_tcp = 1
tcp_port = "16509"
auth_tcp = "none"
mdns_adv = 0
</pre>

Uncomment the following line in /etc/default/libvirtd:
<pre>
LIBVIRTD_ARGS="--listen"
</pre>

Make sure the following lines are present in /etc/libvirt/qemu.conf:
<pre>
security_driver="none"
user="root"
group="root"
</pre>

Now run:
<pre>
systemctl mask libvirtd.socket
systemctl mask libvirtd-ro.socket
systemctl mask libvirtd-admin.socket
systemctl restart libvirtd
</pre>

== Management server setup (cont'd) ==
Now start the cloudstack-management systemd service and visit the web UI (https://cloud.csclub.uwaterloo.ca). The login credentials are 'admin' for both the username and password. Start the setup walkthrough (you will be prompted to change the password). Make sure to choose Basic Networking.

The walkthrough is almost certainly going to fail (at least, it did for me). Don't panic when this happens; just abort the walkthrough, and set up everything else manually. Once primary and secondary storage have been setup, and at least one host has been added, enable the Pod, Cluster and Zone (there should only be one of each).

=== Primary Storage ===
* Type: RBD
* IP address: ceph-mon.cloud.csclub.uwaterloo.ca
* Scope: zone
* Get the credentials which you created in [[Ceph#CloudStack_Primary_Storage]]

=== Secondary Storage ===
* Type: NFS
* Host: ceph-nfs.cloud.csclub.uwaterloo.ca:2049
* Path: /cloudstack-secondary

=== Global settings ===
Some global settings which you'll need to set from the web UI:

* ca.plugin.root.auth.strictness: false (this always caused issues for me, so I just disabled it)
* host: 172.19.168.23,172.19.168.24 (the VLAN 529 addresses of biloba and chamomile)

=== Adding a host ===
This is an extremely painful process which I am almost certainly doing wrong. It usually takes me 7-8 attempts to add a single host (that's not an exaggeration). This is what it looks like:

* Stop cloudstack-agent service
* Configure /etc/cloudstack-agent/agent.properties
* Add a host from the CloudStack UI
* Start cloudstack-agent.service

The reason why this takes several attempts is because cloudstack-agent actually overwrites your agent.properties file. If/when you notice that this happens, restart the whole process again.

=== Accessing the System VMs ===
If you need to SSH into one of the System VMs, get its link-local address from the web UI, and run e.g.
<pre>
ssh -i /var/lib/cloudstack/management/.ssh/id_rsa -p 3922 root@169.254.232.179
</pre>

=== Some more global settings ===
<pre>
allow.user.expunge.recover.vm = true
allow.user.view.destroyed.vm = true
expunge.delay = 1
expunge.interval = 1
network.securitygroups.defaultadding = false
allow.public.user.templates = false
vm.network.throttling.rate = 0
network.throttling.rate = 0
cpu.overprovisioning.factor = 4.0
allow.user.create.projects = false
max.project.cpus = 8
max.project.memory = 8192
max.project.primary.storage = 40
max.projet.secondary.storage = 20
max.account.cpus = 8
max.account.memory = 8192
max.account.primary.storage = 40
max.account.secondary.storage = 20
</pre>

NOTE: the <code>cpu.overprovisioning.factor</code> setting also needs to be set for existing clusters. Go to Infrastructure -> Clusters -> Cluster1 -> Settings and set it accordingly.

=== Firewall ===
Since we disabled certificate validation from the clients, we're going to use some iptables-fu on all of the CloudStack hosts (to make our lives easier, we're going to use the same rules on the management and agent servers):
<pre>
iptables -N CLOUDSTACK-SERVICES
iptables -A INPUT -j CLOUDSTACK-SERVICES
iptables -A CLOUDSTACK-SERVICES -i lo -j RETURN
iptables -A CLOUDSTACK-SERVICES -s 172.19.168.0/27 -j RETURN
iptables -A CLOUDSTACK-SERVICES -p tcp -m multiport --dports 16509,16514,45335,41047,8250 -j REJECT
iptables-save > /etc/iptables/rules.v4

ip6tables -N CLOUDSTACK-SERVICES
ip6tables -A INPUT -j CLOUDSTACK-SERVICES
ip6tables -A CLOUDSTACK-SERVICES -i lo -j RETURN
ip6tables -A CLOUDSTACK-SERVICES -s fd74:6b6a:8eca:4902::/64 -j RETURN
ip6tables -A CLOUDSTACK-SERVICES -p tcp -m multiport --dports 16509,16514,45335,41047,8250 -j REJECT
ip6tables-save > /etc/iptables/rules.v6
</pre>

=== LDAP authentication ===
Go to Global Settings in the UI, type 'ldap' in the search bar, and configure the parameters as needed. Make sure the mail attribute is set to 'mailLocalAddress'.

Create a new domain called 'Members'. Then go to 'LDAP Configuration', click the 'Configure LDAP +' button, and add a new LDAP config linked to the domain you just created.

[[ceo]] handles the creation of CloudStack accounts, so create an API key + secret token and add it to /etc/csc/ceod.ini on biloba.

=== Templates ===
This deserves an entire page of its own - see [[CloudStack Templates]].

=== Kubernetes ===
This deserves an entire page of its own - see [[Kubernetes]].

== Upgrading CloudStack ==
Please be extremely careful if you decide to upgrade CloudStack. The last time I tried to perform an upgrade (from 4.15 to 4.16), the agents refused to connect to the management servers (or maybe it was the other way around?), and I ended up having to wipe the entire CloudStack installation clean and start again from scratch. Therefore it is fair to say that nobody has ever managed to successfully upgrade CloudStack on our machines. Do this at your own risk.

If you decide to perform an upgrade, then at the very least, you will need to backup the MariaDB databases ('cloud' and 'cloud_usage'), as well as the /etc/cloudstack and /var/lib/cloudstack folders on each of biloba, chamomile and ginkgo. Also, good luck.

Systemd

2024-09-13T01:32:24Z

Merenber: /* Services */

This page contains some tips and tricks for writing systemd units on CSC machines.

== Services ==
<ul>
<li>If your service should not be restarted by systemd automatically (e.g. because it has its own retry mechanism), set <code>Restart=no</code></li>
<li>If your service should be restarted by systemd automatically, make sure you set <code>RestartSec</code> to a reasonable value so that it does not restart too quickly</li>
<li>If your service does not need to keep any persistent state on disk, consider using <code>DynamicUser=yes</code></li>
<li>If you are running your service as root just so you can read a secret from a file, consider using <code>DynamicUser=yes</code> with <code>LoadCredential</code></li>
<li>Consider using ProtectSystem, ProtectHome, etc. See https://manpages.debian.org/stable/systemd/systemd.exec.5.en.html for details.</li>
<li>If your service needs to accept network connections (i.e. is a server), use <code>After=network.target</code></li>
<li>If your service needs to create network connections (i.e. is a client), use <code>After=network-online.target</code></li>
<li>If your service needs to lookup LDAP users, use <code>After=nslcd.service sssd.service</code></li>
<li>if your service needs to access a folder on a networked filesystem, use <code>RequiresMountsFor</code></li>
</ul>

== Timers ==
Unlike cron, systemd timers do not send email alerts if the job fails. However, you can create your own alerts using <code>OnFailure=</code>. Paste the following into /usr/local/bin/csc-systemd-email and make it executable:
<pre>
#!/bin/bash

# Adapted from https://wiki.archlinux.org/title/systemd/Timers#MAILTO

set -e

if [[ $# -ne 2 ]]; then
echo "Usage: $0 <address> <unit>" >&2
exit 1
fi

FROM="Systemd <root@$HOSTNAME>"
TO="$1"
if ! [[ $TO =~ @ ]]; then
TO="$TO@csclub.uwaterloo.ca"
fi
SUBJECT="Systemd <root@$HOSTNAME> Unit '$2' failed"
MESSAGE="$(systemctl status --full "$2" || true)"

# Don't use the Postfix sendmail. It creates a new spool file and also
# forks to the background, which we don't want.
if [[ -x /usr/sbin/ssmtp ]]; then
/usr/sbin/ssmtp -t <<EOF
To: $TO
From: $FROM
Subject: $SUBJECT
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=UTF-8

$MESSAGE
EOF
elif [[ -x /usr/bin/mutt ]]; then
EMAIL="$FROM" /usr/bin/mutt -F /dev/null -e "set copy=no" -s "$SUBJECT" -- "$TO" <<< "$MESSAGE"
else
echo "Could not find program to email" >&2
exit 1
fi
</pre>

Next, paste the following into /etc/systemd/system/csc-email-on-failure@.service:
<pre>
[Unit]
Description=Send email alert when %i fails

[Service]
Type=oneshot
ExecStart=/usr/local/bin/csc-systemd-email root@csclub.uwaterloo.ca %i
# Do not use DynamicUser=true until this issue gets fixed:
# https://github.com/systemd/systemd/issues/22737
User=nobody
# Need to be in the adm group to read journald logs
Group=adm
</pre>

Then run <code>systemctl daemon-reload</code>. Now, all you need to do is add the following line to the <code>[Unit]</code> of any service for which you would like to receive email alerts:
<pre>
OnFailure=csc-email-on-failure@%n.service
</pre>

IMPORTANT: make sure you have the following setting in /etc/ssmtp/ssmtp.conf:
<pre>
FromLineOverride=NO
</pre>
Otherwise, Mailman 3 will reject the message because the Envelope From does not have a FQDN.

Systemd

2024-09-13T01:31:53Z

Merenber: /* Restart policy */

This page contains some tips and tricks for writing systemd units on CSC machines.

== Services ==
<ul>
<li>If your service should not be restarted by systemd automatically (e.g. because it has its own retry mechanism), set <code>Restart=no</code></li>
<li>If your service should be restarted by systemd automatically, make sure you set <code>RestartSec</code> to a reasonable value so that it does not restart too quickly</li>
<li>If your service does not need to keep any persistent state on disk, consider using <code>DynamicUser=yes</code> with <code>LoadCredential</code></li>
<li>If you are running your service as root just so you can read a secret from a file, consider using <code>DynamicUser=yes</code> with </li>
<li>Consider using ProtectSystem, ProtectHome, etc. See https://manpages.debian.org/stable/systemd/systemd.exec.5.en.html for details.</li>
<li>If your service needs to accept network connections (i.e. is a server), use <code>After=network.target</code></li>
<li>If your service needs to create network connections (i.e. is a client), use <code>After=network-online.target</code></li>
<li>If your service needs to lookup LDAP users, use <code>After=nslcd.service sssd.service</code></li>
<li>if your service needs to access a folder on a networked filesystem, use <code>RequiresMountsFor</code></li>
</ul>

== Timers ==
Unlike cron, systemd timers do not send email alerts if the job fails. However, you can create your own alerts using <code>OnFailure=</code>. Paste the following into /usr/local/bin/csc-systemd-email and make it executable:
<pre>
#!/bin/bash

# Adapted from https://wiki.archlinux.org/title/systemd/Timers#MAILTO

set -e

if [[ $# -ne 2 ]]; then
echo "Usage: $0 <address> <unit>" >&2
exit 1
fi

FROM="Systemd <root@$HOSTNAME>"
TO="$1"
if ! [[ $TO =~ @ ]]; then
TO="$TO@csclub.uwaterloo.ca"
fi
SUBJECT="Systemd <root@$HOSTNAME> Unit '$2' failed"
MESSAGE="$(systemctl status --full "$2" || true)"

# Don't use the Postfix sendmail. It creates a new spool file and also
# forks to the background, which we don't want.
if [[ -x /usr/sbin/ssmtp ]]; then
/usr/sbin/ssmtp -t <<EOF
To: $TO
From: $FROM
Subject: $SUBJECT
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=UTF-8

$MESSAGE
EOF
elif [[ -x /usr/bin/mutt ]]; then
EMAIL="$FROM" /usr/bin/mutt -F /dev/null -e "set copy=no" -s "$SUBJECT" -- "$TO" <<< "$MESSAGE"
else
echo "Could not find program to email" >&2
exit 1
fi
</pre>

Next, paste the following into /etc/systemd/system/csc-email-on-failure@.service:
<pre>
[Unit]
Description=Send email alert when %i fails

[Service]
Type=oneshot
ExecStart=/usr/local/bin/csc-systemd-email root@csclub.uwaterloo.ca %i
# Do not use DynamicUser=true until this issue gets fixed:
# https://github.com/systemd/systemd/issues/22737
User=nobody
# Need to be in the adm group to read journald logs
Group=adm
</pre>

Then run <code>systemctl daemon-reload</code>. Now, all you need to do is add the following line to the <code>[Unit]</code> of any service for which you would like to receive email alerts:
<pre>
OnFailure=csc-email-on-failure@%n.service
</pre>

IMPORTANT: make sure you have the following setting in /etc/ssmtp/ssmtp.conf:
<pre>
FromLineOverride=NO
</pre>
Otherwise, Mailman 3 will reject the message because the Envelope From does not have a FQDN.

Systemd

2024-09-13T01:31:36Z

Merenber: /* Services */

This page contains some tips and tricks for writing systemd units on CSC machines.

== Services ==
=== Restart policy ===
<ul>
<li>If your service should not be restarted by systemd automatically (e.g. because it has its own retry mechanism), set <code>Restart=no</code></li>
<li>If your service should be restarted by systemd automatically, make sure you set <code>RestartSec</code> to a reasonable value so that it does not restart too quickly</li>
<li>If your service does not need to keep any persistent state on disk, consider using <code>DynamicUser=yes</code> with <code>LoadCredential</code></li>
<li>If you are running your service as root just so you can read a secret from a file, consider using <code>DynamicUser=yes</code> with </li>
<li>Consider using ProtectSystem, ProtectHome, etc. See https://manpages.debian.org/stable/systemd/systemd.exec.5.en.html for details.</li>
<li>If your service needs to accept network connections (i.e. is a server), use <code>After=network.target</code></li>
<li>If your service needs to create network connections (i.e. is a client), use <code>After=network-online.target</code></li>
<li>If your service needs to lookup LDAP users, use <code>After=nslcd.service sssd.service</code></li>
<li>if your service needs to access a folder on a networked filesystem, use <code>RequiresMountsFor</code></li>
</ul>

== Timers ==
Unlike cron, systemd timers do not send email alerts if the job fails. However, you can create your own alerts using <code>OnFailure=</code>. Paste the following into /usr/local/bin/csc-systemd-email and make it executable:
<pre>
#!/bin/bash

# Adapted from https://wiki.archlinux.org/title/systemd/Timers#MAILTO

set -e

if [[ $# -ne 2 ]]; then
echo "Usage: $0 <address> <unit>" >&2
exit 1
fi

FROM="Systemd <root@$HOSTNAME>"
TO="$1"
if ! [[ $TO =~ @ ]]; then
TO="$TO@csclub.uwaterloo.ca"
fi
SUBJECT="Systemd <root@$HOSTNAME> Unit '$2' failed"
MESSAGE="$(systemctl status --full "$2" || true)"

# Don't use the Postfix sendmail. It creates a new spool file and also
# forks to the background, which we don't want.
if [[ -x /usr/sbin/ssmtp ]]; then
/usr/sbin/ssmtp -t <<EOF
To: $TO
From: $FROM
Subject: $SUBJECT
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=UTF-8

$MESSAGE
EOF
elif [[ -x /usr/bin/mutt ]]; then
EMAIL="$FROM" /usr/bin/mutt -F /dev/null -e "set copy=no" -s "$SUBJECT" -- "$TO" <<< "$MESSAGE"
else
echo "Could not find program to email" >&2
exit 1
fi
</pre>

Next, paste the following into /etc/systemd/system/csc-email-on-failure@.service:
<pre>
[Unit]
Description=Send email alert when %i fails

[Service]
Type=oneshot
ExecStart=/usr/local/bin/csc-systemd-email root@csclub.uwaterloo.ca %i
# Do not use DynamicUser=true until this issue gets fixed:
# https://github.com/systemd/systemd/issues/22737
User=nobody
# Need to be in the adm group to read journald logs
Group=adm
</pre>

Then run <code>systemctl daemon-reload</code>. Now, all you need to do is add the following line to the <code>[Unit]</code> of any service for which you would like to receive email alerts:
<pre>
OnFailure=csc-email-on-failure@%n.service
</pre>

IMPORTANT: make sure you have the following setting in /etc/ssmtp/ssmtp.conf:
<pre>
FromLineOverride=NO
</pre>
Otherwise, Mailman 3 will reject the message because the Envelope From does not have a FQDN.

Main Page

2024-09-13T01:15:41Z

Merenber: /* Software Infrastructure */

This is the Wiki of the [[Computer Science Club]]. Feel free to start adding pages and information.

[[Special:AllPages]]

== Member/Club Rep Documentation ==
To access our Linux machines, see [[How to SSH]] and select one of the general-use machines from [[Machine List#General-Use Servers]].

To host a website, see [[Web Hosting]]. If you are trying to host websites for clubs, see [[Club Hosting]].

To use our VPS services (similar to Linode and Amazon EC2), see [https://docs.cloud.csclub.uwaterloo.ca/ CSC Cloud Documentation]. Note that you'll need to activate your account on one of CSC's machines before using the management panel.

To view instruction on playing music at the office, see [[Music]].

To use our Nextcloud instance (similar to Google Drive and Dropbox), go to [https://files.csclub.uwaterloo.ca CSC Files].

=== Guides ===
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[New Member Guide]]
* [[Club Hosting]]
* [[Web Hosting]]
* [[Git Hosting]]
* [[How to IRC]]
* [[How to SSH]]
* [[MySQL]]
* [[PostgreSQL]]
* [https://docs.cloud.csclub.uwaterloo.ca/ CSC Cloud Documentation]
</div>

=== News and Events ===
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[Meetings]]
* [[Talks]]
* [[Projects]]
</div>

== Committees Documentation ==
=== Club Operation ===
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[Budget Guide]]
* [[ceo]]
* [[Exec Manual]]
* [[MEF Guide]]
* [[Office Policies]]
* [[Office Staff]]
* [[Sysadmin Guide]]
* [[How to (Extra) Ban Someone]]
* [[SCS Guide]]
* [[Kerberos |Password Reset]]
* [[Keys and Fobs]]

* [[Talks Guide]]
</div>

=== Hardware Infrastructure (the bare metals) ===
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[Disk Drive RMA Process]]
* [[Machine List]]
* [[IPMI101]]
* [[New NetApp]]
* [[Switches]]
</div>

=== Software Infrastructure ===
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[ADFS]]
* [[Backups]]
* [[DNS]]
* [[Debian Repository]]
* [[Firewall]]
* [[Kerberos]]
* [[Keycloak]]
* [[KVM]]
* [[LDAP]]
* [[Network]]
* [[New CSC Machine]]
* [[Observability]]
* [[OID Assignment]]
* [[Podman]]
* [[Scratch]]
* [[SNMP]]
* [[SSL]]
* [[Syscom Todo]]
* [[Systemd]]
* [[Systemd-nspawn]]
* [[Two-Factor Authentication]]
* [[UID/GID Assignment]]
</div>

=== Services ===
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[Application List]]
* [[BigBlueButton]]
* [[Mail]]
* [[Mailing Lists]]
* [[Mirror]]
* [[Music]]
* [[Nextcloud]]
* [[Printing]]
* [[Pulseaudio]]
* [[Webmail]]
</div>

=== CSC Cloud ===
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[Ceph]]
* [[Cloud Networking]]
* [[CloudStack]]
* [[CloudStack Templates]]
* [[Kubernetes]]
</div>

== Miscellaneous ==
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[Acronyms]]
* [[Budget]]
* [[Executive]]
* [[Past Executive]]
* [[History]]
</div>

== Historical ==
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[Robot Arm]]
* [[Webcams]]
* [[Website]]
* [[Digital Cutter]]
* [[Electronics]]
* [[NetApp]]
* [[Frosh]]
* [[Virtualization (LXC Containers)]]
* [[Serial Connections]]
* [[Library]]
* [[MEF Proposals]]
* [[Proposed Constitution Changes]]
* [[NFS/Kerberos]]
* [[Hardware]]
* [[Imapd Guide]]
__NOTOC__

Systemd

2024-09-13T01:15:10Z

Merenber: Add instructions for systemd timer email alerts

This page contains some tips and tricks for writing systemd units on CSC machines.

== Services ==
TODO

== Timers ==
Unlike cron, systemd timers do not send email alerts if the job fails. However, you can create your own alerts using <code>OnFailure=</code>. Paste the following into /usr/local/bin/csc-systemd-email and make it executable:
<pre>
#!/bin/bash

# Adapted from https://wiki.archlinux.org/title/systemd/Timers#MAILTO

set -e

if [[ $# -ne 2 ]]; then
echo "Usage: $0 <address> <unit>" >&2
exit 1
fi

FROM="Systemd <root@$HOSTNAME>"
TO="$1"
if ! [[ $TO =~ @ ]]; then
TO="$TO@csclub.uwaterloo.ca"
fi
SUBJECT="Systemd <root@$HOSTNAME> Unit '$2' failed"
MESSAGE="$(systemctl status --full "$2" || true)"

# Don't use the Postfix sendmail. It creates a new spool file and also
# forks to the background, which we don't want.
if [[ -x /usr/sbin/ssmtp ]]; then
/usr/sbin/ssmtp -t <<EOF
To: $TO
From: $FROM
Subject: $SUBJECT
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=UTF-8

$MESSAGE
EOF
elif [[ -x /usr/bin/mutt ]]; then
EMAIL="$FROM" /usr/bin/mutt -F /dev/null -e "set copy=no" -s "$SUBJECT" -- "$TO" <<< "$MESSAGE"
else
echo "Could not find program to email" >&2
exit 1
fi
</pre>

Next, paste the following into /etc/systemd/system/csc-email-on-failure@.service:
<pre>
[Unit]
Description=Send email alert when %i fails

[Service]
Type=oneshot
ExecStart=/usr/local/bin/csc-systemd-email root@csclub.uwaterloo.ca %i
# Do not use DynamicUser=true until this issue gets fixed:
# https://github.com/systemd/systemd/issues/22737
User=nobody
# Need to be in the adm group to read journald logs
Group=adm
</pre>

Then run <code>systemctl daemon-reload</code>. Now, all you need to do is add the following line to the <code>[Unit]</code> of any service for which you would like to receive email alerts:
<pre>
OnFailure=csc-email-on-failure@%n.service
</pre>

IMPORTANT: make sure you have the following setting in /etc/ssmtp/ssmtp.conf:
<pre>
FromLineOverride=NO
</pre>
Otherwise, Mailman 3 will reject the message because the Envelope From does not have a FQDN.

SSL

2024-09-11T13:01:13Z

Merenber: /* How to add a new SSL cert for a custom domain on CSC cloud */

== GlobalSign ==

The CSC currently has an SSL Certificate from GlobalSign for *.csclub.uwaterloo.ca provided at no cost to us through IST. GlobalSign likes to take a long time to respond to certificate signing requests (CSR) for wildcard certs, so our CSR really needs to be handed off to IST at least 2 weeks in advance. You can do it sooner – the certificate expiry date will be the old expiry date + 1 year (+ a bonus ) Having an invalid cert for any length of time leads to terrible breakage, followed by terrible workarounds and prolonged problems.

When the certificate is due to expire in a month or two, syscom should (but apparently doesn't always) get an email notification. This will include a renewal link. Otherwise, use the [https://uwaterloo.ca/information-systems-technology/about/organizational-structure/information-security-services/certificate-authority/globalsign-signed-x5093-certificates/self-service-globalsign-ssl-certificates IST-CA self service system]. Please keep a copy of the key, CSR and (once issued) certificate in <tt>/home/sysadmin/certs</tt>. The OpenSSL examples linked there are good to generate a 2048-bit RSA key and a corresponding CSR. It's probably a good idea to change the private key (as it's not that much effort anyways). Just sure your CSR is for <tt>*.csclub.uwaterloo.ca</tt>.

At the self-service portal, these options worked in 2013. If you need IST assistance, [mailto:ist-ca@uwaterloo.ca ist-ca@uwaterloo.ca] is the email address you should contact.
Products: OrganizationSSL
SSL Certificate Type: Wildcard SSL Certificate
Validity Period: 1 year
Are you switching from a Competitor? No, I am not switching
Are you renewing this Certificate? Yes (paste current certificate)
30-day bonus: Yes (why not?)
Add specific Subject Alternative Names (SANs): No (*.csclub.uwaterloo.ca automatically adds csclub.uwaterloo.ca as a SAN)
Enter Certificate Signing Request (CSR): Yes (paste CSR)
Contact Information:
First Name: Computer Science Club
Last Name: Systems Committee
Telephone: +1 519 888 4567 x33870
Email Address: syscom@csclub.uwaterloo.ca

=== Helpful links ===
* [https://support.globalsign.com/ssl/ssl-certificates-installation/generate-csr-openssl How to generate a new CSR and private key]
* [https://uwaterloo.atlassian.net/wiki/spaces/ISTKB/pages/262013183/How+to+obtain+a+new+GlobalSign+certificate+or+renew+an+existing+one How to obtain a new GlobalSign certificate or renew an existing one]
* [https://system.globalsign.com/bm/public/certificate/poporder.do?domain=PAR12271n5w6s27pvg8d92v4150t GlobalSign UWaterloo self-service page]
* [https://support.globalsign.com/ca-certificates/intermediate-certificates/organizationssl-intermediate-certificates GlobalSign intermediate certificate] (needed to create a certificate chain; see below)

=== OpenSSL cheat sheet ===
<ul>
<li>
Generate a new CSR and private key (do this in a new directory):
<pre>
openssl req -out csclub.uwaterloo.ca.csr -new -newkey rsa:2048 -keyout csclub.uwaterloo.ca.key -nodes
</pre>
Enter the following information at the prompts:
<pre>
Country Name (2 letter code) [AU]:CA
State or Province Name (full name) [Some-State]:Ontario
Locality Name (eg, city) []:Waterloo
Organization Name (eg, company) [Internet Widgits Pty Ltd]:University of Waterloo
Organizational Unit Name (eg, section) []:Computer Science Club
Common Name (e.g. server FQDN or YOUR name) []:*.csclub.uwaterloo.ca
Email Address []:systems-committee@csclub.uwaterloo.ca

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:
</pre>
</li>
<li>
View the information inside a CSR:
<pre>
openssl req -noout -text -in csclub.uwaterloo.ca.csr
</pre>
</li>
<li>
View the information inside a private key:
<pre>
openssl pkey -noout -text -in csclub.uwaterloo.ca.key
</pre>
</li>
<li>
View information inside a certificate:
<pre>
openssl x509 -noout -text -in csclub.uwaterloo.ca.crt
</pre>
</li>
</ul>

=== csclub.cloud ===
Once a year, someone from IST will ask us to create a temporary TXT record for csclub.cloud to prove to GlobalSign that we own it. This must be created at the root of the domain. Since this zone is managed dynamically (via the acme.sh script on biloba, see below), we need to freeze the domain and update /var/lib/bind/db.csclub.cloud directly.

Once you're in the correct server (not Biloba). Here are the steps:
<ol>
<li>Run <code>rndc freeze csclub.cloud</code>.</li>
<li>
Open /var/lib/bind/db.csclub.cloud and add a new TXT record. It'll look something like
<pre>
TXT "_globalsign-domain-verification=blablabla"
</pre>
</li>
<li>
In the same file, make sure to also update the SOA serial number. It should generally be YYYYMMDDNN where NN is a monotonically increasing counter (YYYYMMDD is the current date).
</li>
<li>Run <code>rndc reload</code>.</li>
<li>
Run a DNS query to make sure you can see the TXT record:
<pre>
dig -t txt @dns1 csclub.cloud
dig -t txt @dns2 csclub.cloud
</pre>
</li>
<li>Email back the person from IST and let them know that we created the TXT record.</li>
<li>
Once the certificate has been renewed, delete the TXT record, update the SOA serial number, and run <code>rndc reload</code>.
</li>
<li>Run <code>rndc thaw csclub.cloud</code>.</li>
</ol>

== Certificate Files ==
Let's say you obtain a new certificate for *.csclub.uwaterloo.ca. Here are the files which should be stored in the certs folder:
<ul>
<li>csclub.uwaterloo.ca.key: private key created by openssl</li>
<li>csclub.uwaterloo.ca.csr: certificate signing request created by openssl</li>
<li>order: order number from GlobalSign</li>
<li>csclub.uwaterloo.ca.crt: certificate created by GlobalSign</li>
<li>globalsign-intermediate.crt: intermediate certificate from GlobalSign, obtainable from [https://support.globalsign.com/ca-certificates/intermediate-certificates/organizationssl-intermediate-certificates here]. As of this writing, we use the "OrganizationSSL SHA-256 R3 Intermediate Certificate". Just click the "View in Base64" button and copy the contents.
<ul>
<li>There is an alternative way to get the intermediate certificate: if you run <code>openssl x509 -noout -text -in csclub.uwaterloo.ca.crt</code>, under X509v3 extensions > Authority Information Access, there should be a field called "CA Issuers" which has a URL which looks like http://secure.globalsign.com/cacert/gsrsaovsslca2018.crt. You can download that file and convert it to PEM:
<pre>
wget https://secure.globalsign.com/cacert/gsrsaovsslca2018.crt
openssl x509 -inform der -in gsrsaovsslca2018.crt -out globalsign-intermediate.crt
rm gsrsaovsslca2018.crt
</pre>
</li>
</ul>
</li>
</ul><ul>
<li>csclub.uwaterloo.ca.chain: create this with the following command:
<pre>
cat csclub.uwaterloo.ca.crt globalsign-intermediate.crt > csclub.uwaterloo.ca.chain
</pre>
</li>
<li>csclub.uwaterloo.ca.pem: create this with the following command:
<pre>
cat csclub.uwaterloo.ca.key csclub.uwaterloo.ca.chain > csclub.uwaterloo.ca.pem
chmod 600 csclub.uwaterloo.ca.pem
</pre>
</li>
</ul>

== Certificate Locations ==

Keep a copy of newly generated certificates in /users/sysadmin/certs.

A list of places you'll need to put the new certificate to keep our services running. Private key (if applicable) should be kept next to the certificate with the extension .key.

* caffeine:/etc/ssl/private/csclub-wildcard.crt (for Apache)
* coffee:/etc/ssl/private/csclub.uwaterloo.ca (for PostgreSQL and MariaDB)
* <s>mail:/etc/ssl/private/csclub-wildcard.crt (for Apache, Postfix and Dovecot)</s> (UPDATE: we use certbot now for these)
* mailman:/etc/ssl/private/csclub-wildcard-chain.crt (for Apache)
* rt:/etc/ssl/private/csclub-wildcard.crt (for Apache)
* potassium-benzoate:/etc/ssl/private/csclub-wildcard.crt (for nginx)
* phosphoric-acid:/etc/ssl/private/csclub-wildcard-chain.crt (for ceod)
* auth1:/etc/ssl/private/csclub-wildcard.crt (for slapd, make sure to <code>sudo service slapd restart</code>)
* auth2:/etc/ssl/private/csclub-wildcard.crt (for slapd, make sure to <code>sudo service slapd restart</code>)
* mattermost:/etc/ssl/private/csclub-wildcard.crt (for nginx)
* load-balancer-0(1|2):/etc/ssl/private/csclub.uwaterloo.ca (for haproxy) [temporarily down 2020]
* chat:/etc/ssl/private/csclub-wildcard-chain.crt (for nginx)
* prometheus:/etc/ssl/private/csclub-wildcard-chain.crt (for Apache)
* bigbluebutton:/etc/nginx/ssl/csclub-wildcard-chain.crt (podman container on xylitol)
* icy:/etc/ssl/private/csclub-wildcard.pem (for Icecast)
* chamomile:/etc/ssl/private/cloud.csclub.uwaterloo.ca.chain.crt, /etc/ssl/private/csclub.cloud.chain, /etc/ssl/private/csclub.uwaterloo.ca.chain (for nginx)
* biloba:/etc/ssl/private/cloud.csclub.uwaterloo.ca.chain.crt, /etc/ssl/private/csclub.cloud.chain, /etc/ssl/private/csclub.uwaterloo.ca.chain (for nginx)
* nextcloud (nspawn container inside guayusa): /etc/ssl/private/csclub.uwaterloo.ca.chain (for nginx)
* citric-acid (runs vaultwarden): /etc/ssl/private/csclub.uwaterloo.ca.{chain,key} (for nginx)

Some services (e.g. Dovecot, Postfix) prefer to have the certificate chain in one file. Concatenate the appropriate intermediate root to the end of the certificate and store this as csclub-wildcard-chain.crt.

=== More certificate locations ===
We have some SSL certificates which are not used by web servers, but still need to be renewed eventually.

==== Prometheus node exporter ====
All of our Prometheus node exporters are using mTLS via stunnel (every bare-metal host, as well as caffeine, coffee and mail, is running this exporter). The certificates (both client and server) are set to expire in September 2031; before then, create new keypairs in /opt/prometheus/tls, and deploy the new server.crt, node.crt and node.key to /etc/stunnel/tls on all machines. Restart prometheus and all of the node exporters.

==== ADFS ====
See [[ADFS]]. When the university's IdP certificate expires (October 2025), we can just download a new one and restart Apache; when our own certificate expires (July 2031), we need to submit a new form to IST (please do this before the cert expires).

==== Keycloak ====
See [[Keycloak]]. When the saml-passthrough certificate expires (January 2032), you need to create a new keypair in /srv/saml-passthrough on caffeine, and upload the new certificate into the Keycloak UI (IdP settings). When the Keycloak SP certificate expires (December 2031), make sure to create a new keypair and upload it to the Keycloak UI (Realm Settings).

== letsencrypt ==

We support letsencrypt for our virtual hosts with custom domains. We use the <tt>cerbot</tt> from debian repositories with a configuration file at <tt>/etc/letsencrypt/cli.ini</tt>, and a systemd timer to handle renewals.

The setup for a new domain is:

# Become <tt>certbot</tt> on caffine with <tt>sudo -u certbot bash</tt> or similar.
# Run <tt>certbot certonly -c /etc/letsencrypt/cli.ini -d DOMAIN --logs-dir /tmp</tt>. The logs-dir isn't important and is only needed for troubleshooting.
# Set up the Apache site configuration using the example below. (apache config is in /etc/apache2) Note the permanent redirect to https.
# Make sure to commit your changes when you're done.
# Reloading apache config is <tt>sudo systemctl reload apache2</tt>.

<VirtualHost *:80>
ServerName example.com
ServerAlias *.example.com
ServerAdmin example@csclub.uwaterloo.ca

#DocumentRoot /users/example/www/
Redirect permanent / https://example.com/

ErrorLog /var/log/apache2/example-error.log
CustomLog /var/log/apache2/example-access.log combined
</VirtualHost>

<VirtualHost csclub:443>
SSLEngine on
SSLCertificateFile /etc/letsencrypt/live/example.com/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/example.com/privkey.pem
SSLStrictSNIVHostCheck on

ServerName example.com
ServerAlias *.example.com
ServerAdmin example@csclub.uwaterloo.ca

DocumentRoot /users/example/www

ErrorLog /var/log/apache2/example-error.log
CustomLog /var/log/apache2/example-access.log combined
</VirtualHost>

== acme.sh ==
We are using [https://github.com/acmesh-official/acme.sh acme.sh] for provisioning SSL certificates for some of our *.csclub.cloud domains. It is currently set up under /root/.acme.sh on biloba.

NOTE: acme.sh has a cron job which automatically renews certificates before they expire and reloads NGINX, so you do not have to do anything after issuing and installing a certificate (i.e. "set-and-forget").

=== How to add a new SSL cert for a custom domain on CSC cloud ===
Note: you do not need to acquire a new cert if the requested domain is directly on csclub.cloud, e.g. app1.csclub.cloud. We can re-use our wildcard cert on csclub.cloud for that. However, if a user requests a multi-level domain on csclub.cloud, or a domain hosted on an external registrar, then you will need to create a new cert.

Let's say user <code>ctdalek</code> wants <code>mydomain.com</code> to point to a VM on CSC cloud.
 
TLDR:
<pre>
# Obtain the cert.
# If a subdomain was also requested, pass the -d option multiple times, e.g.
# `-d mydomain.com -d sub.mydomain.com`. Make sure the "main" domain is specified first.
acme.sh --issue -d mydomain.com -w /var/www

# Install the cert.
# If a subdomain was also requested, only specify the "main" domain.
acme.sh --install-cert -d mydomain.com \
--key-file /etc/nginx/ceod/member-ssl/mydomain.com.key \
--fullchain-file /etc/nginx/ceod/member-ssl/mydomain.com.chain \
--reloadcmd "/root/bin/reload-nginx.sh"

# Create a vhost file.
# Look at the other files in the same directory for inspiration.
# Make sure the file starts with the username and an underscore, e.g. "ctdalek_",
# because this is how ceod keeps track of the vhosts.
# Make sure to set the custom domain name(s) and paths to the SSL key/cert.
vim /etc/nginx/ceod/member-vhosts/ctdalek_mydomain.com

# Finally, reload NGINX on both biloba and chamomile. The /etc/nginx/ceod directory
# is shared between them.
/root/bin/reload-nginx.sh
</pre>

=== Installation ===
<pre>
cd /opt
git clone --depth 1 https://github.com/acmesh-official/acme.sh
cd acme.sh
./acme.sh --install -m syscom@csclub.uwaterloo.ca
. "/root/.acme.sh/acme.sh.env"
</pre>
Important: If invoking acme.sh from another program, it needs the environment variables set in acme.sh.env. Currently, that is just
<pre>
LE_WORKING_DIR="/root/.acme.sh"
</pre>

For testing purposes, make sure to use the Let's Encrypt test server:
<pre>
acme.sh --set-default-ca --server letsencrypt_test
</pre>

=== NGINX setup ===
<pre>
mkdir -p /var/www/.well-known/acme-challenge
</pre>
Add the following snippet to your default NGINX file (e.g. /etc/nginx/sites-enabled/default):
<pre>
# For Let's Encrypt
location /.well-known/acme-challenge/ {
alias /var/www/.well-known/acme-challenge/;
}
</pre>

Now assuming that biloba has the IP address for *.csclub.cloud, you can test that everything is working:
<pre>
acme.sh --issue -d app.merenber.csclub.cloud -w /var/www
</pre>
To install a certificate after it's been issued:
<pre>
acme.sh --install-cert -d app.merenber.csclub.cloud \
--key-file /etc/nginx/ceod/member-ssl/app.merenber.csclub.cloud.key \
--fullchain-file /etc/nginx/ceod/member-ssl/app.merenber.csclub.cloud.chain \
--reloadcmd "/root/bin/reload-nginx.sh"
</pre>
At this point, you should add your NGINX vhost file which uses that SSL certificate.
 
To remove a certificate:
<pre>
acme.sh --remove -d app.merenber.csclub.cloud
rm -r /root/.acme.sh/app.merenber.csclub.cloud
rm /etc/nginx/ceod/member-ssl/app.merenber.csclub.cloud.chain
rm /etc/nginx/ceod/member-ssl/app.merenber.csclub.cloud.key
</pre>
Don't forget to remove the NGINX vhost file too.

Once you think you're ready, use a real ACME provider, e.g.
<pre>
acme.sh --set-default-ca --server letsencrypt
</pre>

Since we have a [https://zerossl.com ZeroSSL] account, and ZeroSSL has no rate limit, we are going to use that instead:
<pre>
acme.sh --register-account --server zerossl \
--eab-kid xxxxxxxxxxxx \
--eab-hmac-key xxxxxxxxx
acme.sh --set-default-ca --server zerossl
</pre>

=== DNS challenge ===
To obtain a wildcard certificate (e.g. *.k8s.csclub.cloud), you will need to perform the DNS-01 challenge. We are going to use nsupdate to interact with our BIND9 server on dns1.

On dns1, run:
<pre>
tsig-keygen csc-cloud
</pre>
Paste the output into the appropriate section in /etc/bind/named.conf.local. Also paste it into a file somewhere on biloba, e.g. /etc/csc/csc-cloud-tsig.key.

Add the following to the csclub.cloud zone block:
<pre>
allow-update {
!{
!127.0.0.1;
!::1;
!129.97.134.0/24;
!2620:101:f000:4901::/64;
any;
};
key csc-cloud;
};
</pre>
(We're basically trying to restrict updates to the given IP ranges. See https://serverfault.com/a/417229.)

The 'bind' user can't write to files under /etc/bind, so we're going to move our zone file to /var/lib/bind instead.
Comment out 'file "/etc/bind/db.csclub.cloud";' from named.conf.local and add this line below it:
<pre>
file "/var/lib/bind/db.csclub.cloud";
</pre>
Then run:
<pre>
cp /etc/bind/db.csclub.cloud /var/lib/bind/db.csclub.cloud
chown bind:bind /var/lib/bind/db.csclub.cloud
rndc reload
</pre>

On biloba, check that everything's working:
<pre>
nsupdate -k /etc/csc/csc-cloud-tsig.key -v <<EOF
update add test.csclub.cloud 300 A 0.0.0.0
send
EOF
</pre>
Use a tool such as <code>dig</code> to make sure that the update was successful.
If it worked, you can delete the record:
<pre>
nsupdate -k /etc/csc/csc-cloud-tsig.key -v <<EOF
delete test.csclub.cloud
send
EOF
</pre>
Now we are ready to actually perform the challenge with acme.sh:
<pre>
export NSUPDATE_SERVER="dns1.csclub.uwaterloo.ca"
export NSUPDATE_KEY="/etc/csc/csc-cloud-tsig.key"
acme.sh --issue --dns dns_nsupdate -d 'k8s.csclub.cloud' -d '*.k8s.csclub.cloud'
</pre>
(If something goes wrong, use the <code>--debug</code> flag.)

If all went well, just install the certificate as usual:
<pre>
acme.sh --install-cert -d k8s.csclub.cloud \
--key-file /etc/nginx/ceod/syscom-ssl/k8s.csclub.cloud.key \
--fullchain-file /etc/nginx/ceod/syscom-ssl/k8s.csclub.cloud.chain \
--reloadcmd 'systemctl reload nginx'
</pre>

SSL

2024-09-11T13:00:49Z

Merenber: /* How to add a new SSL cert for a custom domain on CSC cloud */

== GlobalSign ==

The CSC currently has an SSL Certificate from GlobalSign for *.csclub.uwaterloo.ca provided at no cost to us through IST. GlobalSign likes to take a long time to respond to certificate signing requests (CSR) for wildcard certs, so our CSR really needs to be handed off to IST at least 2 weeks in advance. You can do it sooner – the certificate expiry date will be the old expiry date + 1 year (+ a bonus ) Having an invalid cert for any length of time leads to terrible breakage, followed by terrible workarounds and prolonged problems.

When the certificate is due to expire in a month or two, syscom should (but apparently doesn't always) get an email notification. This will include a renewal link. Otherwise, use the [https://uwaterloo.ca/information-systems-technology/about/organizational-structure/information-security-services/certificate-authority/globalsign-signed-x5093-certificates/self-service-globalsign-ssl-certificates IST-CA self service system]. Please keep a copy of the key, CSR and (once issued) certificate in <tt>/home/sysadmin/certs</tt>. The OpenSSL examples linked there are good to generate a 2048-bit RSA key and a corresponding CSR. It's probably a good idea to change the private key (as it's not that much effort anyways). Just sure your CSR is for <tt>*.csclub.uwaterloo.ca</tt>.

At the self-service portal, these options worked in 2013. If you need IST assistance, [mailto:ist-ca@uwaterloo.ca ist-ca@uwaterloo.ca] is the email address you should contact.
Products: OrganizationSSL
SSL Certificate Type: Wildcard SSL Certificate
Validity Period: 1 year
Are you switching from a Competitor? No, I am not switching
Are you renewing this Certificate? Yes (paste current certificate)
30-day bonus: Yes (why not?)
Add specific Subject Alternative Names (SANs): No (*.csclub.uwaterloo.ca automatically adds csclub.uwaterloo.ca as a SAN)
Enter Certificate Signing Request (CSR): Yes (paste CSR)
Contact Information:
First Name: Computer Science Club
Last Name: Systems Committee
Telephone: +1 519 888 4567 x33870
Email Address: syscom@csclub.uwaterloo.ca

=== Helpful links ===
* [https://support.globalsign.com/ssl/ssl-certificates-installation/generate-csr-openssl How to generate a new CSR and private key]
* [https://uwaterloo.atlassian.net/wiki/spaces/ISTKB/pages/262013183/How+to+obtain+a+new+GlobalSign+certificate+or+renew+an+existing+one How to obtain a new GlobalSign certificate or renew an existing one]
* [https://system.globalsign.com/bm/public/certificate/poporder.do?domain=PAR12271n5w6s27pvg8d92v4150t GlobalSign UWaterloo self-service page]
* [https://support.globalsign.com/ca-certificates/intermediate-certificates/organizationssl-intermediate-certificates GlobalSign intermediate certificate] (needed to create a certificate chain; see below)

=== OpenSSL cheat sheet ===
<ul>
<li>
Generate a new CSR and private key (do this in a new directory):
<pre>
openssl req -out csclub.uwaterloo.ca.csr -new -newkey rsa:2048 -keyout csclub.uwaterloo.ca.key -nodes
</pre>
Enter the following information at the prompts:
<pre>
Country Name (2 letter code) [AU]:CA
State or Province Name (full name) [Some-State]:Ontario
Locality Name (eg, city) []:Waterloo
Organization Name (eg, company) [Internet Widgits Pty Ltd]:University of Waterloo
Organizational Unit Name (eg, section) []:Computer Science Club
Common Name (e.g. server FQDN or YOUR name) []:*.csclub.uwaterloo.ca
Email Address []:systems-committee@csclub.uwaterloo.ca

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:
</pre>
</li>
<li>
View the information inside a CSR:
<pre>
openssl req -noout -text -in csclub.uwaterloo.ca.csr
</pre>
</li>
<li>
View the information inside a private key:
<pre>
openssl pkey -noout -text -in csclub.uwaterloo.ca.key
</pre>
</li>
<li>
View information inside a certificate:
<pre>
openssl x509 -noout -text -in csclub.uwaterloo.ca.crt
</pre>
</li>
</ul>

=== csclub.cloud ===
Once a year, someone from IST will ask us to create a temporary TXT record for csclub.cloud to prove to GlobalSign that we own it. This must be created at the root of the domain. Since this zone is managed dynamically (via the acme.sh script on biloba, see below), we need to freeze the domain and update /var/lib/bind/db.csclub.cloud directly.

Once you're in the correct server (not Biloba). Here are the steps:
<ol>
<li>Run <code>rndc freeze csclub.cloud</code>.</li>
<li>
Open /var/lib/bind/db.csclub.cloud and add a new TXT record. It'll look something like
<pre>
TXT "_globalsign-domain-verification=blablabla"
</pre>
</li>
<li>
In the same file, make sure to also update the SOA serial number. It should generally be YYYYMMDDNN where NN is a monotonically increasing counter (YYYYMMDD is the current date).
</li>
<li>Run <code>rndc reload</code>.</li>
<li>
Run a DNS query to make sure you can see the TXT record:
<pre>
dig -t txt @dns1 csclub.cloud
dig -t txt @dns2 csclub.cloud
</pre>
</li>
<li>Email back the person from IST and let them know that we created the TXT record.</li>
<li>
Once the certificate has been renewed, delete the TXT record, update the SOA serial number, and run <code>rndc reload</code>.
</li>
<li>Run <code>rndc thaw csclub.cloud</code>.</li>
</ol>

== Certificate Files ==
Let's say you obtain a new certificate for *.csclub.uwaterloo.ca. Here are the files which should be stored in the certs folder:
<ul>
<li>csclub.uwaterloo.ca.key: private key created by openssl</li>
<li>csclub.uwaterloo.ca.csr: certificate signing request created by openssl</li>
<li>order: order number from GlobalSign</li>
<li>csclub.uwaterloo.ca.crt: certificate created by GlobalSign</li>
<li>globalsign-intermediate.crt: intermediate certificate from GlobalSign, obtainable from [https://support.globalsign.com/ca-certificates/intermediate-certificates/organizationssl-intermediate-certificates here]. As of this writing, we use the "OrganizationSSL SHA-256 R3 Intermediate Certificate". Just click the "View in Base64" button and copy the contents.
<ul>
<li>There is an alternative way to get the intermediate certificate: if you run <code>openssl x509 -noout -text -in csclub.uwaterloo.ca.crt</code>, under X509v3 extensions > Authority Information Access, there should be a field called "CA Issuers" which has a URL which looks like http://secure.globalsign.com/cacert/gsrsaovsslca2018.crt. You can download that file and convert it to PEM:
<pre>
wget https://secure.globalsign.com/cacert/gsrsaovsslca2018.crt
openssl x509 -inform der -in gsrsaovsslca2018.crt -out globalsign-intermediate.crt
rm gsrsaovsslca2018.crt
</pre>
</li>
</ul>
</li>
</ul><ul>
<li>csclub.uwaterloo.ca.chain: create this with the following command:
<pre>
cat csclub.uwaterloo.ca.crt globalsign-intermediate.crt > csclub.uwaterloo.ca.chain
</pre>
</li>
<li>csclub.uwaterloo.ca.pem: create this with the following command:
<pre>
cat csclub.uwaterloo.ca.key csclub.uwaterloo.ca.chain > csclub.uwaterloo.ca.pem
chmod 600 csclub.uwaterloo.ca.pem
</pre>
</li>
</ul>

== Certificate Locations ==

Keep a copy of newly generated certificates in /users/sysadmin/certs.

A list of places you'll need to put the new certificate to keep our services running. Private key (if applicable) should be kept next to the certificate with the extension .key.

* caffeine:/etc/ssl/private/csclub-wildcard.crt (for Apache)
* coffee:/etc/ssl/private/csclub.uwaterloo.ca (for PostgreSQL and MariaDB)
* <s>mail:/etc/ssl/private/csclub-wildcard.crt (for Apache, Postfix and Dovecot)</s> (UPDATE: we use certbot now for these)
* mailman:/etc/ssl/private/csclub-wildcard-chain.crt (for Apache)
* rt:/etc/ssl/private/csclub-wildcard.crt (for Apache)
* potassium-benzoate:/etc/ssl/private/csclub-wildcard.crt (for nginx)
* phosphoric-acid:/etc/ssl/private/csclub-wildcard-chain.crt (for ceod)
* auth1:/etc/ssl/private/csclub-wildcard.crt (for slapd, make sure to <code>sudo service slapd restart</code>)
* auth2:/etc/ssl/private/csclub-wildcard.crt (for slapd, make sure to <code>sudo service slapd restart</code>)
* mattermost:/etc/ssl/private/csclub-wildcard.crt (for nginx)
* load-balancer-0(1|2):/etc/ssl/private/csclub.uwaterloo.ca (for haproxy) [temporarily down 2020]
* chat:/etc/ssl/private/csclub-wildcard-chain.crt (for nginx)
* prometheus:/etc/ssl/private/csclub-wildcard-chain.crt (for Apache)
* bigbluebutton:/etc/nginx/ssl/csclub-wildcard-chain.crt (podman container on xylitol)
* icy:/etc/ssl/private/csclub-wildcard.pem (for Icecast)
* chamomile:/etc/ssl/private/cloud.csclub.uwaterloo.ca.chain.crt, /etc/ssl/private/csclub.cloud.chain, /etc/ssl/private/csclub.uwaterloo.ca.chain (for nginx)
* biloba:/etc/ssl/private/cloud.csclub.uwaterloo.ca.chain.crt, /etc/ssl/private/csclub.cloud.chain, /etc/ssl/private/csclub.uwaterloo.ca.chain (for nginx)
* nextcloud (nspawn container inside guayusa): /etc/ssl/private/csclub.uwaterloo.ca.chain (for nginx)
* citric-acid (runs vaultwarden): /etc/ssl/private/csclub.uwaterloo.ca.{chain,key} (for nginx)

Some services (e.g. Dovecot, Postfix) prefer to have the certificate chain in one file. Concatenate the appropriate intermediate root to the end of the certificate and store this as csclub-wildcard-chain.crt.

=== More certificate locations ===
We have some SSL certificates which are not used by web servers, but still need to be renewed eventually.

==== Prometheus node exporter ====
All of our Prometheus node exporters are using mTLS via stunnel (every bare-metal host, as well as caffeine, coffee and mail, is running this exporter). The certificates (both client and server) are set to expire in September 2031; before then, create new keypairs in /opt/prometheus/tls, and deploy the new server.crt, node.crt and node.key to /etc/stunnel/tls on all machines. Restart prometheus and all of the node exporters.

==== ADFS ====
See [[ADFS]]. When the university's IdP certificate expires (October 2025), we can just download a new one and restart Apache; when our own certificate expires (July 2031), we need to submit a new form to IST (please do this before the cert expires).

==== Keycloak ====
See [[Keycloak]]. When the saml-passthrough certificate expires (January 2032), you need to create a new keypair in /srv/saml-passthrough on caffeine, and upload the new certificate into the Keycloak UI (IdP settings). When the Keycloak SP certificate expires (December 2031), make sure to create a new keypair and upload it to the Keycloak UI (Realm Settings).

== letsencrypt ==

We support letsencrypt for our virtual hosts with custom domains. We use the <tt>cerbot</tt> from debian repositories with a configuration file at <tt>/etc/letsencrypt/cli.ini</tt>, and a systemd timer to handle renewals.

The setup for a new domain is:

# Become <tt>certbot</tt> on caffine with <tt>sudo -u certbot bash</tt> or similar.
# Run <tt>certbot certonly -c /etc/letsencrypt/cli.ini -d DOMAIN --logs-dir /tmp</tt>. The logs-dir isn't important and is only needed for troubleshooting.
# Set up the Apache site configuration using the example below. (apache config is in /etc/apache2) Note the permanent redirect to https.
# Make sure to commit your changes when you're done.
# Reloading apache config is <tt>sudo systemctl reload apache2</tt>.

<VirtualHost *:80>
ServerName example.com
ServerAlias *.example.com
ServerAdmin example@csclub.uwaterloo.ca

#DocumentRoot /users/example/www/
Redirect permanent / https://example.com/

ErrorLog /var/log/apache2/example-error.log
CustomLog /var/log/apache2/example-access.log combined
</VirtualHost>

<VirtualHost csclub:443>
SSLEngine on
SSLCertificateFile /etc/letsencrypt/live/example.com/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/example.com/privkey.pem
SSLStrictSNIVHostCheck on

ServerName example.com
ServerAlias *.example.com
ServerAdmin example@csclub.uwaterloo.ca

DocumentRoot /users/example/www

ErrorLog /var/log/apache2/example-error.log
CustomLog /var/log/apache2/example-access.log combined
</VirtualHost>

== acme.sh ==
We are using [https://github.com/acmesh-official/acme.sh acme.sh] for provisioning SSL certificates for some of our *.csclub.cloud domains. It is currently set up under /root/.acme.sh on biloba.

NOTE: acme.sh has a cron job which automatically renews certificates before they expire and reloads NGINX, so you do not have to do anything after issuing and installing a certificate (i.e. "set-and-forget").

=== How to add a new SSL cert for a custom domain on CSC cloud ===
Note: you do not need to acquire new cert if the requested domain is directly on csclub.cloud, e.g. app1.csclub.cloud. We can re-use our wildcart cert on csclub.cloud for that. However, if a user requests a multi-level domain on csclub.cloud, or a domain hosted on an external registrar, then you will need to create a new cert.

Let's say user <code>ctdalek</code> wants <code>mydomain.com</code> to point to a VM on CSC cloud.
 
TLDR:
<pre>
# Obtain the cert.
# If a subdomain was also requested, pass the -d option multiple times, e.g.
# `-d mydomain.com -d sub.mydomain.com`. Make sure the "main" domain is specified first.
acme.sh --issue -d mydomain.com -w /var/www

# Install the cert.
# If a subdomain was also requested, only specify the "main" domain.
acme.sh --install-cert -d mydomain.com \
--key-file /etc/nginx/ceod/member-ssl/mydomain.com.key \
--fullchain-file /etc/nginx/ceod/member-ssl/mydomain.com.chain \
--reloadcmd "/root/bin/reload-nginx.sh"

# Create a vhost file.
# Look at the other files in the same directory for inspiration.
# Make sure the file starts with the username and an underscore, e.g. "ctdalek_",
# because this is how ceod keeps track of the vhosts.
# Make sure to set the custom domain name(s) and paths to the SSL key/cert.
vim /etc/nginx/ceod/member-vhosts/ctdalek_mydomain.com

# Finally, reload NGINX on both biloba and chamomile. The /etc/nginx/ceod directory
# is shared between them.
/root/bin/reload-nginx.sh
</pre>

=== Installation ===
<pre>
cd /opt
git clone --depth 1 https://github.com/acmesh-official/acme.sh
cd acme.sh
./acme.sh --install -m syscom@csclub.uwaterloo.ca
. "/root/.acme.sh/acme.sh.env"
</pre>
Important: If invoking acme.sh from another program, it needs the environment variables set in acme.sh.env. Currently, that is just
<pre>
LE_WORKING_DIR="/root/.acme.sh"
</pre>

For testing purposes, make sure to use the Let's Encrypt test server:
<pre>
acme.sh --set-default-ca --server letsencrypt_test
</pre>

=== NGINX setup ===
<pre>
mkdir -p /var/www/.well-known/acme-challenge
</pre>
Add the following snippet to your default NGINX file (e.g. /etc/nginx/sites-enabled/default):
<pre>
# For Let's Encrypt
location /.well-known/acme-challenge/ {
alias /var/www/.well-known/acme-challenge/;
}
</pre>

Now assuming that biloba has the IP address for *.csclub.cloud, you can test that everything is working:
<pre>
acme.sh --issue -d app.merenber.csclub.cloud -w /var/www
</pre>
To install a certificate after it's been issued:
<pre>
acme.sh --install-cert -d app.merenber.csclub.cloud \
--key-file /etc/nginx/ceod/member-ssl/app.merenber.csclub.cloud.key \
--fullchain-file /etc/nginx/ceod/member-ssl/app.merenber.csclub.cloud.chain \
--reloadcmd "/root/bin/reload-nginx.sh"
</pre>
At this point, you should add your NGINX vhost file which uses that SSL certificate.
 
To remove a certificate:
<pre>
acme.sh --remove -d app.merenber.csclub.cloud
rm -r /root/.acme.sh/app.merenber.csclub.cloud
rm /etc/nginx/ceod/member-ssl/app.merenber.csclub.cloud.chain
rm /etc/nginx/ceod/member-ssl/app.merenber.csclub.cloud.key
</pre>
Don't forget to remove the NGINX vhost file too.

Once you think you're ready, use a real ACME provider, e.g.
<pre>
acme.sh --set-default-ca --server letsencrypt
</pre>

Since we have a [https://zerossl.com ZeroSSL] account, and ZeroSSL has no rate limit, we are going to use that instead:
<pre>
acme.sh --register-account --server zerossl \
--eab-kid xxxxxxxxxxxx \
--eab-hmac-key xxxxxxxxx
acme.sh --set-default-ca --server zerossl
</pre>

=== DNS challenge ===
To obtain a wildcard certificate (e.g. *.k8s.csclub.cloud), you will need to perform the DNS-01 challenge. We are going to use nsupdate to interact with our BIND9 server on dns1.

On dns1, run:
<pre>
tsig-keygen csc-cloud
</pre>
Paste the output into the appropriate section in /etc/bind/named.conf.local. Also paste it into a file somewhere on biloba, e.g. /etc/csc/csc-cloud-tsig.key.

Add the following to the csclub.cloud zone block:
<pre>
allow-update {
!{
!127.0.0.1;
!::1;
!129.97.134.0/24;
!2620:101:f000:4901::/64;
any;
};
key csc-cloud;
};
</pre>
(We're basically trying to restrict updates to the given IP ranges. See https://serverfault.com/a/417229.)

The 'bind' user can't write to files under /etc/bind, so we're going to move our zone file to /var/lib/bind instead.
Comment out 'file "/etc/bind/db.csclub.cloud";' from named.conf.local and add this line below it:
<pre>
file "/var/lib/bind/db.csclub.cloud";
</pre>
Then run:
<pre>
cp /etc/bind/db.csclub.cloud /var/lib/bind/db.csclub.cloud
chown bind:bind /var/lib/bind/db.csclub.cloud
rndc reload
</pre>

On biloba, check that everything's working:
<pre>
nsupdate -k /etc/csc/csc-cloud-tsig.key -v <<EOF
update add test.csclub.cloud 300 A 0.0.0.0
send
EOF
</pre>
Use a tool such as <code>dig</code> to make sure that the update was successful.
If it worked, you can delete the record:
<pre>
nsupdate -k /etc/csc/csc-cloud-tsig.key -v <<EOF
delete test.csclub.cloud
send
EOF
</pre>
Now we are ready to actually perform the challenge with acme.sh:
<pre>
export NSUPDATE_SERVER="dns1.csclub.uwaterloo.ca"
export NSUPDATE_KEY="/etc/csc/csc-cloud-tsig.key"
acme.sh --issue --dns dns_nsupdate -d 'k8s.csclub.cloud' -d '*.k8s.csclub.cloud'
</pre>
(If something goes wrong, use the <code>--debug</code> flag.)

If all went well, just install the certificate as usual:
<pre>
acme.sh --install-cert -d k8s.csclub.cloud \
--key-file /etc/nginx/ceod/syscom-ssl/k8s.csclub.cloud.key \
--fullchain-file /etc/nginx/ceod/syscom-ssl/k8s.csclub.cloud.chain \
--reloadcmd 'systemctl reload nginx'
</pre>

Podman

2024-08-16T13:28:43Z

Merenber: /* Networking */

[https://podman.io/ Podman] is a very neat Docker-compatible container solution. Some of the advantages it has over Docker are:

* no daemon (uses a fork-and-exec model)
* systemd can run inside containers very easily
* containers can become systemd services on the host
* non-root users can run containers

== Installation ==
As of bullseye, podman is available in the official Debian repositories. I suggest installing it from the unstable distribution, since podman 3.2 has many useful improvements over previous versions:
<pre>
apt install -t unstable podman podman-docker
</pre>
The podman-docker package provides a wrapper script so that running the command 'docker' will invoke podman. Recent versions of podman also provide API compatibility with Docker, which means that docker-compose will actually work out of the box. (For non-root users, you will need to set the DOCKER_HOST environment variable to <code>unix://$XDG_RUNTIME_DIR/podman/podman.sock</code>).

I suggest adding the following to /etc/containers/registries.conf so that podman automatically pulls packages from docker.io instead of quay.io:
<pre>
[registries.search]
registries = ['docker.io']
</pre>

== Networking ==
As of this writing (2024-08-16), the latest network backend in Podman is [https://github.com/containers/netavark netavark]. Hosts which are still using the legacy CNI backend should switch to netavark as soon as possible, because support for CNI will be removed in Podman 5.0. Unfortunately, the officially recommended way to migrate from CNI to netavark is to run "podman system reset", which deletes '''everything''' (containers, images, networks, etc.). This is usually undesirable. Here's what I suggest instead (assuming you don't have custom Podman networks):

<ol>
<li>Stop all running containers.</li>
<li>Run <code>echo -n netavark > /var/lib/containers/storage/defaultNetworkBackend</code>.</li>
<li>Restart the stopped containers.</li>
</ol>

If you had custom networks before, this is trickier. You will need to manually convert the CNI JSON file into the netavark JSON format (under /etc/containers/networks).

=== Directly exposing a container to a public network ===
The easiest way to do this, in my opinion, is with a macvlan network. Here's an example of how this was done for [[BigBlueButton]] on xylitol:

<pre>
podman network create \
--driver=macvlan \
--ipv6 \
--opt parent=br0 \
--subnet=129.97.134.0/24 \
--gateway=129.97.134.1 \
--subnet=2620:101:f000:4901:c5c::0/64 \
--gateway=2620:101:f000:4901::1 \
bbbnet
</pre>

Then create a pod in which the containers will be run:

<pre>
podman pod create \
--name bbbpod \
--network bbbnet \
--share net \
--ip=129.97.134.173 \
--ip6=2620:101:f000:4901:c5c::173
</pre>

== Systemd ==
Podman integrates with systemd in both directions - systemd can run in podman, and podman can run in systemd.

=== Systemd in podman ===
To run systemd in podman, just create a Dockerfile like the following:
<pre>
FROM ubuntu:bionic

ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && apt install -y systemd
RUN passwd -d root

CMD [ "/bin/systemd" ]
</pre>
Then run:
<pre>
podman build --privileged -t ubuntu-systemd:bionic -f ubuntu-bionic-systemd.Dockerfile
</pre>
If you're running this as root, I suggest using the --privileged flag. I am pretty sure that there some specific capabilities you can add instead to make it work (via the --cap-add flag), but this is easier.

Then, to run a container with this image:
<pre>
podman run -it --privileged ubuntu-systemd:bionic
</pre>

=== Podman in systemd ===
Podman has a built-in command to generate systemd service files to start containers and pods. For example, let's say we have a pod named bbbpod. Run the following:
<pre>
podman generate systemd --files --name bbbpod
</pre>
This will create .service files for the pod and the containers inside it. Now you just need to enable them:
<pre>
mv *.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable pod-bbbpod.service
</pre>
If you now run <code>systemctl start pod-bbbpod</code>, the pod and its containers will start.

== Pods ==
Podman pods are similar to Kubernetes pods; they can share namespaces with each other, such as network namespaces and UTS namespaces. In this example, we will use a network namespace.

First, we create a pod in the network we previously created:
<pre>
podman pod create --network bbbnet --name bbbpod --share net
</pre>
Then run a container inside the pod:
<pre>
podman run -it --name bbb --hostname bbb --pod bbbpod --privileged ubuntu-systemd:bionic
</pre>
You can add more containers to the pod:
<pre>
podman run -d --name greenlight --pod bbbpod --env-file $PWD/env bigbluebutton/greenlight:v2
</pre>
The bbb and greenlight containers can now communicate with each other over localhost.

Important: Make sure to edit /etc/hostname and /etc/network/interfaces (or whichever network manager you decide to use) in each container.

== Volumes ==
Unfortunately podman does not currently have functionality to allocate a separate volume to each container. Instead, I suggest mounting each root-level folder in a separate volume.

Let's say you created a new LVM volume mounted at /vm/bigbluebutton. So create your container like the following:
<pre>
podman run ... --name bbb -v /vm/bigbluebutton/bin:/bin -v /vm/bigbluebutton/boot:/boot -v /vm/bigbluebutton/etc:/etc -v /vm/bigbluebutton/home:/home -v /vm/bigbluebutton/lib:/lib -v /vm/bigbluebutton/lib64:/lib64 -v /vm/bigbluebutton/media:/media -v /vm/bigbluebutton/mnt:/mnt -v /vm/bigbluebutton/opt:/opt -v /vm/bigbluebutton/root:/root -v /vm/bigbluebutton/sbin:/sbin -v /vm/bigbluebutton/srv:/srv -v /vm/bigbluebutton/usr:/usr -v /vm/bigbluebutton/var:/var ubuntu-systemd:bionic
</pre>

It is also a good idea to mount /var/lib/containers in a separate LVM volume to avoid running out of space on the host.

Podman

2024-08-16T06:03:30Z

Merenber: /* Networking */

[https://podman.io/ Podman] is a very neat Docker-compatible container solution. Some of the advantages it has over Docker are:

* no daemon (uses a fork-and-exec model)
* systemd can run inside containers very easily
* containers can become systemd services on the host
* non-root users can run containers

== Installation ==
As of bullseye, podman is available in the official Debian repositories. I suggest installing it from the unstable distribution, since podman 3.2 has many useful improvements over previous versions:
<pre>
apt install -t unstable podman podman-docker
</pre>
The podman-docker package provides a wrapper script so that running the command 'docker' will invoke podman. Recent versions of podman also provide API compatibility with Docker, which means that docker-compose will actually work out of the box. (For non-root users, you will need to set the DOCKER_HOST environment variable to <code>unix://$XDG_RUNTIME_DIR/podman/podman.sock</code>).

I suggest adding the following to /etc/containers/registries.conf so that podman automatically pulls packages from docker.io instead of quay.io:
<pre>
[registries.search]
registries = ['docker.io']
</pre>

== Networking ==
As of this writing (2024-08-16), the latest network backend in Podman is [https://github.com/containers/netavark netavark]. Hosts which are still using the legacy CNI backend should switch to netavark as soon as possible, because support for CNI will be removed in Podman 5.0. Unfortunately, the officially recommended way to migrate from CNI to netavark is to run "podman system reset", which deletes '''everything''' (containers, images, networks, etc.). This is usually undesirable. Here's what I suggest instead (assuming you don't have custom Podman networks):

<ol>
<li>Stop all running containers.</li>
<li>Run <code>echo -n netavark > /var/lib/containers/storage/defaultNetworkBackend</code>.</li>
<li>Restart the stopped containers.</li>
</ol>

If you had custom networks before, this is trickier. You will need to manually convert the CNI JSON file into the netavark JSON format (under /etc/containers/networks).

=== Directly exposing a container to a public network ===
The easiest way to do this, in my opinion, is with a macvlan network. Here's an example of how this was done for [[BigBlueButton]] on xylitol:

<pre>
podman network create \
--driver=macvlan \
--ipv6 \
--opt parent=br0 \
--subnet=129.97.134.0/24 \
--gateway=129.97.134.1 \
--subnet=2620:101:f000:4901:c5c::0/64 \
--gateway=2620:101:f000:4901::1 \
bbbnet
</pre>

== Systemd ==
Podman integrates with systemd in both directions - systemd can run in podman, and podman can run in systemd.

=== Systemd in podman ===
To run systemd in podman, just create a Dockerfile like the following:
<pre>
FROM ubuntu:bionic

ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && apt install -y systemd
RUN passwd -d root

CMD [ "/bin/systemd" ]
</pre>
Then run:
<pre>
podman build --privileged -t ubuntu-systemd:bionic -f ubuntu-bionic-systemd.Dockerfile
</pre>
If you're running this as root, I suggest using the --privileged flag. I am pretty sure that there some specific capabilities you can add instead to make it work (via the --cap-add flag), but this is easier.

Then, to run a container with this image:
<pre>
podman run -it --privileged ubuntu-systemd:bionic
</pre>

=== Podman in systemd ===
Podman has a built-in command to generate systemd service files to start containers and pods. For example, let's say we have a pod named bbbpod. Run the following:
<pre>
podman generate systemd --files --name bbbpod
</pre>
This will create .service files for the pod and the containers inside it. Now you just need to enable them:
<pre>
mv *.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable pod-bbbpod.service
</pre>
If you now run <code>systemctl start pod-bbbpod</code>, the pod and its containers will start.

== Pods ==
Podman pods are similar to Kubernetes pods; they can share namespaces with each other, such as network namespaces and UTS namespaces. In this example, we will use a network namespace.

First, we create a pod in the network we previously created:
<pre>
podman pod create --network bbbnet --name bbbpod --share net
</pre>
Then run a container inside the pod:
<pre>
podman run -it --name bbb --hostname bbb --pod bbbpod --privileged ubuntu-systemd:bionic
</pre>
You can add more containers to the pod:
<pre>
podman run -d --name greenlight --pod bbbpod --env-file $PWD/env bigbluebutton/greenlight:v2
</pre>
The bbb and greenlight containers can now communicate with each other over localhost.

Important: Make sure to edit /etc/hostname and /etc/network/interfaces (or whichever network manager you decide to use) in each container.

== Volumes ==
Unfortunately podman does not currently have functionality to allocate a separate volume to each container. Instead, I suggest mounting each root-level folder in a separate volume.

Let's say you created a new LVM volume mounted at /vm/bigbluebutton. So create your container like the following:
<pre>
podman run ... --name bbb -v /vm/bigbluebutton/bin:/bin -v /vm/bigbluebutton/boot:/boot -v /vm/bigbluebutton/etc:/etc -v /vm/bigbluebutton/home:/home -v /vm/bigbluebutton/lib:/lib -v /vm/bigbluebutton/lib64:/lib64 -v /vm/bigbluebutton/media:/media -v /vm/bigbluebutton/mnt:/mnt -v /vm/bigbluebutton/opt:/opt -v /vm/bigbluebutton/root:/root -v /vm/bigbluebutton/sbin:/sbin -v /vm/bigbluebutton/srv:/srv -v /vm/bigbluebutton/usr:/usr -v /vm/bigbluebutton/var:/var ubuntu-systemd:bionic
</pre>

It is also a good idea to mount /var/lib/containers in a separate LVM volume to avoid running out of space on the host.

Podman

2024-08-16T06:02:44Z

Merenber: /* Networking */

[https://podman.io/ Podman] is a very neat Docker-compatible container solution. Some of the advantages it has over Docker are:

* no daemon (uses a fork-and-exec model)
* systemd can run inside containers very easily
* containers can become systemd services on the host
* non-root users can run containers

== Installation ==
As of bullseye, podman is available in the official Debian repositories. I suggest installing it from the unstable distribution, since podman 3.2 has many useful improvements over previous versions:
<pre>
apt install -t unstable podman podman-docker
</pre>
The podman-docker package provides a wrapper script so that running the command 'docker' will invoke podman. Recent versions of podman also provide API compatibility with Docker, which means that docker-compose will actually work out of the box. (For non-root users, you will need to set the DOCKER_HOST environment variable to <code>unix://$XDG_RUNTIME_DIR/podman/podman.sock</code>).

I suggest adding the following to /etc/containers/registries.conf so that podman automatically pulls packages from docker.io instead of quay.io:
<pre>
[registries.search]
registries = ['docker.io']
</pre>

== Networking ==
As of this writing (2024-08-16), the latest network backend in Podman is [https://github.com/containers/netavark netavark]. Hosts which are still using the legacy CNI backend should switch to netavark as soon as possible, because support for CNI will be removed in Podman 5.0. Unfortunately, the officially recommended way to migrate from netavark to CNI is to run "podman system reset", which deletes '''everything''' (containers, images, networks, etc.). This is usually undesirable. Here's what I suggest instead (assuming you don't have custom Podman networks):

<ol>
<li>Stop all running containers.</li>
<li>Run <code>echo -n netavark > /var/lib/containers/storage/defaultNetworkBackend</code>.</li>
<li>Restart the stopped containers.</li>
</ol>

If you had custom networks before, this is trickier. You will need to manually convert the CNI JSON file into the netavark JSON format (under /etc/containers/networks).

=== Directly exposing a container to a public network ===
The easiest way to do this, in my opinion, is with a macvlan network. Here's an example of how this was done for [[BigBlueButton]] on xylitol:

<pre>
podman network create \
--driver=macvlan \
--ipv6 \
--opt parent=br0 \
--subnet=129.97.134.0/24 \
--gateway=129.97.134.1 \
--subnet=2620:101:f000:4901:c5c::0/64 \
--gateway=2620:101:f000:4901::1 \
bbbnet
</pre>

== Systemd ==
Podman integrates with systemd in both directions - systemd can run in podman, and podman can run in systemd.

=== Systemd in podman ===
To run systemd in podman, just create a Dockerfile like the following:
<pre>
FROM ubuntu:bionic

ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && apt install -y systemd
RUN passwd -d root

CMD [ "/bin/systemd" ]
</pre>
Then run:
<pre>
podman build --privileged -t ubuntu-systemd:bionic -f ubuntu-bionic-systemd.Dockerfile
</pre>
If you're running this as root, I suggest using the --privileged flag. I am pretty sure that there some specific capabilities you can add instead to make it work (via the --cap-add flag), but this is easier.

Then, to run a container with this image:
<pre>
podman run -it --privileged ubuntu-systemd:bionic
</pre>

=== Podman in systemd ===
Podman has a built-in command to generate systemd service files to start containers and pods. For example, let's say we have a pod named bbbpod. Run the following:
<pre>
podman generate systemd --files --name bbbpod
</pre>
This will create .service files for the pod and the containers inside it. Now you just need to enable them:
<pre>
mv *.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable pod-bbbpod.service
</pre>
If you now run <code>systemctl start pod-bbbpod</code>, the pod and its containers will start.

== Pods ==
Podman pods are similar to Kubernetes pods; they can share namespaces with each other, such as network namespaces and UTS namespaces. In this example, we will use a network namespace.

First, we create a pod in the network we previously created:
<pre>
podman pod create --network bbbnet --name bbbpod --share net
</pre>
Then run a container inside the pod:
<pre>
podman run -it --name bbb --hostname bbb --pod bbbpod --privileged ubuntu-systemd:bionic
</pre>
You can add more containers to the pod:
<pre>
podman run -d --name greenlight --pod bbbpod --env-file $PWD/env bigbluebutton/greenlight:v2
</pre>
The bbb and greenlight containers can now communicate with each other over localhost.

Important: Make sure to edit /etc/hostname and /etc/network/interfaces (or whichever network manager you decide to use) in each container.

== Volumes ==
Unfortunately podman does not currently have functionality to allocate a separate volume to each container. Instead, I suggest mounting each root-level folder in a separate volume.

Let's say you created a new LVM volume mounted at /vm/bigbluebutton. So create your container like the following:
<pre>
podman run ... --name bbb -v /vm/bigbluebutton/bin:/bin -v /vm/bigbluebutton/boot:/boot -v /vm/bigbluebutton/etc:/etc -v /vm/bigbluebutton/home:/home -v /vm/bigbluebutton/lib:/lib -v /vm/bigbluebutton/lib64:/lib64 -v /vm/bigbluebutton/media:/media -v /vm/bigbluebutton/mnt:/mnt -v /vm/bigbluebutton/opt:/opt -v /vm/bigbluebutton/root:/root -v /vm/bigbluebutton/sbin:/sbin -v /vm/bigbluebutton/srv:/srv -v /vm/bigbluebutton/usr:/usr -v /vm/bigbluebutton/var:/var ubuntu-systemd:bionic
</pre>

It is also a good idea to mount /var/lib/containers in a separate LVM volume to avoid running out of space on the host.

IPMI101

2024-04-03T05:08:21Z

Merenber: /* carbonated-water */

= Guide to IPMI (IPMI 101) =

IPMI is a necessary evil. Let’s learn to make the best of it.

== Setting up IPMI ==

# Install ipmitool

<pre># apt-get install ipmitool</pre>
<ol start="2" style="list-style-type: decimal;">
<li>Load IPMI modules (they are included in most upstream kernels)</li></ol>

You may also need a kernel module specific to your motherboard’s manufacture as some BMC/LOMs do not conform to IPMI spec and thus need a translation layer.

<pre># modprobe ipmi_*</pre>
<ol start="3" style="list-style-type: decimal;">
<li>Locally connect to the <code>/dev/ipmi</code> interface</li></ol>

<pre># ipmitool shell
> help
> mc info</pre>
== Securing IPMI ==

Note that root on the machine is root on the BMC and vice versa.

# User administration

(re)set the password, rename the admin account to root and delete any extra users as they can have surprising privilege. You may have to use the BMC’s web interface delete accounts.

<pre># ipmitool shell
> user list 1
ID Name ...
2 ADMIN ...
> user set password 2
User id 2: *******
User id 2: *******
> user set username 2 root
> user disable $other_user_ids</pre>
<ol start="2" style="list-style-type: decimal;">
<li>Disable NULL password and cipher suite 0</li></ol>

Note that the $channel is usually 0 but can range from 0-10 and there can be multiple NICs and so multiple channels to fix.

<pre># ipmitool shell
> lan print $channel
> lan set $channel auth ADMIN MD5
> lan set $channel auth CALLBACK MD5
> lan set $channel auth USER MD5
> lan set $channel auth OPERATOR MD5
> lan set $channel cipher_privs XXXaXXXXXXXXXXX
> lan print $channel</pre>
== Configuring networking ==

Note once again that there are sometimes multiple channels, to find the correct channel it is helpful to use either trial and error and/or an ARP scanner to find the correct MAC address. Usually the channel is 0 but I have seen 1, 8 and 17. Especially when there are multiple NICs.

<pre># ipmitool shell
> lan print $channel
> lan set $channel ipsrc static
> lan set $channel ipaddr 10.15.134.?
> lan set $channel defgw ipaddr 10.15.134.1
> lan set $channel netmask 255.255.255.0
// if you have vlan tagging enabled on the switch port, useful for a shared NIC
> lan set $channel vlan id 520</pre>
== Configuring Serial over LAN ==

To enable serial over LAN you need to ensure that it is enabled in your BIOS or EFI setup utility and further note the baud rate. 115200 is used as an example below. Note that GRUB is the only boot loader that takes input via serial properly, in my experience. Syslinux failed horribly on corn-syrup.

Paste the following into /etc/default/grub.d/99-csclub.cfg:

<pre>
GRUB_CMDLINE_LINUX="console=tty1 console=ttyS1,115200n8"
GRUB_TERMINAL_INPUT="console serial"
GRUB_TERMINAL_OUTPUT="console serial"
GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=1 --word=8 --parity=no --stop=1"
</pre>
and then run:

<pre>// on debian based distros
// Yay, Debian magic :\
# update-grub
// on upstream packages (Arch, Fedora, etc.)
# grub-mkconfig -o /boot/grub/grub.cfg

# reboot</pre>

= iDRAC =
== riboflavin ==
riboflavin is using iDRAC 6. The web console can be viewed from https://riboflavin-ipmi.csclub.uwaterloo.ca; if you are not on campus, you can use a [[How_to_SSH#SOCKS_proxy|SOCKS proxy]]. Unfortunately, the virtual console uses Java Web Start, which is now deprecated. Here's a workaround which you can use instead.

From the web UI, go to the "Console/Media" tab and click the "Launch virtual console" button. This will download a file whose name starts with "viewer.jnlp". Now go to https://www.java.com and download JRE 8; any later version will not have support for JWS (note that OpenJDK will not work; JWS was a proprietary framework from Sun/Oracle). Unpack the tarball, open jre1.8.0_391/lib/security/java.security in a text editor, and comment out the following properties (note that each property spans multiple lines):
<ul>
<li>jdk.certpath.disabledAlgorithms</li>
<li>jdk.jar.disabledAlgorithms</li>
<li>jdk.tls.disabledAlgorithms</li>
</ul>

If you are off-campus, you will need to setup some proxying so that the Java application can access ports 443 and 5900 on riboflavin-ipmi. In the example below, I am using caffeine as a jump host, but any machine on campus should do:
<pre>
ssh -L 5443:localhost:5443 -L 5900:localhost:5900 caffeine.csclub.uwaterloo.ca
</pre>

Now on caffeine, open a tmux/screen session, and run the following commands in two different panes:
<ul>
<li><code>socat TCP-LISTEN:5443,fork TCP:riboflavin-ipmi:443</code></li>
<li><code>socat TCP-LISTEN:5900,fork TCP:riboflavin-ipmi:5900</code></li>
</ul>

Back on your personal machine, open the viewer.jnlp file in a text editor and perform the following:
<ol>
<li>Replace all instances of <code>riboflavin-ipmi.csclub.uwaterloo.ca:443</code> with <code>localhost:5443</code></li>
<li>Under the <code>application-desc</code> element, the first <code>argument</code> child element should say <code>ip=riboflavin-ipmi</code>. Replace this with <code>ip=localhost</code></li>.
<li>Under the <code>application-desc</code> element, there are child <code>argument</code>elements for <code>user</code> and <code>passwd</code>. For some reason these are set to numbers; set these to the username and password for IPMI (username should be <code>root</code>).</li>
</ol>

Now run:
<pre>
jre1.8.0_391/bin/javaws viewer.jnlp
</pre>

If all goes well, the virtual console should eventually appear:
[[File:Riboflavin-idrac-virtual-console.png|1000px]]

== carbonated-water ==
carbonated-water is also using iDRAC 6, but seems to have some kind of TLS certificate configuration which prevents modern browsers from loading its web UI. So we're going to run an old version of Firefox inside a Podman container instead:
<pre>
podman run --name firefox -it -e DISPLAY --net=host -v $XAUTHORITY:/root/.Xauthority -v /tmp/.X11-unix:/tmp/.X11-unix debian:9-slim bash
sed -i 's/deb\.debian\.org/archive.debian.org/' /etc/apt/sources.list
sed -i 's/security\.debian\.org/archive.debian.org/' /etc/apt/sources.list
sed -i '/stretch-updates/d' /etc/apt/sources.list
apt update
apt install firefox-esr
firefox
</pre>
Next, follow the instructions here to set up a SOCKS proxy: [[How to SSH#SOCKS proxy]]

Now visit https://carbonated-water-ipmi.csclub.uwaterloo.ca from Firefox, login using the IPMI credentials, and download the JNLP file. Copy it from the Podman container to your computer (replace "viewer.jnlp" with the full file name):
<pre>
podman cp firefox:/root/Downloads/viewer.jnlp launch.jnlp
</pre>
Follow the same steps as done for riboflavin to edit the JDK settings and JNLP file. In addition, there are a few more settings which we need to tweak:
<ul>
<li>Run <code>jre1.8.0_391/bin/ControlPanel</code>, go to the Advanced tab, scroll down and check "TLS 1.0" and "TLS 1.1"</li>
<li>We also need to disable OCSP. In the same window, set "Check for signed code certificate revocation using" to "Certificate Revocation Lists (CRLs)" and set "Check for TLS certificate revocation using" to "Certificate Revocation Lists (CRLs)" (see [https://www.kunxi.org/2015/01/bypass-the-certpathvalidatorexception-caused-by-malformed-ocsp-response/ here] for the reference).</li>
</ul>
[[File:java-control-panel-advanced.png]]

Now you can launch the JNLP file as usual.

= Supermicro =
== ginkgo ==
To access the virtual console on ginkgo, the steps are the same as those for riboflavin, with the following changes:
<ul>
<li>In the launch.jnlp file, in the root <code><jnlp></code> tag, change the value of the <code>codebase</code> attribute from <code>https<nowiki/>://ginkgo-ipmi.csclub.uwaterloo.ca:443</code> to <code>https<nowiki/>://localhost:5443</code>. Next, in the first <code><argument></code> element under <code><application-desc></code>, replace <code>ginkgo-ipmi.csclub.uwaterloo.ca</code> with <code>localhost</code>. These are the only changes which you should make to this file (unless you are already on the campus network, in which case you do not need to modify this file at all).</li>
<li>Run <code>jre1.8.0_391/bin/ControlPanel</code>, go to the Security tab, click "Edit Site List", and add <code>https<nowiki/>://ginkgo-ipmi.csclub.uwaterloo.ca</code> as an exception.
</ul>

IPMI101

2024-04-03T05:07:15Z

Merenber: /* carbonated-water */

= Guide to IPMI (IPMI 101) =

IPMI is a necessary evil. Let’s learn to make the best of it.

== Setting up IPMI ==

# Install ipmitool

<pre># apt-get install ipmitool</pre>
<ol start="2" style="list-style-type: decimal;">
<li>Load IPMI modules (they are included in most upstream kernels)</li></ol>

You may also need a kernel module specific to your motherboard’s manufacture as some BMC/LOMs do not conform to IPMI spec and thus need a translation layer.

<pre># modprobe ipmi_*</pre>
<ol start="3" style="list-style-type: decimal;">
<li>Locally connect to the <code>/dev/ipmi</code> interface</li></ol>

<pre># ipmitool shell
> help
> mc info</pre>
== Securing IPMI ==

Note that root on the machine is root on the BMC and vice versa.

# User administration

(re)set the password, rename the admin account to root and delete any extra users as they can have surprising privilege. You may have to use the BMC’s web interface delete accounts.

<pre># ipmitool shell
> user list 1
ID Name ...
2 ADMIN ...
> user set password 2
User id 2: *******
User id 2: *******
> user set username 2 root
> user disable $other_user_ids</pre>
<ol start="2" style="list-style-type: decimal;">
<li>Disable NULL password and cipher suite 0</li></ol>

Note that the $channel is usually 0 but can range from 0-10 and there can be multiple NICs and so multiple channels to fix.

<pre># ipmitool shell
> lan print $channel
> lan set $channel auth ADMIN MD5
> lan set $channel auth CALLBACK MD5
> lan set $channel auth USER MD5
> lan set $channel auth OPERATOR MD5
> lan set $channel cipher_privs XXXaXXXXXXXXXXX
> lan print $channel</pre>
== Configuring networking ==

Note once again that there are sometimes multiple channels, to find the correct channel it is helpful to use either trial and error and/or an ARP scanner to find the correct MAC address. Usually the channel is 0 but I have seen 1, 8 and 17. Especially when there are multiple NICs.

<pre># ipmitool shell
> lan print $channel
> lan set $channel ipsrc static
> lan set $channel ipaddr 10.15.134.?
> lan set $channel defgw ipaddr 10.15.134.1
> lan set $channel netmask 255.255.255.0
// if you have vlan tagging enabled on the switch port, useful for a shared NIC
> lan set $channel vlan id 520</pre>
== Configuring Serial over LAN ==

To enable serial over LAN you need to ensure that it is enabled in your BIOS or EFI setup utility and further note the baud rate. 115200 is used as an example below. Note that GRUB is the only boot loader that takes input via serial properly, in my experience. Syslinux failed horribly on corn-syrup.

Paste the following into /etc/default/grub.d/99-csclub.cfg:

<pre>
GRUB_CMDLINE_LINUX="console=tty1 console=ttyS1,115200n8"
GRUB_TERMINAL_INPUT="console serial"
GRUB_TERMINAL_OUTPUT="console serial"
GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=1 --word=8 --parity=no --stop=1"
</pre>
and then run:

<pre>// on debian based distros
// Yay, Debian magic :\
# update-grub
// on upstream packages (Arch, Fedora, etc.)
# grub-mkconfig -o /boot/grub/grub.cfg

# reboot</pre>

= iDRAC =
== riboflavin ==
riboflavin is using iDRAC 6. The web console can be viewed from https://riboflavin-ipmi.csclub.uwaterloo.ca; if you are not on campus, you can use a [[How_to_SSH#SOCKS_proxy|SOCKS proxy]]. Unfortunately, the virtual console uses Java Web Start, which is now deprecated. Here's a workaround which you can use instead.

From the web UI, go to the "Console/Media" tab and click the "Launch virtual console" button. This will download a file whose name starts with "viewer.jnlp". Now go to https://www.java.com and download JRE 8; any later version will not have support for JWS (note that OpenJDK will not work; JWS was a proprietary framework from Sun/Oracle). Unpack the tarball, open jre1.8.0_391/lib/security/java.security in a text editor, and comment out the following properties (note that each property spans multiple lines):
<ul>
<li>jdk.certpath.disabledAlgorithms</li>
<li>jdk.jar.disabledAlgorithms</li>
<li>jdk.tls.disabledAlgorithms</li>
</ul>

If you are off-campus, you will need to setup some proxying so that the Java application can access ports 443 and 5900 on riboflavin-ipmi. In the example below, I am using caffeine as a jump host, but any machine on campus should do:
<pre>
ssh -L 5443:localhost:5443 -L 5900:localhost:5900 caffeine.csclub.uwaterloo.ca
</pre>

Now on caffeine, open a tmux/screen session, and run the following commands in two different panes:
<ul>
<li><code>socat TCP-LISTEN:5443,fork TCP:riboflavin-ipmi:443</code></li>
<li><code>socat TCP-LISTEN:5900,fork TCP:riboflavin-ipmi:5900</code></li>
</ul>

Back on your personal machine, open the viewer.jnlp file in a text editor and perform the following:
<ol>
<li>Replace all instances of <code>riboflavin-ipmi.csclub.uwaterloo.ca:443</code> with <code>localhost:5443</code></li>
<li>Under the <code>application-desc</code> element, the first <code>argument</code> child element should say <code>ip=riboflavin-ipmi</code>. Replace this with <code>ip=localhost</code></li>.
<li>Under the <code>application-desc</code> element, there are child <code>argument</code>elements for <code>user</code> and <code>passwd</code>. For some reason these are set to numbers; set these to the username and password for IPMI (username should be <code>root</code>).</li>
</ol>

Now run:
<pre>
jre1.8.0_391/bin/javaws viewer.jnlp
</pre>

If all goes well, the virtual console should eventually appear:
[[File:Riboflavin-idrac-virtual-console.png|1000px]]

== carbonated-water ==
carbonated-water is also using iDRAC 6, but seems to have some kind of TLS certificate configuration which prevents modern browsers from its web UI. So we're going to run an old version of Firefox inside a Podman container instead:
<pre>
podman run --name firefox -it -e DISPLAY --net=host -v $XAUTHORITY:/root/.Xauthority -v /tmp/.X11-unix:/tmp/.X11-unix debian:9-slim bash
sed -i 's/deb\.debian\.org/archive.debian.org/' /etc/apt/sources.list
sed -i 's/security\.debian\.org/archive.debian.org/' /etc/apt/sources.list
sed -i '/stretch-updates/d' /etc/apt/sources.list
apt update
apt install firefox-esr
firefox
</pre>
Next, follow the instructions here to set up a SOCKS proxy: [[How to SSH#SOCKS proxy]]

Now visit https://carbonated-water-ipmi.csclub.uwaterloo.ca from Firefox, login using the IPMI credentials, and download the JNLP file. Copy it from the Podman container to your computer (replace "viewer.jnlp" with the full file name):
<pre>
podman cp firefox:/root/Downloads/viewer.jnlp launch.jnlp
</pre>
Follow the same steps as done for riboflavin to edit the JDK settings and JNLP file. In addition, there are a few more settings which we need to tweak:
<ul>
<li>Run <code>jre1.8.0_391/bin/ControlPanel</code>, go to the Advanced tab, scroll down and check "TLS 1.0" and "TLS 1.1"</li>
<li>We also need to disable OCSP. In the same window, set "Check for signed code certificate revocation using" to "Certificate Revocation Lists (CRLs)" and set "Check for TLS certificate revocation using" to "Certificate Revocation Lists (CRLs)" (see [https://www.kunxi.org/2015/01/bypass-the-certpathvalidatorexception-caused-by-malformed-ocsp-response/ here] for the reference).</li>
</ul>
[[File:java-control-panel-advanced.png]]

Now you can launch the JNLP file as usual.

= Supermicro =
== ginkgo ==
To access the virtual console on ginkgo, the steps are the same as those for riboflavin, with the following changes:
<ul>
<li>In the launch.jnlp file, in the root <code><jnlp></code> tag, change the value of the <code>codebase</code> attribute from <code>https<nowiki/>://ginkgo-ipmi.csclub.uwaterloo.ca:443</code> to <code>https<nowiki/>://localhost:5443</code>. Next, in the first <code><argument></code> element under <code><application-desc></code>, replace <code>ginkgo-ipmi.csclub.uwaterloo.ca</code> with <code>localhost</code>. These are the only changes which you should make to this file (unless you are already on the campus network, in which case you do not need to modify this file at all).</li>
<li>Run <code>jre1.8.0_391/bin/ControlPanel</code>, go to the Security tab, click "Edit Site List", and add <code>https<nowiki/>://ginkgo-ipmi.csclub.uwaterloo.ca</code> as an exception.
</ul>

IPMI101

2024-04-03T05:06:53Z

Merenber: /* carbonated-water */

= Guide to IPMI (IPMI 101) =

IPMI is a necessary evil. Let’s learn to make the best of it.

== Setting up IPMI ==

# Install ipmitool

<pre># apt-get install ipmitool</pre>
<ol start="2" style="list-style-type: decimal;">
<li>Load IPMI modules (they are included in most upstream kernels)</li></ol>

You may also need a kernel module specific to your motherboard’s manufacture as some BMC/LOMs do not conform to IPMI spec and thus need a translation layer.

<pre># modprobe ipmi_*</pre>
<ol start="3" style="list-style-type: decimal;">
<li>Locally connect to the <code>/dev/ipmi</code> interface</li></ol>

<pre># ipmitool shell
> help
> mc info</pre>
== Securing IPMI ==

Note that root on the machine is root on the BMC and vice versa.

# User administration

(re)set the password, rename the admin account to root and delete any extra users as they can have surprising privilege. You may have to use the BMC’s web interface delete accounts.

<pre># ipmitool shell
> user list 1
ID Name ...
2 ADMIN ...
> user set password 2
User id 2: *******
User id 2: *******
> user set username 2 root
> user disable $other_user_ids</pre>
<ol start="2" style="list-style-type: decimal;">
<li>Disable NULL password and cipher suite 0</li></ol>

Note that the $channel is usually 0 but can range from 0-10 and there can be multiple NICs and so multiple channels to fix.

<pre># ipmitool shell
> lan print $channel
> lan set $channel auth ADMIN MD5
> lan set $channel auth CALLBACK MD5
> lan set $channel auth USER MD5
> lan set $channel auth OPERATOR MD5
> lan set $channel cipher_privs XXXaXXXXXXXXXXX
> lan print $channel</pre>
== Configuring networking ==

Note once again that there are sometimes multiple channels, to find the correct channel it is helpful to use either trial and error and/or an ARP scanner to find the correct MAC address. Usually the channel is 0 but I have seen 1, 8 and 17. Especially when there are multiple NICs.

<pre># ipmitool shell
> lan print $channel
> lan set $channel ipsrc static
> lan set $channel ipaddr 10.15.134.?
> lan set $channel defgw ipaddr 10.15.134.1
> lan set $channel netmask 255.255.255.0
// if you have vlan tagging enabled on the switch port, useful for a shared NIC
> lan set $channel vlan id 520</pre>
== Configuring Serial over LAN ==

To enable serial over LAN you need to ensure that it is enabled in your BIOS or EFI setup utility and further note the baud rate. 115200 is used as an example below. Note that GRUB is the only boot loader that takes input via serial properly, in my experience. Syslinux failed horribly on corn-syrup.

Paste the following into /etc/default/grub.d/99-csclub.cfg:

<pre>
GRUB_CMDLINE_LINUX="console=tty1 console=ttyS1,115200n8"
GRUB_TERMINAL_INPUT="console serial"
GRUB_TERMINAL_OUTPUT="console serial"
GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=1 --word=8 --parity=no --stop=1"
</pre>
and then run:

<pre>// on debian based distros
// Yay, Debian magic :\
# update-grub
// on upstream packages (Arch, Fedora, etc.)
# grub-mkconfig -o /boot/grub/grub.cfg

# reboot</pre>

= iDRAC =
== riboflavin ==
riboflavin is using iDRAC 6. The web console can be viewed from https://riboflavin-ipmi.csclub.uwaterloo.ca; if you are not on campus, you can use a [[How_to_SSH#SOCKS_proxy|SOCKS proxy]]. Unfortunately, the virtual console uses Java Web Start, which is now deprecated. Here's a workaround which you can use instead.

From the web UI, go to the "Console/Media" tab and click the "Launch virtual console" button. This will download a file whose name starts with "viewer.jnlp". Now go to https://www.java.com and download JRE 8; any later version will not have support for JWS (note that OpenJDK will not work; JWS was a proprietary framework from Sun/Oracle). Unpack the tarball, open jre1.8.0_391/lib/security/java.security in a text editor, and comment out the following properties (note that each property spans multiple lines):
<ul>
<li>jdk.certpath.disabledAlgorithms</li>
<li>jdk.jar.disabledAlgorithms</li>
<li>jdk.tls.disabledAlgorithms</li>
</ul>

If you are off-campus, you will need to setup some proxying so that the Java application can access ports 443 and 5900 on riboflavin-ipmi. In the example below, I am using caffeine as a jump host, but any machine on campus should do:
<pre>
ssh -L 5443:localhost:5443 -L 5900:localhost:5900 caffeine.csclub.uwaterloo.ca
</pre>

Now on caffeine, open a tmux/screen session, and run the following commands in two different panes:
<ul>
<li><code>socat TCP-LISTEN:5443,fork TCP:riboflavin-ipmi:443</code></li>
<li><code>socat TCP-LISTEN:5900,fork TCP:riboflavin-ipmi:5900</code></li>
</ul>

Back on your personal machine, open the viewer.jnlp file in a text editor and perform the following:
<ol>
<li>Replace all instances of <code>riboflavin-ipmi.csclub.uwaterloo.ca:443</code> with <code>localhost:5443</code></li>
<li>Under the <code>application-desc</code> element, the first <code>argument</code> child element should say <code>ip=riboflavin-ipmi</code>. Replace this with <code>ip=localhost</code></li>.
<li>Under the <code>application-desc</code> element, there are child <code>argument</code>elements for <code>user</code> and <code>passwd</code>. For some reason these are set to numbers; set these to the username and password for IPMI (username should be <code>root</code>).</li>
</ol>

Now run:
<pre>
jre1.8.0_391/bin/javaws viewer.jnlp
</pre>

If all goes well, the virtual console should eventually appear:
[[File:Riboflavin-idrac-virtual-console.png|1000px]]

== carbonated-water ==
carbonated-water is also using iDRAC 6, but seems to have some kind of TLS certificate configuration which prevents modern browsers from its web UI. So we're going to run an old version of Firefox inside a Podman container instead:
<pre>
podman run --name firefox -it -e DISPLAY --net=host -v $XAUTHORITY:/root/.Xauthority -v /tmp/.X11-unix:/tmp/.X11-unix debian:9-slim bash
sed -i 's/deb\.debian\.org/archive.debian.org/' /etc/apt/sources.list
sed -i 's/security\.debian\.org/archive.debian.org/' /etc/apt/sources.list
sed -i '/stretch-updates/d' /etc/apt/sources.list
apt update
apt install firefox-esr
firefox
</pre>
Next, follow the instructions here to set up a SOCKS proxy: [[How to SSH#SOCKS proxy]]

Now visit https://carbonated-water-ipmi.csclub.uwaterloo.ca from Firefox, login using the IPMI credentials, and download the JNLP file. Copy it from the Podman container to your computer (replace "viewer.jnlp" with the full file name):
<pre>
podman cp firefox:/root/Downloads/viewer.jnlp launch.jnlp
</pre>
Follow the same steps as done for riboflavin to edit the JDK settings and JNLP file. In addition, there are a few more settings which we need to tweak:
<ul>
<li>Run <code>jre1.8.0_391/bin/ControlPanel</code>, go to the Advanced tab, scroll down and check "TLS 1.0" and "TLS 1.1"</li>
<li>We also need to disable OCSP. In the same window, set "Check for signed code certificate revocation using" to "Certificate Revocation Lists (CRLs)" and set "Check for TLS certificate revocation using" to "Certificate Revocation Lists (CRLs)" (see [https://www.kunxi.org/2015/01/bypass-the-certpathvalidatorexception-caused-by-malformed-ocsp-response/ here] for the reference).</li>
</ul>
[[File:java-control-panel-advanced.png]]
Now you can launch the JNLP file as usual.

= Supermicro =
== ginkgo ==
To access the virtual console on ginkgo, the steps are the same as those for riboflavin, with the following changes:
<ul>
<li>In the launch.jnlp file, in the root <code><jnlp></code> tag, change the value of the <code>codebase</code> attribute from <code>https<nowiki/>://ginkgo-ipmi.csclub.uwaterloo.ca:443</code> to <code>https<nowiki/>://localhost:5443</code>. Next, in the first <code><argument></code> element under <code><application-desc></code>, replace <code>ginkgo-ipmi.csclub.uwaterloo.ca</code> with <code>localhost</code>. These are the only changes which you should make to this file (unless you are already on the campus network, in which case you do not need to modify this file at all).</li>
<li>Run <code>jre1.8.0_391/bin/ControlPanel</code>, go to the Security tab, click "Edit Site List", and add <code>https<nowiki/>://ginkgo-ipmi.csclub.uwaterloo.ca</code> as an exception.
</ul>

2023-12-12T06:07:51Z

Merenber:

= Guide to IPMI (IPMI 101) =

IPMI is a necessary evil. Let’s learn to make the best of it.

== Setting up IPMI ==

# Install ipmitool

<pre># apt-get install ipmitool</pre>
<ol start="2" style="list-style-type: decimal;">
<li>Load IPMI modules (they are included in most upstream kernels)</li></ol>

You may also need a kernel module specific to your motherboard’s manufacture as some BMC/LOMs do not conform to IPMI spec and thus need a translation layer.

<pre># modprobe ipmi_*</pre>
<ol start="3" style="list-style-type: decimal;">
<li>Locally connect to the <code>/dev/ipmi</code> interface</li></ol>

<pre># ipmitool shell
> help
> mc info</pre>
== Securing IPMI ==

Note that root on the machine is root on the BMC and vice versa.

# User administration

(re)set the password, rename the admin account to root and delete any extra users as they can have surprising privilege. You may have to use the BMC’s web interface delete accounts.

<pre># ipmitool shell
> user list 1
ID Name ...
2 ADMIN ...
> user set password 2
User id 2: *******
User id 2: *******
> user set username 2 root
> user disable $other_user_ids</pre>
<ol start="2" style="list-style-type: decimal;">
<li>Disable NULL password and cipher suite 0</li></ol>

Note that the $channel is usually 0 but can range from 0-10 and there can be multiple NICs and so multiple channels to fix.

<pre># ipmitool shell
> lan print $channel
> lan set $channel auth ADMIN MD5
> lan set $channel auth CALLBACK MD5
> lan set $channel auth USER MD5
> lan set $channel auth OPERATOR MD5
> lan set $channel cipher_privs XXXaXXXXXXXXXXX
> lan print $channel</pre>
== Configuring networking ==

Note once again that there are sometimes multiple channels, to find the correct channel it is helpful to use either trial and error and/or an ARP scanner to find the correct MAC address. Usually the channel is 0 but I have seen 1, 8 and 17. Especially when there are multiple NICs.

<pre># ipmitool shell
> lan print $channel
> lan set $channel ipsrc static
> lan set $channel ipaddr 10.15.134.?
> lan set $channel defgw ipaddr 10.15.134.1
> lan set $channel netmask 255.255.255.0
// if you have vlan tagging enabled on the switch port, useful for a shared NIC
> lan set $channel vlan id 520</pre>
== Configuring Serial over LAN ==

To enable serial over LAN you need to ensure that it is enabled in your BIOS or EFI setup utility and further note the baud rate. 115200 is used as an example below. Note that GRUB is the only boot loader that takes input via serial properly, in my experience. Syslinux failed horribly on corn-syrup.

Paste the following into /etc/default/grub.d/99-csclub.cfg:

<pre>
GRUB_CMDLINE_LINUX="console=tty1 console=ttyS1,115200n8"
GRUB_TERMINAL_INPUT="console serial"
GRUB_TERMINAL_OUTPUT="console serial"
GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=1 --word=8 --parity=no --stop=1"
</pre>
and then run:

<pre>// on debian based distros
// Yay, Debian magic :\
# update-grub
// on upstream packages (Arch, Fedora, etc.)
# grub-mkconfig -o /boot/grub/grub.cfg

# reboot</pre>

= iDRAC =
== riboflavin ==
riboflavin is using iDRAC 6. The web console can be viewed from https://riboflavin-ipmi.csclub.uwaterloo.ca; if you are not on campus, you can use a [[How_to_SSH#SOCKS_proxy|SOCKS proxy]]. Unfortunately, the virtual console uses Java Web Start, which is now deprecated. Here's a workaround which you can use instead.

From the web UI, go to the "Console/Media" tab and click the "Launch virtual console" button. This will download a file whose name starts with "viewer.jnlp". Now go to https://www.java.com and download JRE 8; any later version will not have support for JWS (note that OpenJDK will not work; JWS was a proprietary framework from Sun/Oracle). Unpack the tarball, open jre1.8.0_391/lib/security/java.security in a text editor, and comment out the following properties (note that each property spans multiple lines):
<ul>
<li>jdk.certpath.disabledAlgorithms</li>
<li>jdk.jar.disabledAlgorithms</li>
<li>jdk.tls.disabledAlgorithms</li>
</ul>

If you are off-campus, you will need to setup some proxying so that the Java application can access ports 443 and 5900 on riboflavin-ipmi. In the example below, I am using caffeine as a jump host, but any machine on campus should do:
<pre>
ssh -L 5443:localhost:5443 -L 5900:localhost:5900 caffeine.csclub.uwaterloo.ca
</pre>

Now on caffeine, open a tmux/screen session, and run the following commands in two different panes:
<ul>
<li><code>socat TCP-LISTEN:5443,fork TCP:riboflavin-ipmi:443</code></li>
<li><code>socat TCP-LISTEN:5900,fork TCP:riboflavin-ipmi:5900</code></li>
</ul>

Back on your personal machine, open the viewer.jnlp file in a text editor and perform the following:
<ol>
<li>Replace all instances of <code>riboflavin-ipmi.csclub.uwaterloo.ca:443</code> with <code>localhost:5443</code></li>
<li>Under the <code>application-desc</code> element, the first <code>argument</code> child element should say <code>ip=riboflavin-ipmi</code>. Replace this with <code>ip=localhost</code></li>.
<li>Under the <code>application-desc</code> element, there are child <code>argument</code>elements for <code>user</code> and <code>passwd</code>. For some reason these are set to numbers; set these to the username and password for IPMI (username should be <code>root</code>).</li>
</ol>

Now run:
<pre>
jre1.8.0_391/bin/javaws viewer.jnlp
</pre>

If all goes well, the virtual console should eventually appear:
[[File:Riboflavin-idrac-virtual-console.png|1000px]]

= Supermicro =
== ginkgo ==
To access the virtual console on ginkgo, the steps are the same as those for riboflavin, with the following changes:
<ul>
<li>In the launch.jnlp file, in the first <code><argument></code> element under <code><application-desc></code>, replace <code>ginkgo-ipmi.csclub.uwaterloo.ca</code> with <code>localhost</code>. This is the only change which you should make to this file (unless you are already on the campus network, in which case you do not need to modify this file at all).</li>
<li>Run <code>jre1.8.0_391/bin/ControlPanel</code>, go to the Security tab, click "Edit Site List", and add <code>https://ginkgo-ipmi.csclub.uwaterloo.ca</code> as an exception.
</ul>

IPMI101

2023-12-12T06:06:53Z

Merenber:

= Guide to IPMI (IPMI 101) =

IPMI is a necessary evil. Let’s learn to make the best of it.

== Setting up IPMI ==

# Install ipmitool

<pre># apt-get install ipmitool</pre>
<ol start="2" style="list-style-type: decimal;">
<li>Load IPMI modules (they are included in most upstream kernels)</li></ol>

You may also need a kernel module specific to your motherboard’s manufacture as some BMC/LOMs do not conform to IPMI spec and thus need a translation layer.

<pre># modprobe ipmi_*</pre>
<ol start="3" style="list-style-type: decimal;">
<li>Locally connect to the <code>/dev/ipmi</code> interface</li></ol>

<pre># ipmitool shell
> help
> mc info</pre>
== Securing IPMI ==

Note that root on the machine is root on the BMC and vice versa.

# User administration

(re)set the password, rename the admin account to root and delete any extra users as they can have surprising privilege. You may have to use the BMC’s web interface delete accounts.

<pre># ipmitool shell
> user list 1
ID Name ...
2 ADMIN ...
> user set password 2
User id 2: *******
User id 2: *******
> user set username 2 root
> user disable $other_user_ids</pre>
<ol start="2" style="list-style-type: decimal;">
<li>Disable NULL password and cipher suite 0</li></ol>

Note that the $channel is usually 0 but can range from 0-10 and there can be multiple NICs and so multiple channels to fix.

<pre># ipmitool shell
> lan print $channel
> lan set $channel auth ADMIN MD5
> lan set $channel auth CALLBACK MD5
> lan set $channel auth USER MD5
> lan set $channel auth OPERATOR MD5
> lan set $channel cipher_privs XXXaXXXXXXXXXXX
> lan print $channel</pre>
== Configuring networking ==

Note once again that there are sometimes multiple channels, to find the correct channel it is helpful to use either trial and error and/or an ARP scanner to find the correct MAC address. Usually the channel is 0 but I have seen 1, 8 and 17. Especially when there are multiple NICs.

<pre># ipmitool shell
> lan print $channel
> lan set $channel ipsrc static
> lan set $channel ipaddr 10.15.134.?
> lan set $channel defgw ipaddr 10.15.134.1
> lan set $channel netmask 255.255.255.0
// if you have vlan tagging enabled on the switch port, useful for a shared NIC
> lan set $channel vlan id 520</pre>
== Configuring Serial over LAN ==

To enable serial over LAN you need to ensure that it is enabled in your BIOS or EFI setup utility and further note the baud rate. 115200 is used as an example below. Note that GRUB is the only boot loader that takes input via serial properly, in my experience. Syslinux failed horribly on corn-syrup.

Paste the following into /etc/default/grub.d/99-csclub.cfg:

<pre>
GRUB_CMDLINE_LINUX="console=tty1 console=ttyS1,115200n8"
GRUB_TERMINAL_INPUT="console serial"
GRUB_TERMINAL_OUTPUT="console serial"
GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=1 --word=8 --parity=no --stop=1"
</pre>
and then run:

<pre>// on debian based distros
// Yay, Debian magic :\
# update-grub
// on upstream packages (Arch, Fedora, etc.)
# grub-mkconfig -o /boot/grub/grub.cfg

# reboot</pre>

= iDRAC =
== riboflavin ==
riboflavin is using iDRAC 6. The web console can be viewed from https://riboflavin-ipmi.csclub.uwaterloo.ca; if you are not on campus, you can use a [[How_to_SSH#SOCKS_proxy|SOCKS proxy]]. Unfortunately, the virtual console uses Java Web Start, which is now deprecated. Here's a workaround which you can use instead.

From the web UI, go to the "Console/Media" tab and click the "Launch virtual console" button. This will download a file whose name starts with "viewer.jnlp". Now go to https://www.java.com and download JRE 8; any later version will not have support for JWS (note that OpenJDK will not work; JWS was a proprietary framework from Sun/Oracle). Unpack the tarball, open jre1.8.0_391/lib/security/java.security in a text editor, and comment out the following properties (note that each property spans multiple lines):
<ul>
<li>jdk.certpath.disabledAlgorithms</li>
<li>jdk.jar.disabledAlgorithms</li>
<li>jdk.tls.disabledAlgorithms</li>
</ul>

If you are off-campus, you will need to setup some proxying so that the Java application can access ports 443 and 5900 on riboflavin-ipmi. In the example below, I am using caffeine as a jump host, but any machine on campus should do:
<pre>
ssh -L 5443:localhost:5443 -L 5900:localhost:5900 caffeine.csclub.uwaterloo.ca
</pre>

Now on caffeine, open a tmux/screen session, and run the following commands in two different panes:
<ul>
<li><code>socat TCP-LISTEN:5443,fork TCP:riboflavin-ipmi:443</code></li>
<li><code>socat TCP-LISTEN:5900,fork TCP:riboflavin-ipmi:5900</code></li>
</ul>

Back on your personal machine, open the viewer.jnlp file in a text editor and perform the following:
<ol>
<li>Replace all instances of <code>riboflavin-ipmi.csclub.uwaterloo.ca:443</code> with <code>localhost:5443</code></li>
<li>Under the <code>application-desc</code> element, the first <code>argument</code> child element should say <code>ip=riboflavin-ipmi</code>. Replace this with <code>ip=localhost</code></li>.
<li>Under the <code>application-desc</code> element, there are child <code>argument</code>elements for <code>user</code> and <code>passwd</code>. For some reason these are set to numbers; set these to the username and password for IPMI (username should be <code>root</code>).</li>
</ol>

Now run:
<pre>
jre1.8.0_391/bin/javaws viewer.jnlp
</pre>

If all goes well, the virtual console should eventually appear:
[[File:Riboflavin-idrac-virtual-console.png|1000px]]

== Supermicro ==
=== ginkgo ===
To access the virtual console on ginkgo, the steps are the same as those for riboflavin, with the following changes:
<ul>
<li>In the launch.jnlp file, in the first <code><argument></code> element under <code><application-desc></code>, replace <code>ginkgo-ipmi.csclub.uwaterloo.ca</code> with <code>localhost</code>. This is the only change which you should make to this file (unless you are already on the campus network, in which case you do not need to modify this file at all).</li>
<li>Run <code>jre1.8.0_391/bin/ControlPanel</code>, go to the Security tab, click "Edit Site List", and add <code>https://ginkgo-ipmi.csclub.uwaterloo.ca</code> as an exception.
</ul>

IPMI101

2023-12-08T06:42:40Z

Merenber:

File:Riboflavin-idrac-virtual-console.png

2023-12-08T04:58:51Z

2023-11-25T16:52:52Z

Merenber: /* Guides */

This is the Wiki of the [[Computer Science Club]]. Feel free to start adding pages and information.

[[Special:AllPages]]

== Member/Club Rep Documentation ==
To access our Linux machines, see [[How to SSH]] and select one of the general-use machines from [[Machine List#General-Use Servers]].

To host a website, see [[Web Hosting]]. If you are trying to host websites for clubs, see [[Club Hosting]].

To use our VPS services (similar to Linode and Amazon EC2), see [https://docs.cloud.csclub.uwaterloo.ca/ CSC Cloud Documentation]. Note that you'll need to activate your account on one of CSC's machines before using the management panel.

To view instruction on playing music at the office, see [[Music]].

To use our Nextcloud instance (similar to Google Drive and Dropbox), go to [https://files.csclub.uwaterloo.ca CSC Files].

=== Guides ===
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[New Member Guide]]
* [[Club Hosting]]
* [[Web Hosting]]
* [[Git Hosting]]
* [[How to IRC]]
* [[How to SSH]]
* [[MySQL]]
* [[PostgreSQL]]
* [https://docs.cloud.csclub.uwaterloo.ca/ CSC Cloud Documentation]
</div>

=== News and Events ===
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[Meetings]]
* [[Talks]]
* [[Projects]]
</div>

== Committees Documentation ==
=== Club Operation ===
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[Budget Guide]]
* [[ceo]]
* [[Exec Manual]]
* [[MEF Guide]]
* [[Office Policies]]
* [[Office Staff]]
* [[Sysadmin Guide]]
* [[How to (Extra) Ban Someone]]
* [[SCS Guide]]
* [[Kerberos |Password Reset]]
* [[Keys and Fobs]]

* [[Talks Guide]]
</div>

=== Hardware Infrastructure (the bare metals) ===
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[Disk Drive RMA Process]]
* [[Machine List]]
* [[IPMI101]]
* [[New NetApp]]
* [[Switches]]
</div>

=== Software Infrastructure ===
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[ADFS]]
* [[Backups]]
* [[DNS]]
* [[Debian Repository]]
* [[Firewall]]
* [[Kerberos]]
* [[Keycloak]]
* [[KVM]]
* [[LDAP]]
* [[Network]]
* [[New CSC Machine]]
* [[Observability]]
* [[OID Assignment]]
* [[Podman]]
* [[Scratch]]
* [[SNMP]]
* [[SSL]]
* [[Syscom Todo]]
* [[Systemd-nspawn]]
* [[Two-Factor Authentication]]
* [[UID/GID Assignment]]
</div>

=== Services ===
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[Application List]]
* [[BigBlueButton]]
* [[Mail]]
* [[Mailing Lists]]
* [[Mirror]]
* [[Music]]
* [[Nextcloud]]
* [[Printing]]
* [[Pulseaudio]]
* [[Webmail]]
</div>

=== CSC Cloud ===
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[Ceph]]
* [[Cloud Networking]]
* [[CloudStack]]
* [[CloudStack Templates]]
* [[Kubernetes]]
</div>

== Miscellaneous ==
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[Acronyms]]
* [[Budget]]
* [[Executive]]
* [[Past Executive]]
* [[History]]
</div>

== Historical ==
<div style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
* [[Robot Arm]]
* [[Webcams]]
* [[Website]]
* [[Digital Cutter]]
* [[Electronics]]
* [[NetApp]]
* [[Frosh]]
* [[Virtualization (LXC Containers)]]
* [[Serial Connections]]
* [[Library]]
* [[MEF Proposals]]
* [[Proposed Constitution Changes]]
* [[NFS/Kerberos]]
* [[Hardware]]
* [[Imapd Guide]]
__NOTOC__

PostgreSQL

2023-11-25T16:22:46Z

Merenber: /* Backups */

== For members ==
PostgreSQL is available as a service for members on caffeine. Just run <code>ceo postgresql create</code> to create a new database for your account. As of this writing, club reps cannot create PostgreSQL databases for their clubs via ceo, so they will need to send an email to syscom instead.

== For syscom ==
We are also running a Postgres database on coffee, which is not available to members. Any software installed by syscom should use this database instead of the one on caffeine.

=== Creating a database manually on caffeine ===
See [https://git.csclub.uwaterloo.ca/public/pyceo/src/commit/392ec153d0a1a9f4068a5ba3c4e4ecb2279ebab4/ceod/db/PostgreSQLService.py#L58 how ceo does it].

=== Upgrades ===
Upgrading Postgres is more difficult than upgrading MySQL; when you upgrade the Debian version on a machine, a newer version of Postgres will be installed but the old version will remain and the data will not be migrated. You are responsible for manually upgrading the database yourself on all machines where Postgres is installed (currently, just coffee and caffeine).

Here's the Debian-specific way to do it (steps adapted from [https://www.pontikis.net/blog/update-postgres-major-version-in-debian here]). In the example below, we will assume that we are upgrading from Postgres 13 to 15.
<ol>
<li>
First, take a full backup of the database. DO NOT SKIP THIS STEP.
<pre>
pg_dumpall | xz -T0 > dump.sql.xz
</pre>
</li>
<li>
Drop the new database, which should be empty at this point. Make sure that you are not dropping the old database instead! You can run <code>pg_lsclusters</code> to see which database versions are present.
<pre>
# Make sure that this is the NEW version, not the old version!
pg_dropcluster --stop 15 main
</pre>
</li>
<li>
Upgrade the cluster:
<pre>
pg_upgradecluster -v 15 13 main
</pre>
</li>
<li>
Run psql and make sure that the databases are present:
<pre>
su - postgres -c psql
\l
\q
</pre>
</li>
<li>
Once we are sure that everything is working, drop the old database:
<pre>
# Make sure that this is the OLD version, not the new version!
pg_dropcluster --stop 13
</pre>
</li>
<li>
It is now safe to purge the old postgres package:
<pre>
apt purge postgresql-13
</pre>
</li>
</ol>

=== Backups ===
We use [https://pgbackrest.org pgBackRest] for Postgres backups. It has already been installed on coffee and caffeine.

==== Installation ====
In the example below, we will be installing pgbackrest on coffee, and using corn-syrup to store the backups (via SSH).

The pgbackrest package in bookworm is too old and doesn't support SFTP, so we're going to download the packages we need from trixie instead (starting from trixie and higher, this should no longer be necessary):
<pre>
# On coffee
wget http://mirror.csclub.uwaterloo.ca/debian/pool/main/p/pgbackrest/pgbackrest_2.48-1_amd64.deb
wget http://mirror.csclub.uwaterloo.ca/debian/pool/main/libz/libzstd/libzstd1_1.5.5+dfsg2-2_amd64.deb
apt install ./pgbackrest_2.48-1_amd64.deb ./libzstd1_1.5.5+dfsg2-2_amd64.deb
</pre>
Switch to the postgres user and create a new SSH key:
<pre>
su - postgres
ssh-keygen -t ed25519
</pre>
Login to corn-syrup, switch to the syscom user, and paste the public key you created earlier into ~/.ssh/authorized_keys:
<pre>
restrict ssh-ed25519 AAAAC3Nza... postgres@coffee
</pre>
Create a folder to store the backups:
<pre>
mkdir ~/backups/coffee/pgbackrest
</pre>
Next, on coffee, paste something like the following into /etc/pgbackrest.conf. Make sure to adjust repo1-path and pg1-path.
<pre>
[global]
repo1-retention-full=2
repo1-retention-diff=4
repo1-bundle=y
repo1-type=sftp
repo1-sftp-host=corn-syrup
repo1-sftp-host-user=syscom
repo1-path=/users/syscom/backups/coffee/pgbackrest
repo1-sftp-private-key-file=/var/lib/postgresql/.ssh/id_ed25519
repo1-sftp-public-key-file=/var/lib/postgresql/.ssh/id_ed25519.pub
repo1-sftp-host-key-hash-type=sha256
repo1-sftp-host-key-check-type=none
start-fast=y
log-level-console=info
process-max=4
compress-type=lz4

[main]
pg1-path=/var/lib/postgresql/15/main
</pre>
The config above will keep two full backups and at least four differential backups. See https://pgbackrest.org/user-guide.html#retention for more details.

Next, open /etc/postgresql/15/main/postgresql.conf and add/edit the following lines:
<pre>
archive_mode = on
archive_command = 'pgbackrest --stanza=main archive-push %p'
</pre>
See https://pgbackrest.org/user-guide.html#quickstart/configure-archiving for more details.

Next, restart Postgres:
<pre>
systemctl restart postgresql@15-main
</pre>

Switch to the postgres user, create the main stanza, and run the first backup:
<pre>
su - postgres
pgbackrest --stanza=main stanza-create
pgbackrest --stanza=main check
pgbackrest --stanza=main backup --type=full
</pre>

==== Upgrades ====
Normally, whenever you upgrade Postgres, you have to manually edit /etc/pgbackrest.conf and run the "stanza-upgrade" command. To make this easier for future sysadmins, I wrote a wrapper script around pgbackrest which does this automatically if it detects that Postgres was upgraded. Paste the following into /var/lib/postgresql/bin/pgbackrest-wrapper.sh and make it executable:
<pre>
#!/bin/bash

set -ex
if [ $USER != postgres ]; then
echo "This script should run as the postgres user" >&2
exit 1
fi
# Use the full path to ls to avoid bash aliases
mapfile -t pg_versions < <(/bin/ls -1 /var/lib/postgresql | grep -P '^\d+$')
if [ ${#pg_versions[@]} -ne 1 ]; then
echo "Expected to find 1 Postgres version, found ${#pg_versions[@]} instead: ${pg_versions[*]}" >&2
exit 1
fi
pg_ver=${pg_versions[0]}
mapfile -t pgbr_versions < <(grep -oP '/var/lib/postgresql/\K(\d+)' /etc/pgbackrest.conf)
if [ ${#pgbr_versions[@]} -ne 1 ]; then
echo "Expected to find 1 pgBackRest folder, found ${#pgbr_versions[@]} instead: ${pgbr_versions[*]}" >&2
exit 1
fi
pgbr_ver=${pgbr_versions[0]}
if [ $pg_ver -eq $pgbr_ver ]; then
# pgbackrest.conf is up to date, so just run the backup normally
pgbackrest "$@"
exit 0
elif [ $pg_ver -lt $pgbr_ver ]; then
echo "pgBackRest does not support downgrades - you will have to fix this manually" >&2
exit 1
fi
# sed -i needs to create a temporary file, and the postgres user doesn't have
# write permissions on /etc, so write to a temporary file first
sed "s,/var/lib/postgresql/$pgbr_ver,/var/lib/postgresql/$pg_ver," /etc/pgbackrest.conf > /tmp/pgbackrest.conf
cp /tmp/pgbackrest.conf /etc/pgbackrest.conf
rm /tmp/pgbackrest.conf
pgbackrest --stanza=main stanza-upgrade
pgbackrest --stanza=main check
# Run the backup
pgbackrest "$@"
</pre>
Now we can just pass pgbackrest parameters directly to this script, e.g. <code>pgbackrest-wrapper.sh --stanza=main backup</code>.

==== Cron ====
We want backups to be taken periodically. Paste the following into e.g. /etc/cron.d/postgres_backup (this file must be owned by root):
<pre>
MAILTO=root@csclub.uwaterloo.ca

# Full back up at 00:15 every Sunday and Wednesday
15 0 * * 0,3 postgres chronic ~/bin/pgbackrest-wrapper.sh --stanza=main backup --type=full
# Differential backup at 00:30 every day
30 0 * * * postgres chronic ~/bin/pgbackrest-wrapper.sh --stanza=main backup --type=diff
# Incremental backup at the 45th minute of every hour
45 * * * * postgres chronic ~/bin/pgbackrest-wrapper.sh --stanza=main backup --type=incr
</pre>

==== Restore ====
Suppose we want to restore the latest backup, and the installed Postgres is 15. First, make sure that you actually have at least one backup present for this version:
<pre>
su -c postgres -c 'pgbackrest --stanza=main info'
</pre>
Next, stop the database and delete all of the files:
<pre>
systemctl stop postgresql@15-main
rm -rf /var/lib/postgresql/15/main/*
</pre>
Now switch to the postgres user and run the "restore" command:
<pre>
su - postgres
pgbackrest --stanza=main restore
</pre>
If you start Postgres, everything should be in a working state:
<pre>
systemctl start postgresql@15-main
</pre>

If you want to restore a backup which is not the latest version, pass the <code>--set</code> argument to pgbackrest. See https://pgbackrest.org/user-guide.html#restore for more details.

PostgreSQL

2023-11-25T16:13:26Z

Merenber: /* Upgrades */

== For members ==
PostgreSQL is available as a service for members on caffeine. Just run <code>ceo postgresql create</code> to create a new database for your account. As of this writing, club reps cannot create PostgreSQL databases for their clubs via ceo, so they will need to send an email to syscom instead.

== For syscom ==
We are also running a Postgres database on coffee, which is not available to members. Any software installed by syscom should use this database instead of the one on caffeine.

=== Creating a database manually on caffeine ===
See [https://git.csclub.uwaterloo.ca/public/pyceo/src/commit/392ec153d0a1a9f4068a5ba3c4e4ecb2279ebab4/ceod/db/PostgreSQLService.py#L58 how ceo does it].

=== Upgrades ===
Upgrading Postgres is more difficult than upgrading MySQL; when you upgrade the Debian version on a machine, a newer version of Postgres will be installed but the old version will remain and the data will not be migrated. You are responsible for manually upgrading the database yourself on all machines where Postgres is installed (currently, just coffee and caffeine).

Here's the Debian-specific way to do it (steps adapted from [https://www.pontikis.net/blog/update-postgres-major-version-in-debian here]). In the example below, we will assume that we are upgrading from Postgres 13 to 15.
<ol>
<li>
First, take a full backup of the database. DO NOT SKIP THIS STEP.
<pre>
pg_dumpall | xz -T0 > dump.sql.xz
</pre>
</li>
<li>
Drop the new database, which should be empty at this point. Make sure that you are not dropping the old database instead! You can run <code>pg_lsclusters</code> to see which database versions are present.
<pre>
# Make sure that this is the NEW version, not the old version!
pg_dropcluster --stop 15 main
</pre>
</li>
<li>
Upgrade the cluster:
<pre>
pg_upgradecluster -v 15 13 main
</pre>
</li>
<li>
Run psql and make sure that the databases are present:
<pre>
su - postgres -c psql
\l
\q
</pre>
</li>
<li>
Once we are sure that everything is working, drop the old database:
<pre>
# Make sure that this is the OLD version, not the new version!
pg_dropcluster --stop 13
</pre>
</li>
<li>
It is now safe to purge the old postgres package:
<pre>
apt purge postgresql-13
</pre>
</li>
</ol>

=== Backups ===
We use [https://pgbackrest.org pgBackRest] for Postgres backups.

==== Installation ====
In the example below, we will be installing pgbackrest on coffee, and using corn-syrup to store the backups (via SSH).

The pgbackrest package in bookworm is too old and doesn't support SFTP, so we're going to download the packages we need from trixie instead (starting from trixie and higher, this should no longer be necessary):
<pre>
# On coffee
wget http://mirror.csclub.uwaterloo.ca/debian/pool/main/p/pgbackrest/pgbackrest_2.48-1_amd64.deb
wget http://mirror.csclub.uwaterloo.ca/debian/pool/main/libz/libzstd/libzstd1_1.5.5+dfsg2-2_amd64.deb
apt install ./pgbackrest_2.48-1_amd64.deb ./libzstd1_1.5.5+dfsg2-2_amd64.deb
</pre>
Switch to the postgres user and create a new SSH key:
<pre>
su - postgres
ssh-keygen -t ed25519
</pre>
Login to corn-syrup, switch to the syscom user, and paste the public key you created earlier into ~/.ssh/authorized_keys:
<pre>
restrict ssh-ed25519 AAAAC3Nza... postgres@coffee
</pre>
Create a folder to store the backups:
<pre>
mkdir ~/backups/coffee/pgbackrest
</pre>
Next, on coffee, paste something like the following into /etc/pgbackrest.conf. Make sure to adjust repo1-path and pg1-path.
<pre>
[global]
repo1-retention-full=2
repo1-retention-diff=4
repo1-bundle=y
repo1-type=sftp
repo1-sftp-host=corn-syrup
repo1-sftp-host-user=syscom
repo1-path=/users/syscom/backups/coffee/pgbackrest
repo1-sftp-private-key-file=/var/lib/postgresql/.ssh/id_ed25519
repo1-sftp-public-key-file=/var/lib/postgresql/.ssh/id_ed25519.pub
repo1-sftp-host-key-hash-type=sha256
repo1-sftp-host-key-check-type=none
start-fast=y
log-level-console=info
process-max=4
compress-type=lz4

[main]
pg1-path=/var/lib/postgresql/15/main
</pre>
The config above will keep two full backups and at least four differential backups. See https://pgbackrest.org/user-guide.html#retention for more details.

Next, open /etc/postgresql/15/main/postgresql.conf and add/edit the following lines:
<pre>
archive_mode = on
archive_command = 'pgbackrest --stanza=main archive-push %p'
</pre>
See https://pgbackrest.org/user-guide.html#quickstart/configure-archiving for more details.

Next, restart Postgres:
<pre>
systemctl restart postgresql@15-main
</pre>

Switch to the postgres user, create the main stanza, and run the first backup:
<pre>
su - postgres
pgbackrest --stanza=main stanza-create
pgbackrest --stanza=main check
pgbackrest --stanza=main backup --type=full
</pre>

==== Upgrades ====
Normally, whenever you upgrade Postgres, you have to manually edit /etc/pgbackrest.conf and run the "stanza-upgrade" command. To make this easier for future sysadmins, I wrote a wrapper script around pgbackrest which does this automatically if it detects that Postgres was upgraded. Paste the following into /var/lib/postgresql/bin/pgbackrest-wrapper.sh and make it executable:
<pre>
#!/bin/bash

set -ex
if [ $USER != postgres ]; then
echo "This script should run as the postgres user" >&2
exit 1
fi
# Use the full path to ls to avoid bash aliases
mapfile -t pg_versions < <(/bin/ls -1 /var/lib/postgresql | grep -P '^\d+$')
if [ ${#pg_versions[@]} -ne 1 ]; then
echo "Expected to find 1 Postgres version, found ${#pg_versions[@]} instead: ${pg_versions[*]}" >&2
exit 1
fi
pg_ver=${pg_versions[0]}
mapfile -t pgbr_versions < <(grep -oP '/var/lib/postgresql/\K(\d+)' /etc/pgbackrest.conf)
if [ ${#pgbr_versions[@]} -ne 1 ]; then
echo "Expected to find 1 pgBackRest folder, found ${#pgbr_versions[@]} instead: ${pgbr_versions[*]}" >&2
exit 1
fi
pgbr_ver=${pgbr_versions[0]}
if [ $pg_ver -eq $pgbr_ver ]; then
# pgbackrest.conf is up to date, so just run the backup normally
pgbackrest "$@"
exit 0
elif [ $pg_ver -lt $pgbr_ver ]; then
echo "pgBackRest does not support downgrades - you will have to fix this manually" >&2
exit 1
fi
# sed -i needs to create a temporary file, and the postgres user doesn't have
# write permissions on /etc, so write to a temporary file first
sed "s,/var/lib/postgresql/$pgbr_ver,/var/lib/postgresql/$pg_ver," /etc/pgbackrest.conf > /tmp/pgbackrest.conf
cp /tmp/pgbackrest.conf /etc/pgbackrest.conf
rm /tmp/pgbackrest.conf
pgbackrest --stanza=main stanza-upgrade
pgbackrest --stanza=main check
# Run the backup
pgbackrest "$@"
</pre>
Now we can just pass pgbackrest parameters directly to this script, e.g. <code>pgbackrest-wrapper.sh --stanza=main backup</code>.

==== Cron ====
We want backups to be taken periodically. Paste the following into e.g. /etc/cron.d/postgres_backup (this file must be owned by root):
<pre>
MAILTO=root@csclub.uwaterloo.ca

# Full back up at 00:15 every Sunday and Wednesday
15 0 * * 0,3 postgres chronic ~/bin/pgbackrest-wrapper.sh --stanza=main backup --type=full
# Differential backup at 00:30 every day
30 0 * * * postgres chronic ~/bin/pgbackrest-wrapper.sh --stanza=main backup --type=diff
# Incremental backup at the 45th minute of every hour
45 * * * * postgres chronic ~/bin/pgbackrest-wrapper.sh --stanza=main backup --type=incr
</pre>

==== Restore ====
Suppose we want to restore the latest backup, and the installed Postgres is 15. First, make sure that you actually have at least one backup present for this version:
<pre>
su -c postgres -c 'pgbackrest --stanza=main info'
</pre>
Next, stop the database and delete all of the files:
<pre>
systemctl stop postgresql@15-main
rm -rf /var/lib/postgresql/15/main/*
</pre>
Now switch to the postgres user and run the "restore" command:
<pre>
su - postgres
pgbackrest --stanza=main restore
</pre>
If you start Postgres, everything should be in a working state:
<pre>
systemctl start postgresql@15-main
</pre>

If you want to restore a backup which is not the latest version, pass the <code>--set</code> argument to pgbackrest. See https://pgbackrest.org/user-guide.html#restore for more details.

PostgreSQL

2023-11-25T16:09:07Z

Merenber: /* Installation */

== For members ==
PostgreSQL is available as a service for members on caffeine. Just run <code>ceo postgresql create</code> to create a new database for your account. As of this writing, club reps cannot create PostgreSQL databases for their clubs via ceo, so they will need to send an email to syscom instead.

== For syscom ==
We are also running a Postgres database on coffee, which is not available to members. Any software installed by syscom should use this database instead of the one on caffeine.

=== Creating a database manually on caffeine ===
See [https://git.csclub.uwaterloo.ca/public/pyceo/src/commit/392ec153d0a1a9f4068a5ba3c4e4ecb2279ebab4/ceod/db/PostgreSQLService.py#L58 how ceo does it].

=== Upgrades ===
Upgrading Postgres is more difficult than upgrading MySQL; when you upgrade the Debian version on a machine, a newer version of Postgres will be installed but the old version will remain and the data will not be migrated. You are responsible for manually upgrading the database yourself on all machines where Postgres is installed (currently, just coffee and caffeine).

Here's the Debian-specific way to do it (steps adapted from [https://www.pontikis.net/blog/update-postgres-major-version-in-debian here]). In the example below, we will assume that we are upgrading from Postgres 13 to 15.
<ol>
<li>
First, take a full backup of the database. DO NOT SKIP THIS STEP.
<pre>
pg_dumpall | xz -T0 > dump.sql.xz
</pre>
</li>
<li>
Drop the new database, which should be empty at this point. Make sure that you are not dropping the old database instead! You can run <code>pg_lsclusters</code> to see which database versions are present.
<pre>
# Make sure that this is the NEW version, not the old version!
pg_dropcluster --stop 15 main
</pre>
</li>
<li>
Upgrade the cluster:
<pre>
pg_upgradecluster -v 15 13 main
</pre>
</li>
<li>
Run psql and make sure that the databases are present:
<pre>
su - postgres -c psql
\l
\q
</pre>
</li>
<li>
Once we are sure that everything is working, drop the old database:
<pre>
# Make sure that this is the OLD version, not the new version!
pg_dropcluster --stop 13
</pre>
</li>
<li>
It is now safe to purge the old postgres package:
<pre>
apt purge postgresql-13
</pre>
</li>
</ol>

=== Backups ===
We use [https://pgbackrest.org pgBackRest] for Postgres backups.

==== Installation ====
In the example below, we will be installing pgbackrest on coffee, and using corn-syrup to store the backups (via SSH).

The pgbackrest package in bookworm is too old and doesn't support SFTP, so we're going to download the packages we need from trixie instead (starting from trixie and higher, this should no longer be necessary):
<pre>
# On coffee
wget http://mirror.csclub.uwaterloo.ca/debian/pool/main/p/pgbackrest/pgbackrest_2.48-1_amd64.deb
wget http://mirror.csclub.uwaterloo.ca/debian/pool/main/libz/libzstd/libzstd1_1.5.5+dfsg2-2_amd64.deb
apt install ./pgbackrest_2.48-1_amd64.deb ./libzstd1_1.5.5+dfsg2-2_amd64.deb
</pre>
Switch to the postgres user and create a new SSH key:
<pre>
su - postgres
ssh-keygen -t ed25519
</pre>
Login to corn-syrup, switch to the syscom user, and paste the public key you created earlier into ~/.ssh/authorized_keys:
<pre>
restrict ssh-ed25519 AAAAC3Nza... postgres@coffee
</pre>
Create a folder to store the backups:
<pre>
mkdir ~/backups/coffee/pgbackrest
</pre>
Next, on coffee, paste something like the following into /etc/pgbackrest.conf. Make sure to adjust repo1-path and pg1-path.
<pre>
[global]
repo1-retention-full=2
repo1-retention-diff=4
repo1-bundle=y
repo1-type=sftp
repo1-sftp-host=corn-syrup
repo1-sftp-host-user=syscom
repo1-path=/users/syscom/backups/coffee/pgbackrest
repo1-sftp-private-key-file=/var/lib/postgresql/.ssh/id_ed25519
repo1-sftp-public-key-file=/var/lib/postgresql/.ssh/id_ed25519.pub
repo1-sftp-host-key-hash-type=sha256
repo1-sftp-host-key-check-type=none
start-fast=y
log-level-console=info
process-max=4
compress-type=lz4

[main]
pg1-path=/var/lib/postgresql/15/main
</pre>
The config above will keep two full backups and at least four differential backups. See https://pgbackrest.org/user-guide.html#retention for more details.

Next, open /etc/postgresql/15/main/postgresql.conf and add/edit the following lines:
<pre>
archive_mode = on
archive_command = 'pgbackrest --stanza=main archive-push %p'
</pre>
See https://pgbackrest.org/user-guide.html#quickstart/configure-archiving for more details.

Next, restart Postgres:
<pre>
systemctl restart postgresql@15-main
</pre>

Switch to the postgres user, create the main stanza, and run the first backup:
<pre>
su - postgres
pgbackrest --stanza=main stanza-create
pgbackrest --stanza=main check
pgbackrest --stanza=main backup --type=full
</pre>

==== Upgrades ====
Normally, whenever you upgrade Postgres, you have to manually edit /etc/pgbackrest.conf and run the "stanza-upgrade" command. To make this easier for future sysadmins, I wrote a wrapper script around pgbackrest which does this automatically if it detects that Postgres was upgraded. Paste the following into /var/lib/postgresql/bin/pgbackrest-wrapper.sh and make it executable:
<pre>
#!/bin/bash

set -ex
if [ $USER != postgres ]; then
echo "This script should run as the postgres user" >&2
exit 1
fi
# Use the full path to ls to avoid bash aliases
mapfile -t pg_versions < <(/bin/ls -1 /var/lib/postgresql | grep -P '^\d+$')
if [ ${#pg_versions[@]} -ne 1 ]; then
echo "Expected to find 1 Postgres version, found ${#pg_versions[@]} instead: ${pg_versions[*]}" >&2
exit 1
fi
pg_ver=${pg_versions[0]}
mapfile -t pgbr_versions < <(grep -oP '/var/lib/postgresql/\K(\d+)' /etc/pgbackrest.conf)
if [ ${#pgbr_versions[@]} -ne 1 ]; then
echo "Expected to find 1 pgBackRest folder, found ${pgbr_versions[@]} instead: ${pgbr_versions[*]}" >&2
exit 1
fi
pgbr_ver=${pgbr_versions[0]}
if [ $pg_ver -eq $pgbr_ver ]; then
# pgbackrest.conf is up to date, so just run the backup normally
pgbackrest "$@"
exit 0
elif [ $pg_ver -lt $pgbr_ver ]; then
echo "pgBackRest does not support downgrades - you will have to fix this manually" >&2
exit 1
fi
# sed -i needs to create a temporary file, and the postgres user doesn't have
# write permissions on /etc, so write to a temporary file first
sed "s,/var/lib/postgresql/$pgbr_ver,/var/lib/postgresql/$pg_ver," /etc/pgbackrest.conf > /tmp/pgbackrest.conf
cp /tmp/pgbackrest.conf /etc/pgbackrest.conf
rm /tmp/pgbackrest.conf
pgbackrest --stanza=main stanza-upgrade
pgbackrest --stanza=main check
# Run the backup
pgbackrest "$@"
</pre>
Now we can just pass pgbackrest parameters directly to this script, e.g. <code>pgbackrest-wrapper.sh --stanza=main backup</code>.

==== Cron ====
We want backups to be taken periodically. Paste the following into e.g. /etc/cron.d/postgres_backup (this file must be owned by root):
<pre>
MAILTO=root@csclub.uwaterloo.ca

# Full back up at 00:15 every Sunday and Wednesday
15 0 * * 0,3 postgres chronic ~/bin/pgbackrest-wrapper.sh --stanza=main backup --type=full
# Differential backup at 00:30 every day
30 0 * * * postgres chronic ~/bin/pgbackrest-wrapper.sh --stanza=main backup --type=diff
# Incremental backup at the 45th minute of every hour
45 * * * * postgres chronic ~/bin/pgbackrest-wrapper.sh --stanza=main backup --type=incr
</pre>

==== Restore ====
Suppose we want to restore the latest backup, and the installed Postgres is 15. First, make sure that you actually have at least one backup present for this version:
<pre>
su -c postgres -c 'pgbackrest --stanza=main info'
</pre>
Next, stop the database and delete all of the files:
<pre>
systemctl stop postgresql@15-main
rm -rf /var/lib/postgresql/15/main/*
</pre>
Now switch to the postgres user and run the "restore" command:
<pre>
su - postgres
pgbackrest --stanza=main restore
</pre>
If you start Postgres, everything should be in a working state:
<pre>
systemctl start postgresql@15-main
</pre>

If you want to restore a backup which is not the latest version, pass the <code>--set</code> argument to pgbackrest. See https://pgbackrest.org/user-guide.html#restore for more details.

PostgreSQL

2023-11-25T16:08:05Z

Merenber: /* Installation */

== For members ==
PostgreSQL is available as a service for members on caffeine. Just run <code>ceo postgresql create</code> to create a new database for your account. As of this writing, club reps cannot create PostgreSQL databases for their clubs via ceo, so they will need to send an email to syscom instead.

== For syscom ==
We are also running a Postgres database on coffee, which is not available to members. Any software installed by syscom should use this database instead of the one on caffeine.

=== Creating a database manually on caffeine ===
See [https://git.csclub.uwaterloo.ca/public/pyceo/src/commit/392ec153d0a1a9f4068a5ba3c4e4ecb2279ebab4/ceod/db/PostgreSQLService.py#L58 how ceo does it].

=== Upgrades ===
Upgrading Postgres is more difficult than upgrading MySQL; when you upgrade the Debian version on a machine, a newer version of Postgres will be installed but the old version will remain and the data will not be migrated. You are responsible for manually upgrading the database yourself on all machines where Postgres is installed (currently, just coffee and caffeine).

Here's the Debian-specific way to do it (steps adapted from [https://www.pontikis.net/blog/update-postgres-major-version-in-debian here]). In the example below, we will assume that we are upgrading from Postgres 13 to 15.
<ol>
<li>
First, take a full backup of the database. DO NOT SKIP THIS STEP.
<pre>
pg_dumpall | xz -T0 > dump.sql.xz
</pre>
</li>
<li>
Drop the new database, which should be empty at this point. Make sure that you are not dropping the old database instead! You can run <code>pg_lsclusters</code> to see which database versions are present.
<pre>
# Make sure that this is the NEW version, not the old version!
pg_dropcluster --stop 15 main
</pre>
</li>
<li>
Upgrade the cluster:
<pre>
pg_upgradecluster -v 15 13 main
</pre>
</li>
<li>
Run psql and make sure that the databases are present:
<pre>
su - postgres -c psql
\l
\q
</pre>
</li>
<li>
Once we are sure that everything is working, drop the old database:
<pre>
# Make sure that this is the OLD version, not the new version!
pg_dropcluster --stop 13
</pre>
</li>
<li>
It is now safe to purge the old postgres package:
<pre>
apt purge postgresql-13
</pre>
</li>
</ol>

=== Backups ===
We use [https://pgbackrest.org pgBackRest] for Postgres backups.

==== Installation ====
In the example below, we will be installing pgbackrest on coffee, and using corn-syrup to store the backups (via SSH).

The pgbackrest package in bookworm is too old and doesn't support SFTP, so we're going to download the packages we need from trixie instead (starting from trixie and higher, this should no longer be necessary):
<pre>
# On coffee
wget http://mirror.csclub.uwaterloo.ca/debian/pool/main/p/pgbackrest/pgbackrest_2.48-1_amd64.deb
wget http://mirror.csclub.uwaterloo.ca/debian/pool/main/libz/libzstd/libzstd1_1.5.5+dfsg2-2_amd64.deb
apt install ./pgbackrest_2.48-1_amd64.deb ./libzstd1_1.5.5+dfsg2-2_amd64.deb
</pre>
Switch to the postgres user and create a new SSH key:
<pre>
su - postgres
ssh-keygen -t ed25519
</pre>
Login to corn-syrup, switch to the syscom user, and paste the public key you created earlier into ~/.ssh/authorized_keys:
<pre>
restrict ssh-ed25519 AAAAC3Nza... postgres@coffee
</pre>
Create a folder to store the backups:
<pre>
mkdir ~/backups/coffee/pgbackrest
</pre>
Next, on coffee, paste the following into /etc/pgbackrest.conf (adjust repo1-path and pg1-path as appropriate):
<pre>
[global]
repo1-retention-full=2
repo1-retention-diff=4
repo1-bundle=y
repo1-type=sftp
repo1-sftp-host=corn-syrup
repo1-sftp-host-user=syscom
repo1-path=/users/syscom/backups/coffee/pgbackrest
repo1-sftp-private-key-file=/var/lib/postgresql/.ssh/id_ed25519
repo1-sftp-public-key-file=/var/lib/postgresql/.ssh/id_ed25519.pub
repo1-sftp-host-key-hash-type=sha256
repo1-sftp-host-key-check-type=none
start-fast=y
log-level-console=info
process-max=4
compress-type=lz4

[main]
pg1-path=/var/lib/postgresql/15/main
</pre>
The config above will keep two full backups and at least four differential backups. See https://pgbackrest.org/user-guide.html#retention for more details.

Next, open /etc/postgresql/15/main/postgresql.conf and add/edit the following lines:
<pre>
archive_mode = on
archive_command = 'pgbackrest --stanza=main archive-push %p'
</pre>
See https://pgbackrest.org/user-guide.html#quickstart/configure-archiving for more details.

Next, restart Postgres:
<pre>
systemctl restart postgresql@15-main
</pre>

Switch to the postgres user, create the main stanza, and run the first backup:
<pre>
su - postgres
pgbackrest --stanza=main stanza-create
pgbackrest --stanza=main check
pgbackrest --stanza=main backup --type=full
</pre>

==== Upgrades ====
Normally, whenever you upgrade Postgres, you have to manually edit /etc/pgbackrest.conf and run the "stanza-upgrade" command. To make this easier for future sysadmins, I wrote a wrapper script around pgbackrest which does this automatically if it detects that Postgres was upgraded. Paste the following into /var/lib/postgresql/bin/pgbackrest-wrapper.sh and make it executable:
<pre>
#!/bin/bash

set -ex
if [ $USER != postgres ]; then
echo "This script should run as the postgres user" >&2
exit 1
fi
# Use the full path to ls to avoid bash aliases
mapfile -t pg_versions < <(/bin/ls -1 /var/lib/postgresql | grep -P '^\d+$')
if [ ${#pg_versions[@]} -ne 1 ]; then
echo "Expected to find 1 Postgres version, found ${#pg_versions[@]} instead: ${pg_versions[*]}" >&2
exit 1
fi
pg_ver=${pg_versions[0]}
mapfile -t pgbr_versions < <(grep -oP '/var/lib/postgresql/\K(\d+)' /etc/pgbackrest.conf)
if [ ${#pgbr_versions[@]} -ne 1 ]; then
echo "Expected to find 1 pgBackRest folder, found ${pgbr_versions[@]} instead: ${pgbr_versions[*]}" >&2
exit 1
fi
pgbr_ver=${pgbr_versions[0]}
if [ $pg_ver -eq $pgbr_ver ]; then
# pgbackrest.conf is up to date, so just run the backup normally
pgbackrest "$@"
exit 0
elif [ $pg_ver -lt $pgbr_ver ]; then
echo "pgBackRest does not support downgrades - you will have to fix this manually" >&2
exit 1
fi
# sed -i needs to create a temporary file, and the postgres user doesn't have
# write permissions on /etc, so write to a temporary file first
sed "s,/var/lib/postgresql/$pgbr_ver,/var/lib/postgresql/$pg_ver," /etc/pgbackrest.conf > /tmp/pgbackrest.conf
cp /tmp/pgbackrest.conf /etc/pgbackrest.conf
rm /tmp/pgbackrest.conf
pgbackrest --stanza=main stanza-upgrade
pgbackrest --stanza=main check
# Run the backup
pgbackrest "$@"
</pre>
Now we can just pass pgbackrest parameters directly to this script, e.g. <code>pgbackrest-wrapper.sh --stanza=main backup</code>.

==== Cron ====
We want backups to be taken periodically. Paste the following into e.g. /etc/cron.d/postgres_backup (this file must be owned by root):
<pre>
MAILTO=root@csclub.uwaterloo.ca

# Full back up at 00:15 every Sunday and Wednesday
15 0 * * 0,3 postgres chronic ~/bin/pgbackrest-wrapper.sh --stanza=main backup --type=full
# Differential backup at 00:30 every day
30 0 * * * postgres chronic ~/bin/pgbackrest-wrapper.sh --stanza=main backup --type=diff
# Incremental backup at the 45th minute of every hour
45 * * * * postgres chronic ~/bin/pgbackrest-wrapper.sh --stanza=main backup --type=incr
</pre>

==== Restore ====
Suppose we want to restore the latest backup, and the installed Postgres is 15. First, make sure that you actually have at least one backup present for this version:
<pre>
su -c postgres -c 'pgbackrest --stanza=main info'
</pre>
Next, stop the database and delete all of the files:
<pre>
systemctl stop postgresql@15-main
rm -rf /var/lib/postgresql/15/main/*
</pre>
Now switch to the postgres user and run the "restore" command:
<pre>
su - postgres
pgbackrest --stanza=main restore
</pre>
If you start Postgres, everything should be in a working state:
<pre>
systemctl start postgresql@15-main
</pre>

If you want to restore a backup which is not the latest version, pass the <code>--set</code> argument to pgbackrest. See https://pgbackrest.org/user-guide.html#restore for more details.