August 2018 Power Outage Plan

There is a planned power outage in MC from Tuesday, August 21 to Thursday, August 30.

There is also a one-day outage in DC, which will complicate keeping services up during the entire outage.

Impact

All services in MC for Aug. 21-30, and services in DC for two days during that window.

Services in PHY are not affected (This is redundant DNS and Authentication services. There are no other services (general-use or otherwise) in PHY.). It's also on a different network then MC and DC.

Timeline

Before Sunday, August 19

Complete plan for outage
Send notifications (and reminders) to csc-general
Take backups of LDAP and Kerberos, and download offsite
Take backups of system passwords, and download offsite

Sunday, August 19

Copy the CSC website to caffeine-dr
Shutdown general-use computing services
Shutdown csclub.cloud components (they won't really work since not everything is redundant yet)
Transfer computing services to redundant / temporary systems
Revoke access to home directories on aspartame to all machines

Sometime during the outage window

Shutdown DC systems before the building outage

After the outage

Being restoring normal services

Networking

Our network is announced from both MC and DC. No impact to networking is expected when MC goes offline.

DHCP is hosted in MC (on caffeine). This is not strictly required as our servers use static IPs, but we can move it to DC so it's available.

Systems

Mirror

CSCF will provided some generator power for mirror in MC.

CSCF is also setting up a second node in DC.

Website

A copy of the CSC website will be hosted on caffeine-dr. All pages not found on the local machine (including member and club sites) will return a 503 Service Unavailable error page.

Sample status page: https://www-dr.csclub.uwaterloo.ca/test

The following IP addresses should be added to caffeine-dr during the outage to serve the error page for other CSC services:

caffeine: 129.97.134.17 / 2620:101:f000:4901:c5c::caff:e12e
git: 129.97.134.49 / 2620:101:f000:4901:c5c:3eb::49
wiki: 129.97.134.44 / 2620:101:f000:4901:c5c:3eb::44
munin: 129.97.134.51 / 2620:101:f000:4901:c5c::51
prometheus: 129.97.134.15 / 2620:101:f000:4901:c5c::15

Mail

Since the outage is for a week, we need to maintain email services during the outage. An initial plan by ztseguin and jxpryde:

rsync users' .forward, .procmailrc and .maildir to a local directory on mail, allowing mail to continue as expected

However, this requires:

Users not reference any scripts, programs, etc. in their procmailrc file that reference things in their home directory

Authentication

Authentication is located in both MC and PHY.

While the MC node is down, the PHY node can continue to answer to authentication requests. However, updating membership and changing passwords will not be possible.

We may consider moving auth1 to DC for the outage.

DNS

CSC's DNS service is located in both MC and PHY.

We may consider moving the MC DNS node to DC, but this is not necessary to maintain services during the outage.

NOTE: The MC node is the master node, so we will need to ensure that the SOA record contains a long enough expiry time so the PHY doesn't stop serving zones.

August 2018 Power Outage Plan

Contents

Impact

Timeline

Before Sunday, August 19

Sunday, August 19

Sometime during the outage window

After the outage

Networking

Systems

Mirror

Website

Mail

Authentication

DNS

Additional Resources

Navigation menu

August 2018 Power Outage Plan

Impact

Timeline

Before Sunday, August 19

Sunday, August 19

Sometime during the outage window

After the outage

Networking

Systems

Mirror

Website

Mail

Authentication

DNS

Additional Resources

Navigation menu

Search