NetApp

From CSCWiki
Revision as of 21:52, 16 December 2015 by Ehashman (talk | contribs) (→‎Terminology: jxpryde nitpicking me)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

As of Fall 2013, the CSC has a NetApp FAS3000 series which is capable of hosting network shares. It was donated to us by CSCF. It is also pretty old. Like, Pentium IV old.

Documentation

All the manuals are hosted in ~sysadmin/netapp-docs/

Relevant docs for storage modification are: smg.pdf, sysadmin.pdf

iSCSI documentation is in ontop/bsag.pdf

Background

While the NetApp supports both NFS and CIFS, neither of these export options provide the versatility nor the options we desire of a network fileshare (for instance, no device authentication is supported). Instead, we have configured the NetApp to export iSCSI block devices to be mounted on aspartame. Therefore, aspartame now replaces ginseng as the primary CSC fileserver.

Terminology

  • Filer: the controller unit for the NetApp. Currently psilodump.
  • Disk shelf: where the physical disks live. Can be plugged into a filer or directly into another machine.
  • RAID: "Redundant Array of Independent Disks", used to improve reliability and protect against disk failures.
  • RAID-DP: "Double Parity" RAID, similar to RAID6 in failure tolerance (implemented like RAID4 but with two dedicated parity disks). It can survive up to two disk failures before degradation.
  • aggr: An aggregate of disks. This is a list of physical disks, similar to selecting the physical devices used for LVM.
  • vol: A volume consisting of some space on an aggregate. In general, we use the whole aggregate for a volume. RAID level is set at the volume. Similar to an LVM volume group.
  • lun: "Logical Unit Number" The LUN is a device addressed by the SCSI protocol, and looks like a disk to the user. We usually use the whole volume for a single LUN. This is similar to an LVM logical volume.

Common Commands

aggr status -r aggr_name
  Shows aggregate status
disk show -v
  Shows disks, and which filer they are owned by (currently all by psilodump)
storage
  storage related things
disk assign
  Assigns orphaned disks to a filer
vol
  Volume stuffs

NetApp Configuration

Should aspartame get totally hosed, or stability is long enough such that all sysadmin folk at the time have graduated, here is how to access, configure, and complete set up iSCSI on the NetApp+aspartame.

Access

Configuration mechanisms are accessible via SSH or serial interface, but through aspartame only, which the machine is directly plugged into. The NetApp is not visible on 134net at all.

The private IP is 10.15.134.130, only available from aspartame on the interface with IP 10.15.134.1. You may have to remove the default route from the routing table in order to successfully contact the machine with ssh.

Disk information

  • shelf 1
    • 14x136GB 10,000RPM FibreChannel disks
    • Currently disconnected, could be connected to psilodump or directly to another machine.
  • shelf 2
    • 14x136GB 10,000RPM FibreChannel disks
    • Currently assigned to psilodump
  • shelf 3
    • 14x500GB 7,200RPM ATA disks
    • Currently assigned to psilodump
  • shelf 4
    • 14x500GB 7,200RPM ATA disks
    • Currently assigned to psilodump

Aggregates

  • aggr0
    • Root aggregate volume, in RAID-DP
  • aggr1
    • Music aggregate volume, in RAID-DP
  • aggr2
    • Users aggregate volume, in RAID-DP
  • aggr3
    • Backups volume for CSC videos, in RAID-DP

Volumes

  • /vol/vol0
    • Root volume.
  • /vol/vol1music
    • Music volume. This volume is not accessible via NFS or CIFS. It contains only the iSCSI LUN /vol/vol1music/lun0 .
  • /vol/vol2users
    • Users volume. This volume is not accessible via NFS or CIFS. It contains only the iSCSI LUN /vol/vol2users/lun0 .
  • /vol/vol3backup
    • Backup volume for videos. This volume is not accessible via NFS or CIFS. It contains only the iSCSI LUN /vol/vol3backup/lun0 .

Enabling iSCSI and Auth (one-time setup)

Enable iSCSI and configure default authentication.

options iscsi.enable on
iscsi nodename iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca
iscsi security default -s CHAP -p yoursecurepassword -n psilodump

where yoursecurepassword is more secure. For iSCSI hosts, the target will be on node iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca with username psilodump and password yoursecurepassword.

Setting up a new disk aggregate, volume, and LUN

1. Login to the NetApp. You'll either need access to the physical serial console or to ssh as root to psilodump's private IP (10.15.134.130). Credentials are stored in /users/sysadmin .

2. To get information on the available disks, run the command:

aggr status -r

This command will return three lists: Active aggregates with their assigned disks, spare disks, and disks managed by the partner. An aggregate is roughly equivalent to an LVM volume group: It is a collection of physical disks, possibly across multiple disk shelves and with various RAID levels applied, which may host one or more logical volumes. Do not proceed if there are fewer than three spare disks of each type available. Refer to the NetApp documentation to add more disks or release disks from existing aggregates.

3. Choose a list of disks for your new aggregate. The available space will be approximately 2/3 of the total disk space.

4. Create the aggregate as follows:

aggr create aggrN -t raid_dp -d [disk-list]

where [disk-list] is a list of the form AA:BB CC:DD ... containing the identifiers for the disks you wish to use to create the aggregate.

5. Retrieve the aggregate information. You will need to know the available space for the next step.

aggr show_space aggrN

6. Create a volume in the aggregate:

vol create volNfoo -s volume aggrN XXXK

where XXX is the total available space in aggrN. You may need to choose a smaller number due to hidden size constraints and rounding. If you can't seem to find the right size, pick one much smaller, and then use the command

 vol size volNfoo +XXX

to grow the volume. This command will tell you how much available space remains, unlike `vol create`, so you don't need to keep guessing.

7. Disable snapshotting and access time update. Neither will be needed for exporting an iSCSI LUN.

vol options volNfoo no_atime_update on
vol options volNfoo nosnap on
snap reserve volNfoo 0

8. Create a LUN on your volume:

lun create -s XXXK -t linux /vol/volNfoo/lun0

where XXXK is the amount of available space on the volume, as shown by the command df.

9. Create an iSCSI initiator group and add all of your hosts to it:

igroup create -i -t linux volNfoo_group
igroup add volNfoo_group iqn.1993-08.org.debian:01:123456789
igroup add volNfoo_group iqn.1993-08.org.debian:01:981287231
...

The node identifiers given to the igroup add command will soon be able to access the iSCSI LUN you created above.

10. Map the LUN to the iSCSI initiator group:

lun map /vol/volNfoo/lun0 volNfoo_group

You're done! Any host in the initiator group should now be able to access the LUN you've created as a block device.

Expanding an aggregate, volume, and LUN

1. Start by getting the aggregate's status, e.g.

psilodump> aggr status -r aggr3
Aggregate aggr3 (online, raid_dp) (block checksums)
  Plex /aggr3/plex0 (online, normal, active)
    RAID group /aggr3/plex0/rg0 (normal)
    
    RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
    --------- ------  ------------- ---- ---- ---- ----- --------------    --------------
    dparity   0c.32   0c    2   0   FC:B   -  FCAL 10000 136000/278528000  139072/284820800
    parity    0c.33   0c    2   1   FC:B   -  FCAL 10000 136000/278528000  139072/284820800
    data      0a.34   0a    2   2   FC:A   -  FCAL 10000 136000/278528000  139072/284820800
    ...

2. Now determine the available spare disks:

psilodump> aggr status -s

Spare disks

RAID Disk       Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------  ------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block or zoned checksum traditional volumes or aggregates
spare           0a.41   0a    2   9   FC:A   -  FCAL 10000 136000/278528000  137104/280790184 
spare           0c.38   0c    2   6   FC:B   -  FCAL 10000 136000/278528000  137104/280790184 
spare           0c.37   0c    2   5   FC:B   -  FCAL 10000 136000/278528000  137422/281442144
...

3. Select disks by device number and add them to the aggregate, using the following command. (Use the -n flag if you want to test your command syntax with a dry run.)

psilodump> aggr add aggr3 -g rg0 -d 0a.39 0a.44 0c.40 0c.45
Addition of 4 disks to the aggregate has completed.
Wed Dec 16 19:55:09 EST [psilodump: raid.vol.disk.add.done:notice]: Addition of Disk /aggr3/plex0/rg0/0c.45 Shelf 2 Bay 13 [NETAPP   X274_HJURE146F10 NA14] S/N [404W6272] to aggregate aggr3 has completed successfully
...

4. Now fight with `vol size` to resize the volume:

psilodump> df -A aggr3
Aggregate               kbytes       used      avail capacity  
aggr3                833369408  357122492  476246916      43%  
psilodump> vol size vol3backup +476246000k
vol size: Insufficient space to grow this volume with its guarantee enabled; maximum growth is +473602692k.
psilodump> vol size vol3backup +473602692k
vol size: Flexible volume 'vol3backup' size set to 828725892k.

5. Last, fight with `lun resize` to increase the lun size:

psilodump> lun resize /vol/vol3backup/lun0 +473602692k
lun resize: No space left on device
lun resize: max size: 788g (846844657664)
psilodump> lun resize /vol/vol3backup/lun0 846844657664

Host Configuration

aspartame Configuration

Install open-iscsi:

apt-get install open-scsi

Edit /etc/iscsi/iscsid.conf:

node.startup = manual
discovery.sendtargets.auth.authmethod=CHAP
discovery.sendtargets.auth.username=username
discovery.sendtargets.auth.password=password
node.session.auth.authmethod=CHAP
node.session.auth.username=username
node.session.auth.password=password

Start open-iscsi service:

service open-iscsi start

Scan for iSCSI devices from the NetApp:

iscsiadm --mode discovery --type st --portal psilodump

This should dump out a ton of information, for example:

[fe80::XXXX:XXXX:XXXX:XXXX]:3260,2001 iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca
[fe80::XXXX:XXXX:XXXX:XXXX]:3260,2000 iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca
[fe80::XXXX:XXXX:XXXX:XXXX]:3260,2002 iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca
[fe80::XXXX:XXXX:XXXX:XXXX]:3260,1000 iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca
10.15.134.131:3260,2002 iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca
129.97.134.131:3260,2001 iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca
10.15.134.130:3260,2000 iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca
129.97.134.130:3260,1000 iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca

The .130 IPs correspond to one filer, and the .131 IPs correspond to the other filer. Currently we are only using one of the filers (psilodump).

This also populates the /etc/iscsi/nodes/iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca directory with all possible ways to access the NetApp. For testing purposes (i.e. node.startup = manual), this is okay.

Test to see if you can get the iSCSI device to show up correctly:

iscsiadm --mode node --targetname "iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca"  --portal 10.15.134.130:3260 --login

This should produce output similar to:

Logging in to [iface: default, target: iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca, portal: 10.15.134.130,3260]
Login to [iface: default, target: iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca, portal: 10.15.134.130,3260]: successful

Check /dev/disk/by-path/ip* to ensure new disks show up:

# ls -l /dev/disk/by-path/ip*
   /dev/disk/by-path/ip-10.15.134.130:3260-iscsi-iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca-lun-0 -> ../../sda
   /dev/disk/by-path/ip-10.15.134.130:3260-iscsi-iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca-lun-0-part1 -> ../../sda1
   /dev/disk/by-path/ip-10.15.134.130:3260-iscsi-iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca-lun-1 -> ../../sdb
   /dev/disk/by-path/ip-10.15.134.130:3260-iscsi-iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca-lun-1-part1 -> ../../sdb1

If this fails, check all your configuration again.

If this succeeds, you are now ready to try autoconnecting the iSCSI device.

Delete all extraneous entries from /etc/iscsi/nodes/iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca . This prevents the startup script from (a) hanging, and (b) being very upset. All that is left should be the interface you intend to connect through:

# ls -l /etc/iscsi/nodes/iqn.1992-08.com.netapp:psilodump.csclub.uwaterloo.ca/
    10.15.134.130,3260,2000

Edit /etc/iscsi/iscsid.conf:

node.startup = automatic

For the init.d script to work correctly (i.e. properly mount things) we need to add a sleep to allow the device to settle: Edit /etc/init.d/open-iscsi roughly around line 127 to add a "sleep 1":

 ...
       # Now let's mount
       sleep 1
       log_daemon_msg "Mounting network filesystems"
       MOUNT_RESULT=1
       if mount -a -O _netdev >/dev/null 2>&1; then
               MOUNT_RESULT=0
               break
       fi
       log_end_msg $MOUNT_RESULT
 ...

Now we can restart the service:

service open-iscsi restart

Now you can configure partitions and mountpoints.

Exporting Kerberized NFS from Debian Sid

The default kernel in Debian sid (stable, 2.6.32) does not support the necessary crypto suites to export kerberized NFS to newer kernels. You MUST upgrade the kernel, nfs-common, and nfs-kernel-server packages to AT LEAST squeeze-backports.

iSCSI block device mount optimizations

tmyklebu made some changes to /sys/block/sda/queue. The following is now in /etc/rc.local on aspartame:

echo 2048 > /sys/block/sda/queue/read_ahead_kb
echo 32768 > /sys/block/sda/queue/max_sectors_kb
echo 4096 > /sys/block/sda/queue/nr_requests
echo noop > /sys/block/sda/queue/scheduler

We should increase the iSCSI configs node.session.queue_depth and node.session.cmds_max during next maintenance window.

Transferring old files from ginseng

Method A

  • On ginseng, use parted to set up the mounted iscsi drive as an ext4 primary partition (setting up a partition of size >2TB requires care and a GPT)
  • Compiled star in /root on ginseng
  • Transferred files with the following Makefile (assuming original user directories in /export/users, destination volume in /mnt/iscsi, make -j8):
foo := $(wildcard /export/users/*)
bar := $(patsubst /export/users/%,/mnt/iscsi/%,$(foo))
all: $(bar)
/mnt/iscsi/%: /export/users/%
	# echo $@ $<
	~/star-1.5.2/star/OBJ/x86_64-linux-cc/star \
	    -copy -p -acl artype=exustar \
	    -C /export/users $(notdir $<) /mnt/iscsi

Method B

  • On ginseng, authenticate with iSCSI target (psilodump.csclub.uwaterloo.ca lun0).
  • Umount /dev/mapper/vg0-users
  • Copy users filesystem directly to iSCSI target:
dd if=/dev/mapper/vg0-users of=/path/to/psilodump:lun0 bs=8M
  • Resize users filesystem on destination partition to fit:
resize2fs /path/to/psilodump:lun0