Tuesday, 22 December 2015

ZFS, like a work of art

A subtle thing with ZFS is you'll notice how the drive L.E.D.s flash quite differently to typical storage arrays, when you understand more under the hood you'll know why that is. So just looking in a DC you'd be able to observe this across which servers for example. You can see this type of effect here to illustrate - https://www.youtube.com/watch?v=LS3cfl-7n-4

ofc thats ZFS on linux.. which is implemented as a FUSE so less efficent than that of a FS in kernel space as elaborted across various posts, some examples: https://lkml.org/lkml/2007/4/16/133 , https://lkml.org/lkml/2007/4/16/83

example pool using raidz2 with hot spares, which will autoreplace in the event a drive or 2 fail. Creating with brackets like this is always easier - c4t{0..1}d0. Also have to get the order of commands to be correct or you may be second guessing...

# zpool create data c0t50004CF210AD1C22d0 c0t50004CF210BE51F1d0 c0t50004CF210BE51F3d0 c0t50004CF210BE5214d0 c4t{0..1}d0 raidz2
Unable to build pool from specified devices: invalid vdev specification: raidz2 requires at least 3 devices

# zpool create -o atime=off -o compress=lz4 data raidz2 c0t50004CF210AD1C22d0 c0t50004CF210BE51F1d0 c0t50004CF210BE51F3d0 c0t50004CF210BE5214d0 c4t{0..1}d0
# zpool add data spare c4t3d0 c5t3d0
# zpool status
  pool: data
 state: ONLINE
  scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        data                       ONLINE       0     0     0
          raidz2-0                 ONLINE       0     0     0
            c0t50004CF210AD1C22d0  ONLINE       0     0     0
            c0t50004CF210BE51F1d0  ONLINE       0     0     0
            c0t50004CF210BE51F3d0  ONLINE       0     0     0
            c0t50004CF210BE5214d0  ONLINE       0     0     0
            c4t0d0                 ONLINE       0     0     0
            c4t1d0                 ONLINE       0     0     0
        spares
          c4t3d0                   AVAIL  
          c5t3d0                   AVAIL

Then as always test the assumption and it works as expected. I've got hot swap capabilities so pulled a drive out to simulate then try write some data and looks to have worked.

# zpool status -xv
  pool: data
 state: DEGRADED
status: One or more devices are unavailable in response to persistent errors.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or 'fmadm repaired', or replace the device
    with 'zpool replace'.
  scan: resilvered 136K in 1s with 0 errors on Wed Dec 23 05:53:44 2015

config:

    NAME                         STATE     READ WRITE CKSUM
    data                         DEGRADED     0     0     0
      raidz2-0                   DEGRADED     0     0     0
        c0t50004CF210AD1C22d0    ONLINE       0     0     0
        c0t50004CF210BE51F1d0    ONLINE       0     0     0
        spare-2                  DEGRADED     0     0     0
          c0t50004CF210BE51F3d0  UNAVAIL      0    24     0
          c4t3d0                 ONLINE       0     0     0
        c0t50004CF210BE5214d0    ONLINE       0     0     0
        c4t0d0                   ONLINE       0     0     0
        c4t1d0                   ONLINE       0     0     0
    spares
      c4t3d0                     INUSE  
      c5t3d0                     AVAIL  

device details:

    c0t50004CF210BE51F3d0      UNAVAIL       too many errors
    status: FMA has faulted this device.
    action: Run 'fmadm faulty' for more information. Clear the errors
        using 'fmadm repaired'.
       see: http://support.oracle.com/msg/ZFS-8000-FD for recovery