Setting up a ZFS pool on Ubuntu 16.04

Posted on Thu 15 February 2018 in Technical Solutions

Introduction

Previously in this series, the new NAS was assembled. Ubuntu 16.04 has been installed and updated. It's time to do something with all those hard drives!

I'll be setting the seven 4TB drives in a single ZFS pool. I'm using ZFS for protection against data corruption. It offers several other features too. I'll be using dual parity, which means I could lose two drives and be able to recover. The goal is never to test this, but I'd rather not go through a data loss scare again.

Before we begin, it's a good idea to ensure Ubuntu has been updated.

sudo apt-get update && sudo apt-get upgrade

With the update complete, let's get started.

Installing ZFS

Installing the ZFS file system is simple on Ubuntu.

sudo apt-get install zfs parted

Ta-da! Your system is now capable of setting up ZFS pools. The parted package will be used to set up a ZFS pool shortly.

Setting up our pool

Pools are the basic building block of ZFS. A pool is made up of the underlying devices that will store the data. Setting up our ZFS pool requires a little bit of prep work for our new drives. First, ensure that the zfs package installed correctly by running:

sudo zpool status

At this point in the process, you should get the message no pools available.

Adding the GPT label

I'm setting up this pool with brand new drives. We need to add a GPT label to each disk so that ZFS doesn't complain about disks having an invalid vdev specification when we create the pool. To do this, we'll find the names of our drives first

ls -l /dev/sd*

On my system, I get a result similar to this:

brw-rw---- 1 root disk 8,   0 Feb 13 09:23 /dev/sda
brw-rw---- 1 root disk 8,   1 Feb 13 09:23 /dev/sda1
brw-rw---- 1 root disk 8,   2 Feb 13 09:23 /dev/sda2
brw-rw---- 1 root disk 8,   5 Feb 13 09:23 /dev/sda5
brw-rw---- 1 root disk 8,  16 Feb 13 09:23 /dev/sdb
brw-rw---- 1 root disk 8,  32 Feb 13 09:23 /dev/sdc
brw-rw---- 1 root disk 8,  48 Feb 13 09:23 /dev/sdd
brw-rw---- 1 root disk 8,  64 Feb 13 09:23 /dev/sde
brw-rw---- 1 root disk 8,  80 Feb 13 09:23 /dev/sdf
brw-rw---- 1 root disk 8,  96 Feb 13 09:23 /dev/sdg
brw-rw---- 1 root disk 8, 112 Feb 13 09:23 /dev/sdh

We'll be adding the GPT labels to each of the unformatted drives. The unformatted ones are the listed drives that don't have a numeral as well. For me, that means we'll be working with sdb, sdc, sdd, sde, sdf, sdg and sdh. The sda drive has been formatted and contains partitions already. Those are sda1, sda2 and sda5.

For each drive, except sda in my case, we need to run the parted command.

sudo parted /dev/sdb

This will give you a short dialog. All you will need to do is issue the mklabel GPT command and then quit (using q)

GNU Parted 3.2
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel GPT
(parted) q
Information: You may need to update /etc/fstab.

Getting device IDs

Once the GPT labels are added, we can create our pool. However, we're not going to use the device paths returned above. Theoretically, those can change (especially if you replace a drive). That would be bad and mess with the entire ZFS pool. Instead we're going to create the pool by using the device id.

ls -l /dev/disk/by-id/*

This returns output similar to this:

...
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/wwn-0x50014ee20f1d3114 -> ../../sdc
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/wwn-0x50014ee20f3ba2b9 -> ../../sdg
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/wwn-0x50014ee2647227b7 -> ../../sdb
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/wwn-0x50014ee26490a21e -> ../../sdd
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/wwn-0x50014ee2b9c81501 -> ../../sdh
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/wwn-0x50014ee2b9e6ab61 -> ../../sdf
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/wwn-0x50014ee2b9e6b857 -> ../../sde
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/wwn-0x5001b444a9525c87 -> ../../sda
lrwxrwxrwx 1 root root 10 Feb 13 09:23 /dev/disk/by-id/wwn-0x5001b444a9525c87-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Feb 13 09:23 /dev/disk/by-id/wwn-0x5001b444a9525c87-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Feb 13 09:23 /dev/disk/by-id/wwn-0x5001b444a9525c87-part5 -> ../../sda5

...
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K0HLKUXT -> ../../sdb
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K0HLKXS0 -> ../../sdc
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4YJ6T0U -> ../../sdg
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5LCEYN4 -> ../../sdf
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K6VNN6TA -> ../../sde
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K6VNNTXY -> ../../sdd
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K7TSA4VS -> ../../sdh
lrwxrwxrwx 1 root root  9 Feb 13 09:23 /dev/disk/by-id/ata-WDC_WDS100T1B0A-00H9H0_174256421671 -> ../../sda
lrwxrwxrwx 1 root root 10 Feb 13 09:23 /dev/disk/by-id/ata-WDC_WDS100T1B0A-00H9H0_174256421671-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Feb 13 09:23 /dev/disk/by-id/ata-WDC_WDS100T1B0A-00H9H0_174256421671-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Feb 13 09:23 /dev/disk/by-id/ata-WDC_WDS100T1B0A-00H9H0_174256421671-part5 -> ../../sda5

Notice that both formats symlink to the same location. This means you can pick which ever format you like better. However, I recommend the second one that contains the device serial number. It'll make it easier to determine problem disks in the future.

Create the pool

Now that we've determined the device ides for each of our hard drives, it's time to actually create the pool. As I mentioned above, we'll be creating using dual parity (raidz2). We'll be naming our pool data. Once this command is complete, /data will be where this pool resides.

sudo zpool create data raidz2 /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K0HLKUXT /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K0HLKXS0 /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K6VNNTXY /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K6VNN6TA /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5LCEYN4 /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4YJ6T0U /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K7TSA4VS

This will take a little while, but a surprisingly smaller amount of time than I initially expected.

Once the creation is complete, take a look at the status of your new pool:

sudo zpool status

  pool: data
 state: ONLINE
  scan: none requested
config:

        NAME                                          STATE     READ WRITE CKSUM
        data                                          ONLINE       0     0     0
          raidz2-0                                    ONLINE       0     0     0
            ata-WDC_WD40EFRX-68N32N0_WD-WCC7K0HLKUXT  ONLINE       0     0     0
            ata-WDC_WD40EFRX-68N32N0_WD-WCC7K0HLKXS0  ONLINE       0     0     0
            ata-WDC_WD40EFRX-68N32N0_WD-WCC7K6VNNTXY  ONLINE       0     0     0
            ata-WDC_WD40EFRX-68N32N0_WD-WCC7K6VNN6TA  ONLINE       0     0     0
            ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5LCEYN4  ONLINE       0     0     0
            ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4YJ6T0U  ONLINE       0     0     0
            ata-WDC_WD40EFRX-68N32N0_WD-WCC7K7TSA4VS  ONLINE       0     0     0

errors: No known data errors

That last line is important. No known data errors is good.

Create datasets

Where pools are the basic building blocks of ZFS, datasets is a term for a ZFS file system, volume, snapshot or clone. Each dataset can be managed and configured differently. This means that you can compress one dataset, but leave the others alone. You can put a quota on one, but leave the others without a quota. Creating a dataset is pretty easy:

cd /
sudo zfs create data/storage

This will create a dataset that exists at /data named storage. You can have child datasets that inherit attributes from parents (or even grandparents) by doing something like:

sudo zfs create data/storage/music

This will create the new dataset at /data/storage named music.

Once you've set up your datasets, you can see they were all created and how much space they have available by issuing sudo zfs list.

Set up complete

With that, we've finished setting up ZFS on Ubuntu 16.04. I set up a few datasets for my purposes. I'm one step closer to getting this running and handling all of the digital data in the house.


- is a father, an engineer and a computer scientist. He is interested in online community building, tinkering with new code and building new applications. He writes about his experiences with each of these.