Swapping ZFS drive in Proxmox

I struggled muddling through all the forum posts for the best way to replace a ZFS drive in Proxmox, so here are my documented notes for how to do so - if you ever need to unfortunately do so.

Proxmox email alert

So you’ve got an email like the one above, and it’s time to replace the drive in your ZFS pool in Proxmox. Let’s go through how to find the drive that’s dying, replacing it, matching the parititons of the other drives, and getting it to resilver.

Finding the bad drive

Let’s start by running the following command on the machine with the dying drive:

zpool status -P

You should see output similar to this:

root@novigrad:~# zpool status -P
  pool: rpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Sep  8 00:38:52 2024
        1.12T / 1.12T scanned, 810G / 927G issued at 18.7M/s
        181G resilvered, 87.39% done, 01:46:40 to go
config:

        NAME                                                STATE     READ WRITE CKSUM
        rpool                                               DEGRADED     0     0     0
          raidz1-0                                          DEGRADED     0     0     0
            /dev/disk/by-id/scsi-350000c0f0101bc24-part3    ONLINE       0     0     0
            /dev/disk/by-id/scsi-350014ee7aab023c4-part3    ONLINE       0     0     0
            /dev/disk/by-id/scsi-350014ee7aaaff38c-part3    FAULTED    220     0     0  too many errors

But what the heck is /dev/disk/by-id/scsi-350014ee7aaaff38c-part3? That doesn’t look like the normal /dev/sdX format. It turns out that this is best practice for referencing disks in a ZFS pool. This is because if you have a 3 disk pool consisting of /dev/sda, /dev/sdb, /dev/sdc, with a hot swap of /dev/sdd - if you take out the drives (or reboot), you don’t want just the first 3 drives that woke up (of your 4 installed drives) to be the main part of your pool with the “last” drive to wake up being your hotspare. That’s not going to make for a good time.

But how do we go about finding what disk this actually is? By running the following command and then following the symlinks:

ls -l /dev/disk/by-id/

You should see output like the following: Listing disks by ID

So now we know that the dying drive is /dev/sdd. But which bay is it in, or what’s the serial number of the drive so I know which one it is? You’ll want to run the following command, replacing /dev/sdd with the actual /dev name of the dying device (in my instance it’s sdd):

smartctl -a /dev/sdd

You should then see output like the following:

root@novigrad:~# smartctl -a /dev/sdd
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.12-1-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HP
Product:              EG0600FBDSR
Revision:             HPD2
Compliance:           SPC-3
User Capacity:        600,127,266,816 bytes [600 GB]
Logical block size:   512 bytes
Rotation Rate:        10025 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x500003931808c4c0
Serial number:        EA01PB20AM331105
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Mon Apr 14 11:07:44 2025 PDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

Cool, so we know that it’s a SAS drive with the serial number EA01PB20AM331105 and the product ID EG0600FBDSR. Let’s go ahead and offline the “bad” drive in the ZFS pool with the following commands (making sure to replace scsi-350014ee7aaaff38c-part3 with the actual name of the dying disk):

zpool offline rpool scsi-350014ee7aaaff38c-part3

Physically replace the drive

Now, if you have the extra bay/caddy/drive, you should go ahead and physically put it in the server (as long as it supports hot-adding drives, and if it doesn’t, you’ll need to shutdown, add the drive, then turn it back on). If you don’t have an extra bay/caddy/drive, you’ll need to shutdown your machine and physically remove the “bad” drive.

If you are running your Proxmox on an HP/Dell machine, this is where iLO and iDRAC really come in to to help. You’ll want to login and check which bay this drive is in. If you are running this on another machine, you now have the fun job of finding out where that drive is, in your machine.

Find the new drive and format it

With your new/additional drive now installed on your Proxmox, and Proxmox is now running - let’s see where that new drive is. My favorite way of doing so, is logging into the Proxmox GUI, clicking on the server (“node”), and then checking which disk (or disks in the below image) aren’t partitioned:

Newly added disks

So in the above image, the newly added disks are /dev/sdi, and /dev/sdj since they’re marked with the column Usage of No. Obviously you will have just replaced (or added) a single drive, so you’ll only see one.

Now that we know the /dev device name of our new drive (let’s just pretend it’s the drive /dev/sdi), let’s create the partitions we need. Thankfully there’s a pretty nifty utility for us to use that will just go ahead and do the hard work for us in the form of sgdisk. In this example, I take the known “good” drive of /dev/sda, and copy all of the partition table information to the new/replaced drive that is at /dev/sdi:

sgdisk /dev/sda -R /dev/sdi

At least in my instance, there were 3 newly created partitions, with the 2nd partition being the EFI partition that the system uses to boot:

Disk partitions

So since we know that the 2nd partition is being used as the EFI partition, and the 3rd partition is used for ZFS data (since it’s the largest and has “ZFS” as the marked usage), let’s go ahead and format/initialize it for our system:

proxmox-boot-tool format /dev/sdi2
proxmox-boot-tool init /dev/sdi2

proxmox-boot-tool clean

Getting the new drive in the ZFS pool

Sweet, the drives should now be ready to boot from, if they’re ever called upon to do so. Now let’s go ahead and add the drive back to the pool, to begin resilvering. We’ll need to run the following command again, to see what /dev/disk/by-id it recieved:

ls -l /dev/disk/by-id/

Which should output something like this: Listing disks by ID

So you’d want to get the ID of the “scsi” (or the wwn), which should be something like:

scsi-350014ee7aab023c4-part3

So now let’s replace the dying drive in our pool with the good one (yes, you should still use replace even if the drive isn’t connected anymore):

zpool replace rpool /dev/disk/by-id/scsi-350000c0f0101bc24-part3 /dev/disk/by-id/scsi-350014ee7aab023c4-part3

Then, running the following command:

zpool status -P

You should see output like this:

root@novigrad:~# zpool status -P
  pool: rpool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Sep 13 13:40:34 2024
        949G / 949G scanned, 247G / 787G issued at 29.6M/s
        67.9G resilvered, 31.34% done, 05:11:07 to go
config:

        NAME                                              STATE     READ WRITE CKSUM
        rpool                                             ONLINE       0     0     0
          raidz1-0                                        ONLINE       0     0     0
            /dev/disk/by-id/scsi-350000c0f0101bc24-part3  ONLINE       0     0     0
            /dev/disk/by-id/scsi-350014ee7aab023c4-part3  ONLINE       0     0     0
            /dev/disk/by-id/scsi-350014ee700054e1c-part3  ONLINE       0     0     0
            /dev/disk/by-id/scsi-3500003931808c4c0-part3  ONLINE       0     0     0  (resilvering)
        spares
          /dev/disk/by-id/scsi-350000393e811e644-part3    AVAIL
          /dev/disk/by-id/scsi-350000394882a0510-part3    AVAIL
          /dev/disk/by-id/scsi-350000393080846bc-part3    AVAIL

errors: No known data errors

Resilvering will definitely take some time, so let it run. It took mine about 24 hours in total.

Then, once the pool has resilvered the data, if the “bad” disk still shows in the pool, we can detach it:

zpool detach rpool /dev/disk/by-id/scsi-350000c0f0101bc24-part3

And you’re done!

Finding the bad drive#

Physically replace the drive#

Find the new drive and format it#

Getting the new drive in the ZFS pool#

Finding the bad drive

Physically replace the drive

Find the new drive and format it

Getting the new drive in the ZFS pool