Large Expansion of ZFS dataset

I have a TrueNas server where the main data pool is reaching capacity. It currently uses a single VDEV with 5x8TB in RAIDZ1 configuration. RAIDZ2/Mirroring is not a requirement.

Over time, I have added drives and expanded the array (including full content rewrite to ensure most efficient rebalancing). These required downtime of minutes to install the new drive each time (which was awesome), and I’d like to minimise downtime this time too.

I have 1 slot left in the server to add a drive, but I know that adding that drive is only a temporary measure. I want to make a more long lasting change, perhaps with much larger drives (e.g. 3x20TB in RAIDZ1 with 3 more available slots, ultimately giving 6x20).

There are so many options for doing this… for example:

  • Fail a drive. Replace with a larger drive. Rinse and repeat 5 times. Then set auto-expand.
  • Replace drive then fail (I don’t understand this one, but it apparently keeps redundancy).
  • External e-Sata enclosure with a new array with new larger drives (q1: can I then just swap the server/enclosure drives?)(q2: Can I just set up an external vdev/mirror with large drives, then fail the internal mirror?)
  • Just keep adding drives in an e-sata enclosure. (seems risky to have parts of a vdev internal and external).

So what is the best strategy when one needs to expand, but is running out of slots internally?

Many thanks for your advice.

p.s. Yes I know that a backup, build new pool, restore would also work - but where is the ZFS magic in that :-). Plus I would still need to use an enclosure to keep uptime high.

There was an episode of 2.5 admins that mentioned a problem with expanding a pool by more than 10x capacity, from initial pool creation. I don’t remember the term off the top of my head… might have been metaslab? … (I’m going to use this term, someone correct me if it is the wrong one)… but essentially, at pool creation, your total capacity is divided up into some integer, and the size of this metaslab is immutable upon expansion, so after expanding the pool, you are increasing the number of metaslabs by the same multiplier.

The discussion on the podcast episode explained this is bad for {performance | something else}.

If you get an external e-sata enclosure, one option is to create a new pool on it, so the metaslab size is “optimal” (loosely used) for the new pool capacity. Then send | receive into the new pool.

No need to fail a drive, or risk your current pool. The new pool is now (hopefully, should be!) a 2nd backup.

There is also no need for a “restore” step using this method. (Technically, physically moving drives around in the next paragraph is your “restore”. But in this case, the physical-version of “restore” should take a minute of physical connections rather than hours of writing all the data in a software-version of “restore”.)

When send | receive is complete, turn off machine, move the drives from the external enclosure into the main box, removing the old drives. Power on. That 2nd backup is now your primary, production pool. You may have to alter the root mountpoint, but if you set it up correctly and used the correct parameters in send | receive, that mountpoint would then inherit (cascade) into the received datasets, and the mount paths should all be the same as before, and all your programs shouldn’t know anything is different.

Down time is just a couple of minutes.

This method only requires write-time of all of your data 1 time, and all the while, your current production is still capable of active service.

Not sure there is anything more magic about it. After all, "it’s just ZFS"™ haha

1 Like

Thanks for the hints around this. It clarified some of the operations for me:

a) get an ESATA enclosure that supports Port Multiplication, fill with drives and create a VDEV on it and a new pool
b) send all the data across to it
c) fix the mount point that currently points to the main pool to point to the new pool in the enclosure
d) move the enclosure drives inside the server. Safely store the old drives

After d and restart - the new pool and vdev should be recognised in its new location, and just work.

As to your comment about expanding a pool too much - you usually need to rebalance the files in your array to free up more space. There is a script to do this which I’ve used in the past: https://github.com/markusressel/zfs-inplace-rebalancing, but it appears that zfs now has a command do to it: zfs rewrite

Thanks for your advice.

I am pretty sure the metaslab issue is separate from the rebalancing issue. You may want to search into that, for your own future reference and knowledge.

Yes, your summary looks about right. Good luck!

zfs rewrite probably behaves a bit different from what you expect, especially around snapshots. recommend watching this video before going ahead with it

Thanks - I’ll review the metaslab docs. There’s a video provided in this thread that will help.

My one worry is moving the disks from the external eSATA enclosure to the internal storage. In theory, using eSATA with port multiplication should ensure all disks are referred to by the id (GUID), making the move of the disks to internal very simple and id’s consistent.

Thanks for your help.

Not the same issue. You’re referring to simple fragmentation with metaslabs of an appropriate size. The issue we’re talking about here is having too many metaslabs that are too small, making fragmentation effectively inevitable. And feinedsquirrel is correct; metaslab size is immutable after pool creation. The pool creates more metaslabs when you expand it, but the size of each one doesn’t get any larger.