Does ZFS overwrite folders if you create a dataset with the same name?

Babbling0305 · June 21, 2024, 2:02pm

Alright, guys, I’ve finally come to the conclusion that I am a complete idiot: I am in the process of trying to clean up some of the data I’ve amassed over the course of the years so, to speed things up, I’ve temporarily copied it over to an SSD pool. Of course, this pool does not have automatic snapshots enabled (since it’s just a temporary place) and this is the scenario:

The SSD pool is called flash_temp and it does not contain any datasets; just loose folders.
I had a folder there called backupDisk which may, or may not, have contained some important files - nothing too important (hopefully) - but not family photos or anything (thank God!)
I thought backupDisk was a dataset and I wanted to share it with the NFS option but, since it wasn’t, I couldn’t do it.
Now, since I don’t use my brain to think, but merely breathe, I ran zfs create flash_temp/backupDisk in order to be able to share it with NFS… so now, of course, that folder is empty. “Yay”.
The pool’s space doesn’t seem to have changed, but this could also well mean that the folder was small in size…

So, my question is: is there any way to find out if I am f****d or if it’s possible to recover the data? I haven’t dared doing a zfs destroy flash_temp/backupDisk.

Thank you in advance and, please, don’t pick on me.

quartsize · June 21, 2024, 2:50pm

I believe it’s just as if you mounted a filesystem over top, so you should be able to just umount it, or like, zfs set mounted=no flash_temp/backupDisk.

mercenary_sysadmin · June 21, 2024, 4:22pm

First up: you’re fine; all you did was mount a filesystem onto a non-empty folder. This does not destroy the contents of the non-empty folder on the root filesystem, regardless of the filesystem type you mount.

Second up: you find out by experimenting! So let’s experiment. First, we’ll set up a test pool using a sparse file, and make a test folder and file on it:

root@elden:/tmp# truncate -s 1G temppool.raw
root@elden:/tmp# zpool create temppool /tmp/temppool.raw
root@elden:/tmp# cd /temppool
root@elden:/temppool# mkdir testfolder
root@elden:/temppool# touch testfolder/veryimportantfile

Now that we’ve got our test environment, let’s create a dataset in such a way that it mounts on top of our testfolder:

root@elden:/temppool# zfs create temppool/testfolder
root@elden:/temppool# ls testfolder

As you can see, when we created temppool/testfolder as a dataset, it mounted on top of /temppool/testfolder on the actual filesystem, since our veryimportantfile isn’t showing up anymore. How do we make sure veryimportantfile still exists, and regain access to it?

root@elden:/temppool# zfs umount temppool/testfolder
root@elden:/temppool# ls testfolder
veryimportantfile

There it is! veryimportantfile was there all along, we just had to unmount the filesystem that was hiding it. Now we can change the mountpoint of temppool/testfolder to somewhere less inconvenient, or even just destroy it entirely if we don’t need it.

Finally, always remember to clean up your toys when you’re done playing in test pools:

root@elden:/temppool# cd /tmp
root@elden:/tmp# zpool destroy temppool
root@elden:/tmp# rm temppool.raw

Babbling0305 · June 21, 2024, 5:47pm

@quartsize
You were absolutely right!

@mercenary_sysadmin
Appreciate the in-depth, Jim! It felt like a cold shower when I realised that I might have managed to lose those files… again, no biggie - at least I think so - since the most important stuff is backed up.

Also, if you don’t mind me asking: is this the correct way to test out pools?

I’ve never tried this tbh. Very interesting.

mercenary_sysadmin · June 21, 2024, 6:11pm

Yes, it’s a SUPER fast, easy, and stress-free way to try things out without risking your real infrastructure. The first command creates a 1GiB sparse file–which means that the file is permitted to grow to up to 1GiB, but starts out at zero bytes and only increases as you add data to it.

Once you’ve got the sparse file, you can then create a toy pool using that sparse file as its only “disk.” Again, the 1T we specified with the -s argument initially controls how large that file can become, but in the limited interactions we did, it never breaks 2MiB:

root@elden:/tmp# truncate -s 1T temppool.raw
root@elden:/tmp# zpool create temppool /tmp/temppool.raw
root@elden:/tmp# touch /temppool/veryimportantfile
root@elden:/tmp# ls -lah | grep raw
-rw-r--r--  1 root  root  1.0T Jun 21 13:49 temppool.raw
root@elden:/tmp# du -h temppool.raw
1.2M	temppool.raw

As you can see above, temppool.raw appears to be 1.0T in size when queried with ls–but du tells the real story; it only reached 1.2MiB in size so far. If we dump more data into our test pool, we’ll see that .raw file grow:

root@elden:/tmp# dd if=/dev/urandom bs=4M count=10 of=/temppool/random.bin
10+0 records in
10+0 records out
41943040 bytes (42 MB, 40 MiB) copied, 0.086724 s, 484 MB/s
root@elden:/tmp# du -h temppool.raw
42M	temppool.raw

This also means that when you’re trying to plan around final pool sizes before ordering new hardware, you don’t need some goofy webpage with a javascript calculator that may or may not be accurate: you can test it directly with ZFS and sparse files.

For example, what if I wanted to see how much space ZFS would report available on a pool composed of six 18T drives, even though the machine I’m in front of now only has 2T of storage?

First, we destroy our existing temppool and get rid of its sparse file. ALWAYS clean up your toys!

root@elden:/tmp# zpool destroy temppool
root@elden:/tmp# rm temppool.raw
root@elden:/tmp# zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
rpool  1.81T   957G   899G        -         -    18%    51%  1.00x    ONLINE  -

Like I said, I’ve only got a bit under 2TiB total to work with on this machine. But thanks to the magic of sparse files, I can create–although not actually fill–a pool of essentially any size. I believe we were speculating about 18T drives in a six-wide Z2.

First, remember that an “18T” drive isn’t 18TiB, it’s 18TB. So convert from 18TB to TiB, and you get 16.4TiB per “18T” drive. Now, create six of them (this is possible direcly on a ZFS dataset like I’m using, but in an ext4 /tmp you may be limited to a smaller individual filesize):

root@elden:/tmp# for drive in {0..5}; do truncate -s 16400G drive$drive.raw ; done
root@elden:/tmp# ls -lh drive*raw
-rw-r--r-- 1 root root 17T Jun 21 13:59 drive0.raw
-rw-r--r-- 1 root root 17T Jun 21 13:59 drive1.raw
-rw-r--r-- 1 root root 17T Jun 21 13:59 drive2.raw
-rw-r--r-- 1 root root 17T Jun 21 13:59 drive3.raw
-rw-r--r-- 1 root root 17T Jun 21 13:59 drive4.raw
-rw-r--r-- 1 root root 17T Jun 21 13:59 drive5.raw
root@elden:/tmp# zpool create testpool -oashift=12 raidz2 /tmp/drive0.raw /tmp/drive1.raw /tmp/drive2.raw /tmp/drive3.raw /tmp/drive4.raw /tmp/drive5.raw

Despite my machine only having 1.8TiB of storage (even if you ignore how much of it I’m already using), I could easily create a set of six 18TB “drives” and use them to create a new pool.

Let’s take a look at that pool, and what its reported size is:

root@elden:/tmp# zpool status testpool
  pool: testpool
 state: ONLINE
config:

	NAME                 STATE     READ WRITE CKSUM
	testpool             ONLINE       0     0     0
	  raidz2-0           ONLINE       0     0     0
	    /tmp/drive0.raw  ONLINE       0     0     0
	    /tmp/drive1.raw  ONLINE       0     0     0
	    /tmp/drive2.raw  ONLINE       0     0     0
	    /tmp/drive3.raw  ONLINE       0     0     0
	    /tmp/drive4.raw  ONLINE       0     0     0
	    /tmp/drive5.raw  ONLINE       0     0     0

errors: No known data errors

root@elden:/tmp# zpool list testpool
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
testpool  96.1T  1.12M  96.1T        -         -     0%     0%  1.00x    ONLINE  -

And there you have it! According to ZFS itself, if you create a pool based on a single six-wide Z2 vdev of 18T drives, each of which can be expected to have 16.4TiB of storage, you wind up with 96.1TiB total space… except that’s a bit misleading, because it refers to total space before parity, not after. We’ll get the estimate for usable space if we do zfs list instead of zpool list:

root@elden:/tmp# zfs list testpool
NAME       USED  AVAIL     REFER  MOUNTPOINT
testpool   767K  63.9T      192K  /testpool

There we go: we should be able to fit roughly 64TiB on our vdev, which aligns well with the (16.4TiB * (6-2) == 65.6TiB) that we’d naively expect from back-of-the-napkin math.

Would that change if we instead put those six drives in two three-wide Z1 vdevs? Let’s find out!

root@elden:/tmp# zpool destroy testpool
root@elden:/tmp# zpool create testpool -oashift=12 raidz1 /tmp/drive0.raw /tmp/drive1.raw /tmp/drive2.raw raidz1 /tmp/drive3.raw /tmp/drive4.raw /tmp/drive5.raw
root@elden:/tmp# zpool status testpool
  pool: testpool
 state: ONLINE
config:

	NAME                 STATE     READ WRITE CKSUM
	testpool             ONLINE       0     0     0
	  raidz1-0           ONLINE       0     0     0
	    /tmp/drive0.raw  ONLINE       0     0     0
	    /tmp/drive1.raw  ONLINE       0     0     0
	    /tmp/drive2.raw  ONLINE       0     0     0
	  raidz1-1           ONLINE       0     0     0
	    /tmp/drive3.raw  ONLINE       0     0     0
	    /tmp/drive4.raw  ONLINE       0     0     0
	    /tmp/drive5.raw  ONLINE       0     0     0

errors: No known data errors
root@elden:/tmp# zfs list testpool
NAME       USED  AVAIL     REFER  MOUNTPOINT
testpool   527K  63.9T      128K  /testpool

Nope; ZFS estimates (unsurprisingly) that the raw storage available to a pool of six drives is the same whether you configure them as a single Z2 (offering greater protection from failure) or dual Z1 (offering significantly higher performance).

With that said, in some circumstances ZFS can’t and won’t get this kind of estimate entirely correct: it’s complicated under the hood, and wide vdevs in particular often won’t get the storage efficiency you might naively expect. If you write a 4KiB file to a six-wide Z2 array, it’s going to eat three blocks–one for data and two for parity–meaning you took 12KiB to store 4KiB, or 33% storage efficiency.

If you stored the same 4KiB file on the same drives, but in dual Z1, you’d still get lower storage efficiency than you’d expect–but it would be 50% instead of 33%, since the file would be stored in a single data and parity block on one of your two vdevs.

The moral here is, ZFS’ own estimate is the best available, but it’s a long way from infallible. The type of data you store and how you store it will have a big impact on performance and storage efficiency alike!

But enough about RAIDz. What if we used those six drives as mirrors, instead?

root@elden:/tmp# zpool destroy testpool
root@elden:/tmp# zpool create testpool -oashift=12 mirror /tmp/drive0.raw /tmp/drive1.raw mirror /tmp/drive2.raw /tmp/drive3.raw mirror /tmp/drive4.raw /tmp/drive5.raw
root@elden:/tmp# zpool status testpool
  pool: testpool
 state: ONLINE
config:

	NAME                 STATE     READ WRITE CKSUM
	testpool             ONLINE       0     0     0
	  mirror-0           ONLINE       0     0     0
	    /tmp/drive0.raw  ONLINE       0     0     0
	    /tmp/drive1.raw  ONLINE       0     0     0
	  mirror-1           ONLINE       0     0     0
	    /tmp/drive2.raw  ONLINE       0     0     0
	    /tmp/drive3.raw  ONLINE       0     0     0
	  mirror-2           ONLINE       0     0     0
	    /tmp/drive4.raw  ONLINE       0     0     0
	    /tmp/drive5.raw  ONLINE       0     0     0

errors: No known data errors
root@elden:/tmp# zfs list testpool
NAME       USED  AVAIL     REFER  MOUNTPOINT
testpool   408K  47.9T       96K  /testpool

And there we have it: if we wanted to get the massively higher performance and faster resilvers of a mirror topology instead of RAIDz, we’d drop from 63.9TiB to 47.9TiB estimated usable capacity.

You can see how easy it is to play around here and plan your next moves. This is a fantastic tool that I highly recommend everyone interested in storage, even at a hobbyist level, learn how to do for themselves.

bladewdr · June 22, 2024, 12:57am

Yeah when I heard you and Allan discussing sparse files on 2.5 admins it blew my mind. I had no idea and now I use it all the time.

Babbling0305 · June 22, 2024, 7:42am

Guilty as charged

A lot of the stuff you’re talking about has made me curious, so I’m sitting here reading on workload tuning. This means that I most likely will open a new thread to ask a couple of more questions.