3x SSDs as a stripe and 1x large HDD as a mirror

Minime · August 15, 2024, 8:15pm

Hi all,

I am trying to figure out whether it is possible to have 3x SSDs as a stripe and then mirror that to 1x large HDD.

Like this, maybe?

sudo zpool create -f -o ashift=13 -O normalization=formD -O atime=off -m /mnt/pool -O compression=lz4 -O recordsize=1M nas-pool /dev/mapper/mirror_hdd mirror /dev/mapper/NAS_1 /dev/mapper/NAS_2 /dev/mapper/NAS_3

But I am not sure this is correct…I know the other way around doesn’t work as ZFS is then complaining that you need a min of 2 disks to mirror…hmmm…

Would appreciate any pointers on this, thanks!

mercenary_sysadmin · August 15, 2024, 9:10pm

That’s not how ZFS works.

A pool is a collection of vdevs.

A vdev is a collection of disks.

Vdevs can be mirror vdevs (any number of disks, all of which contain the exact same blocks) or they can be RAIDz vdevs (disks arranged in a striped parity array with 1, 2, or 3 blocks of parity per stripe).

You cannot mirror one vdev to a different vdev. The pool makes its own decisions about where to put incoming writes, you don’t get to decide for yourself.

Minime · August 15, 2024, 9:27pm

Thank you for the information.

A bit odd that ZFS doesn’t provide this feature, no? I mean, is my idea that wrong?

mercenary_sysadmin · August 15, 2024, 9:51pm

it’s “wrong” in the sense of “that’s not how ZFS is designed,” not necessarily “universally wrong and you should feel bad for thinking it,” if you catch my drift.

Changing the write distribution method for the entire pool arbitrarily at individual machines would have some pretty serious wide-ranging repercussions in terms of design and maintainability. OpenZFS is designed to provide fault recovery at the vdev level, not the pool level.

You build your vdevs redundant, because your pool is not.

Minime · August 15, 2024, 10:01pm

Understood, thank you. In a nutshell, what I want to achieve (with my parameters) is not possible with ZFS?

Joghurt · August 15, 2024, 10:13pm

If you want a RAID1 consisting of a RAID0 of 3 SSDs, and another HDD, you can’t do that with ZFS alone (technically it might support it, but the command line tools don’t). You could technically achieve it by (ab)using device-mapper or LVM to create a big device from the 3 SSD, and use that and the HDD to create a mirror.

But I don’t think this is a good idea: first, ZFS really expects to write to disks directly, and any abstraction inbetween might break the guarantees ZFS makes, because some assumptions are not valid anymore (data that is written might not get actually written, or it might get written to another place, etc).

Also, I think a write is only completed if all members of a mirror have completed them, so you’re basically reducing the write speed to HDD speeds.

It’s probably better to create a pool from the 3 SSD instead and regularly create snapshots and send them to a second pool consisting of the HDD.

mercenary_sysadmin · August 15, 2024, 10:15pm

In a nutshell, no.

If you wanted to do really crazy shit, you could for example use mdraid to create a stripe of your three SSDs, then use ZFS to mirror the stripe of SSDs to the large drive.

I would not advise doing that for any reason other than “the lolz” and “I don’t mind if this is weird and crappy and might make problems down the road,” mind you. But it could be done. =)

There may be better answers for what you really want, though, so let’s talk about that. Why do you want to mirror three small SSDs to one large HDD? You may not be aware of this, but typically speaking, a mirrored array performs as though it’s composed entirely of its slowest member–so that RAID0 of three SSDs will, for the most part, not speed up your mirror array any more than just a perfectly normal second HDD would.

Minime · August 15, 2024, 10:31pm

@Joghurt thank you!
@mercenary_sysadmin: The idea was/is to have 3x 8TB SSDs as the stripe, which will be fast in read/day-to-day use and a 24TB HDD as the mirror.

I don’t mind the slow write speed and was aware of that, but I was of the (wrong?) opinion that read would be super fast from the SSDs irrespective of the slow HDD? I have probably 90% read load. That’s why I came-up with my idea…

mercenary_sysadmin · August 15, 2024, 10:33pm

You DEFINITELY don’t want to try to mirror them in real time. You want two separate pools, with regular replication from the fast SSD to the slow rust using a tool like syncoid.

Most of my sites have hourly replication to a completely separate machine over the network using syncoid. In your case, I’d recommend doing the same, but if you don’t have a separate PC to put your rust drive in (which would be better), you can just make a second pool in the same PC, and replicate locally.

Minime · August 15, 2024, 10:35pm

Ah? You mean like with rsync? I was hoping ZFS can do that within their scope, I am not a fan of yet another tool.

mercenary_sysadmin · August 15, 2024, 10:37pm

ZFS replication is frequently as much as 1,000x faster than rsync for re-synchronizing a workload, with an even bigger difference in the storage impact on the system.

It’s absolutely practical to replicate a 50TiB volume over a 1Gbps LAN every hour. You couldn’t manage the same thing with rsync in a week, let alone an hour.

Minime · August 15, 2024, 10:38pm

Ok, I will read on this a bit more. Maybe I will come back here with questions

Thank you very much for taking the time to respond!

jay_tuckey · August 15, 2024, 10:46pm

I do something like this with ZFS+syncoid. My home server has a fast SSD and a slow rust disk, then replicates using syncoid to a second big rust disk every 30mins. This means I stand to lose at most 30 mins of data from the SSD, which is fine for my use case, but I still get full SSD speed off of it.

Minime · August 15, 2024, 10:51pm

Is there a good guide for syncoid (I can google, but if you know a trusted one that actually works…)?

mercenary_sysadmin · August 15, 2024, 11:26pm

the hardest part is sharing SSH keys between your systems.

If you already know how to do that, there’s nothing to syncoid; it works from a user’s perspective VERY much like rsync does. (edit: and since you’ve only GOT one system to begin with, you don’t have to worry about ssh keys.)

ZFS replication is snapshot based, so with default arguments like the following:

syncoid -r sourcepool/dataset targetpool/dataset

… it first creates a new snapshot on sourcepool/dataset, then replicates it (along with any prior snapshots) to targetpool/dataset, then looks for old automatically-created sync snapshots and deletes them if it finds any.

You can see arguments and simple usage examples at GitHub - jimsalterjrs/sanoid: These are policy-driven snapshot management and replication tools which use OpenZFS for underlying next-gen storage. (Btrfs support plans are shelved unless and until btrfs becomes reliable.) . I would also recommend setting Sanoid up to take regular snapshots of your source dataset, by the way. It sounds like you’ve got a lot of learning ahead of you, but it’s really really useful learning, and you’ll likely be amazed at how much you can do, and how easily, once you wrap your head around it.

Minime · August 16, 2024, 8:49am

Thank you very much!!

jaammess23 · August 16, 2024, 10:06am

Hi,
I think you are trying to this creation after a long time setup with a striped SSD pool mirrored to a single HDD but ZFS does not support mirroring a stripe to a single disk as it requires at least two disks for redundancy in a mirror. You might need to reconsider your setup or use a different approach for redundancy.

Thanks

Minime · August 16, 2024, 11:39am

I don’t want to annoy you guys here…feel free to ignore me, but I might have found a way?

zpool status
  pool: nas-pool
 state: ONLINE
config:

	NAME          STATE     READ WRITE CKSUM
	nas-pool      ONLINE       0     0     0
	  mirror-0    ONLINE       0     0     0
	    ssd_1     ONLINE       0     0     0
	    backup_1  ONLINE       0     0     0
	  mirror-1    ONLINE       0     0     0
	    ssd_2     ONLINE       0     0     0
	    backup_2  ONLINE       0     0     0
	  mirror-2    ONLINE       0     0     0
	    ssd_3     ONLINE       0     0     0
	    backup_3  ONLINE       0     0     0

errors: No known data errors

“backup” is a single large HDD where I simply created 3 partitions and “ssd” are 3 separate SSD disks.

sudo zpool create -f -o ashift=13 -O normalization=formD -O atime=off -m none -R /mnt -O compression=lz4 -O recordsize=1M nas-pool  
mirror /dev/mapper/ssd_1 /dev/mapper/backup_1
mirror /dev/mapper/ssd_2 /dev/mapper/backup_2 
mirror /dev/mapper/ssd_3 /dev/mapper/backup_3

sdb                           8:16   0     1G  0 disk  
└─ssd_1                     252:2    0  1008M  0 crypt 
sdc                           8:32   0     3G  0 disk  
├─sdc1                        8:33   0     1G  0 part  
│ └─backup_1                252:3    0  1008M  0 crypt 
├─sdc2                        8:34   0     1G  0 part  
│ └─backup_2                252:6    0  1008M  0 crypt 
└─sdc3                        8:35   0  1022M  0 part  
  └─backup_3                252:7    0  1006M  0 crypt 
sdd                           8:48   0     1G  0 disk  
└─ssd_2                     252:4    0  1008M  0 crypt 
sde                           8:64   0     1G  0 disk  
└─ssd_3                     252:5    0  1008M  0 crypt

waltar · August 16, 2024, 11:52am

Read could be fast, write and delete would be slow. Wondering about crypt by luks instead of native zfs encryption also. And the ssd’s have to be same size as hdd, so if take 20TB hdd’s which ssd now ?
Edit: Upps, I should read carefully as you mentioned partitioning the hdd in 3 parts … but which hdd is 3 times as big as 1 of your ssd’s or otherwise there is some space lost on hdd or ssd side.

Minime · August 16, 2024, 11:55am

I don’t mind write and delete. I read a lot of debate on zfs encryption and I think (without any proof or experience at all) that this might be simpler for me to handle…I have my OS also encrypted with LUKS and then I can crate a encrypted file on /root/ and automount ZFS…that’s the basic idea…