Syncoid Used/Refer Totals Very Different Between Source And Target Systems

Meth0d10 · November 25, 2024, 12:45pm

Hello,

I am using Syncoid to copy an existing dataset called “tank/data1” to a target
system using the same pool/dataset name.

There seems to be a discrepancy between the used/Refer amounts between the
source and the target systems. The SOURCE shows 183G used and 169G refer. The
TARGET shows 253G used and 236G REFER.

I wanted to ask what would cause this significant difference between the
source and the target systems. Note, the transfer completes successfully.

LZ4 compression is on for both system pools

#SOURCE
NAME USED AVAIL REFER MOUNTPOINT
tank/data1 183G 4.82T 169G /tank/data1
#TARGET
NAME USED AVAIL REFER MOUNTPOINT
tank/data1 253G 5.6T 236G tank/data1

Source System:

zfs --version
zfs-2.1.5-1ubuntu6~22.04.1
zfs-kmod-2.1.5-1ubuntu6~22.04.1
OS: Ubuntu 22.04.3 LTS
Syncoid:
/usr/sbin/syncoid version 2.2.0
(Getopt::Long::GetOptions version 2.52; Perl version 5.34.0)

NAME USED AVAIL REFER MOUNTPOINT
tank/data1 183G 4.82T 169G /tank/data1

Target System:

zfs --version
zfs-2.1.5-1ubuntu6~22.04.4
zfs-kmod-2.1.5-1ubuntu6~22.04.4
OS: Ubuntu 22.04.5 LTS

NAME USED AVAIL REFER MOUNTPOINT
tank/data1 253G 5.6T 236G tank/data1

JanHolbo · November 25, 2024, 4:16pm

The two system disk sizes suggests that the two pools are not identical - this could affect the disk usage. Also it seems that it is the target system that takes up most room. This could be due to different retention rules - have you checked the number of snapshots?

You can also do a:

$ zfs list -o "name,used,usedds,usedsnap,avail,mountpoint"

The column Used Datasets should be the about the same, if the dataset has just recently been
sent

Meth0d10 · November 25, 2024, 6:09pm

Thanks for the quick reply.

Actually, I forgot to mention that the target system is a brand new setup
and does not have any snapshot rules setup at all at this time; which is why the difference
surprises me. I would think that since the system is new and the syncoid transfer is
creating the data1 dataset on the target from scratch, that the copy totals would be a bit closer.

SOURCE SYSTEM:
NAME USED USEDDS USEDSNAP AVAIL MOUNTPOINT
tank/data1 183G 169G 13.9G 4.82T /tank/data1

TARGET SYSTEM:
NAME USED USEDDS USEDSNAP AVAIL MOUNTPOINT

tank/data1 253G 236G 16.9G 5T /tank/data1

mercenary_sysadmin · November 25, 2024, 7:32pm

What’s the ashift on both sides? What is the pool topology?

zdb -C tank | grep ashift
zpool status tank

Need to see that information on both sides. The answer you’re looking for should be in that data.

JanHolbo · November 25, 2024, 8:32pm

So the source might purge older snapshots, which the target does not - that could be part of the reason for the difference, and then as Jim Salter says the pool topology, ashift and I guess possibly blocksize.

Meth0d10 · November 25, 2024, 11:07pm

Thanks guys!

The pool topology is the following:

SOURCE SYSTEM:
Ashift:14
pool: tank
state: ONLINE
config:

NAME                    STATE     READ WRITE CKSUM
tank                    ONLINE       0     0     0
  raidz2-0              ONLINE       0     0     0
    nvme-disk1 		ONLINE       0     0     0
    nvme-disk2 		ONLINE       0     0     0
    nvme-disk3  	ONLINE       0     0     0
    nvme-disk4 		ONLINE       0     0     0
    nvme-disk5  	ONLINE       0     0     0
    nvme-disk6 		ONLINE       0     0     0
    nvme-disk7  	ONLINE       0     0     0
    nvme-disk8 		ONLINE       0     0     0

TARGET SYSTEM:
Ashift: 13
pool: tank
state: ONLINE
config:

NAME                    STATE     READ WRITE CKSUM
tank                    ONLINE       0     0     0
  raidz2-0              ONLINE       0     0     0
    nvme-disk1          ONLINE       0     0     0
    nvme-disk2          ONLINE       0     0     0
    nvme-disk3          ONLINE       0     0     0
    nvme-disk4          ONLINE       0     0     0
    nvme-disk5          ONLINE       0     0     0
    nvme-disk6          ONLINE       0     0     0
    nvme-disk7          ONLINE       0     0     0
    nvme-disk8          ONLINE       0     0     0
    nvme-disk9          ONLINE       0     0     0
    nvme-disk10         ONLINE       0     0     0

mercenary_sysadmin · November 25, 2024, 11:34pm

Mismatched ashift is the cause. Replication will not tear down blocks and rebuild them, so when the physical sector size (ashift) is different from one system to the next, you wind up with magically expanding data like this.

Changes in topology can have some effect also, but generally not as large as what you see here. I also discovered this the hard way, going from ashift=9 to a new ashift=12 pool… At least your sector size just doubled, instead of increasing by a factor of eight!

You either need to set ashift(target) = ashift(source), or if you want to permanently increase the sector size inside your blocks, do a brute force copy operation (eg rsync) instead of replication.

Note that ashift is immutable, so we’re talking about blowing the entire pool away and recreating from scratch. Sorry 'bout that.

JanHolbo · November 27, 2024, 1:33pm

I am wondering about your pool topology. First of all congrats on all NVMe - that is currenty just a dream for me

I do not know the roles of the two pools - they may both be primary pools that are used as backup/replication pools as a secondary role - also the sizes are not mentioned. RAIDZx optimal topology is datadisk 2-4-8-etc plus number of parity.
In your case the target is optimal 10 (8 data + 2 parity) where the source is non-optimal 8 (6 data + 2 parity). This may not be the cause of the size difference but may harm storage efficiency and performance.

Just out of curiousity, how do you connect 8 and 10 NVMe drives respectively?