Cost of using dedup

I’m migrating some Solaris NFS/ZFS fileservers over to FreeBSD and am contemplating the tradeoffs of enabling dedup. Legacy dedup is enabled on an existing backup fileserver (we don’t care if the backup is slow) so I know that the dedupratio is about 1.8x. Where the old servers used magnetic disks and SSDs for cache/log, the new servers are all SSD so we’re not gaining any extra capacity but the performance is better). I was under the impression that the main cost of dedup was the huge amount of RAM required for the hash tables. Does fast dedup address that at all or is the only improvement performance related as implied by Introducing OpenZFS Fast Dedup - Klara Systems

If I were to enable dedup, rsync the data (can’t zfs send from Solaris 11) and then disable the dedup, does it drop the table of hashes and only keep the reference counts. Or would I still pay the RAM usage penalty?

Are there any offline dedup solutions? Would be good to be able to interogate the backup system to identify duplicate blocks. As far as I can tell, the block cloning uses separate data structures from the dedup feature though I’d have naively supposed reference counts might be shared. Does it scale well with lots of deduplication?

Thanks

If you do this, you’ll dedup the data being written. It will remain deduplicated after dedup is disabled, but additional writes after that will NOT be deduplicated.

The RAM “penalty” you’re referring to is that while dedup is active, you really REALLY want to be able to keep the entire dedup table in RAM, because you’ve got to grovel over it with every write. With dedup disabled, you no longer need to consult a master tree of block hashes before every write, so you no longer need more RAM than you would without dedup / if dedup had never been enabled.

Yeah, I understand that after turning off dedup, new writes will not be deduped. On the old Solaris system there are a couple of pools where that had been done and they still report a dedupratio of around 1.2 after quite a few years. Do you happen to know whether, if you turn off dedup, it drops the entire dedup table or are the hashes kept around for as long as the corresponding data (I can probably determine that experimentally).

Thanks for providing the confirmation that turning off dedup does allow the RAM it used to be reclaimed.

It slightly surprises me that dedup works on the backup system where all the data comes in via zfs receive, I guess it is replaying transactions rather than cloning data in a more verbatim form.