Is there such a thing as too many datasets?

swimminginpools · September 9, 2023, 3:00pm

I’m in the middle of reorganizing all my data and isolating it onto my server. I’m making a dataset for myself under my username, is there any downside to making some of the folders that you’d normally find in a user folder (like documents, music, photos, etc) into their own dataset?

And then I have another dataset for my professional photos based off the archive harddrive they came from. I’m planning on combining them into one master photo dataset, but would there be any downside to making a master photo dataset and then datasets for each year? I’m thinking I could have different sanoid policies for current year vs prior years and then the datasets for each year would make it easy for me to replicate a year to an external drive for sneakernetting over to an offsite backpack.

So is there a such a thing as too many datasets?

deviantintegral · September 9, 2023, 9:12pm

I think this is a good case for multiple datasets. For example, you could then have different snapshot policies on each dataset. I do something similar. However, you can also split datasets later if you want fairly easily. Do zfs clone ... to clone the original dataset, and then remove the files from each dataset you don’t want. As far as I know, merging datasets means you’ll have to have 2x the space at least temporarily, or longer if you want to maintain snapshots.

The big issue in the long run is that if you have many datasets it slows down basic operations like zfs list and zpool import. However, I’ve noticed this only on spinning disks, so if you’re using flash storage it’s unlikely you’ll have an issue.

Having many snapshots can lead to similar situations. I’ve never run into situations where it made a server unusable. But if I run something like zfs list -r -t all <pool> on my spinning disk pool with a total of ~6300 snapshots and datasets that it takes a minute the first time.