Zpool operations don't take effect

darukutsu · July 29, 2024, 11:43am

On linux I have this mirrored pool consisting of 4 disks and decided to destroy it and instead create normal (non-mirrored) pool.

After doing zpool destroy backup and zpool create maintank -f /dev/disk/by-id/{ata-foo1,ata-foo2,ata-foo3,ata-foo4} everything was ok, I created datasets and after reboot everything was back to “backup”. I then tried zpool export backup and zpool import maintank which looked like it worked, because I was able to see maintank and it’s datasets but after 2nd reboot I couldn’t do it.

This was after first reboot:

 $  lsblk -fs
NAME  FSTYPE     FSVER LABEL    UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda1  zfs_member 5000  maintank 1385790277081281569                                 
└─sda zfs_member 5000  backup   8465628505380369490                                 
sda9                                                                                
└─sda zfs_member 5000  backup   8465628505380369490                                 
sdb1  zfs_member 5000  maintank 1385790277081281569                                 
└─sdb zfs_member 5000  backup   8465628505380369490                                 
sdb9                                                                                
└─sdb zfs_member 5000  backup   8465628505380369490                                 
sdc1  vfat       FAT32          1E46-4225                                           
└─sdc                                                                               
sdc2  zfs_member 5000  hpserver 14954012052040600445                                
└─sdc                                                                               
sdd1  zfs_member 5000  maintank 1385790277081281569                                 
└─sdd zfs_member 5000  backup   8465628505380369490                                 
sdd9                                                                                
└─sdd zfs_member 5000  backup   8465628505380369490                                 
sde1  zfs_member 5000  maintank 1385790277081281569                                 
└─sde zfs_member 5000  backup   8465628505380369490                                 
sde9                                                                                
└─sde zfs_member 5000  backup   8465628505380369490

I probably forget something to do like clear cache or do I have to wipefs from each of these disk and then create new pool?

waltar · July 29, 2024, 1:16pm

Hello darukutsu, do again to your new pool “zpool destroy maintank”.
After that cleanup your drives
“for d in sda sdb sdd sde; do dd if=/dev/zero of=/dev/$d bs=1M count=10 ; done”
Then you could create a new zpool like you wish and test by reboot.
But your first decision to a mirror pool was still best as it’s much faster than a raidz pool while not much less space available and even more flexible for future expansion.

HankB · July 29, 2024, 1:48pm

It would be helpful to know the distro in use because I think the policies may differ for managing the cache, if that’s the cause of this issue.

For clearing the drive, I’d suggest wipefs because ZFS scatters information across the drive so zeroing out the first 10MB is probably not sufficient.

Partition 9 is what ZFS creates to provide a space cushion when it is handed the entire device for zpool create (vs. creating from partitions.) It’s odd to see it included in pools.

It would also be helpful to see the exact commands you executed (with drive IDs obfuscated as needed.) You should be able to copy them from your notes. If you don’t have notes, you should!

Disk IDs in the form of /dev/sdX are a bit of a red flag as they can change on every boot. I wonder if you can import the drives using something like zpool import -d /dev/disk/by-id/<some-wikdcard> or even listing the specific disks that way. Worth a try.

Good luck!

darukutsu · July 29, 2024, 3:18pm

thanks for the tip, clearing first 10MB truly didn’t help. IIRC i don’t remember doing similar thing when I was on freebsd. (I don’t know why i thought wipefs gonna take hours to finish…) so everything works now even after reboot.

I’m on archlinux 6.9.7-hardened1-1-hardened, this is what I used for creation

$ zpool destroy backup

$ zpool create maintank -f -O compression=zstd -O acltype=posix -O xattr=sa -O normalization=formD -O mountpoint=none -O canmount=off -O devices=off /dev/disk/by-id/{ata-TOSHIBA_MQ04ABF100_Z7I8PAPUT,ata-ST1000DM003-1ER162_S4Y4QF87,ata-ST500LT012-1DG142_WBY4J9X1,ata-ST500LT012-1DG142_WBY4JB3M}

I don’t understand why it shows sdx9 too, since it doesn’t for other pools.

 $  ls -la /dev/disk/by-id/
total 0
drwxr-xr-x 2 root root 580 Jul 29 19:01 .
drwxr-xr-x 9 root root 180 Jul 29 19:01 ..
lrwxrwxrwx 1 root root   9 Jul 29 19:01 ata-P3-128_9D10512005174 -> ../../sdc
lrwxrwxrwx 1 root root  10 Jul 29 19:01 ata-P3-128_9D10512005174-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  10 Jul 29 19:01 ata-P3-128_9D10512005174-part2 -> ../../sdc2
lrwxrwxrwx 1 root root   9 Jul 29 19:01 ata-ST1000DM003-1ER162_S4Y4QF87 -> ../../sdd
lrwxrwxrwx 1 root root  10 Jul 29 19:01 ata-ST1000DM003-1ER162_S4Y4QF87-part1 -> ../../sdd1
lrwxrwxrwx 1 root root  10 Jul 29 19:01 ata-ST1000DM003-1ER162_S4Y4QF87-part9 -> ../../sdd9
lrwxrwxrwx 1 root root   9 Jul 29 19:01 ata-ST500LT012-1DG142_WBY4J9X1 -> ../../sdb
lrwxrwxrwx 1 root root  10 Jul 29 19:01 ata-ST500LT012-1DG142_WBY4J9X1-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  10 Jul 29 19:01 ata-ST500LT012-1DG142_WBY4J9X1-part9 -> ../../sdb9
lrwxrwxrwx 1 root root   9 Jul 29 19:01 ata-ST500LT012-1DG142_WBY4JB3M -> ../../sde
lrwxrwxrwx 1 root root  10 Jul 29 19:01 ata-ST500LT012-1DG142_WBY4JB3M-part1 -> ../../sde1
lrwxrwxrwx 1 root root  10 Jul 29 19:01 ata-ST500LT012-1DG142_WBY4JB3M-part9 -> ../../sde9
lrwxrwxrwx 1 root root   9 Jul 29 19:01 ata-TOSHIBA_MQ04ABF100_Z7I8PAPUT -> ../../sda
lrwxrwxrwx 1 root root  10 Jul 29 19:01 ata-TOSHIBA_MQ04ABF100_Z7I8PAPUT-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Jul 29 19:01 ata-TOSHIBA_MQ04ABF100_Z7I8PAPUT-part9 -> ../../sda9
lrwxrwxrwx 1 root root   9 Jul 29 19:01 wwn-0x50000398325071e5 -> ../../sda
lrwxrwxrwx 1 root root  10 Jul 29 19:01 wwn-0x50000398325071e5-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Jul 29 19:01 wwn-0x50000398325071e5-part9 -> ../../sda9
lrwxrwxrwx 1 root root   9 Jul 29 19:01 wwn-0x5000c5008c7b89eb -> ../../sdd
lrwxrwxrwx 1 root root  10 Jul 29 19:01 wwn-0x5000c5008c7b89eb-part1 -> ../../sdd1
lrwxrwxrwx 1 root root  10 Jul 29 19:01 wwn-0x5000c5008c7b89eb-part9 -> ../../sdd9
lrwxrwxrwx 1 root root   9 Jul 29 19:01 wwn-0x5000c5009d53f102 -> ../../sdb
lrwxrwxrwx 1 root root  10 Jul 29 19:01 wwn-0x5000c5009d53f102-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  10 Jul 29 19:01 wwn-0x5000c5009d53f102-part9 -> ../../sdb9
lrwxrwxrwx 1 root root   9 Jul 29 19:01 wwn-0x5000c5009d540e80 -> ../../sde
lrwxrwxrwx 1 root root  10 Jul 29 19:01 wwn-0x5000c5009d540e80-part1 -> ../../sde1
lrwxrwxrwx 1 root root  10 Jul 29 19:01 wwn-0x5000c5009d540e80-part9 -> ../../sde9

 $  lsblk -f
NAME   FSTYPE     FSVER LABEL    UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda                                                                                  
├─sda1 zfs_member 5000  maintank 14549055206291813215                                
└─sda9                                                                               
sdb                                                                                  
├─sdb1 zfs_member 5000  maintank 14549055206291813215                                
└─sdb9                                                                               
sdc                                                                                  
├─sdc1 vfat       FAT32          1E46-4225                                           
└─sdc2 zfs_member 5000  hpserver 14954012052040600445                                
sdd                                                                                  
├─sdd1 zfs_member 5000  maintank 14549055206291813215                                
└─sdd9                                                                               
sde                                                                                  
├─sde1 zfs_member 5000  maintank 14549055206291813215                                
└─sde9

waltar · July 29, 2024, 3:53pm

You created a dangerous “raid0 like” just striped zpool without any redundancy now, not preferable at all if not using just temporary for any scratch data. PS: On linux sd"x"9 is always there even if unused, it’s in discussion to remove in a feature release, I wonder it doesn’t on other pools you saw.

mercenary_sysadmin · July 29, 2024, 4:43pm

When you feed OpenZFS raw disks, it always partitions them first and uses partition 1 for the data and partition 9 to reserve a small amount of space at the end of the drive. IIRC, the idea is that small partition at the end “standardizes” drives of a given size which aren’t really–in other words, a Seagate “2TB” drive and a Western Digital “2TB” drive typically aren’t actually quite the same size, and without something like the small part9 at the end to standardize, you wouldn’t be able to replace a larger “2TB” drive with a smaller “2TB” drive in an existing pool.

Anyway, here you can see the same behavior in one of my pools that has some raw drives in it:

root@jrs-dr0:/# zpool status clients | grep -A2 mirror-0
	  mirror-0                  ONLINE       0     0     0
	    wwn-0x5000c500a52f6315  ONLINE       0     0     0
	    wwn-0x5000c500a5369061  ONLINE       0     0     0

root@jrs-dr0:/# sfdisk -d /dev/disk/by-id/wwn-*6315 | sed 's/type.*$//' | sed 's/.*6315//'
label: gpt
label-id: EB283696-A40B-934B-8B13-FFD1A311A1E4

unit: sectors
first-lba: 34
last-lba: 23437770718

-part1 : start=        2048, size= 23437750272, 
-part9 : start= 23437752320, size=       16384,

BTW, exporting and re-importing a pool created with raw drives can sometimes result in the specifc partition showing in zpool status rather than the raw drivename, but that’s not a problem, just a quirk. Ultimately, the partition is all ZFS is using, whether you fed it “a whole disk” or specified which partition to use yourself.