Freeze and unfreeze

I have 12 SATA disks in my pool. Everything is fine and it has good health. But I would like to document which disc serial number is in which slot. The easiest way I can think of is to run lsblk --scsi to see the disc designations and their serial numbers and print that to paper to create a list. Then, pop a disk out very slightly and Mark which slot it is in in the list. I’ve looked at the undocumented freeze and unfreeze commands, and I would like to know from any experts here how safe they are, and if they are applicable to my purpose. Thanks in advance.

Neither safe nor applicable. The zpool freeze command describes itself in its own comments as “a vile debugging abomination, so we treat it as such.”

If used in a production rather than a filesystem debugging context, zpool freeze is extremely likely to result in a deadlocked system that requires a forced reset to recover.

The stone axe way to do this is just to start with a particular disklabel, hit it with a ton of read traffic, and see which bay’s LED goes into disco mode.

root@box:~# pv < /dev/disk/by-id/wwn-*FAF0 > dev/null

If you don’t have pv available and don’t want to install it, that’s fine, cat will work just fine for this purpose as well. Just don’t forget to redirect it to /dev/null!

This is a safe operation, because you’re only reading from the drive, not writing to it, and I have difficulty imagining the “honest typo” that results in an accidental write. (Even if you get the redirection backwards, /dev/null is empty, so you don’t actually write anything to your drive.)

There are also some packages in the repo that will supposedly blink a bay light for you without requiring actual disk traffic, but in my experience they don’t tend to work universally on any hardware you might come across.

Whereas reading all the data off an individual disk as rapidly as possible will always light up a bay!

1 Like

But if you’re worried about it… Write a ridiculously simple bourne script to make it impossible to screw up.

#!/bin/sh
pv < $1 > /dev/null
# or cat $1 > /dev/null if you don't have pv

If you save that as /usr/local/bin/discolights and sudo chmod 755 /usr/local/bin/discolights, you can then sudo discolights /dev/disk/by-id/<disklabel> to EXTREMELY safely identify a drive’s bay by activity light.

The more I thought about this, the more I thought it deserved some quick tooling in the Sanoid project itself. You can find a fledgling version of the new tool findbay in MASTER of the project’s Github, from which it should make its way out to distros after our next release:

I’m still working on the tool, but for the moment, you can pass it any string sufficient to uniquely identify a drive–for example, the last 4 of the WWN, or whatever else you can see in the output of zpool status–and it will return a raw devicename.

You can in turn pass the raw devicename to pv or cat in order to force a drive’s activity light on:

root@elden:~# zpool status | grep SHGP31
	  nvme-SHGP31-2000GM_[elided]-part3  ONLINE       0     0     0

root@elden:~# findbay SHGP31
/dev/nvme0n1

root@elden:/usr/local/bin# pv < `findbay SHGP31` > /dev/null
54.5GiB 0:00:09 [3.06GiB/s] [====>                        ]   2% ETA 0:04:20

Future versions of the tool will blip the hard drive activity lights for you directly–and I plan to support multiple access patterns also, since different bays behave differently–but this is (IMO) already a useful tool, since it makes it MUCH quicker and easier to get the raw devicename you really want out of any arbitrary chunk of what you can see from zpool status.

Sounds cool! I’m glad I inspired this activity. My consultant who is a great scripter, made a script to identify discs with a burst read pattern. This worked perfectly on my backup server because it’s not online and there isn’t much activity on the drives, so I was able to identify those drives. I can show you the script if you like. But the main server is currently showing a lot of blinking activity on a lot of discs because I have an Rclone script, running to back up to Amazon S3. And there’s a lot of data still in progress. Anyway, thank you all and especially @mercenary_sysadmin for working on this. I think a successful script for this is still at work in progress?

1 Like

Yes, I need another hour or so of gumption to sit down and bang at the code to implement the actual drive activity stuff directly.

Ideally, that will happen today. But it is Saturday. :cowboy_hat_face: