Do you use notifications from zed, smartd, etc., if you use monitoring tools?

muay_throwaway · July 12, 2023, 6:13am

If you use general monitoring software (such as Nagios, Zabbix, PRTG, Prometheus, etc.), do you also configure and activate the native monitoring tools for particular applications/services with email notifications (e.g., zed from ZFS, smartd from smartmontools, backup notifications)? Or do you completely centralize your monitoring? I’m considering adopting Zabbix, Nagios, etc., but I’ve already set up some monitoring services/daemons (mostly the standard ones). I know you can, for example, implement smartd and zfs-monitoring plugins in Nagios, but I am not sure if it makes sense to completely abandon the existing ones. I was thinking it may offer some redundancy if I leave them in place.

jblondreddit · July 12, 2023, 9:53am

With ZFS on Linux I use /etc/zfs/zed.d/zed.rc

use set

ZED_EMAIL_ADDR=someValid@example.com
ZED_EMAIL_PROG="mail"
ZED_EMAIL_OPTS="-s '@SUBJECT@' @ADDRESS@"

That works 99% of the time.

Also there is https://github.com/in-famous-raccoon/proxmox-snmp/blob/main/snmp-zfs-used.sh

for SNMP. That isn’t in real-time but usually fast enough.

Stack · July 12, 2023, 12:40pm

I’ve been using Zabbix since the 1.4 days and throughout the years I’ve contributed very very minor things to them - I mention this only to say that I’ve long since learned how to make it work well for me and I obviously like it. I know not everyone feels that way, but I have a slight bias

Anyway, I use Zabbix with this ZFS template:

I will say, that hasn’t been updated in a while and one of the pull requests has a fix I needed but I don’t recall which one… but I’ve been running that template now since I saw the announcement back in 2019 and realized it was better then what I’d been doing. It has served me well. I used to be bleeding edge for Zabbix, but life - so I’ve been on 6.0 LTS since it released sometime last year.

The ZFS template plus the zabbix_agent2 built-in SMART data pulls in a ton of stats and alerts which I’ve been quite happy with. And of course, Zabbix can be configured to send alerts a million different ways - I’m pretty simple though. If you have questions on a stat, feel free to ping me about it and I will do what I can.

Hope that helps.

muay_throwaway · July 12, 2023, 5:30pm

Thanks. Do you use any monitoring tools (like Zabbix)? I already have zed set up; my question is: do you still allow zed, etc., to send notifications independent of your primary monitoring software?

mercenary_sysadmin · July 12, 2023, 8:11pm

I use Nagios, but do not use zed. I use Sanoid’s checks as Nagios plugins: one plugin runs sanoid --monitor-health and another runs sanoid --monitor-snapshots via NRPE.

muay_throwaway · July 12, 2023, 9:46pm

Thank you! That makes sense. Do you containerize/virtualize your Nagios instance? Or do you just let it sit at the hypervisor level?

bladewdr · July 13, 2023, 10:15pm

From memory he runs Nagios on a cloud based VPS and connects back to his servers via Wireguard tunnels or possibly Nebula.

muay_throwaway · July 13, 2023, 11:07pm

Thank you! That makes sense.

mercenary_sysadmin · July 16, 2023, 7:16pm

Correct, for my own infra. When I set Nagios up for clients to have their own monitoring, I use a dedicated Nagios VM on the client’s infra, also with WireGuard tunnels to get to any of the client’s devices which aren’t on the primary network.

Cupbearer · July 25, 2023, 6:19am

Stack:

I’ve been using Zabbix since the 1.4 days and throughout the years I’ve contributed very very minor things to them - I mention this only to say that I’ve long since learned how to make it work well for me and I obviously like it. I know not everyone feels that way, but I have a slight bias

Anyway, I use Zabbix with this ZFS template:
GitHub - Cosium/zabbix_zfs-on-linux: zabbix template and user parameters to monitor zfs on linux

I will say, that hasn’t been updated in a while and one of the pull requests has a fix I needed but I don’t recall which one… but I’ve been running that template now since I saw the announcement back in 2019 and realized it was better then what I’d been doing. It has served me well. I used to be bleeding edge for Zabbix, but life - so I’ve been on 6.0 LTS since it released sometime last year.

The ZFS template plus the zabbix_agent2 built-in SMART data pulls in a ton of stats and alerts which I’ve been quite happy with. And of course, Zabbix can be configured to send alerts a million different ways - I’m pretty simple though. If you have questions on a stat, feel free to ping me about it and I will do what I can.

Hope that helps.

The fix which hasn’t been merged is for %1 and %2 that are showing beneath graphs. Its easy to fix by picking up the template from the pr and installing it on top of your zfs on linux template. Files can be picked up here:

While you’re at it, pick up the changes in the config file for your hosts as well. It has some improvements as well.

There is another problem though, it will throw errors on your clients about cat /proc/spl/kstat/zfs/puddle/io. Fix is here:

I hope that was helpful because I have a question for you . Do you how to trigger an alert when an encryption key hasn’t been loaded for a particular dataset?

Stack · July 26, 2023, 12:58pm

Do you how to trigger an alert when an encryption key hasn’t been loaded for a particular dataset?

I don’t. Sorry.

galezer · August 20, 2023, 9:39pm

what would be the best way to monitor ZFS events of FreeBsd machine ?

jblondreddit · March 14, 2024, 10:46am

I use https://www.observium.org/