Refurbished Rust Advice?

KMPLSV · April 1, 2024, 12:18am

Looking to get some refurbed rust for my R730XD LFF. 16x something, probably 6TB-12TB.

Jim, I know you mention IronWolf a lot, but I’m looking at HGST refurbs and it’s a bit of a dangerous dance. I know WD took over and hijacked the “Ultrastar” name. So a lot of the refurbs out there are WD refurbs, labeled “Ultrastar.” How do I know if I’m looking at genuine HGST refurbs?

mercenary_sysadmin · April 1, 2024, 1:23pm

I don’t think you do know that. With that said, as far as I know, WD never put SMR into any of the models that carried the HGST branding.

KMPLSV · April 1, 2024, 4:13pm

I stand corrected. I do not know that. On the 2.5 Admins podcast, I heard Alan mention something like WD “using” the old HGST “Ultra Star” name so a better question is…

I’m looking for good refurb rust. Preferably 4TB-12TB, about 18 drives total. IronWolf and HGST drives have been recommended. Are there any particular variants of either to avoid? Are there any other brands/models you recommend in addition to those above?

Thank you. I apologize for the wording in my initial post.

mercenary_sysadmin · April 1, 2024, 4:42pm

I wasn’t trying to dress you down, I was just saying I don’t think there actually is any way to know for sure whether an HGST refurb is one of the pre-WD-acquisition models or the post-WD-acquisition models. I was also suggesting that it probably doesn’t matter, since to the best of my knowledge, WD never put SMR technology into any of the HGST branded models, even after the acquisition.

If I were looking for a crap ton of refurbed drives on the cheap, I would absolutely be looking at HGST drives along with Ironwolf drives. I think it’s probably easier to find the HGST; those used to be immensely popular and widely deployed.

The biggest caution I have is to be aware of the possibility, however slight, of finding SMR somewhere you didn’t want to find it. So just test your drives as soon as you get them, rather than letting them sit in unopened boxes for a few months before beginning your project (been there, done that, myself)!

KMPLSV · April 2, 2024, 11:58am

Good info!

I’m going to use my Google-Fu, but do you have any solid methodology for finding out if a drive is SMR quickly?

mercenary_sysadmin · April 2, 2024, 1:16pm

Hit it with a stream of continuous 32KiB writes using fio. If the drive is SMR, the performance will plummet pretty quickly.

KMPLSV · April 3, 2024, 12:41pm

Thank you.

Also, having someone talk me down – from the ledge of this purchase – may not be a bad idea!

mercenary_sysadmin · April 3, 2024, 1:37pm

Well, if you want maybe to be talked down, let’s start here. You absolutely should not be considering drives smaller than 8T at the minimum. You’re not going to get enough of a break (if any) on cost per TiB to come even close to making up for increases in heat generation and power consumption, let alone for the increased number of failures per annum.

You especially shouldn’t be considering tiny drives with that number of spindles. For most workloads, you won’t see much higher performance on eighteen spindles than you do on eight–and the odds are pretty good that 4-6TB drives will be noticeably slower than >=8TB drives individually, to boot.

Look for the best cost per TB you can get, and base the number of spindles on the capacity you need. If you want higher performance, don’t try to get it by just throwing spindle count at the wall–use the extra capacity to make a higher performance topology (eg 6 wide Z2 instead of ten wide, or 3-wide Z1 instead of 6-wide Z2, or mirrors instead of either) possible, instead.

KMPLSV · April 3, 2024, 6:57pm

Again thank you for the information! I definitely have some homework here.

By “spindle” you mean hard drive, but are using this term to put emphasis on the mechanical workload, is that accurate?

The number of drives in operation will be 16 HDDs in an R730XD LFF (with the 4 x 3.5" Midplane Cage Assembly & Backplane Tray 4FHR4). I arrived tentatively at the number 18 to figure in two backup drives. I’ve been pointed in the direction of Z3 (what I use currently), but haven’t planned much beyond that. This will be entirely new array with no migration necessary.

mercenary_sysadmin · April 3, 2024, 7:40pm

Well, mostly. But instead of “mechanical workload” let’s go with 'individual workload."

Essentially, your pool performance is derived from the individual performance of each of your drives, combined with the characteristics of the pool topology you choose. Adding a larger number of total drives (“spindles”) CAN increase overall pool performance, but it does not necessarily do so, and usually to nowhere near the degree people expect it to.

I arrived tentatively at the number 18 to figure in two backup drives. I’ve been pointed in the direction of Z3 (what I use currently),

This is a bit ambiguous, but it sounds like you mean you’re only looking at a single vdev plus some SPAREs, which would be a mistake with this many drives. You can build a pool out of essentially any number of drives and/or vdevs, but given that you aren’t locked into a specific set of hardware yet, it makes sense to buy your hardware in order to build the best pool, rather than building the best pool you can out of the hardware you’re already locked into.

This means pick topology first, then worry about how many drives you need (and what capacities) later. The usual choices here are as follows:

2n mirrors – highest possible performance, single redundancy, 50% SE
3n Z1 – very high performance, single redundance, 67% SE
4n Z2 – high performance, dual redundancy, 50% SE
6n Z2 – moderate performance, dual redundancy, 67% SE
10n Z2 – acceptable performance, dual redundancy, 80% SE

I don’t really recommend Z3 at this scale.

The “SE” above refers to Storage Efficiency, the ratio of total capacity vs usable capacity. Worth noting: that’s a maximum storage efficiency, not a guarantee. If you store a lot of small blocks, you won’t get the efficiency shown above: for example, if you need to store a single 4KiB file on a 10-wide Z2 with recordsize=1M, that single 4KiB file will actually take up 3MiB (one 1MiB block of data, and two 1MiB blocks of parity).

While you absolutely can do different widths than the ones above, you should be aware that when it comes to incompressible data, they won’t perform optimally. For example, a 10-wide Z2 needs to store each block on 8 data disks and two parity disks. Recordsize is always a power of 2, which means that recordsize/8 is an even number, which means it doesn’t need padding. An 11-wide Z2 would try to store the same 1MiB of data divided among nine disks, which doesn’t add up evenly, which means “padding” to make up the difference–wasted space on disk, and wasted performance coming back off the disk.

So, now that you know all that, the question becomes which topology appeals to you the most? Whichever one, you build your pool out of those building blocks: an array of 2n mirrors, 3n Z1, 4n Z2, 6n Z2, or 10n Z2 vdevs. As many vdevs total as you need to meet your capacity goals; and each additional vdev means additional performance capability as well.

Hold up, didn’t I just get done telling you that adding disks doesn’t often help? Yes, but that’s in terms of the disk count itself, not the vdev count. Four 2n mirrors outperforms three 2n mirrors. Four 10n Z2 outperforms three 10n Z2. You get the idea. What doesn’t fly is thinking that 20 disks in two 10n Z2 vdevs will outperform 10 disks in 2n mirror vdevs–because even though the total drive count of the mirrored pool is only half that of the Z2 pool, the underlying topology is immensely higher performing for the mirrors, and therefore the much smaller final pool is faster than the larger pool is.