Recommended: Sing it, brah! 5 fabulous songs for developers
JW's Top 5
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 2 of 2
Now, after watching that for a few days and noting the absence of network or storage errors, we should add to the load. Toss Netperf or a similar network stress tool on each of those test VMs, and write a quick script to randomize TCP traffic of different sizes and payloads, with different test durations between all the VMs, and loop it the same way. Run that concurrently with the storage workload. If you want to add to the misery, throw in a few other VMs with a large number of virtual CPUs and RAM, then run CPU and RAM stress routines on them. At this point we should be hammering the hell out of just about every aspect of the cluster, from CPU to storage, from RAM to the network. If something's going to break, this would be where that happens, at least in theory.
Right about at that point is where I'd start trying to break things. Pull a host's power and make sure any fail-over actions happen appropriately. Run an automated host upgrade process and watch it carefully. Yank a network cable, or shut down the relevant switch port and make sure that bonding and fail-over network links work like they're supposed to. Also check to see all this happens under load -- that's when it's most important.
For some, this is one of the best parts: to come up with ways to beat the stuffing out of fresh gear, poking for weaknesses and holes. For everyone, the benefits are indispensable. For one thing, it allows a certain peace of mind after the production workload shifts; for another, it's vastly easier than trying to fix a big problem that was missed early on and winds up causing production outages.
So test, test, and test some more. Have some fun cooking up creative ways to stress every subsystem, every component, and ease everything into production after a reasonable breaking-in period. That light will still be on when you get there, perhaps a bit brighter and more soothing than before.
This story, "Didn't test? Then don't deploy," was originally published at InfoWorld.com. Read more of Paul Venezia's The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.